Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upSearch for recursive is too slow (unusable). Look at helm-projectile-ag or ripgrep-regexp for fast search samples #2
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 13, 2018
Owner
Hi,
Actually I thought the performance for recursive was decent for running inside Emacs. It loads files in raw mode to avoid loading major modes, and it avoids processing any files and directories it shouldn't. I tested it on the org-mode repo which has a lot of files, and I was satisfied with the performance even on a relatively slow system.
But it could definitely be faster. I initially wanted to avoid calling external processes so we could use Emacs's major mode syntax tables to match comment delimiters for each file type automatically. That works fine for already opened files, but finding unopened files in raw mode defeats that. However, I do have a few ideas about how we might get the syntax tables without paying the penalty of loading the whole major mode...
Another idea would be to use async.el to run the code in another Emacs process, and then insert the section in a callback. The speed might not improve (might be worse in large repos), but the status buffer wouldn't be delayed in loading.
Having said all that, I'm certainly open to the idea of alternative backends that use programs like rg. As long as we aren't using the syntax tables, it doesn't matter much. It would be good to maintain an Emacs native one as a fallback, so users wouldn't have to install an external tool.
So, thanks for the suggestions and example PR. I'll see what I can do. :)
|
Hi, Actually I thought the performance for recursive was decent for running inside Emacs. It loads files in raw mode to avoid loading major modes, and it avoids processing any files and directories it shouldn't. I tested it on the org-mode repo which has a lot of files, and I was satisfied with the performance even on a relatively slow system. But it could definitely be faster. I initially wanted to avoid calling external processes so we could use Emacs's major mode syntax tables to match comment delimiters for each file type automatically. That works fine for already opened files, but finding unopened files in raw mode defeats that. However, I do have a few ideas about how we might get the syntax tables without paying the penalty of loading the whole major mode... Another idea would be to use async.el to run the code in another Emacs process, and then insert the section in a callback. The speed might not improve (might be worse in large repos), but the status buffer wouldn't be delayed in loading. Having said all that, I'm certainly open to the idea of alternative backends that use programs like rg. As long as we aren't using the syntax tables, it doesn't matter much. It would be good to maintain an Emacs native one as a fallback, so users wouldn't have to install an external tool. So, thanks for the suggestions and example PR. I'll see what I can do. :) |
alphapapa
self-assigned this
Jun 13, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 13, 2018
Contributor
Thanks, and nice work! I'm using with my branch change, as I have rg with the -b flag and even on massive repositories (non-Emacs projects) there is no lag time for the magit window to pop up - so nice to have all the TODOs categorized in one spot (and reminding me to make them TODONEs each time I do a commit).
|
Thanks, and nice work! I'm using with my branch change, as I have rg with the -b flag and even on massive repositories (non-Emacs projects) there is no lag time for the magit window to pop up - so nice to have all the TODOs categorized in one spot (and reminding me to make them TODONEs each time I do a commit). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 13, 2018
Owner
Cool! BTW I'm curious, how many files are being scanned in those massive repos?
|
Cool! BTW I'm curious, how many files are being scanned in those massive repos? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 13, 2018
Contributor
I am pretty sure the magit-todos.el as is was not ignoring "node_modules" even when I added it in the list (possibly a separate issue). If that was the case, it was scanning 27,000 files.
With the directory ignored (such as how rg is ignoring it), still 1200 files (pre-edit accidentally listed .git file count as well
|
I am pretty sure the magit-todos.el as is was not ignoring "node_modules" even when I added it in the list (possibly a separate issue). If that was the case, it was scanning 27,000 files. With the directory ignored (such as how rg is ignoring it), still 1200 files (pre-edit accidentally listed .git file count as well |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 13, 2018
Owner
Yikes, yeah that would definitely be slow. I think directory exclusion works correctly, but maybe I broke something, or maybe I already fixed it after you cloned the repo. Please let me know if you notice any other issues.
|
Yikes, yeah that would definitely be slow. I think directory exclusion works correctly, but maybe I broke something, or maybe I already fixed it after you cloned the repo. Please let me know if you notice any other issues. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 13, 2018
Contributor
Another fun idea I just pushed up - splitting each 'type' into its own magit heading (TODOs FIXMEs etc.) - this could be useful to have the codebase grep on anything the user wants categorized/grouped, instead of just TODOs, but may be growing the scope of this project (you could have magit summarize and group ticket references etc.).
|
Another fun idea I just pushed up - splitting each 'type' into its own magit heading (TODOs FIXMEs etc.) - this could be useful to have the codebase grep on anything the user wants categorized/grouped, instead of just TODOs, but may be growing the scope of this project (you could have magit summarize and group ticket references etc.). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 13, 2018
Owner
That's an interesting idea. I guess as long as they were grouped under a top-level section, that could be useful. I wouldn't want each type to have its own top-level section. I don't think I'd want that to be the default behavior, though.
I think that, instead of adding that feature, I'd prefer to factor out the insertion code and let users call a custom function.
An idea I had is to experiment with aligning the filenames and/or keywords, as in a big list it can look jagged. That's not necessarily a problem, as filenames with different lengths essentially create their own visual grouping, but some users might prefer a tidier look. What do you think?
|
That's an interesting idea. I guess as long as they were grouped under a top-level section, that could be useful. I wouldn't want each type to have its own top-level section. I don't think I'd want that to be the default behavior, though. I think that, instead of adding that feature, I'd prefer to factor out the insertion code and let users call a custom function. An idea I had is to experiment with aligning the filenames and/or keywords, as in a big list it can look jagged. That's not necessarily a problem, as filenames with different lengths essentially create their own visual grouping, but some users might prefer a tidier look. What do you think? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Definitely a good idea! Would look much better with some alignment |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zhaojiangbin
Jun 14, 2018
Any reason this is not using 'git-grep'? I heard that rg is very fast. But using 'git-grep' won't require an external tool. And 'git-grep' is usually much faster than the plain old grep.
[UPDATE: Oh, I see why. git-grep does not support -b/--byte-offset.]
zhaojiangbin
commented
Jun 14, 2018
•
|
Any reason this is not using 'git-grep'? I heard that rg is very fast. But using 'git-grep' won't require an external tool. And 'git-grep' is usually much faster than the plain old grep. [UPDATE: Oh, I see why. git-grep does not support -b/--byte-offset.] |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 14, 2018
Owner
@zhaojiangbin Yeah, git grep is great, but it's not always a viable replacement for grep or other tools. Also, I stayed within Emacs for some reasons (see above), but supporting external tools also would be good.
|
@zhaojiangbin Yeah, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 14, 2018
Owner
@ahungry By the way, if you don't mind, would you mind testing something for me on your big repos? Change magit-todos-scan-file-predicate to nil. I wonder how much of the slowness you notice is due to the default setting calling magit-file-tracked-p for every file, which runs a git process for every file. We might be able to make a faster, caching version without too much trouble.
|
@ahungry By the way, if you don't mind, would you mind testing something for me on your big repos? Change |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Good idea, and sure, I'll test later |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zhaojiangbin
Jun 14, 2018
@alphapapa I see why you want to stay within Emacs. I agree that there are certain advantages of doing that. But it is slow in large projects. I tested it in a project with about 1000 source code files, opening the magit-status buffer took too long. I had to kill the buffer. That is what led me to this conversation in search for a solution.
Here is another idea. How about using an external tool such as git-grep or rg to filter - that is to return a list of file names and line numbers where the keywords are found, then open each file in Emacs, jump to the lines, use whatever Emacs tools available to locate the keywords. If the tool respects .gitignore, such as git-grep and rg, you don't have to call magit-todos-scan-file-predicate on every files.
zhaojiangbin
commented
Jun 14, 2018
|
@alphapapa I see why you want to stay within Emacs. I agree that there are certain advantages of doing that. But it is slow in large projects. I tested it in a project with about 1000 source code files, opening the magit-status buffer took too long. I had to kill the buffer. That is what led me to this conversation in search for a solution. Here is another idea. How about using an external tool such as git-grep or rg to filter - that is to return a list of file names and line numbers where the keywords are found, then open each file in Emacs, jump to the lines, use whatever Emacs tools available to locate the keywords. If the tool respects .gitignore, such as git-grep and rg, you don't have to call magit-todos-scan-file-predicate on every files. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 14, 2018
Owner
@zhaojiangbin Yes, that's a good idea, thanks. Getting the line number and then doing what we're doing now in Emacs would probably work well.
By the way, would you also mind testing with magit-todos-scan-file-predicate set to nil?
One tip BTW: you can C-g in a Magit status buffer to abort what it's doing, rather than having to kill the buffer first.
|
@zhaojiangbin Yes, that's a good idea, thanks. Getting the line number and then doing what we're doing now in Emacs would probably work well. By the way, would you also mind testing with One tip BTW: you can |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 15, 2018
Owner
If you are interested, please try this branch, which memoizes the file-tracked-p function so that subsequent status buffers will open more quickly. In quick testing it seems about 10 times faster on the Org repo.
|
If you are interested, please try this branch, which memoizes the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 15, 2018
Owner
Never mind that branch. :)
agis now supported inmaster, enabled by default ifagis found byexecutable-find. Please let me know how it works for you. The depth is set to0by default, which stops it from scanning past the root directory. It seems to work well in the Org repo with a large depth setting, and you can setmagit-todos-depthin a dir-local variable.- When not set to use
ag, the speed should be improved because it now usesmagit-list-files, which only callsgitonce.
I don't have rg installed, as it's not packaged in the version of Ubuntu I'm using, and I haven't needed it enough to install it by other means. If rg support is really needed, I can see about doing that (or patches welcome, of course).
Thanks.
|
Never mind that branch. :)
I don't have Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 16, 2018
Owner
I may almost have asynchronous insertion of todos working, which would let the magit status buffer open quickly even while a large repo is searched in the background (e.g. the Emacs repo has thousands of todos and takes many seconds to search with ag). However, ag has some bugs with how it handles output options, giving different output depending on whether it's going to a pty or a pipe, and I cannot find a combination of settings and output type that gives output in a simple file:line:column: match format when Emacs runs it asynchronously (it works fine when called synchronously). There are several bugs on the ag issue tracker, the first over 5 years old, and a PR that fixes it which is 2 years old.
I'm not eager to support more than one external backend, but if rg handles this better, maybe it would be worth it. Can anyone with experience with rg comment on this? Thanks.
|
I may almost have asynchronous insertion of todos working, which would let the magit status buffer open quickly even while a large repo is searched in the background (e.g. the Emacs repo has thousands of todos and takes many seconds to search with I'm not eager to support more than one external backend, but if rg handles this better, maybe it would be worth it. Can anyone with experience with rg comment on this? Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 16, 2018
Owner
Ok, I finally got it working. Please test this branch in a huge repo: https://github.com/alphapapa/magit-todos/tree/async-ag It should open the magit-status buffer immediately while running ag in the background, and then insert the todo section when ag finishes.
|
Ok, I finally got it working. Please test this branch in a huge repo: https://github.com/alphapapa/magit-todos/tree/async-ag It should open the magit-status buffer immediately while running ag in the background, and then insert the todo section when ag finishes. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 16, 2018
Contributor
Wow, yea, the Emacs source is slow with this package, even with rg it pauses for 2 to 3 seconds. I'll test out your branch now.
|
Wow, yea, the Emacs source is slow with this package, even with rg it pauses for 2 to 3 seconds. I'll test out your branch now. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 16, 2018
Contributor
It locks up for about 2 seconds (unresponsive keys) and doesn't seem to complete in Emacs project (waited 20+ seconds).
In this own repo, it completes, showing ~13 todos, however now at top of buffer instead of bottom (I like bottom better).
AFAIK, ag doesn't have byte offset support, while rg does.
|
It locks up for about 2 seconds (unresponsive keys) and doesn't seem to complete in Emacs project (waited 20+ seconds). In this own repo, it completes, showing ~13 todos, however now at top of buffer instead of bottom (I like bottom better). AFAIK, ag doesn't have byte offset support, while rg does. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 16, 2018
Owner
Wow, yea, the Emacs source is slow with this package, even with rg it pauses for 2 to 3 seconds. I'll test out your branch now.
Haha, 2 to 3 seconds, is that all? Takes like 12-20 seconds for me.
It locks up for about 2 seconds (unresponsive keys) and doesn't seem to complete in Emacs project (waited 20+ seconds).
20 seconds may not be enough in the Emacs repo, especially if it's hitting the disk instead of the cache.
In this own repo, it completes, showing ~13 todos, however now at top of buffer instead of bottom (I like bottom better).
I had to do some hacks to make it work, so it doesn't respect the usual magit section insertion order. It might be possible to improve that.
AFAIK, ag doesn't have byte offset support, while rg does.
Thanks. I'm just using line and column numbers for now.
I'm trying to get rg working but running into a weird bug.
Haha, 2 to 3 seconds, is that all? Takes like 12-20 seconds for me.
20 seconds may not be enough in the Emacs repo, especially if it's hitting the disk instead of the cache.
I had to do some hacks to make it work, so it doesn't respect the usual magit section insertion order. It might be possible to improve that.
Thanks. I'm just using line and column numbers for now. I'm trying to get rg working but running into a weird bug. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 16, 2018
Owner
I got rg working, so the async-ag branch supports both ag and rg now. Please let me know how it works for you.
|
I got |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 20, 2018
Owner
Seems to be working pretty well now, so I'm going to close this. Feedback welcome!
|
Seems to be working pretty well now, so I'm going to close this. Feedback welcome! |
alphapapa
closed this
Jun 20, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 24, 2018
Owner
@ahungry If you're still interested, I just pushed some more improvements, including optional automatic grouping. It also supports 3 scanners now: find/grep, ag, and rg. I'd appreciate any feedback. The grouping seems helpful in large repos, but it also may introduce some delay in very large repos, so dir-local variables may be helpful to configure it appropriately.
|
@ahungry If you're still interested, I just pushed some more improvements, including optional automatic grouping. It also supports 3 scanners now: find/grep, ag, and rg. I'd appreciate any feedback. The grouping seems helpful in large repos, but it also may introduce some delay in very large repos, so dir-local variables may be helpful to configure it appropriately. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 25, 2018
Contributor
I am still interested, thanks! Reading the new defcustom's and code, it looks very promising and awesome, however I'm having some trouble with the new settings. When I set magit-todos-scan-fn to "Automatic" (or "rg" or #'rg) I receive errors about: 'funcall: Invalid function: "Automatic"' (or "rg"). Using the #'rg just says invalid number of options to the call.
My config is as such:
(use-package magit-todos
:config
(setq magit-todos-scan-fn "Automatic")
(setq magit-todos-max-items 0)
(setq magit-todos-auto-group-items 0)
(setq magit-todos-recursive t)
(setq magit-todos-ignore-directories-always '(".git" ".cask" "node_modules" "vendor"))
(magit-todos-mode))I'll try with a plain settings config.
|
I am still interested, thanks! Reading the new defcustom's and code, it looks very promising and awesome, however I'm having some trouble with the new settings. When I set magit-todos-scan-fn to "Automatic" (or "rg" or #'rg) I receive errors about: 'funcall: Invalid function: "Automatic"' (or "rg"). Using the #'rg just says invalid number of options to the call. My config is as such: (use-package magit-todos
:config
(setq magit-todos-scan-fn "Automatic")
(setq magit-todos-max-items 0)
(setq magit-todos-auto-group-items 0)
(setq magit-todos-recursive t)
(setq magit-todos-ignore-directories-always '(".git" ".cask" "node_modules" "vendor"))
(magit-todos-mode))I'll try with a plain settings config. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 25, 2018
Contributor
It worked in this repo, however in any of the repos where I need it to recurse, even with magit-todos-recursive set to t, it does not seem to recurse (I used above config settings with just magit-todos-recursive uncommented).
|
It worked in this repo, however in any of the repos where I need it to recurse, even with magit-todos-recursive set to t, it does not seem to recurse (I used above config settings with just magit-todos-recursive uncommented). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
alphapapa
Jun 25, 2018
Owner
When I set magit-todos-scan-fn to "Automatic" (or "rg" or #'rg) I receive errors about: 'funcall: Invalid function: "Automatic"' (or "rg"). Using the #'rg just says invalid number of options to the call.
"Automatic" is not a valid choice for magit-todos-scan-fn. Use customize-option and it will work correctly. :)
It worked in this repo, however in any of the repos where I need it to recurse, even with magit-todos-recursive set to t, it does not seem to recurse (I used above config settings with just magit-todos-recursive uncommented).
magit-todos-recursive doesn't do anything anymore; I need to remove that. Searches are recursive by default. Use magit-todos-depth.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ahungry
Jun 26, 2018
Contributor
Hah good catch, works nicely, would be nice to add the --hidden flag (either directly or by a defcustom) to rg so that it will search against hidden directories as well (rg already ignores most VCS ignorables in .gitignore, so no reason to also exclude hidden)
|
Hah good catch, works nicely, would be nice to add the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Nevermind, I see its already a defcustom, sweet! Nice work |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Great, please let me know if you have any other feedback. Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Btw, use-package has a |
ahungry commentedJun 13, 2018
I would use one of the built in Emacs recursive search mechanisms for performing the search of TODO/FIXME etc. tags, as what you have currently (I assume an own-rolled file open + text parse) is way too slow.
If you base it off one of the projectile-search commands, it will also ignore 'project' type directories (.git, node_modules) the user has already customized for ignoring, without requiring being explicit in your definition of it.