Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

counsel-git-grep fails if there are unreadable directories #1502

Closed
icarus-sparry opened this issue Mar 22, 2018 · 17 comments

Comments

@icarus-sparry
Copy link

commented Mar 22, 2018

If du outputs an error message, e.g. du: cannot read directory './doc': Permission denied then counsel-git-grep fails with Wrong type argument: number-or-marker-p, du:

Reproduction steps.

git clone https://github.com/abo-abo/swiper/
cd swiper
chmod 111 doc
make plain

and then invoke counsel-git-grep (C-c j) and observe the error.

Possible fix:

diff --git a/counsel.el b/counsel.el
index b601b90..5b4e992 100644
--- a/counsel.el
+++ b/counsel.el
@@ -1232,7 +1232,7 @@ files in a project.")
   "Default defun to calculate `counsel--git-grep-count'."
   (if (eq system-type 'windows-nt)
       0
-    (read (shell-command-to-string "du -s"))))
+    (read (shell-command-to-string "du -s 2>/dev/null"))))
 
 (defvar counsel--git-grep-count-func #'counsel--git-grep-count-func-default
   "Defun to calculate `counsel--git-grep-count' for `counsel-git-grep'.")
@abo-abo

This comment has been minimized.

Copy link
Owner

commented Mar 25, 2018

Thanks.

@shalin24

This comment has been minimized.

Copy link

commented Apr 3, 2018

This fix seems to break counsel-git-grep with tcsh.

~/.emacs.d % du -s 2>/dev/null
du: cannot access `2': No such file or directory

In emacs, when I do C-c j, I see:
Wrong type argument: number-or-marker-p, du:

@icarus-sparry

This comment has been minimized.

Copy link
Author

commented Apr 4, 2018

Sorry about that.

Solutions include

  1. Let emacs handle the redirection of stderr to /dev/null itself.
  2. Find another function which will invoke a POSIX shell rather than $SHELL
  3. Change function to call sh -c "du 2>/dev/null" (yuk)
  4. dynamically bind shell-file-name (and shell-command-switch) to invoke /bin/sh
  5. Put in more cases into counsel--git-grep-count-func-default to handle tcsh (maybe by getting it to call sh, or just returning 0 as it is on NT.)
  6. Remove counsel--git-grep-count and associated code
  7. something else

The number one reason in the csh considered harmful paper suggesting not to use csh, at least as a programming language, is the difficulty of throwing away errors from programs but not the desired output.

@shalin24

This comment has been minimized.

Copy link

commented Apr 4, 2018

For my project's git repo:
du -s .git returns 8388
git ls-files | xargs wc -l returns 385305
So it looks like du -s .git is not returning the number of lines in my git repo.

I like the idea of being able to configure counsel-git-grep, via counsel--git-grep-count-threshold, to not require a minimum of 3 characters for 'small' repositories.

@basil-conto

This comment has been minimized.

Copy link
Collaborator

commented Apr 4, 2018

@shalin24

So it looks like du -s .git is not returning the number of lines in my git repo.

du stands for "disk usage" and does not count lines or number of files, if that's what you meant. See man du.

@shalin24

This comment has been minimized.

Copy link

commented Apr 4, 2018

I know that.

counsel--git-grep-count-func-default is using du -s .git to calculate counsel--git-grep-count, which is described as Store the line count in current repository. I was merely trying to point out that du -s .git does not return the intended value.

@icarus-sparry

This comment has been minimized.

Copy link
Author

commented Apr 4, 2018

@shalin24 so your point is that the phrase line count is wrong and it should say size (in some arbitrary units, e.g. lines, disk blocks, gallons of beer needed to reproduce) would be better?

@shalin24

This comment has been minimized.

Copy link

commented Apr 4, 2018

@icarus-sparry I assumed that the intended metric here was 'number of lines' as documented in the code, so was just pointing out the inconsistency, in case using something other than du helps to avoid the problem. I am fine with using any metric for determining the size of the repository.

Going back to the problem I saw, I don't have the expertise to comment on the solutions you listed.

I tried solution 3 in tcsh but with sh -c "du -s .git 2>/dev/null", and it worked.

@icarus-sparry

This comment has been minimized.

Copy link
Author

commented Apr 5, 2018

@shalin24 Addressing your points.

Units for size

I am not aware of anything that would not be a lot slower than du (or equivalent) to get an estimate of the size of the repo. Number of lines really doesn't make sense as almost all the data stored in the .git directory is in the sub-directories of the .git/objects where they are held compressed.

sh -c "du -s .git 2>/dev/null"

Yes, it will work. My concern is efficiency. With sh -c "du -s .git 2>/dev/null" emacs first does a fork/exec dance to invoke your tcsh. This then does a fork/exec to fire up sh. This will need to do an exec (and probably another fork depending on the implementation) to fire up du. On an x86_64 machine it takes about 2.3 million instructions to start up a new copy of bash, I don't have the figure for tcsh to hand but it is probably about the same. So here we are spending about 5 million instructions just to get to starting du. Sure in some sense they are not lost, we can't save them up to make something else go faster, and 5 million instructions is not a lot on the scale of modern cpus running in the GHz frequency range, but it still feels wrong.

Other solutions

I was essentially just brainstorming. The yuk comment on 3 is because of the extra shell being invoked. Solution 4 is the most lisp like solution, it is saying "just whilst we are running this code, ignore the existing value of $SHELL and just use sh". If there is another function that invokes /bin/sh rather than $SHELL then solution 2 is the simplest. Solution 5 says use sh -c "du -s 2>/dev/null" if the user is using tcsh (or csh), but just use du -s 2>/dev/null if they are using a POSIX shell, i.e.save the 2.3 million instructions if we can, but if we can't then just pay up.

@basil-conto

This comment has been minimized.

Copy link
Collaborator

commented Apr 5, 2018

Sorry, I hadn't been following the content of this issue; I think I could have spared everyone some time/writing.

I believe all of the aforementioned issues can and should be solved via

  1. Let emacs handle the redirection of stderr to /dev/null itself.

and

  1. something else

by calling du directly rather than via the shell, as I prophesied almost two moons ago: #1470 (comment).

I propose either:

diff --git a/counsel.el b/counsel.el
index 83cbbc9..b9e8cd5 100644
--- a/counsel.el
+++ b/counsel.el
@@ -1248,7 +1248,9 @@ counsel--git-grep-count-func-default
   "Default defun to calculate `counsel--git-grep-count'."
   (if (eq system-type 'windows-nt)
       0
-    (read (shell-command-to-string "du -s .git 2>/dev/null"))))
+    (read (with-output-to-string
+            (call-process "du" nil (list standard-output nil) nil
+                          "-s" ".git")))))
 
 (defvar counsel--git-grep-count-func #'counsel--git-grep-count-func-default
   "Defun to calculate `counsel--git-grep-count' for `counsel-git-grep'.")

in which case the size of the unreadable parent directory itself is returned in this example (8 KiB on my machine), or:

diff --git a/counsel.el b/counsel.el
index 83cbbc9..c9e9830 100644
--- a/counsel.el
+++ b/counsel.el
@@ -1248,7 +1248,12 @@ counsel--git-grep-count-func-default
   "Default defun to calculate `counsel--git-grep-count'."
   (if (eq system-type 'windows-nt)
       0
-    (read (shell-command-to-string "du -s .git 2>/dev/null"))))
+    (with-temp-buffer
+      (let ((status (call-process "du" nil '(t nil) nil "-s" ".git")))
+        (if (eq status 0)
+            (read (buffer-string))
+          ;; Handle error
+          )))))
 
 (defvar counsel--git-grep-count-func #'counsel--git-grep-count-func-default
   "Defun to calculate `counsel--git-grep-count' for `counsel-git-grep'.")

in order to handle the error differently.

@abo-abo

This comment has been minimized.

Copy link
Owner

commented Apr 5, 2018

5 million instructions

This is negligible to me, since these instructions are tied to a single key press - they're not some code that gets called in a loop. An overhead of ~0.1s per key press would actually be serious and something to fix, since it's perceptible by the user. But again, we're nowhere close to that.

(call-process "du" nil '(t nil) nil "-s" ".git")

This is a lot less clear than:

(shell-command-to-string "du -s .git 2>/dev/null")

Not everyone is familiar with Elisp, but most are familiar with a shell. So it's a huge advantage to be able to see transparently that Emacs does the same thing as the shell. On numerous bugs I've told users to simply copy-paste the shell command from code and run it to examine the output, it's very useful.

Now if someone cares a lot about overhead, it could be possible to make a compile-time macro that would parse the "du -s .git 2>/dev/null" string and compile it into (call-process "du" nil '(t nil) nil "-s" ".git"). This could actually result in a useful package, and if it made its way to ELPA I wouldn't mind to have Ivy start using it.

@basil-conto

This comment has been minimized.

Copy link
Collaborator

commented Apr 6, 2018

@abo-abo

if someone cares a lot about overhead

This issue isn't about shell overhead; it's about dealing with incompatibilities between and nuisances of shells; see #1502 (comment).

(call-process "du" nil '(t nil) nil "-s" ".git")

This is a lot less clear than:

(shell-command-to-string "du -s .git 2>/dev/null")

Indeed, the calling convention of call-process is, unfortunately, not as pleasant to the eye as that of, say, process-lines or start-process, but I think it's unwise to deliberately choose an inferior (in that it produces real, not theoretical or philosophical, issues like this one) solution which inconveniences users just because of code readability.

If the only blocker here is code readability, just add a convenience wrapper. Here's a toy example, off the top of my head:

(defun counsel--process-output (&rest command)
  "Call `process-file' on COMMAND, returnings its stdout."
  (with-output-to-string
    (apply #'process-file
           (car command)
           nil (list standard-output nil) nil
           (cdr command))))

;; Usage
(read (counsel--process-output "du" "-s" ".git"))

One can obviously take this idea to town, e.g. by adding another command which also returns the error code returned, for further error handling.

In a programming environment like Emacs, which already provides a system interface, the customisable shell should be reserved, where possible, for users; not programmers of libraries written in ultra-high-level languages like Elisp.

As a mere interested user I obviously can't speak from authority, but I see Ivy/Counsel as libraries which provide many interactive conveniences for the average Emacs user. Providing an over-simplified source, at the expense of a restriction on the shell users are able to use, is not one of said conveniences. It is up to convenience libraries to bite the bullet, where necessary, in the name of increased usefulness and reduced restriction to the user.

Not everyone is familiar with Elisp, but most are familiar with a shell. So it's a huge advantage to be able to see transparently that Emacs does the same thing as the shell.

As I've said, it is possible to make calling processes directly more readable. Given the incompatibilities between different shells, I would argue calling processes directly is more transparent than any shell invocation under the sun. When invoking the shell, I can never be entirely confident that all the arguments and shell redirections are escaped and quoted properly to be as cross-platform as I'd like them to be. Calling processes directly gives me more confidence and reduces mental strain.

On numerous bugs I've told users to simply copy-paste the shell command from code and run it to examine the output, it's very useful.

I don't find telling someone to run "du -s .git" any more difficult than copy-pasting a shell command from an Elisp program, but that's just me.


To summarise my points:

  • Calling the shell makes the program less portable and restricts the user's choice of shell, which results in real issues like the current one.
  • Calling processes directly need not be as unreadable as the argument list of call-process.
  • Calling the shell is usually slower than calling the equivalent process directly.
  • I find arguments in favour of shell invocations in the source weak.

This is just my $0.02; if you disagree, you disagree. Either way, thanks for all your work.

@abo-abo abo-abo reopened this Apr 6, 2018

@abo-abo

This comment has been minimized.

Copy link
Owner

commented Apr 6, 2018

Reopen to fix for tcsh.

@shalin24

This comment has been minimized.

Copy link

commented Nov 27, 2018

tcsh throws more errors for the latest counsel--git-grep-count-func-default.
Replacing du -s \"$(git rev-parse --git-dir)\" 2>/dev/null with du -s `git rev-parse --git-dir` works for tcsh.

I defined my own function that uses the latter command from above, and set counsel--git-grep-count-func to call that function.

Still, I am wondering if and when this will be updated/fixed in a different way?

@basil-conto

This comment has been minimized.

Copy link
Collaborator

commented Nov 27, 2018

Still, I am wondering if and when this will be updated/fixed in a different way?

I think consensus has been reached, since this issue was originally reported, that the ideal solution for this and several other issues would be to move away from the shell to direct process invocation. I have been interested in doing this for a while now, but have lacked and will continue to lack the time to do it wholesale. In the meantime, @abo-abo has recently started incrementally refactoring specific features to avoid the shell. This might inspire others (and me) to tackle one shell issue at a time, rather than incubating a major overhaul which never comes. Either way, sorry about the delay.

basil-conto added a commit to basil-conto/swiper that referenced this issue Nov 28, 2018
counsel.el: Call du process directly
(counsel--git-grep-count-func-default): Call git and du directly to
avoid shell incompatibilities in redirection.

Re: abo-abo#1460
Fixes abo-abo#1502
basil-conto added a commit to basil-conto/swiper that referenced this issue Nov 28, 2018
counsel.el: Call du process directly
(counsel--git-grep-count-func-default): Call git and du directly to
avoid shell incompatibilities in redirection.

Re: abo-abo#1470, abo-abo#1558
Fixes abo-abo#1502

@abo-abo abo-abo closed this in 3ac2b6b Nov 28, 2018

@abo-abo

This comment has been minimized.

Copy link
Owner

commented Nov 28, 2018

@basil-conto Thanks for the PR.
@shalin24 Please test if it works.

@shalin24

This comment has been minimized.

Copy link

commented Dec 4, 2018

@basil-conto @abo-abo Thanks for the fix, it works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.