Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing +x bit on scripts that are run and not sourced #1345

Merged
merged 1 commit into from
Apr 14, 2024

Conversation

EliahKagan
Copy link
Contributor

@EliahKagan EliahKagan commented Apr 13, 2024

This adds executable permissions to shell scripts that begin with a #! and are meant to be run rather than sourced into another script. Most of these already had those set, so the inconsistency of having a handful of them unset does not appear intended.

The purpose of this change is to make clearer which .sh files are meant to be run directly in a new shell process (which is most, but not all, of them), and which are meant to be sourced into another script rather than run (i.e., those that are script "libraries"). Executable permissions are intentionally not added to .sh files without shebangs that are meant for sourcing rather than running.

This was not causing test failures because the way the scripts were invoked did not rely on being able to directly execute them. Nonetheless, it is clearest for these metadata to match the scripts' shebangs. (Scripts that shouldn't have the executable bit set should likewise not have shebangs, but there didn't appear to be cases where scripts wrongly had shebangs or wrongly had +x set.)

A further benefit is that executable scripts are traditionally named with no special suffix while files meant for sourcing into a shell are traditionally named with a .sh suffix. Since that is not the convention followed here, setting the executable bits may be especially useful to clarify intent.

This pull request also removes doubled end-of-file newlines (where a file ended in two newline characters instead of just one) where present in a few files whose permissions were being changed.

The rationale for including that here, even though no other script cleanup is done here, is that sometimes this happens as an overcorrection when editing scripts on Windows, where many editors don't automatically add even a single end-of-file newline. Windows is also where intended executable bits are sometimes omitted. That, in turn, is due to the absence of a Unix-style chmod, and how Cygwin-like environments infer likely executable permissions from shebangs rather than file metadata. (Git still tracks them, but Windows's own permissions system differs).

Files whose metadata were not being changed were not checked for extra newlines, nor otherwise edited, here.


I had originally intended also to include a change to one or more .gitattributes files here, to make it so that files meant to be run or sourced as shell scripts (i.e., in this project, all .sh files) are checked out with their actual LF line endings on all platforms, including Windows where Git is usually configured to replace LF line endings with CR LF line endings.

The reason is that, unlike code in most languages, shell scripts technically require LF line endings. I say "technically" because Bash in Cygwin-like environments--but not WSL--will automatically pretend CR LF endings are LF in order to work around these kinds of issues. One disadvantage of such line endings this is that it is hard to run shellcheck on them on Windows (without running dos2unix first) because of all the warnings it gives about the line endings being wrong.

The reason I have not included that change is that there is already specific and intentional handling of line endings for shell scripts in .gitattributes files in this project, which I think I may not fully understand. I think the simpler rule *.sh text=auto eol=lf may be sufficient to achieve those goals, but I am not sure.

If such a change is wanted then I could add another commit here for it, or do it in a separate PR.

This adds executable permissions to shell scripts that begin with a
`#!` and are meant to be run rather than sourced into another
script. Most of these already had those set, so the inconsistency
of having a handful of them unset does not appear intended.

The purpose of this change is to make clearer which .sh files are
meant to be run directly in a new shell process (which is most, but
not all, of them), and which are meant to be sourced into another
script rather than run (i.e., those that are script "libraries").
Executable permissions are intentionally not added to .sh files
without shebangs that are meant for sourcing rather than running.

This was not causing test failures because the way the scripts were
invoked did not rely on being able to directly execute them.
Nonetheless, it is clearest for these metadata to match the
scripts' shebangs. (Scripts that shouldn't have the executable bit
set should likewise not have shebangs, but there didn't appear to
be cases where scripts wrongly had shebangs or wrongly had +x set.)

A further benefit is that executable scripts are traditionally
named with no special suffix while files meant for sourcing into a
shell are traditionally named with a .sh suffix. Since that is not
the convention followed here, setting the executable bits may be
especially useful to clarify intent.

This commit also removes doubled end-of-file newlines (where a file
ended in two newline characters instead of just one) where present
in a few files whose permissions were being changed.

The rationale for including that here, even though no other script
cleanup is done here, is that sometimes this happens as an
overcorrection when editing scripts on Windows, where many editors
don't automatically add even a single end-of-file newline. Windows
is also where intended executable bits are sometimes omitted. That,
in turn, is due to the absence of a Unix-style chmod, and how
Cygwin-like environments infer likely executable permissions from
shebangs rather than file metadata. (git still tracks them, but
Windows's own permissions system differs).

Files whose metadata were not being changed were not checked for
extra newlines, nor otherwise edited, here.
@EliahKagan EliahKagan marked this pull request as ready for review April 13, 2024 22:32
Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for cleaning this up, it's much appreciated! I also enjoyed reading the rationale, I learned something.

The reason I have not included that change is that there is already specific and intentional handling of line endings for shell scripts in .gitattributes files in this project, which I think I may not fully understand. I think the simpler rule *.sh text=auto eol=lf may be sufficient to achieve those goals, but I am not sure.

If such a change is wanted then I could add another commit here for it, or do it in a separate PR.

I'd very much appreciated cleaning up the attributes on shell-scripts! The reason they are like they are is me not being fully aware of what's truly needed, and me not working on Windows so I wouldn't really find out.

It's notable that the way the fixture-system is designed, it won't have to actually execute most scripts on Windows anyway as it instead decompresses an archive that has been created on a unix system. This is usually done in the name of performance, but also when it's considered to have no effect.

However, since you are working on Windows and know everything about its differences and subtleties, it might be better for validating system compatibility in tests if Windows would not rely on Unix archives as much. Again, these are created for performance as scripts take some time to run, particularly on Windows, but having done so whenever possible might have introduced a smoother cross-platform testing experience than would otherwise be the case.

For me, testing and making changes for Windows is rather painful, so I was OK with taking that shortcut. But I am also very open to seeing this changed.

A way to find out about differences between the archive and actual script execution would be to set GIX_TEST_IGNORE_ARCHIVES=1 and try to run the test suite. It might fail because there are differences in expected outcomes, or because it can't actually execute the fixture scripts at all. Ideally, the archives are just for performance optimization, and all scripts should work.

Lastly, should you decide to take on such investigation at some point, gix-testtools (source) already uses .gitignore to indicate that archives shouldn't be created (or used, for that matter), and I could totally imagine that additional differences and tuning could be performed by means of .gitattributes.

Please let me know if there are any other questions about this - here I can answer them all in detail :).

@Byron Byron merged commit fe24c89 into Byron:main Apr 14, 2024
19 checks passed
@EliahKagan EliahKagan deleted the shell-scripts branch April 14, 2024 06:43
@EliahKagan
Copy link
Contributor Author

A way to find out about differences between the archive and actual script execution would be to set GIX_TEST_IGNORE_ARCHIVES=1 and try to run the test suite. It might fail because there are differences in expected outcomes, or because it can't actually execute the fixture scripts at all. Ideally, the archives are just for performance optimization, and all scripts should work.

I had thought I'd tried this before, but actually I had not. 16 tests fail when this is done. I can try to open an issue about this sometime soon. I don't know how much detail would be desired.

This should not be confused with failures that occur when the tests are run outside a Git Bash environment, which happens even without setting that environment variable and is due to having paths as the current working directory, and possibly in some other environment variables, with Windows-style \ separators that are treated as escape characters by the shell scripts. There should probably be a (separate) issue about that too, at least if it is not known. (The CI environment uses Git Bash, so it doesn't happen there.)

@Byron
Copy link
Owner

Byron commented Apr 15, 2024

I had thought I'd tried this before, but actually I had not. 16 tests fail when this is done. I can try to open an issue about this sometime soon. I don't know how much detail would be desired.

Thanks for trying it out! 16 tests is only a small fraction, fortunately, even though trying to address these can be a time-sink.
Maybe you could open the issue and refer to the Gist for details so it's publicly stated to be a known issue. Trying to address these one by one might be an excellent motivation to get to know the different crates, even though I'd expect the failures are mostly due to incorrect test expectations rather than actual issues.

One test failure that stuck out to me might be a curious case where gitoxide is probably more correct than Git, at least so I'd hope. As it's part of a baseline test, an exception could probably be added if that's true. This paragraph is probably proof that this could be a great avenue to get to know the codebase, while improving the cross-platform experience. In a way, the test-suite doesn't actually run on Windows yet as it relies on caches, which isn't the case on Linux.

Please also note that one day, once tests work on Windows without archives, one could consider running the actual tests (without archives) on Windows as well. It's probably going to take twice as long as on Linux which means that it would probably have to run so that failures are just warnings so one doesn't necessarily have to wait for it to finish.

There should probably be a (separate) issue about that too, at least if it is not known. (The CI environment uses Git Bash, so it doesn't happen there.)

I agree, this sounds like a separate issue which requires a separate kind of fix. Maybe it happens in 2 stages as well. Once with archives, and once without once the 16 test failures are resolved.

Please do, however, proceed in any direction you want, with the issues nothing will be forgotten.

@EliahKagan
Copy link
Contributor Author

For that one, is the idea that the pattern */\' should not match the pathname XXX/\' on the grounds that, although * matches XXX and / matches /, \' does not match \' because \' instead matches only a literal '?

If so, then I wonder if this is a difference between upstream Git, and Git for Windows. On Windows, \ is the primary character used as a directory separator (with / also being usable). Git prefers to output paths that use /, including on Windows. But in input, it tries to support \ as a directory separator on Windows and usually manages to do so.

Should .gitignore make an effort to permit \ as a directory separator in .gitignore on Windows? It seems like a bad idea, if doing so wouldn't work on other systems that check out the code. The upstream Git nonetheless documents no such thing and I think does not do it. Looking through the Git for Windows repository does not seem to reveal such behavior either. Yet someone has reported that, at least at some point, backslashes seemed to be usable this way.

I can do some experiments later if the answer is not apparent.

@Byron
Copy link
Owner

Byron commented Apr 15, 2024

For that one, is the idea that the pattern */\' should not match the pathname XXX/\' on the grounds that, although * matches XXX and / matches /, \' does not match \' because \' instead matches only a literal '?

I am not sure what it is supposed to do or why there is a difference on Windows, but it seems that gitoxide has a match whereas Git does not, as left should be the actual value, whereas the baseline produced by Git provides the expected one.

Should .gitignore make an effort to permit \ as a directory separator in .gitignore on Windows? It seems like a bad idea, if doing so wouldn't work on other systems that check out the code. The upstream Git nonetheless documents no such thing and I think does not do it.

I also couldn't find anything that would indicate \ is recognised as path separator in the Git codebase, but didn't check Git for Windows.

However, it does look like maybe that's the reason they differ, and gitoxide should be able to provide a good user-experience on Windows as well.

I can do some experiments later if the answer is not apparent.

Yes, it would be nice to learn what is what here. In theory, if just Git is compiled on Windows, the baseline would be expected to match. But with Git for Windows, we see the issue noted above.
Once that is validated, gitoxide could probably learn to treat backslashes as path-separator at least as well as Git for Windows does. And due to the way the matching machinery works, I'd expect this to work better for directory-based matches (like target/, which matches only a directory).

@EliahKagan
Copy link
Contributor Author

EliahKagan commented May 6, 2024

I am not sure what it is supposed to do or why there is a difference on Windows, but it seems that gitoxide has a match whereas Git does not, as left should be the actual value, whereas the baseline produced by Git provides the expected one.

I've returned to this (with the intention of opening an issue for the 16 failures on Windows with GIX_TEST_IGNORE_ARCHIVES=1, as discussed above), and I'm having some trouble understanding what is going on with this specific test. I don't plan to examine each of the 16 failing tests before opening the issue, but I figured that since we were already discussing this one and you had identified it as particularly interesting, I should get clear enough on it that I can report or summarize what is known about it, in the forthcoming issue.

It looks to me like the assertion that is failing is not directly testing gitoxide, but is instead checking that Git itself behaves in the expected way, so subsequent checks can reasonably compare against it. On the assertion that is failing, the expected results for a case seems to expressed as a combination of which file make_baseline.sh writes the case into, and hard-coded true or false in matching.rs for that file.

while read -r pattern value; do
echo "$pattern" "$value"
echo "$pattern" > .gitignore
echo "$value" | git check-ignore -vn --stdin 2>&1 || :
done <<EOF >git-baseline.nmatch
/*foo bam/barfoo/baz/bam
/*foo bar/bam/barfoo/baz/bam
foo foobaz
*/\' XXX/\'

for (input_file, expected_matches, case) in &[
("git-baseline.match", true, pattern::Case::Sensitive),
("git-baseline.nmatch", false, pattern::Case::Sensitive),
("git-baseline.match-icase", true, pattern::Case::Fold),
] {

In a loop though the entries, we have the assertion that is failing:

assert_eq!(
is_match, *expected_matches,
"baseline for matches must be {expected_matches} - check baseline and git version: {m:?}"
);

This keeps it from getting to the "actual" logic that calls gix_glob code, turns a panic into a test failure if there was one, and otherwise assert the expected result. Rather than failing for the particular pattern at issue, that logic is not reached due to the assertion that precedes it.

So to me it looks like the "expected" value in the failing assertion is the value false, hard-coded for that iteration of the outer loop, and that, in the inner loop with the pattern */\' and the pathname XXX/\', the value of true that is unequal to it comes from Git itself, prior to checking what any facility in gix_glob does with the pattern. But that would be the opposite of what you are suggesting about Git considering them non-matching.

I think there is a substantial amount of relevant code that I have not examined here. So I would not be surprised to learn that my impression so far is mistaken. However, I figured I'd post my thinking on this as it stands, in case there's a easy explanation or correction related to my thinking.

(My short-term goal in opening this issue is not to work on the bug immediately, but instead to open issues for test failures and other bugs that I am aware of but not working on at the moment. As you have said, with the issues, nothing will be forgotten. However, there is no rush here, since GitHub will notify me whenever you reply. You can also of course ask me to open a brief issue before any of this is clarified, in which case I will do so.)

@Byron
Copy link
Owner

Byron commented May 6, 2024

Thanks for sharing all these details which help me to get back into it. The baseline-test was definitely written with unix-only assumptions, and it never inconvenienced itself with dealing with Windows-specific differences that might arise without an archive. Git for Windows not being a match here might mean there is some special handling of backslashes.

Ultimately, what would have to be done is to examine what Git for Windows does, and then see if the test can be made to match its outcome, if especially if it seems more sensible. Then again, I was never looking into Git for Windows and hence didn't see anything specific it might have added to the wildmatch.c code, or maybe in other places.

Ultimately, once the test passes, one would have to .gitignore the respective archive and ignore it to avoid it from being used and generated in future.

@EliahKagan
Copy link
Contributor Author

Git for Windows not being a match here might mean there is some special handling of backslashes.

This is my main area of confusion: it seems to me that Git for Windows is a match here, and that Git on other systems is not, and that the purpose of the failing assertion is to verify that it is not a match. But I've interpreted some of the things you've said, including that quoted text above, to mean that this understanding of mine is not correct.

When I create a repository with the .gitignore file whose content is

*/\'

and run the command

git check-ignore -vn "XXX/\'"

I get this output on Ubuntu:

::      "XXX/\\'"

And when I run it on Windows (the quoting rules for "XXX/\'" happens to be the same in PowerShell as in POSIX-compatible shells, even though they not the same in general), I get this output:

.gitignore:1:*/\'       "XXX/\\'"

To double-check this, and also to make it more demonstrable, I performed a variation on this experiment on CI, where Ubuntu, macOS, and Windows were all tested, and where bash was used as the shell on all of them. (In GitHub Actions, on Windows runners, bash is Git Bash.) The results are as above: the pattern is shown as not a match on Ubuntu and on macOS, and as a match on Windows.

That seems to accord with my understanding of the test failure as indicating that this is a match only on Windows, that the assertion is asserting that it is never a match, and that the assertion is testing only expected behavior of Git as opposed to the subsequent test logic that also involves behavior of gitoxide. Yet, I worry that I may be mistaken, since it seems like we are each expressing opposite interpretations of what is being asserted--though maybe we are saying the same thing, somehow, yet talking past one another.

(But I reiterate that there is no hurry here. It looks like I may end up opening an issue for this even before I am totally clear on what you are saying about what this shows, since it looks like at least one other issue that I wish to open would benefit from being able to refer to that issue for clarification. If I open an issue before I full understand this, then I can and will expand or otherwise revise it when more information is available.)

@EliahKagan
Copy link
Contributor Author

EliahKagan commented May 7, 2024

As suggested above, I've gone ahead and opened issue #1358 for this even though I'm not confident that I adequately understand the specifics of the particular failure we were discussing here, so that I can reference the issue in other issues.

Further discussion of gix_glob-related issue discussed here could continue in either place, as I can always edit the most important information into the issue there and/or add a comment wherever information about it is absent but needed.

@Byron
Copy link
Owner

Byron commented May 7, 2024

Thanks for elaborating - I am pretty sure that I flipped the meaning of match and not-match as you are referring to the .nmatch variant of these baselines, which are not supposed to match.
Ultimately, the only way to resolve this is to dig in and look at what's happening specifically. In any case, it seems clear something different is happening on Windows which seems to match \ directly (maybe).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants