Autogenerate gix-packetline-blocking/src #1340

EliahKagan · 2024-04-09T05:07:58Z

This replaces the symbolic link at gix-packetline-blocking/src with a copied source tree from gix-packetline/src, with headers prepended to state that they are copies and, for each file, what file it is a copy of (other than that header).

The headers are in the format suggested in #1310 (comment), except that a blank line (i.e., two newlines rather than one) appears after each one. In files where there are already //! comments at the top, I am unsure if a blank line is correct, or if it should instead be omitted or a second line of just //! added to signify a paragraph break. Currently there is only one such file, gix-packetline/src/lib.rs.

This also adds a shell script, etc/copy-packetline.sh, which was used to perform the copy, and can be used at any time to update the copy when changes are made in gix-packetline/src. It can be invoked directly, or by running just copy-packetline. Both approaches work even if gix-packeltine-blocking/src has is absent or has been checked out as a regular file due to core.symlinks being false or symbolic links being otherwise unavailable.

However, the latter approach, even though at this time no Rust tool is used, will still print an error message that may mislead the person running it into thinking it has failed. It shows the same message as in #1310 (comment). But this message actually always happens whenever a just command is run when the project structure is unacceptable to cargo. The cause is:

https://github.com/Byron/gitoxide/blob/68d1a29e8e497fa130e5733d1ecb6b7694a489ca/justfile#L193

It seems to me that it should be possible to keep that message from being printed for recipes that don't rely on it. But I think that ought be considered outside the scope of this PR.

The script is used on CI to check that it has no effect. For now I have added three jobs with a matrix strategy, but this is unnecessary to check that the script has been run if needed; for that we just need one. Instead, running it on three platforms checks that the script works on all of them (I don't have a macOS system). Because it does not test the full range of the script's functionality--as noted below, the script implements specific rules for when it is willing to make the copy--I also tested the script, I believe thoroughly, locally on Windows and Ubuntu.

I've kept all three jobs so if the script has to be changed during further work on this PR then they will be there for testing. I think all but the ubuntu-latest job can ultimately be removed.

The script tries to refuse only when its effects could lead to data loss or confusion. The rules it follows, for the target, are implemented and commented in the check_target_dir shell function. Basically:

If nothing at all exists at the target location gix-packetline-blocking/src on disk, then always proceed. This is safe, and it provides a simple way to get to a state where the script will regenerate the files: deleting the entire subtree indicates a willingness for it to be replaced.
If something exists at the target on disk and there is no merge in progress, then proceed so long as all changes at or under the target location are staged and no ignored files are present there.
If something exists at the target location on disk and there is a merge in progress (whether due to a conflict or a deliberately uncommitted merge), then proceed so long as there are no changes of any kind an the target location (not even staged changes) and no ignored files are present there.

The script does not stage its own changes. This helps avoid data loss, makes it easy to keep changes from the script identifiable by staging and then running it, and makes the script suitable for use in a pre-commit hook if desired.

The script decides between \n and \r\n for the line endings of each file's prepended header by checking how the first line of the file being copied ends. I believe this is a better approach that analyzing the file as a whole, since if a file is to be inconsistent, it ought not to be made more inconsistent by having more separate sections with different line endings. Because various platforms, git configurations, and approaches to editing files affect line endings, I think there is no reasonable way to decide this besides to look at the files themselves. They are checked in a way that overcomes the strong tendency of tools in Cygwin-like environments (including MSYS2 and "Git Bash") to claim that \r\n is merely \n.

The script performs a single pass, adding the headers at the time the files are written, rather than copying the files and then modifying them. This has some advantages, which might justify retaining this approach over others that have been discussed, as detailed below.

It may be that this shell script should be partly or wholly replaced with a tool coded in Rust. I believe that was the idea in #1310 (comment): that a shell script would check if it is safe and reasonable to copy the file and, if so, do so and then invoke a Rust tool to prepend the headers. The reason not to write the whole thing in Rust was that it can fail initially if there is no symlink or directory tree. I embarked on writing the script with the idea that I would write the whole thing as a shell script to get clear on exactly what it should do, then pare down the script, replacing the header-prepending code with a Rust tool.

As I wrote the script, I found that it was unexpectedly tricky, relative to the apparent simplicity of the task, to implement it as a sufficiently portable shell script that was robust to rare but plausible oddities. That made me think as much of it should be turned into Rust as possible. But I persisted in implementing the whole thing as a shell script initially, because I also noticed some advantages of it being implemented that way. I now suspect those advantages are sufficient to justify it remaining as a shell script in its entirety:

Copying the files and then prepending their headers runs the risk of having copies without headers, that appear to be files that are reasonable to edit and that are easy to edit by accident instead of the correct ones, if the task is interrupted or fails. Failure is likely as long as running a separate Rust tool requires cargo to be able to run and see files where they're supposed to be everywhere in the workspace; it may fail due to conceptually unrelated breakages in any crate.
Copying the files and then prepending their headers seems like it will introduce delays, and occasional minor annoyances, due to language servers (as used in editors and IDEs) triggering an analysis of the files as they are originally created, and then again when they are edited. The delay in between may be enough for this to happen, and slight experimentation suggests this is a real issue at least on Windows, but I do not really know for sure how big of a problem this would be. (In particular, this sort of thing could be even further ameliorated by avoiding removing an existing tree before updating it, and I did not go that far.)
The script is very fast on CI, which runs the script directly rather than through just, because no Rust toolchain has to be installed.
When one is making frequent edits inside gix-packetline/src, one might wish to perform the updates automatically using a pre-commit hook. Having them done by a script that doesn't depend on other state of the project benefits this, mainly in that one could leave the hook enabled all the time even when making other unrelated changes that don't require cargo to accept the state of the workspace.

I am inclined to find those factors persuasive when taken together, but I am not certain. In particular, this should be weighed against one of the benefits of having as much of this logic in Rust as possible: more gitoxide developers are likely to be able to work efficiently with the code, since gitoxide is a Rust project. (On the other hand, gitoxide does rely substantially on shell scripts as fixtures.) In addition, while this is the sort of thing one would think a shell script would be great for, it actually incurs some complexities due to limitations related to portability that I believe would be absent in Rust code. Ways the script is weird or cumbersome in service of portability are detailed below.

Although I did use CI checks during development (rebasing onto a branch of my fork in which CI runs on all branches), I've decided not to deliberately push each commit separately in this PR as I often do in GitPython. I'd be pleased to do something that like on request. The reasons I've refrained are that I don't know if it would be welcome here, with the more involved CI checks, and that I wonder if I may need to rebase this anyway, as I think this is a case where messages in the conventional commits style aren't called for, but I'm not sure.

Cygwin-like environments on Windows, including MSYS2 and the MSYS2-ish "Git Bash" environment, are weird, and some of the weirdness, as mentioned above and as commented in the script, constrains the approaches that can be taken to distinguish \r\n from mere \n, especially since the script should avoid using anything that may not be available on other systems that have nothing to do with Windows. This is covered in full in the big comment in first_line_ends_crlf, and thus not detailed here.

Bash offers a number of nice features beyond what are standardized for sh by POSIX. But many of them were introduced in Bash 4, which macOS does not and probably never will ship. This needs to work on macOS systems where no newer Bash has been installed; furthermore, most scripts in the project specify /bin/bash as the interpreter, and it could be confusing to change that to /usr/bin/env bash just for this. (I have tried to adhere to the style of other scripts.) Before major version 4, Bash lacks recursive globbing (globstar), the ability to have the rightmost command in a pipeline affect the calling environment (lastpipe), and the ability to check the exit status of a command run in process substitution (process substitution would otherwise be a workaround for not having lastpipe).

I assume Bash 3 or later, and the existence of command-line tools that are required by POSIX (in addition, I try to avoid using any that are, in practice, ever absent; for example, file is often not installed in Docker images). This is with one notable exception: I assume that find, which I'm using because I don't have recursive ** globs, supports the -print0 action; if it does not, then the script will fail (but signal the failure properly, due to set -o pipefail, which is supported even in Bash 3).

The contents of this repository are generally trusted, and I doubt anyone will deliberately create a file with a newline character in its name (which Windows can't tolerate, but other systems can) without good reason. But:

There may be a reason to have such files, for the purpose of testing gitoxide (or its interaction with other tools such as gpg or ssh) in their presence.
They can be created by accident.
Assuming the full range of possible filenames cannot occur goes wrong more often than one tends to predict.

Of course, this can be made fully portable by using the -exec action. But this has a couple of problems:

It makes lots of subprocess invocations (or incurs the further complexity of also implementing a loop to receive multiple paths from the + form).
It is hard to portably stop the entire traversal on the first failure.
I did it that way first and the script was more complicated, longer, and harder to read.

Fortunately -print0 seems in practice to be very portable. GNU findutils, BusyBox find (as in minimal environments or Alpine Linux), and the find command in BSD systems, including macOS, all support it, and have supported it for a long time.

It may be that there is a much simpler and more elegant way to do this than I think, even in a shell script. Ordinarily, when writing an explicit loop in a shell script, one should consider alternatives (and I have considered some). However, before proceeding with an idea to simplify the script, the general problem that most Unix-style tools that conceptually operate on text do not properly distinguish between \n and \r\n in Cygwin-like environments should be considered. If the cumbersome state of the script ought to be ameliorated, then I suspect that replacing it with Rust may be the overall best approach. I am also curious if there is some way to do the whole thing in Rust yet let it work even if cargo frowns on the state of the workspace.

By copying the files with added headers, instead of having a symlink. Right now this is entirely as a shell script. It may be beneficial to rewrite at least some parts, especially those that are unnatural to implement portably as a shell script even given Bash, in Rust.

This was always intended to work, but didn't because script subprocesses always used the original command name/path, but this was after changing directory. The script already relies on the repository being organized a particular way, so relying on itself (or another script suitable to do its work) being at the known location is no worse. This also makes it unnecessary to guard against the rare {}-in-path case. This change has the further benefit that if the script is ever successfully invoked via a Windows-style path with "\" separators (which can happen on Windows), its subprocesses can still execute. This also removes an altogether ineffective attempt to support the rare case of a working tree whole top-level directory name ends in a newline character, and instead changes the comment to note that this does work. A few other comments throughout are reworded for clarity as well. It may make sense to change the script more deeply than this (even before replacing part of it with a Rust tool). By sacrificing a small amount of portability, a loop could be used, as find recognizes -print0 in GNU/Linux (via GNU findutils), most BSD systems including macOS, and busybox (thus also Alpine Linux). Unfortunately a recursive glob, which would be even nicer, is not supported by versions of bash that ship with macOS.

This modifies the script so that: - When the target does not exist on disk at all, we proceed, even if the deletion is an unstaged change. - When there is an uncommitted merge, as happens when a conflict is being resolved (and occasionally otherwise), recognize this case (rather than misinterpreting the short status output), proceeding only if there are no changes at all at the target. Even staged changes cause the copy to be cancelled in that case. - Some cumbersomely expressed code is written more clearly, a couple of specific error messages are added for what should be rare failures of git commands, checking for grep failure is removed (grep should not actually fail if used correctly), and the big comment in first_line_ends_crlf is made more accurate.

This doesn't happen on Windows filesystems, but can happen elsewhere. It can happen on occasion by accident.

`find -print0` is not POSIX, but supported systems include: - GNU/Linux systems, using GNU findutils - Other Linux-based systems using BusyBox find (e.g. Alpine Linux) - BSD systems, including macOS (all versions, since 10) Since this is a Bash script, and even the old versions of Bash on macOS systems support `read -d ''` to read null-byte-delimited input, assuming `find -print0` is supported everywhere needed, this is highly likely to work everywhere needed. It allows filenames to be safely read even if they contain weird characters including newlines, as can happen accidentally (hopefully not intentionally). It would be nicer to loop through the result of a recursive glob (globstar, **), but Bash only supports that starting in version 4, which no version of macOS ships (or is likely to ship). The benefit of this approach, compared to the `find -exec` way used before, is that the code is simpler, no longer needs to call itself as a subprocess through find, and no longer needs a help to clarify a mode that is not meant to be used from the outside. (It may also be slightly faster.) This makes some behavioral changes, in areas where the design had been driven by the implementation rather than the other way around: - We stop on the first failure. - Because of that, there is no need to restate what files could not be generated (it is at most one, and the failure should show it). - As noted above, this script can no longer be invoked to process an individual file (which as not a design goal), and it therefore no longer accepts any command-line arguments (they are ignored). This also fixes a misspelling the big first_line_ends_crlf comment.

This also refactors slightly in other ways, clarifies a comment, and improves an error message.

For the real purpose of this new check, which is to verify that `just copy-packetline` has been run when needed, to ensure that gix-packetline-blocking/src is not out of date, it should only be necessary to run this check on one platform. So this should probably be changed to `runs-on: ubuntu-latest` instead of a matrix strategy soon. I'm running it on three platforms initially to check the script (though some functionality of the script is not exercised). I've also tested the script locally on Ubuntu 22.04.4 LTS, and on Windows 10 (in Git Bash).

Somewhat unintuitively, the "-" is considered to be a file operand and not an option, even though it treated to signify standard input rather than a file named "-". On some systems including macOS, options to cat cannot follow paths. On those systems, writing "-" before "--" causes "--" to be treated as a filename, causing the script to break on macOS. This changes the order, which lets the cat command work on all systems, though (as before) if the subsequent argument were a literal "-" then that would not have the desired effect. In this case that is okay, since we actually know that the other argument, the value of the source_file variable, is not exactly "-". (We even know that it does not begin "-". But keeping the "--" before it makes clearer to human readers that this argument is a path and never an option.)

This speeds things up, as no cargo or just commands have to be run, and no Rust toolchain has to be installed. If part of all of the script is replaced by a tool implemented in Rust, then of course this should (and probably must) be revisited.

Since --porcelain is guaranteed stable across future git versions.

Only the secondary reason, which is the less important one, was given before. Now both are given and the more important reason is emphasized. (If the entire subtree is deleted on disk, staged or not, then we do proceed, since neither disadvantage applies. I think this is clear from context, since this is checked first, and like the other checks, it is commented.)

These are copies, except for their headers, and generated by the etc/copy-packetline.sh script. This should make the new CI checks, which use the script, pass.

So when gix-packetline-blocking/src was not updated, the diff from running the script is shown in full. Right now this is to investigate how the script does not seem to work on macOS, but it may be useful to keep it.

While a command like `find foo` should add a `/` to make paths like `foo/bar`, some find implementations, including in macOS, add their own second slash even when there already was one, so `find foo/` outputs paths like `foo//bar`. (The current POSIX standard seems to forbid this, but it seems early standards did not, and in any case it is the behavior of find even on recent macOS systems.) That broke the copy-packetline.sh script on macOS: - Unsightly extra slashes were added to the headers placed in the copied/generated files inside gix-packetline-blocking/src. - Due to this happening on macOS, the script produced different output on macOS and other systems, preventing a precise portable CI check that the worktree remains clean even after running it. This commit addresses that by not adding a `/` to the path anymore, and making the required slight change to how the source prefix is removed (to be replaced by the target prefix) to accommodate this. That prefix removal was the original reason I had added a `/` originally. It served two purposes: to make the nature of the replacement slightly clearer to human readers, and to safeguard against a prefix that was not a whole path component. The first goal remains valuable but the benefit was slight, so it's okay to lose out on that. The second is no longer needed, because it was related to a way the script was more complex and thus more prone to to error at that time; the associated check was already removed.

Byron

Thanks a million for this improvement!

Please note that I just took a quick look and didn't read any of the PR description yet, which I will get to in a bit.

etc/copy-packetline.sh

See GitoxideLabs#1340 (comment).

This reruns `just copy-packetline`, turning the comments in files in gix-packetline-blocking/src from doc comments to regular ones.

- avoid `target` prefix as it reminds me of `target/` too much - run script on only one platform on CI, with explanation - make CI failures an error

EliahKagan · 2024-04-09T15:40:41Z

Building on d25e5d3, another option for names could be input and output.

Byron

Thanks a lot for this wonderful work, leading to a very impactful fix! Finally, we can use cargo on Windows again.

I only have made a couple of smaller adjustments. One change that I am not perfectly certain about is the removal of the continue-on-error: true line, which I thought was a left-over rather than intentional. If that's wrong, let's fix it before the merge.

Although I did use CI checks during development (rebasing onto a branch of my fork in which CI runs on all branches), I've decided not to deliberately push each commit separately in this PR as I often do in GitPython. I'd be pleased to do something that like on request. The reasons I've refrained are that I don't know if it would be welcome here, with the more involved CI checks, and that I wonder if I may need to rebase this anyway, as I think this is a case where messages in the conventional commits style aren't called for, but I'm not sure.

I'd be happy to keep the current style of pushing once, as otherwise I am pretty sure CI would be blocked for a long time which could be an impediment.

If the cumbersome state of the script ought to be ameliorated, then I suspect that replacing it with Rust may be the overall best approach. I am also curious if there is some way to do the whole thing in Rust yet let it work even if cargo frowns on the state of the workspace.

I understand the reasoning for the script, find it very elaborate and well made, and chose to consider it read-only. It's not that it's unmaintainable, it's just that it has one purpose that it fulfils so we can move on to greener pastures :).

Byron · 2024-04-09T15:46:19Z

Building on d25e5d3, another option for names could be input and output.

I am done with the refactor and encourage you to bring in last minute changes, also to the names - the ones proposed are shorter, and they are also not names target :).

I plan to merge once CI is complete, but probably only in an hour or so.

EliahKagan · 2024-04-09T16:04:53Z

Thanks, I'll push a change to rename the identifiers shortly.

In the copy-packetline.sh script: - Rename all identifiers with "source" to "input", and with "destination" or "target" to "output". This includes variables and functions. - Adjust messages accordingly. This is not a simple search and replace, since sometimes "target" becomes "output location" (to avoid giving the impression that it must always being as a dir). - Adjust comments accordingly, but use the word "destination" in a few comments that had said "target", where it's clear it applies to "output" and seems to be clearer wording than "output". - A couple very small unrelated comment rephrasings for clarity. In the copy-packetline job definition in ci.yml: - Put `fail-fast: false`, which has no effect on a matrix that only generates one job, but will aid in producing results for all three platforms if the other two have to be reenabled to test a change to the script. This is what I had been using `continue-on-error` for, but `fail-fast: false` is the better way to do it (and also avoids confusion with the other meanings of `continue-on-error`). - Tweak indentation so that removing the leading `# ` on the macos-latest and windows-latest entries would place them in the list (so that if they ever have to be reenabled to test changes to the script, then this will be slightly easier).

Byron · 2024-04-09T16:48:11Z

Thank you, much better!

I also just now realise that the new CI job takes a mere 3 seconds, so running more of these isn't ever going to be an issue.
The only argument I can make to leave it as is, is that it's less noisy with just one job, and less confusing as well since it would otherwise be puzzling why this has to be tested on all platforms (if one knows the intention of the script). Thus, I think I can indeed merge in a moment.

EliahKagan · 2024-04-09T16:52:11Z

I only have made a couple of smaller adjustments. One change that I am not perfectly certain about is the removal of the continue-on-error: true line, which I thought was a left-over rather than intentional. If that's wrong, let's fix it before the merge.

Really I should never have used that, because it was applied to each job in the matrix for the purpose of ensuring that results from all platforms appeared during testing, rather than to specific jobs considered experimental. I should have used fail-fast: false instead. The effect of fail-fast makes no difference to a matrix strategy of only one job, but I've put that in so that, if the other jobs ever have to be enabled to test a change to the script, then all platforms' results will be produced and shown.

I also just now realise that the new CI job takes a mere 3 seconds, so running more of these isn't ever going to be an issue. The only argument I can make to leave it as is, is that it's less noisy with just one job, and less confusing as well since it would otherwise be puzzling why this has to be tested on all platforms (if one knows the intention of the script). Thus, I think I can indeed merge in a moment.

I think it is less confusing with just one, yes. If the script will rarely be edited, then I think it's better to have just the ubuntu-latest job for the check.

EliahKagan added 15 commits April 8, 2024 01:31

Properly cover directory names with trailing newlines

1125ca0

This doesn't happen on Windows filesystems, but can happen elsewhere. It can happen on occasion by accident.

Extract constants; rename functions/variables for clarity

b312d8d

This also refactors slightly in other ways, clarifies a comment, and improves an error message.

Use --porcelain instead of --short for git status

84992d7

Since --porcelain is guaranteed stable across future git versions.

Make CI clean working tree checks simpler and more robust

3710c34

Replace gix-packetline-blocking/src with generated files

a793bde

These are copies, except for their headers, and generated by the etc/copy-packetline.sh script. This should make the new CI checks, which use the script, pass.

Show full diffs on CI

93c0149

So when gix-packetline-blocking/src was not updated, the diff from running the script is shown in full. Right now this is to investigate how the script does not seem to work on macOS, but it may be useful to keep it.

EliahKagan marked this pull request as ready for review April 9, 2024 05:13

Byron reviewed Apr 9, 2024

View reviewed changes

etc/copy-packetline.sh Outdated Show resolved Hide resolved

EliahKagan and others added 3 commits April 9, 2024 02:37

Create gix-packetline-blocking headers as regular comments

6a5db0c

See GitoxideLabs#1340 (comment).

Rerun script to get // instead of //! comments

ec2427f

This reruns `just copy-packetline`, turning the comments in files in gix-packetline-blocking/src from doc comments to regular ones.

refactor

d25e5d3

- avoid `target` prefix as it reminds me of `target/` too much - run script on only one platform on CI, with explanation - make CI failures an error

Byron force-pushed the copy-packetline branch from 4e73bb9 to d25e5d3 Compare April 9, 2024 15:37

Byron approved these changes Apr 9, 2024

View reviewed changes

Byron merged commit a9b783d into GitoxideLabs:main Apr 9, 2024
19 checks passed

EliahKagan deleted the copy-packetline branch April 9, 2024 16:55

EliahKagan changed the title ~~Autogenerate ci-packetline-blocking/src~~ Autogenerate gix-packetline-blocking/src May 5, 2024

EliahKagan mentioned this pull request Nov 7, 2024

Test cross-platform symlink and +x permission expectations on Windows #1652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autogenerate gix-packetline-blocking/src #1340

Autogenerate gix-packetline-blocking/src #1340

EliahKagan commented Apr 9, 2024 •

edited

Loading

Byron left a comment

EliahKagan commented Apr 9, 2024

Byron left a comment

Byron commented Apr 9, 2024

EliahKagan commented Apr 9, 2024

Byron commented Apr 9, 2024

EliahKagan commented Apr 9, 2024

Autogenerate gix-packetline-blocking/src #1340

Autogenerate gix-packetline-blocking/src #1340

Conversation

EliahKagan commented Apr 9, 2024 • edited Loading

Byron left a comment

Choose a reason for hiding this comment

EliahKagan commented Apr 9, 2024

Byron left a comment

Choose a reason for hiding this comment

Byron commented Apr 9, 2024

EliahKagan commented Apr 9, 2024

Byron commented Apr 9, 2024

EliahKagan commented Apr 9, 2024

EliahKagan commented Apr 9, 2024 •

edited

Loading