Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gix clean with -r or -xd deletes the repo's own nested worktrees #1464

Closed
EliahKagan opened this issue Jul 23, 2024 · 4 comments · Fixed by #1465
Closed

gix clean with -r or -xd deletes the repo's own nested worktrees #1464

EliahKagan opened this issue Jul 23, 2024 · 4 comments · Fixed by #1465
Assignees
Labels
acknowledged an issue is accepted as shortcoming to be fixed

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Jul 23, 2024

Current behavior 😯

When a worktree has been created with git worktree add inside the repository, gix clean treats it the same as any untracked nested repository, even though it is a working tree of the current repository.

  • gix clean -re will delete it.
  • If * is listed in .gitignore, then gix clean -xde will also delete it, even without -r.

In both cases, this happens even if there is no intermediate ignored directory.

Significance

Users shouldn't usually create worktrees nested inside the main worktree (or inside each other). But it is supported to do so, and users may occasionally do so deliberately.

It is also easy to do by accident if one is not experienced with git worktree, since the one-argument form of git worktree add takes a path from which a branch is inferred, rather than taking a branch. git worktree mybranch creates a mybranch branch if it does not exist, and creates a worktree for it in the mybranch subdirectory of the current directory.

Since such a worktree works fine and does not usually interfere with the main worktree, even though the user may have meant to write git worktree ../mybranch, the user may proceed to use the nested worktree and to rely on data there not being lost due to cleaning in the main worktree.

In addition to being an area where I think the behavior is unexpected and carries some risk of inadvertent data loss, the question of how gix clean should treat nested worktrees is relevant to improving the help text for -r (especially if the recommendation I make below that they be given the same protection as submodule working trees is not followed).

Relationship to #1458

In all observations and manual testing described in this issue, I made sure to use a build containing the changes from #1462.

#1462 did fix #1458. The case in this issue is something I had not thought of at that time and did not include in #1458. This issue may be seen as a sequel to that one.

Expected behavior 🤔

It seems to me that:

  • Working trees managed by git worktree should be protected to the same degree as submodules' working trees.
  • No working tree of the repository gix clean is run in should ever be removed by any gix clean command, regardless of the options passed.

It seems to me that this can be avoided without much overhead even for worktrees that are nested inside ignored directories, because worktrees are discoverable by examining the repository.

(They have directories in .git/worktrees. This is of course with the usual variations when $GIT_DIR has a nonstandard value, or when .git is a regular file due to them being worktrees of a submodule.)

Edge cases

Strange edge cases are possible, such as a submodule with a git worktree managed worktree in a subdirectory of its superproject's working tree or of a sibling submodule's working tree, or a superproject with a git worktree managed worktree in a subdirectory of a submodule's working tree.

I am not sure what to expect, for what happens when gix clean is run the repository whose working tree contains the related repository's extra worktree. I lean slightly toward thinking it should be deleted in this situation, if a nested repository would otherwise be deleted. This seems like the simplest approach to implement.

I do not think these edge cases, even if the related repository's working tree is still to be deleted, challenge the expectations given above.

  • A repository has a close relationship to all its working trees, including git worktree managed working trees.
  • They are more closely related to each other, conceptually, than a repository is related to its own submodules--for example, a repository's local configuration is not used in its submodules.
  • The reason gix clean does not and should not remove submodules is that they are tracked. This is different from the reason gix clean should not remove its own worktrees, which is that they have the same status as the repository's own main working tree or .git directory.
  • By the point we get to worktrees belonging to submodules, superprojects, or sibling submodules, we are conceptually much further removed, such that the situation is closer to that of a separately cloned repository that would be eligible for removal with -r and in some cases without -r when also ignored.

When the current repository is a submodule, of course its own git worktree managed working trees should be protected from deletion when git clean commands are run within it. That's conceptually identical to a situation with git worktree managed worktrees where no submodules are involved in any way. The only difference is that to find out where to look to find out where the worktrees are, we have to follow the path in the .git file (or use a mechanism that already takes care of this).

Git behavior

git clean does not remove them with any combination of options.

As in #1458, this is area where git behavior is not decisive because git clean does not remove nested repositories, ignored or otherwise. However, also as in #1458, the git clean behavior is is relevant here because of the expectations it sets about what cleaning means.

Steps to reproduce 🕹

With -r and not -x or -d – instructions

Create a repository with at least one commit:

git init with-worktree
cd with-worktree
touch file
git add .
git commit -m 'Initial commit'

Add a worktree for a new feature branch, but place it inside the main worktree instead of alongside it as would ordinarily be preferable:

git worktree add mybranch

Observe that, even though git status reports it as untracked, git clean will not remove it, and that, for git clean, the reason appears to be that it regards it as a nested repository (optionally use -f instead of -n):

git status
git clean -dn
git clean -xdn
git status
git add .
git status

Unstage and check that the situation is back to where it was when the worktree was added:

git rm --cached mybranch
git worktree list
git status

See what gix clean would do, and that it really does it, deleting the mybranch subdirectory that belonged to a git worktree managed working tree:

gix clean -n
gix clean -rn
gix clean -re
git status
git worktree list
ls mybranch

With -r and not -x or -d – with output

Here's all that, with output:

ek@Glub:~/src$ git init with-worktree
Initialized empty Git repository in /home/ek/src/with-worktree/.git/
ek@Glub:~/src$ cd with-worktree
ek@Glub:~/src/with-worktree (main #)$ touch file
ek@Glub:~/src/with-worktree (main #%)$ git add .
ek@Glub:~/src/with-worktree (main +)$ git commit -m 'Initial commit'
[main (root-commit) be9dc23] Initial commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file
ek@Glub:~/src/with-worktree (main)$ git worktree add mybranch
Preparing worktree (new branch 'mybranch')
HEAD is now at be9dc23 Initial commit
ek@Glub:~/src/with-worktree (main %)$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        mybranch/

nothing added to commit but untracked files present (use "git add" to track)
ek@Glub:~/src/with-worktree (main %)$ git clean -dn
ek@Glub:~/src/with-worktree (main %)$ git clean -xdn
ek@Glub:~/src/with-worktree (main %)$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        mybranch/

nothing added to commit but untracked files present (use "git add" to track)
ek@Glub:~/src/with-worktree (main %)$ git add .
warning: adding embedded git repository: mybranch
hint: You've added another git repository inside your current repository.
hint: Clones of the outer repository will not contain the contents of
hint: the embedded repository and will not know how to obtain it.
hint: If you meant to add a submodule, use:
hint:
hint:   git submodule add <url> mybranch
hint:
hint: If you added this path by mistake, you can remove it from the
hint: index with:
hint:
hint:   git rm --cached mybranch
hint:
hint: See "git help submodule" for more information.
ek@Glub:~/src/with-worktree (main +)$ git status
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   mybranch
ek@Glub:~/src/with-worktree (main +)$ git rm --cached mybranch
rm 'mybranch'
ek@Glub:~/src/with-worktree (main %)$ git worktree list
/home/ek/src/with-worktree           be9dc23 [main]
/home/ek/src/with-worktree/mybranch  be9dc23 [mybranch]
ek@Glub:~/src/with-worktree (main %)$ git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        mybranch/

nothing added to commit but untracked files present (use "git add" to track)
ek@Glub:~/src/with-worktree (main %)$ gix clean -n
Nothing to clean (Skipped 1 repository - show with -r)
ek@Glub:~/src/with-worktree (main %)$ gix clean -rn
WOULD remove repository mybranch/
ek@Glub:~/src/with-worktree (main %)$ gix clean -re
removing repository mybranch/
ek@Glub:~/src/with-worktree (main)$ git status
On branch main
nothing to commit, working tree clean
ek@Glub:~/src/with-worktree (main)$ git worktree list
/home/ek/src/with-worktree           be9dc23 [main]
/home/ek/src/with-worktree/mybranch  be9dc23 [mybranch] prunable
ek@Glub:~/src/with-worktree (main)$ ls mybranch
ls: cannot access 'mybranch': No such file or directory

Without -r, and with -x and -d, when .gitignore has * – instructions

This can be done as a variation on the above example, for which separate explication may not be required. This instead produces it on a real-world repository that lists * followed by ! exclusions--the cargo-update repository that was also used as an example in #1458.

Get the repository and observe what it ignores:

git clone https://github.com/nabijaczleweli/cargo-update.git
cd cargo-update
cat .gitignore

Observe that #1458 is fixed (#1462):

gix clean -xdn
gix clean -xde

Create a nested worktree:

git worktree add mybranch
git worktree list

Verify that git clean does not delete it, even when told to remove directories and ignored entries:

git clean -xdn
git clean -xdf
ls -ld mybranch

Use a gix clean command to delete it, even though this is expected to delete some nested repositories but is intended not to delete any of our worktrees, and verify that it is gone:

gix clean -xdn
gix clean -xde
ls -ld mybranch

Without -r, and with -x and -d, when .gitignore has * – output

Here's all that, with output:

ek@Glub:~/src$ git clone https://github.com/nabijaczleweli/cargo-update.git
Cloning into 'cargo-update'...
remote: Enumerating objects: 328251, done.
remote: Counting objects: 100% (83271/83271), done.
remote: Compressing objects: 100% (1630/1630), done.
remote: Total 328251 (delta 81523), reused 83220 (delta 81477), pack-reused 244980
Receiving objects: 100% (328251/328251), 110.53 MiB | 35.35 MiB/s, done.
Resolving deltas: 100% (320324/320324), done.
ek@Glub:~/src$ cd cargo-update
ek@Glub:~/src/cargo-update (master=)$ cat .gitignore
*
!.gitignore
!.travis.yml
!gh_rsa.enc
!appveyor.yml
!LICENSE
!Cargo.toml
!rustfmt.toml
!build.rs
!cargo-install-update-manifest.rc
!cargo-install-update.exe.manifest
!*.sublime-project
!*.md
!.github
!.github/**
!src
!src/**
!man
!man/**
!tests
!tests/**
!test-data
!test-data/**
ek@Glub:~/src/cargo-update (master=)$ gix clean -xdn
Nothing to clean
ek@Glub:~/src/cargo-update (master=)$ gix clean -xde
ek@Glub:~/src/cargo-update (master=)$
ek@Glub:~/src/cargo-update (master=)$ git worktree add mybranch
Preparing worktree (new branch 'mybranch')
HEAD is now at 6334da9b99 they should invent rustfmt that works imo
ek@Glub:~/src/cargo-update (master=)$ git worktree list
/home/ek/src/cargo-update           6334da9b99 [master]
/home/ek/src/cargo-update/mybranch  6334da9b99 [mybranch]
ek@Glub:~/src/cargo-update (master=)$ git clean -xdn
ek@Glub:~/src/cargo-update (master=)$ git clean -xdf
ek@Glub:~/src/cargo-update (master=)$ ls -ld mybranch
drwxr-xr-x 7 ek ek 4096 Jul 23 16:39 mybranch
ek@Glub:~/src/cargo-update (master=)$ gix clean -xdn
WOULD remove mybranch/ (🗑️)

WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
ek@Glub:~/src/cargo-update (master=)$ gix clean -xde
removing mybranch/ (🗑️)
ek@Glub:~/src/cargo-update (master=)$ ls -ld mybranch
ls: cannot access 'mybranch': No such file or directory
@EliahKagan EliahKagan changed the title gix clean commands readily delete the repo's own nested worktrees gix clean with -p or -xd can delete the repo's own nested worktrees Jul 23, 2024
@EliahKagan EliahKagan changed the title gix clean with -p or -xd can delete the repo's own nested worktrees gix clean with -r or -xd can delete the repo's own nested worktrees Jul 23, 2024
@EliahKagan EliahKagan changed the title gix clean with -r or -xd can delete the repo's own nested worktrees gix clean with -r or -xd deletes the repo's own nested worktrees Jul 23, 2024
@Byron Byron added the acknowledged an issue is accepted as shortcoming to be fixed label Jul 24, 2024
@Byron Byron self-assigned this Jul 24, 2024
@Byron
Copy link
Owner

Byron commented Jul 24, 2024

Thanks so much for researching this and the deep analysis - I was particularly impressed by thinking of submodules that have worktrees in the superproject.

Regarding the behaviour of git clean -f, I thought you might be interested in the undocumented 'double-force feature', which would then indeed remove nested repositories. It's worth noting that at some point git seemingly removed nested ignored repositories, but stopped doing so in favor of -ff. This is no argument at all for not protecting worktrees, just something I thought you might find interesting.

With that said, I also believe that it should never touch the worktrees of the repository it is run in, while being unsure of what to do with worktrees of submodules that are reaching into the superproject. My take here is that it would probably be so unlikely that it basically never happens, and if it does it's more of an accident. As such, it should probably show up as eligible to be cleaned. If one day that shouldn't be desired anymore, then implementing this will be trivial by collecting worktrees of all submodules recursively as well.

I see two points of action here:

  • add a way to classify worktrees and pass in their locations so that the algorithm can identify them. Having this as part of the API also makes callers aware.
  • when the mode is 'for deletion', always double-check if an ignored directory is also a repository based on the passed criterion (i.e. fast if it has .git entry inside or slow by actual repository check). This would also mean that a 'status' would identify such a worktree as untracked, but clean would know more. That way, accidental deletion while ignored will be prevented, even if such repository isn't a worktree. (One will have to specify -r).

@Byron Byron mentioned this issue Jul 24, 2024
2 tasks
@EliahKagan
Copy link
Contributor Author

Regarding the behaviour of git clean -f, I thought you might be interested in the undocumented 'double-force feature', which would then indeed remove nested repositories. It's worth noting that at some point git seemingly removed nested ignored repositories, but stopped doing so in favor of -ff. This is no argument at all for not protecting worktrees, just something I thought you might find interesting.

Thanks--I was totally unaware of -ff!

With that said, I also believe that it should never touch the worktrees of the repository it is run in, while being unsure of what to do with worktrees of submodules that are reaching into the superproject. My take here is that it would probably be so unlikely that it basically never happens, and if it does it's more of an accident. As such, it should probably show up as eligible to be cleaned. If one day that shouldn't be desired anymore, then implementing this will be trivial by collecting worktrees of all submodules recursively as well.

It occurs to me--and maybe this is what you're already thinking of--that there is case where a submodule's git worktree managed worktree being present in the superproject is if it is .gitignored in the superproject and created there deliberately so that it exists alongside the submodule's main worktree, in the same way that one would usually create a repository's extra worktrees in the parent directory of the repository's main working tree.

But I guess when doing this one would know they are ignored and not run gix clean -xd without specifying a path that omits them. Maybe precious files will ultimately be the solution to that.

when the mode is 'for deletion', always double-check if an ignored directory is also a repository based on the passed criterion

Is this just for top-level ignored directories (i.e. those that are not subdirectories of ignored directories and that are ignored because they match an ignore pattern)?

If it applies to git repositories in all ignored directories no matter how deep down, then it would make it easier to understand -r because it would always be needed to delete nested repositories. Maybe that would go against the desire to be as fast as possible by limiting traversal. Then again, maybe that is not a problem in deletion, where one is always taking the time to traverse at least once anyway (since a recursive deletion traverses fully).

@Byron
Copy link
Owner

Byron commented Jul 25, 2024

But I guess when doing this one would know they are ignored and not run gix clean -xd without specifying a path that omits them. Maybe precious files will ultimately be the solution to that.

Yes, precious files would add a layer of protection (even though you shouldn't take my word for it :)), even though with the already implemented change it would at least detect directly ignored repositories as such, so one has to -r explicitly.

Is this just for top-level ignored directories (i.e. those that are not subdirectories of ignored directories and that are ignored because they match an ignore pattern)?

If it applies to git repositories in all ignored directories no matter how deep down, then it would make it easier to understand -r because it would always be needed to delete nested repositories. Maybe that would go against the desire to be as fast as possible by limiting traversal. Then again, maybe that is not a problem in deletion, where one is always taking the time to traverse at least once anyway (since a recursive deletion traverses fully).

No, it's just for the top-level, no nesting. Indeed, this is for performance reasons and there is that warning indicating that ignored directories may include repositories (but they may not be one anymore).

This performance I really must protect, as for instance in GitButler with node_modules/ and target/, it takes ~0.04s with gix clean -xd , but ~3.8s with git clean -nxd. It's day and night.

Interestingly, gix clean -xd --skip-hidden-repositories non-bare is still faster than git.

❯ hyperfine -M3 -w1 'git clean -nxd' 'gix clean -xd --skip-hidden-repositories non-bare'
Benchmark 1: git clean -nxd
  Time (mean ± σ):      4.404 s ±  0.738 s    [User: 0.390 s, System: 3.587 s]
  Range (min … max):    3.958 s …  5.255 s    3 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: gix clean -xd --skip-hidden-repositories non-bare
  Time (mean ± σ):      3.229 s ±  0.248 s    [User: 0.408 s, System: 2.621 s]
  Range (min … max):    3.075 s …  3.515 s    3 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (3.515 s). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.

Summary
  gix clean -xd --skip-hidden-repositories non-bare ran
    1.36 ± 0.25 times faster than git clean -nxd

Byron added a commit that referenced this issue Jul 25, 2024
That way it's possibel for them to be equivalent to submodules, which
would never be deleted by accident due to their 'tracked' status.

This works by passing repository-relative paths of worktree locations
that are within this repository.
@Byron
Copy link
Owner

Byron commented Jul 25, 2024

Please note that this was implemented as breaking change, which will prevent me from publishing a patch release unfortunately. Thus this fix will be released in a month or two with the next regular 'breaking' one.

Technically, this is breaking just for gix-dir but not for gix, but the publishing system doesn't analyse the public API at all and thus relies on me flagging breaking changes, which are propagated 'downstream' within the workspace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants