Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save would make git crash replacing a file with directory with content #6558

Closed
yarikoptic opened this issue Mar 16, 2022 · 1 comment · Fixed by #6581
Closed

save would make git crash replacing a file with directory with content #6558

yarikoptic opened this issue Mar 16, 2022 · 1 comment · Fixed by #6581
Labels

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Mar 16, 2022

might populate with more details later, for now just a record of reproducing the problem reported in dandi/dandisets#127 (comment) and shown in CI runs of dandi/dandisets#134

cut paste of manual reproducing
(git-annex)lena:/tmp/clean[master]git-annex
$> touch path1; datalad save -m 1
add(ok): path1 (file)                                                                                                                                                                
save(ok): . (dataset)                                                                                                                                                                
action summary:                                                                                                                                                                      
  add (ok: 1)
  save (ok: 1)
1 13723 [1].....................................:Wed 16 Mar 2022 02:59:07 PM EDT:.
(git-annex)lena:/tmp/clean[master]git-annex
$> rm path1; mkdir path1; echo 1 > path1/path2; git annex add path1/path2
add path1/path2 
ok                                
(recording state in git...)
1 13724 [1].....................................:Wed 16 Mar 2022 02:59:13 PM EDT:.
(git-annex)lena:/tmp/clean[master]git-annex
$> git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	deleted:    path1
	new file:   path1/path2

1 13725 [1].....................................:Wed 16 Mar 2022 02:59:16 PM EDT:.
(git-annex)lena:/tmp/clean[master]git-annex
$> datalad save -m 2       
[WARNING] Received an exception CommandError(CommandError: 'git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2 path1' failed with exitcode 128 under /tmp/clean [err: 'error: 'path1' does not have a commit checked out
fatal: updating files failed']). Canceling not-yet running jobs and waiting for completion of running. You can force earlier forceful exit by Ctrl-C. 
[INFO   ] Canceled 0 out of 0 jobs. 0 left running. 
Total: 0.00 datasets [00:00, ? datasets/s]CommandError: 'git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2 path1' failed with exitcode 128 under /tmp/clean
error: 'path1' does not have a commit checked out
fatal: updating files failed
1 13726 ->128 [1].....................................:Wed 16 Mar 2022 02:59:27 PM EDT:.
(git-annex)lena:/tmp/clean[master]git-annex
$> git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2 path1
error: 'path1' does not have a commit checked out
fatal: updating files failed
1 13727 ->128 [1].....................................:Wed 16 Mar 2022 02:59:40 PM EDT:.
(git-annex)lena:/tmp/clean[master]git-annex
$> git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2      
[master a03d813] 2
 2 files changed, 1 insertion(+), 1 deletion(-)
 delete mode 120000 path1
 create mode 120000 path1/path2

which boils down to "not provide path of removed (and staged for removal) directory into 'git commit' call"

DataLad 0.15.5 WTF (configuration, credentials, datalad, dataset, dependencies, environment, extensions, git-annex, location, metadata_extractors, metadata_indexers, python, system)

WTF

configuration <SENSITIVE, report disabled by configuration>

credentials

  • keyring:
    • active_backends:
      • SecretService Keyring
      • kwallet DBusKeyring
      • libsecret Keyring
      • EncryptedKeyring with [PBKDF2] AES256.CFB v.1.0 at /home/yoh/.local/share/python_keyring/crypted_pass.cfg
      • PlaintextKeyring with no encyption v.1.0 at /home/yoh/.local/share/python_keyring/keyring_pass.cfg
    • config_file: /home/yoh/.config/python_keyring/keyringrc.cfg
    • data_root: /home/yoh/.local/share/python_keyring

datalad

  • version: 0.15.5

dataset

  • branches:
    • git-annex@dc82499
    • master@a03d813
  • id: c625b32c-9d97-4034-aeb9-dd14a0d50b23
  • metadata: <SENSITIVE, report disabled by configuration>
  • path: /tmp/clean
  • repo: AnnexRepo

dependencies

  • annexremote: 1.6.0
  • appdirs: 1.4.4
  • boto: 2.49.0
  • cmd:7z: 16.02
  • cmd:annex: 8.20211123
  • cmd:bundled-git: UNKNOWN
  • cmd:git: 2.34.1
  • cmd:system-git: 2.34.1
  • cmd:system-ssh: 8.7p1
  • exifread: 2.3.2
  • git: 3.1.24
  • gitdb: 4.0.9
  • humanize: 0.0.0
  • iso8601: 0.1.16
  • keyring: 23.5.0
  • keyrings.alt: 4.1.0
  • msgpack: 1.0.2
  • mutagen: 1.45.1
  • requests: 2.25.1
  • scrapy: 2.5.1
  • wrapt: 1.13.3

environment

  • GIT_PAGER: less --no-init --quit-if-one-screen
  • LANG: en_US.UTF-8
  • PATH: /home/yoh/gocode/bin:/home/yoh/gocode/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/sbin:/usr/sbin:/usr/local/sbin

extensions

git-annex

  • build flags:
    • Assistant
    • Webapp
    • Pairing
    • Inotify
    • DBus
    • DesktopNotify
    • TorrentParser
    • MagicMime
    • Feeds
    • Testsuite
    • S3
    • WebDAV
  • dependency versions:
    • aws-0.22
    • bloomfilter-2.0.1.0
    • cryptonite-0.26
    • DAV-1.3.4
    • feed-1.3.0.1
    • ghc-8.8.4
    • http-client-0.6.4.1
    • persistent-sqlite-2.10.6.2
    • torrent-10000.1.1
    • uuid-1.3.13
    • yesod-1.6.1.0
  • key/value backends:
    • SHA256E
    • SHA256
    • SHA512E
    • SHA512
    • SHA224E
    • SHA224
    • SHA384E
    • SHA384
    • SHA3_256E
    • SHA3_256
    • SHA3_512E
    • SHA3_512
    • SHA3_224E
    • SHA3_224
    • SHA3_384E
    • SHA3_384
    • SKEIN256E
    • SKEIN256
    • SKEIN512E
    • SKEIN512
    • BLAKE2B256E
    • BLAKE2B256
    • BLAKE2B512E
    • BLAKE2B512
    • BLAKE2B160E
    • BLAKE2B160
    • BLAKE2B224E
    • BLAKE2B224
    • BLAKE2B384E
    • BLAKE2B384
    • BLAKE2BP512E
    • BLAKE2BP512
    • BLAKE2S256E
    • BLAKE2S256
    • BLAKE2S160E
    • BLAKE2S160
    • BLAKE2S224E
    • BLAKE2S224
    • BLAKE2SP256E
    • BLAKE2SP256
    • BLAKE2SP224E
    • BLAKE2SP224
    • SHA1E
    • SHA1
    • MD5E
    • MD5
    • WORM
    • URL
    • X*
  • local repository version: 8
  • operating system: linux x86_64
  • remote types:
    • git
    • gcrypt
    • p2p
    • S3
    • bup
    • directory
    • rsync
    • web
    • bittorrent
    • webdav
    • adb
    • tahoe
    • glacier
    • ddar
    • git-lfs
    • httpalso
    • borg
    • hook
    • external
  • supported repository versions:
    • 8
  • upgrade supported from repository versions:
    • 0
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
  • version: 8.20211123

location

  • path: /tmp/clean
  • type: dataset

metadata_extractors

  • annex (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.annex
    • version: None
  • audio (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.audio
    • version: None
  • datacite (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.datacite
    • version: None
  • datalad_core (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.datalad_core
    • version: None
  • datalad_rfc822 (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.datalad_rfc822
    • version: None
  • exif (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.exif
    • version: None
  • frictionless_datapackage (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.frictionless_datapackage
    • version: None
  • image (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: None
    • module: datalad.metadata.extractors.image
    • version: None
  • xmp (datalad 0.15.5):
    • distribution: datalad 0.15.5
    • load_error: ModuleNotFoundError(No module named 'libxmp')
    • module: datalad.metadata.extractors.xmp

metadata_indexers

python

  • implementation: CPython
  • version: 3.9.10

system

The same problem with bleeding edge master datalad and most recent git available in debian ATM
(git-annex)lena:/tmp/clean[master]git
$> git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2 path1
error: 'path1' does not have a commit checked out
fatal: updating files failed

$> datalad save -m 2                                                 
[WARNING] Received an exception CommandError(CommandError: 'git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2 path1' failed with exitcode 128 under /tmp/clean [err: 'error: 'path1' does not have a commit checked out
fatal: updating files failed']). Canceling not-yet running jobs and waiting for completion of running. You can force earlier forceful exit by Ctrl-C. 
[INFO   ] Canceled 0 out of 0 jobs. 0 left running. 
Total: 0.00 datasets [00:00, ? datasets/s]CommandError: 'git -c diff.ignoreSubmodules=none commit -m 2 -- path1/path2 path1' failed with exitcode 128 under /tmp/clean
error: 'path1' does not have a commit checked out
fatal: updating files failed

$> datalad --version
datalad 0.15.6+829.gb23174877

$> git --version
git version 2.35.1.473.g83b2b277ed

I think we should fix it in DataLad via "remove paths from the call to commit which are parents for another path in the call" but that might be too expensive although I remember we had already done something like that for something else.

but I also think it is worthwhile reporting to git folks -- IMHO operation is inconsistent

e.g. if I remove the file and regardless either I add that to index or not -- commit works fine

$> rm path1                
(dev3) 1 13781 [1].....................................:Wed 16 Mar 2022 03:45:14 PM EDT:.
(git-annex)lena:/tmp/clean[master]git
$> git status
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    path1

no changes added to commit (use "git add" and/or "git commit -a")
(dev3) 1 13782 [1].....................................:Wed 16 Mar 2022 03:45:15 PM EDT:.
(git-annex)lena:/tmp/clean[master]git
$> git add path1 
(dev3) 1 13783 [1].....................................:Wed 16 Mar 2022 03:45:22 PM EDT:.
(git-annex)lena:/tmp/clean[master]git
$> git status   
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	deleted:    path1

(dev3) 1 13784 [1].....................................:Wed 16 Mar 2022 03:45:24 PM EDT:.
(git-annex)lena:/tmp/clean[master]git
$> git commit -m removed path1
[master 0444c4c] removed
 1 file changed, 1 deletion(-)
 delete mode 120000 path1
@yarikoptic
Copy link
Member Author

update: Junio (of git) already replied confirming it is a bug in git (citing: "guess is wrong and it is a bug that it does not notice it") and (rightfully so) reprimanded me for saying that git crashed whenever it just exited with non-0, should remember better ;)

yarikoptic added a commit to yarikoptic/datalad that referenced this issue Mar 18, 2022
accompanied with a test to demonstrate save defect on renaming file into directory.

Prior Commits optimizing the body/data structures of _save should have helped
to also minimize potential run time for this added check/manipulation of status.

Underlying issue is actually within git, see
datalad#6558
and correspondence on git mailing list
https://lore.kernel.org/git/xmqqv8wdk308.fsf@gitster.g/T/#madd83451d39754c044691910f3897659ed291abf
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Mar 18, 2022
accompanied with a test to demonstrate save defect on renaming file into directory.

Prior Commits optimizing the body/data structures of _save should have helped
to also minimize potential run time for this added check/manipulation of status.

Underlying issue is actually within git, see
datalad#6558
and correspondence on git mailing list
https://lore.kernel.org/git/xmqqv8wdk308.fsf@gitster.g/T/#madd83451d39754c044691910f3897659ed291abf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment