Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presumably corrupted git-annex branches #132

Open
adswa opened this issue Oct 17, 2022 · 2 comments
Open

Presumably corrupted git-annex branches #132

adswa opened this issue Oct 17, 2022 · 2 comments

Comments

@adswa
Copy link

adswa commented Oct 17, 2022

This re-posts an issue I previously created at https://gin.g-node.org/G-Node/Info/issues/62.

Hi! First and foremost a huge thank you for Gin! It is an immeasurably useful infrastructure for science.

I've recently noticed what I presume to be a corruption of the git-annex branch after pushing to Gin, and reported it originally at datalad/datalad-gooey#349.

Describe the bug

The issue presents as follows:
At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing).

An example is this dataset (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a git annex addurl call). If I add a new gin repository as a remote, and push it using datalad push, the push succeeds for the default branch, but fails with a non-fast-forward error for the git-annex branch, similar to the one below:

*	refs/heads/master:refs/heads/master	[new branch]
!	refs/heads/git-annex:refs/heads/git-annex	[rejected] (non-fast-forward)
Done'] [err: 'Delta compression using up to 16 threads
Total 422 (delta 198), reused 149 (delta 33), pack-reused 0                                                                                      error: failed to push some refs to 'gin.g-node.org:/adswa/ml-books-only-ssh.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.']

Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex.
The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed:

(gooyey) C:\Users\adina\Desktop\ml-books2>git log git-annex
commit 4e226892a69de8989b56cef5f41c49f138aee09e (git-annex)
Author: Adina Wagner <adina.wagner@t-online.de>
Date:   Fri Oct 14 09:22:57 2022 +0200

    continuing transition ["forget git history"]

commit 38be5a7d07b019e2a7e42c8dff0734926c276f7d
Author: Adina Wagner <adina.wagner@t-online.de>
Date:   Fri Oct 14 09:17:56 2022 +0200

    update

commit 72cd967f9648209aab5c55aebf5b60f1aea41099 (origin/git-annex)
Author: Adina Wagner <adina.wagner@t-online.de>
Date:   Tue Apr 19 13:29:07 2022 +0200

    update

A manual pull fails locally:

❱ git pull gin git-annex
From https://gin.g-node.org/adswa/mlbooksmoretests
 * branch            git-annex  -> FETCH_HEAD
fatal: refusing to merge unrelated histories

And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository.

(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git-annex whereis A.Shashua-Introduction_to_Machine_Learning.pdf          1 !
whereis A.Shashua-Introduction_to_Machine_Learning.pdf (0 copies) failed
whereis: 1 failed
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master

❱ git annex get A.Shashua-Introduction_to_Machine_Learning.pdf            130 !
get A.Shashua-Introduction_to_Machine_Learning.pdf (not available) 
  No other repository is known to contain the file.
failed
get: 1 failed
(gooey) adina@mun

I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or local availability. The only datasets I successfully pushed had in common that they were created on my local computer and not cloned, thus had all file content available locally and no special remotes. But we couldn't 100% pinpoint it. Can you advise what might be wrong?

To reproduce

  • Clone https://github.com/datalad-datasets/machinelearning-books (datalad clone https://github.com/datalad-datasets/machinelearning-books)
  • Create a sibling with datalad (datalad create-sibling-gin somereponame). Or create a new repo and add it manually as a remote (git remote add gin git@gin.g-node.org:/<user>/somereponame.git)
  • datalad push --to gin or perform a manual git push and git-annex sync
@mpsonntag
Copy link
Contributor

Hi @adswa,

thank you for reporting this issue! We did not introduce any changes to GIN or the gin client that touch the annex behavior in the recent past.
Since you are writing that you have been able to push successfully before, is it possible that it might be an issue with recent updates in git annex or datalad? Have you tried using previous versions of datalad or git annex to check whether this might cause the issue? Unfortunately we currently only fully support our own gin-client which comes with a pre-packed git annex binary but do not have the capacity to test all datalad and annex distributions in tandem with GIN. Maybe if we can narrow it down to a working version of datalad/annex, we can figure out if we can update GIN accordingly.

@mpsonntag
Copy link
Contributor

It might also have to do with the git annex version that you are using. gin and the gin-client both use git-annex version 8 which is incompatible with the latest annex version 10 that is now installed by default. It might be that this is the root of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants