Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect versionId in .rmet file(s) #1791

Closed
yarikoptic opened this issue Sep 1, 2020 · 4 comments
Closed

incorrect versionId in .rmet file(s) #1791

yarikoptic opened this issue Sep 1, 2020 · 4 comments

Comments

@yarikoptic
Copy link
Contributor

Describe the bug

.rmet file for ds000001 sub-01/anat/sub-01_inplaneT2.nii.gz contains a wrong versionId

To Reproduce

#!/bin/bash

export PS4='> '
set -x
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"

git clone http://datasets.datalad.org/openneuro/ds000001/.git
cd ds000001

git annex whereis sub-01/anat/sub-01_inplaneT2.nii.gz
# that is for the key of that file
git show git-annex:./b0b/6a2/MD5E-s669578--0017a7174b9fdebeb1e57f36027bfb96.nii.gz.log.rmet

# it will manage to get it only on the 2nd blind
# try by abandoning versionId altogether
git annex get --debug sub-01/anat/sub-01_inplaneT2.nii.gz

# and this one would fail to drop since it would fail to
# verify that known versionId'ed url is still available
git annex drop --debug sub-01/anat/sub-01_inplaneT2.nii.gz
produces following output
$> bash ds000001-rmet.sh
> set -eu
>> mktemp -d /home/yoh/.tmp/dl-XXXXXXX
> cd /home/yoh/.tmp/dl-zRCA7ES
> git clone http://datasets.datalad.org/openneuro/ds000001/.git
Cloning into 'ds000001'...
remote: Counting objects: 2432, done.
remote: Compressing objects: 100% (639/639), done.
remote: Total 2432 (delta 1127), reused 2381 (delta 1102)
Receiving objects: 100% (2432/2432), 240.24 KiB | 3.81 MiB/s, done.
Resolving deltas: 100% (1127/1127), done.
> cd ds000001
> git annex whereis sub-01/anat/sub-01_inplaneT2.nii.gz
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
whereis sub-01/anat/sub-01_inplaneT2.nii.gz (2 copies) 
  	b5dd2e3d-825f-4bc2-b719-cba1059f6bfc -- root@93184394ac19:/datalad/ds000001
   	deaa691f-c824-4416-9bf8-a94a47dd31b5 -- [s3-PUBLIC]

  s3-PUBLIC: http://openneuro.org.s3.amazonaws.com/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz?versionId=NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ
ok
> git show git-annex:./b0b/6a2/MD5E-s669578--0017a7174b9fdebeb1e57f36027bfb96.nii.gz.log.rmet
1531531226s deaa691f-c824-4416-9bf8-a94a47dd31b5:V +NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ#ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz
> git annex get --debug sub-01/anat/sub-01_inplaneT2.nii.gz
[2020-09-01 13:06:45.753442819] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2020-09-01 13:06:45.764879201] process done ExitSuccess
[2020-09-01 13:06:45.765316608] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2020-09-01 13:06:45.777240277] process done ExitSuccess
[2020-09-01 13:06:45.777722833] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--","sub-01/anat/sub-01_inplaneT2.nii.gz"]
[2020-09-01 13:06:45.778861247] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2020-09-01 13:06:45.780058453] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2020-09-01 13:06:45.780986013] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2020-09-01 13:06:45.789373444] process done ExitSuccess
[2020-09-01 13:06:45.789626747] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2020-09-01 13:06:45.796984082] process done ExitSuccess
[2020-09-01 13:06:45.798025215] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ef6b41bf58869c65376f45e3bc9cfece333dbbdb","--pretty=%H","-n1"]
[2020-09-01 13:06:45.807267844] process done ExitSuccess
[2020-09-01 13:06:45.823835505] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-09-01 13:06:45.826361293] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-09-01 13:06:45.835442674] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
get sub-01/anat/sub-01_inplaneT2.nii.gz [2020-09-01 13:06:45.841927169] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-09-01 13:06:45.842542306] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
(from s3-PUBLIC...) 

[2020-09-01 13:06:45.891958405] Request {
  host                 = "openneuro.org.s3.amazonaws.com"
  port                 = 80
  secure               = False
  requestHeaders       = [("Accept-Encoding","identity"),("User-Agent","git-annex/8.20200810+git47-g27329f0bb-1~ndall+1")]
  path                 = "/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz"
  queryString          = "?versionId=NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ"
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}


  download failed: Not Found
[2020-09-01 13:06:46.258572407] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","diff-tree","-z","--raw","--no-renames","-l0","-r","4b825dc642cb6eb9a060e54bf8d69288fbee4904","739c73f4d89cbd1b42c4a605409463afaafe84ff","--"]
[2020-09-01 13:06:46.273868076] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-09-01 13:06:46.27461288] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-09-01 13:06:46.296141264] process done ExitSuccess

[2020-09-01 13:06:46.344478573] Request {
  host                 = "openneuro.org.s3.amazonaws.com"
  port                 = 80
  secure               = False
  requestHeaders       = [("Accept-Encoding","identity"),("User-Agent","git-annex/8.20200810+git47-g27329f0bb-1~ndall+1")]
  path                 = "/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}

(checksum...) ok                   
[2020-09-01 13:06:46.686059547] process done ExitSuccess
[2020-09-01 13:06:46.687454267] process done ExitSuccess
[2020-09-01 13:06:46.688804557] process done ExitSuccess
[2020-09-01 13:06:46.689983422] process done ExitSuccess
[2020-09-01 13:06:46.690394994] process done ExitSuccess
[2020-09-01 13:06:46.690610381] process done ExitSuccess
[2020-09-01 13:06:46.691043126] process done ExitSuccess
[2020-09-01 13:06:46.692561143] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--stdin-paths","--no-filters"]
[2020-09-01 13:06:46.694772828] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-index","-z","--index-info"]
[2020-09-01 13:06:46.70745508] process done ExitSuccess
[2020-09-01 13:06:46.707716623] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2020-09-01 13:06:46.71494411] process done ExitSuccess
(recording state in git...)
[2020-09-01 13:06:46.715154775] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","write-tree"]
[2020-09-01 13:06:46.722643263] process done ExitSuccess
[2020-09-01 13:06:46.722737792] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","c473bd9f7b2110d39adc69f64f05de6d991bcd35","--no-gpg-sign","-p","refs/heads/git-annex"]
[2020-09-01 13:06:46.726165435] process done ExitSuccess
[2020-09-01 13:06:46.726205297] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","refs/heads/git-annex","c9fb48cfde9ed727462376a64940380a01492f82"]
[2020-09-01 13:06:46.729285292] process done ExitSuccess
[2020-09-01 13:06:46.729554466] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-09-01 13:06:46.729850874] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-09-01 13:06:46.736016553] process done ExitSuccess
[2020-09-01 13:06:46.736857497] process done ExitSuccess
[2020-09-01 13:06:46.737529477] process done ExitSuccess
> git annex drop --debug sub-01/anat/sub-01_inplaneT2.nii.gz
[2020-09-01 13:06:46.776951692] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2020-09-01 13:06:46.779781413] process done ExitSuccess
[2020-09-01 13:06:46.779832399] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2020-09-01 13:06:46.785570419] process done ExitSuccess
[2020-09-01 13:06:46.786122549] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--","sub-01/anat/sub-01_inplaneT2.nii.gz"]
[2020-09-01 13:06:46.786830179] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2020-09-01 13:06:46.787477395] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2020-09-01 13:06:46.787994954] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2020-09-01 13:06:46.792951656] process done ExitSuccess
[2020-09-01 13:06:46.793249514] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2020-09-01 13:06:46.801028072] process done ExitSuccess
[2020-09-01 13:06:46.80164594] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..c9fb48cfde9ed727462376a64940380a01492f82","--pretty=%H","-n1"]
[2020-09-01 13:06:46.809912894] process done ExitSuccess
[2020-09-01 13:06:46.819934219] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-09-01 13:06:46.821406138] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-09-01 13:06:46.825138902] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2020-09-01 13:06:46.828044545] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","annex.largefiles","--"]
[2020-09-01 13:06:46.831617123] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-09-01 13:06:46.833993264] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
drop sub-01/anat/sub-01_inplaneT2.nii.gz (checking s3-PUBLIC...) [2020-09-01 13:06:46.876826897] Request {
  host                 = "openneuro.org.s3.amazonaws.com"
  port                 = 80
  secure               = False
  requestHeaders       = [("Accept-Encoding",""),("User-Agent","git-annex/8.20200810+git47-g27329f0bb-1~ndall+1")]
  path                 = "/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz"
  queryString          = "?versionId=NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ"
  method               = "HEAD"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}

(checking s3-PUBLIC...) [2020-09-01 13:06:47.53090843] Request {
  host                 = "openneuro.org.s3.amazonaws.com"
  port                 = 80
  secure               = False
  requestHeaders       = [("Accept-Encoding",""),("User-Agent","git-annex/8.20200810+git47-g27329f0bb-1~ndall+1")]
  path                 = "/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz"
  queryString          = "?versionId=NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ"
  method               = "HEAD"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}

(unsafe) 
  Could only verify the existence of 0 out of 1 necessary copies

  Try making some of these repositories available:
  	b5dd2e3d-825f-4bc2-b719-cba1059f6bfc -- root@93184394ac19:/datalad/ds000001

  (Use --force to override this check, or adjust numcopies.)
failed
[2020-09-01 13:06:47.575793279] process done ExitSuccess
[2020-09-01 13:06:47.576336045] process done ExitSuccess
[2020-09-01 13:06:47.57670703] process done ExitSuccess
[2020-09-01 13:06:47.576785507] process done ExitSuccess
[2020-09-01 13:06:47.576844576] process done ExitSuccess
[2020-09-01 13:06:47.576894115] process done ExitSuccess
git-annex: drop: 1 failed
**Expected behavior**

It should have contained currently correct one or some previous version (in this case there seems to be one one):

$> datalad ls -aL s3://openneuro.org/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz             
Connecting to bucket: openneuro.org
[INFO   ] S3 session: Connecting to the bucket openneuro.org anonymously 
Bucket info:
  Versioning: S3ResponseError: 403 Forbidden
     Website: S3ResponseError: 403 Forbidden
         ACL: <Policy: openneurocommon (owner) = FULL_CONTROL, http://acs.amazonaws.com/groups/global/AllUsers = READ, rblair2 = FULL_CONTROL, http://acs.amazonaws.com/groups/global/AllUsers = READ_ACP>
ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz 2020-08-21T20:24:15.000Z 669578 ver:2XOQz5Qk6RCKoiwNFp2dtYVrD3b2aPhv  acl:AccessDenied  http://openneuro.org.s3.amazonaws.com/ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz?versionId=2XOQz5Qk6RCKoiwNFp2dtYVrD3b2aPhv [OK]

so it should have been 2XOQz5Qk6RCKoiwNFp2dtYVrD3b2aPhv not NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ as it does now:

$> git show git-annex:./b0b/6a2/MD5E-s669578--0017a7174b9fdebeb1e57f36027bfb96.nii.gz.log.rmet
1531531226s deaa691f-c824-4416-9bf8-a94a47dd31b5:V +NbC0xQSwG8gRcTCNq5aSrlVkru9STddJ#ds000001/sub-01/anat/sub-01_inplaneT2.nii.gz

Additional context
Add any other context about the problem here.

may be relating to OpenNeuroOrg/datalad-service#71 which was to populate for those datasets which miss ed .rmet in early days?

@nellh
Copy link
Contributor

nellh commented Sep 1, 2020

This dataset was recently re-exported to S3 by clearing all versions of all files under the prefix, renaming the git-annex remote, marking it with git-annex dead, and running git-annex export for each tag. We did see a few 500 errors from S3 on the initial export across a few datasets and exported those again. Maybe we are still missing some versions for the objects that threw a 500 error? The wrong version might be for another tag?

@yarikoptic
Copy link
Contributor Author

We did see a few 500 errors from S3 on the initial export across a few datasets and exported those again.

hm - git annex did not tolerate (by waiting/retrying) them? I think it should. 500s could be very intermittent .

I think here in particular issue report it was "my bad" -- apparently somehow my "update openneuro datasets" did not update git-annex branch with new availability information... my bad. I have just tested on a fresh clone from https://github.com/OpenNeuroDatasets/ds000001.git and was able to get/drop all files in current HEAD. Sorry for the noise -- I will look into WTF is happening on my side

@yarikoptic
Copy link
Contributor Author

re 500s -- submitted https://git-annex.branchable.com/todo/tolerate_intermittent_errors_upon___34__export__34_____40__and_probably_copy__41___to_S3/?updated

@yarikoptic
Copy link
Contributor Author

ah -- I know what has happened! I use --since and the summary of the issue/need for another option is now in datalad/datalad#4857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants