Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Able to get all individual files, but not folder as a whole from RIA: "invalid start byte" & "broken pipe" #7214

Closed
mslw opened this issue Dec 5, 2022 · 8 comments · Fixed by #7215
Assignees
Labels
RIA/ORA Issues related to RIA-based workflows and related components

Comments

@mslw
Copy link
Contributor

mslw commented Dec 5, 2022

I am working in a dataset clone made from ria+ssh. When trying to datalad get a folder with several files, the result is "ok" for the first file in that folder, and then "error" for all subsequent files:

❱ datalad get input/2.1 
get(ok): input/2.1/1626079095.600487.json (file) [from entrystore-storage...]                                                         
get(error): input/2.1/1626095182.2839966.json (file) [Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]]
get(error): input/2.1/1626095628.319196.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626161029.8703449.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626161392.6669025.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626249325.7805614.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626249452.0744905.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(impossible): input/2.1 (directory) [could not get some content in /home/mszczepanik/test-get/rawdata/input/2.1 ['/home/mszczepanik/test-get/rawdata/input/2.1/1626095182.2839966.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626095628.319196.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626161029.8703449.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626161392.6669025.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626249325.7805614.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626249452.0744905.json']]
action summary:
  get (error: 6, impossible: 1, ok: 1)

However, I can get all these files one by one:

❱ datalad get input/2.1/1626095182.2839966.json
get(ok): input/2.1/1626095182.2839966.json (file) [from entrystore-storage...]                                                        
(tgr) mszczepanik@juseless in ~/test-get/rawdata on git:master
❱ datalad get input/2.1/1626095628.319196.json
get(ok): input/2.1/1626095628.319196.json (file) [from entrystore-storage...]

Trying with --log-level debug reveals that the error comes from git annex itself (at this moment we have three files locally, it gets the fourth, and errors on subsequent):

[DEBUG  ] received JSON result from annex: {'command': 'get', 'wanted': [{'here': False, 'uuid': '2f2283e5-a1ab-4c77-b328-914f8a8eef02', 'description': 'SFB 1451 Data Entry Set'}], 'note': 'from entrystore-storage...\nUnable to access these remotes: entrystore-storage\n(Note that these git remotes have annex-ignore set: origin)', 'success': False, 'input': ['input/2.1'], 'key': 'MD5E-s4772--c0dc07e23583fa1dd16eb1dafa80a553.json', 'error-messages': ['  Failed to obtain key: ["\'utf-8\' codec can\'t decode byte 0x85 in position 0: invalid start byte", "\'utf-8\' codec can\'t decode byte 0xf0 in position 4: invalid continuation byte"]'], 'file': 'input/2.1/1626161392.6669025.json'} 
get(error): input/2.1/1626161392.6669025.json (file) [Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]]
[DEBUG  ] received JSON result from annex: {'command': 'get', 'wanted': [{'here': False, 'uuid': '2f2283e5-a1ab-4c77-b328-914f8a8eef02', 'description': 'SFB 1451 Data Entry Set'}], 'note': 'from entrystore-storage...\nUnable to access these remotes: entrystore-storage\n(Note that these git remotes have annex-ignore set: origin)', 'success': False, 'input': ['input/2.1'], 'key': 'MD5E-s4771--7e63db35116ceb0ea794952b5081c22a.json', 'error-messages': ["  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']"], 'file': 'input/2.1/1626249325.7805614.json'} 
get(error): input/2.1/1626249325.7805614.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]

And indeed:

❱ git annex get input/2.1 
get input/2.1/1626161392.6669025.json (from entrystore-storage...) 
(checksum...) ok
get input/2.1/1626249325.7805614.json (from entrystore-storage...) 

  Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed

Note: the remote which git-annex suggests to "maybe add" above is where the RIA store got populated from (same machine as the RIA). For the record, I can clone that "initial" dataset through SSH and have no problems getting the folder in that situation. But my intention is to drop data from there and only access the RIA from the outside.

Context:

  • Data in the RIA storage sibling is encrypted using git-annex encryption (sharedpubkey), I have a private (decryption) key on the machine where I run datalad get
  • observed with datalad 0.17.8 and git-annex version: 8.20210310
  • also observed with datalad 0.17.9 and git-annex version: 10.20221004-gbf27a02b0
@mslw
Copy link
Contributor Author

mslw commented Dec 5, 2022

Repeating the process with git-annex --debug shows the following messages surrounding the first (ok) and second (failed) file:

[2022-12-05 10:49:34.821873785] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f .git/annex/tmp/GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f
[2022-12-05 10:49:34.824091362] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f
[2022-12-05 10:49:34.824245801] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE KQ/Mj/
[2022-12-05 10:49:34.894627396] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-SUCCESS RETRIEVE GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f
[2022-12-05 10:49:34.896633793] (Utility.Process) process [2944567] chat: gpg ["--quiet","--trust-model","always","--batch","--decrypt"]
[2022-12-05 10:49:34.9446311] (Utility.Process) process [2944567] done ExitSuccess
(checksum...) [2022-12-05 10:49:34.949195904] (Annex.Perms) freezing content .git/annex/objects/Wp/x1/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json
[2022-12-05 10:49:34.962490214] (Annex.Perms) freezing content directory .git/annex/objects/Wp/x1/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json
[2022-12-05 10:49:34.963456332] (Annex.Branch) read 331/734/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json.log
[2022-12-05 10:49:34.967673746] (Annex.Branch) set 331/734/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json.log
ok
get input/2.1/1626095182.2839966.json (from entrystore-storage...) 
[2022-12-05 10:49:34.974336686] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595 .git/annex/tmp/GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595
[2022-12-05 10:49:34.974794755] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595
[2022-12-05 10:49:34.974893725] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE Vz/QJ/
[2022-12-05 10:49:34.983072983] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595 Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]
[2022-12-05 10:49:34.983294433] (Annex.Branch) read d42/b46/MD5E-s5524--89ec912c893f55cbc9ce0f9697262675.json.log.cnk

  Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]

  Unable to access these remotes: entrystore-storage
[2022-12-05 10:49:34.990286932] (Annex.Branch) read uuid.log

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed

Apparently, the error comes from the ora special remote (git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE)?

Full debug output:
❱ datalad drop input/2.1  # we were running out of locally unavailable files
❱ git annex --debug get input/2.1
[2022-12-05 10:49:33.803932633] (Utility.Process) process [2944490] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","input/2.1"]
[2022-12-05 10:49:33.804891742] (Utility.Process) process [2944491] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-12-05 10:49:33.805406561] (Utility.Process) process [2944492] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-12-05 10:49:33.806384399] (Utility.Process) process [2944493] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2022-12-05 10:49:33.820757248] (Utility.Process) process [2944493] done ExitSuccess
[2022-12-05 10:49:33.821500726] (Utility.Process) process [2944494] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2022-12-05 10:49:33.83248133] (Utility.Process) process [2944494] done ExitSuccess
[2022-12-05 10:49:33.833140689] (Utility.Process) process [2944495] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..73088ef65f305741a7262c2441d5956c02d57810","--pretty=%H","-n1"]
[2022-12-05 10:49:33.8457162] (Utility.Process) process [2944495] done ExitSuccess
[2022-12-05 10:49:33.846303939] (Utility.Process) process [2944496] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..ced6e40db5f5a5ef9b21fdddf648239aa9ec8aa0","--pretty=%H","-n1"]
[2022-12-05 10:49:33.858192551] (Utility.Process) process [2944496] done ExitSuccess
[2022-12-05 10:49:33.860174098] (Utility.Process) process [2944497] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
get input/2.1/1626079095.600487.json [2022-12-05 10:49:33.886995958] (Utility.Process) process [2944499] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(from entrystore-storage...) 
[2022-12-05 10:49:33.907326428] (Utility.Process) process [2944500] chat: /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora []
[2022-12-05 10:49:34.35880531] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> VERSION 1
[2022-12-05 10:49:34.3590544] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- EXTENSIONS INFO GETGITREMOTENAME ASYNC
[2022-12-05 10:49:34.359189529] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> EXTENSIONS
[2022-12-05 10:49:34.359247919] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- PREPARE
[2022-12-05 10:49:34.359356519] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> GETGITDIR
[2022-12-05 10:49:34.359405629] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE .git
[2022-12-05 10:49:34.473992087] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> GETCONFIG name
[2022-12-05 10:49:34.474264316] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE entrystore-storage
[2022-12-05 10:49:34.474566266] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> GETCONFIG url
[2022-12-05 10:49:34.474672136] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE ria+ssh://sfb1451.inm7.de:/data/sfb1451.inm7.de/z03-data-encrypted
[2022-12-05 10:49:34.474890645] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> GETCONFIG push-url
[2022-12-05 10:49:34.474998025] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE 
[2022-12-05 10:49:34.475937784] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> GETCONFIG archive-id
[2022-12-05 10:49:34.476049474] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE 8e465aa4-af88-4abd-aaa0-d248339780be
[2022-12-05 10:49:34.476255283] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> GETCONFIG force-write
[2022-12-05 10:49:34.476371693] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE 
[2022-12-05 10:49:34.821178566] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> PREPARE-SUCCESS
[2022-12-05 10:49:34.821873785] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f .git/annex/tmp/GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f
[2022-12-05 10:49:34.824091362] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f
[2022-12-05 10:49:34.824245801] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE KQ/Mj/
[2022-12-05 10:49:34.894627396] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-SUCCESS RETRIEVE GPGHMACSHA1--5d9512c5760ab5df6857f00e6ad9690b5d6ab41f
[2022-12-05 10:49:34.896633793] (Utility.Process) process [2944567] chat: gpg ["--quiet","--trust-model","always","--batch","--decrypt"]
[2022-12-05 10:49:34.9446311] (Utility.Process) process [2944567] done ExitSuccess
(checksum...) [2022-12-05 10:49:34.949195904] (Annex.Perms) freezing content .git/annex/objects/Wp/x1/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json
[2022-12-05 10:49:34.962490214] (Annex.Perms) freezing content directory .git/annex/objects/Wp/x1/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json
[2022-12-05 10:49:34.963456332] (Annex.Branch) read 331/734/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json.log
[2022-12-05 10:49:34.967673746] (Annex.Branch) set 331/734/MD5E-s4720--600741a960628d7a37992deb7e54e4e2.json.log
ok
get input/2.1/1626095182.2839966.json (from entrystore-storage...) 
[2022-12-05 10:49:34.974336686] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595 .git/annex/tmp/GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595
[2022-12-05 10:49:34.974794755] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595
[2022-12-05 10:49:34.974893725] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE Vz/QJ/
[2022-12-05 10:49:34.983072983] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--d2d1ec171c533ff42f5a104d433f311260e18595 Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]
[2022-12-05 10:49:34.983294433] (Annex.Branch) read d42/b46/MD5E-s5524--89ec912c893f55cbc9ce0f9697262675.json.log.cnk

  Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]

  Unable to access these remotes: entrystore-storage
[2022-12-05 10:49:34.990286932] (Annex.Branch) read uuid.log

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed
(from entrystore-storage...) 96.json 
[2022-12-05 10:49:34.997580331] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--e65bd4dcf0f067ce1137502c80678cdd4d95fef9 .git/annex/tmp/GPGHMACSHA1--e65bd4dcf0f067ce1137502c80678cdd4d95fef9
[2022-12-05 10:49:34.997806351] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--e65bd4dcf0f067ce1137502c80678cdd4d95fef9
[2022-12-05 10:49:34.997897751] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE 76/Qf/
[2022-12-05 10:49:34.99827858] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--e65bd4dcf0f067ce1137502c80678cdd4d95fef9 Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']
[2022-12-05 10:49:34.99840149] (Annex.Branch) read 2f6/f1f/MD5E-s5523--ba11dcb323ee18b8a3d80651e1ee80c4.json.log.cnk

  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed
(from entrystore-storage...) 449.json 
[2022-12-05 10:49:35.015770374] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--970f0ec117b5f157f45a30ca58a9aadbb4cca648 .git/annex/tmp/GPGHMACSHA1--970f0ec117b5f157f45a30ca58a9aadbb4cca648
[2022-12-05 10:49:35.016100523] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--970f0ec117b5f157f45a30ca58a9aadbb4cca648
[2022-12-05 10:49:35.016172343] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE pg/g5/
[2022-12-05 10:49:35.016531042] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--970f0ec117b5f157f45a30ca58a9aadbb4cca648 Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']
[2022-12-05 10:49:35.016614032] (Annex.Branch) read 09e/ed7/MD5E-s4772--3c0a402442fa8daf30dbff5d4ec08a0c.json.log.cnk

  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed
(from entrystore-storage...) 025.json 
[2022-12-05 10:49:35.026329498] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--9dbae60db03d5f1f3eeb9dea402ce4a5b20dc179 .git/annex/tmp/GPGHMACSHA1--9dbae60db03d5f1f3eeb9dea402ce4a5b20dc179
[2022-12-05 10:49:35.026538837] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--9dbae60db03d5f1f3eeb9dea402ce4a5b20dc179
[2022-12-05 10:49:35.026627537] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE Q6/64/
[2022-12-05 10:49:35.026949207] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--9dbae60db03d5f1f3eeb9dea402ce4a5b20dc179 Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Bro
ken pipe']
[2022-12-05 10:49:35.027066047] (Annex.Branch) read 226/b20/MD5E-s4772--c0dc07e23583fa1dd16eb1dafa80a553.json.log.cnk

  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed
(from entrystore-storage...) 614.json 
[2022-12-05 10:49:35.037143992] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--624a5140a5fa53b688fab8bb09b7cab41707604f .git/annex/tmp/GPGHMACSHA1--624a5140a5fa53b688fab8bb09b7cab41707604f
[2022-12-05 10:49:35.037346531] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--624a5140a5fa53b688fab8bb09b7cab41707604f
[2022-12-05 10:49:35.037432051] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE 3P/17/
[2022-12-05 10:49:35.037741141] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--624a5140a5fa53b688fab8bb09b7cab41707604f Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']
[2022-12-05 10:49:35.03786586] (Annex.Branch) read cfd/e6b/MD5E-s4771--7e63db35116ceb0ea794952b5081c22a.json.log.cnk

  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed
(from entrystore-storage...) 905.json 
[2022-12-05 10:49:35.048588444] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- TRANSFER RETRIEVE GPGHMACSHA1--96912464b78cc3286f1e8994c03784576b6ec05e .git/annex/tmp/GPGHMACSHA1--96912464b78cc3286f1e8994c03784576b6ec05e
[2022-12-05 10:49:35.048800944] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> DIRHASH GPGHMACSHA1--96912464b78cc3286f1e8994c03784576b6ec05e
[2022-12-05 10:49:35.048887314] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] <-- VALUE 61/pP/
[2022-12-05 10:49:35.049198943] (Annex.ExternalAddonProcess) /home/mszczepanik/test-get/tgr/bin/git-annex-remote-ora[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA1--96912464b78cc3286f1e8994c03784576b6ec05e Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']
[2022-12-05 10:49:35.049321113] (Annex.Branch) read 5b4/31b/MD5E-s4771--4db471169fb28d66f8f0793e01083fc8.json.log.cnk

  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed
[2022-12-05 10:49:35.054830465] (Utility.Process) process [2944497] done ExitSuccess
[2022-12-05 10:49:35.054983135] (Utility.Process) process [2944492] done ExitSuccess
[2022-12-05 10:49:35.055101405] (Utility.Process) process [2944491] done ExitSuccess
[2022-12-05 10:49:35.055198265] (Utility.Process) process [2944490] done ExitSuccess
[2022-12-05 10:49:35.195051005] (Utility.Process) process [2944500] done ExitSuccess
[2022-12-05 10:49:35.198907309] (Database.Handle) commitDb start
[2022-12-05 10:49:35.200992346] (Database.Handle) commitDb done
[2022-12-05 10:49:35.205349549] (Utility.Process) process [2944589] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","hash-object","-w","--stdin-paths","--no-filters"]
[2022-12-05 10:49:35.208006245] (Utility.Process) process [2944590] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-index","-z","--index-info"]
[2022-12-05 10:49:35.224083211] (Utility.Process) process [2944590] done ExitSuccess
[2022-12-05 10:49:35.225416129] (Utility.Process) process [2944591] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2022-12-05 10:49:35.235012904] (Utility.Process) process [2944591] done ExitSuccess
(recording state in git...)
[2022-12-05 10:49:35.236399062] (Utility.Process) process [2944592] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","write-tree"]
[2022-12-05 10:49:35.261804394] (Utility.Process) process [2944592] done ExitSuccess
[2022-12-05 10:49:35.262889853] (Utility.Process) process [2944593] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","commit-tree","c2a43abb951fb6148d0557731555c96a1505d9f1","--no-gpg-sign","-p","refs/heads/git-annex"]
[2022-12-05 10:49:35.2777399] (Utility.Process) process [2944593] done ExitSuccess
[2022-12-05 10:49:35.278755329] (Utility.Process) process [2944594] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/heads/git-annex","faf95934ac8c04220428732ca80523a4baccffe5"]
[2022-12-05 10:49:35.292544898] (Utility.Process) process [2944594] done ExitSuccess
[2022-12-05 10:49:35.298270109] (Utility.Process) process [2944499] done ExitSuccess
[2022-12-05 10:49:35.299046778] (Utility.Process) process [2944589] done ExitSuccess
get: 6 failed

@bpoldrack
Copy link
Member

bpoldrack commented Dec 5, 2022

Interesting. This smells like the ORA remote is unable to write the streamed chunks into the target file. (As in: Are we writing a utf-8 string instead of binary somewhere?)

To be clear: 'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte is the only thing relevant. The subsequent failures are due to the fact, that the special remote crashed.

@bpoldrack bpoldrack self-assigned this Dec 5, 2022
@bpoldrack bpoldrack added the RIA/ORA Issues related to RIA-based workflows and related components label Dec 5, 2022
@christian-monch
Copy link
Contributor

We had the decoding error before. A component split its output into random chunks, Andi the Receiver just decoded every chunk. That led the errors, but f the chunk borders where located inside an utf8 encoded character

@christian-monch
Copy link
Contributor

So it's either that, or we decode() binary data, e.g. encrypted data.

@bpoldrack
Copy link
Member

@christian-monch

@mslw and I found it.

It's a little more twisted, but yes we ended up decoding binary data. The decode() call is not in itself incorrect, though. We are reading from a pipe where we do not expect binary data. It was written into it, but never consumed and then mistaken for the output of something else later on.

@bpoldrack
Copy link
Member

bpoldrack commented Dec 5, 2022

Leaving a note on how the bug would be triggered:

It depends on several things: First and foremost, the triggering annex keys must not have the key's size in it and the conncetion to the RIA store must be via SSH and lastly several keys are being retrieved. Such a key would in itself succeed, but screw up the ORA remote for the following key by leaving "garbage" in stdout of the persistent remote shell.

Actual bug, however, was to issue a remote cat on the key when we don't actually know that we want to use it yet, implying it's not just buggy but also an unnecessary operation (hence impacts performance).

mslw added a commit to mslw/datalad that referenced this issue Dec 5, 2022
This delays the remote-end `cat` command until just before its output
is needed, for two reasons:

1. The command output is not needed if `get` decides to use scp

2. When getting multiple files encrypted by git-annex (which would use
scp, due to size not being part of the annex key), the `cat` output
would remain in standard output and would be read when checking
whether the *next* file exists, crashing upon `decode()` datalad#7214
@mih
Copy link
Member

mih commented Dec 7, 2022

FTR: The ssh downloader in datalad-next shows how to get the file size when a key doesn't have it. This would make (slow and inconvenient) fallbacks unnecessarry.

@yarikoptic-gitmate
Copy link
Collaborator

Issue fixed in 0.17.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RIA/ORA Issues related to RIA-based workflows and related components
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants