Skip to content

Able to get all individual files, but not folder as a whole from RIA: "invalid start byte" & "broken pipe" #7214

@mslw

Description

@mslw

I am working in a dataset clone made from ria+ssh. When trying to datalad get a folder with several files, the result is "ok" for the first file in that folder, and then "error" for all subsequent files:

❱ datalad get input/2.1 
get(ok): input/2.1/1626079095.600487.json (file) [from entrystore-storage...]                                                         
get(error): input/2.1/1626095182.2839966.json (file) [Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]]
get(error): input/2.1/1626095628.319196.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626161029.8703449.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626161392.6669025.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626249325.7805614.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(error): input/2.1/1626249452.0744905.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]
get(impossible): input/2.1 (directory) [could not get some content in /home/mszczepanik/test-get/rawdata/input/2.1 ['/home/mszczepanik/test-get/rawdata/input/2.1/1626095182.2839966.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626095628.319196.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626161029.8703449.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626161392.6669025.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626249325.7805614.json', '/home/mszczepanik/test-get/rawdata/input/2.1/1626249452.0744905.json']]
action summary:
  get (error: 6, impossible: 1, ok: 1)

However, I can get all these files one by one:

❱ datalad get input/2.1/1626095182.2839966.json
get(ok): input/2.1/1626095182.2839966.json (file) [from entrystore-storage...]                                                        
(tgr) mszczepanik@juseless in ~/test-get/rawdata on git:master
❱ datalad get input/2.1/1626095628.319196.json
get(ok): input/2.1/1626095628.319196.json (file) [from entrystore-storage...]

Trying with --log-level debug reveals that the error comes from git annex itself (at this moment we have three files locally, it gets the fourth, and errors on subsequent):

[DEBUG  ] received JSON result from annex: {'command': 'get', 'wanted': [{'here': False, 'uuid': '2f2283e5-a1ab-4c77-b328-914f8a8eef02', 'description': 'SFB 1451 Data Entry Set'}], 'note': 'from entrystore-storage...\nUnable to access these remotes: entrystore-storage\n(Note that these git remotes have annex-ignore set: origin)', 'success': False, 'input': ['input/2.1'], 'key': 'MD5E-s4772--c0dc07e23583fa1dd16eb1dafa80a553.json', 'error-messages': ['  Failed to obtain key: ["\'utf-8\' codec can\'t decode byte 0x85 in position 0: invalid start byte", "\'utf-8\' codec can\'t decode byte 0xf0 in position 4: invalid continuation byte"]'], 'file': 'input/2.1/1626161392.6669025.json'} 
get(error): input/2.1/1626161392.6669025.json (file) [Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]]
[DEBUG  ] received JSON result from annex: {'command': 'get', 'wanted': [{'here': False, 'uuid': '2f2283e5-a1ab-4c77-b328-914f8a8eef02', 'description': 'SFB 1451 Data Entry Set'}], 'note': 'from entrystore-storage...\nUnable to access these remotes: entrystore-storage\n(Note that these git remotes have annex-ignore set: origin)', 'success': False, 'input': ['input/2.1'], 'key': 'MD5E-s4771--7e63db35116ceb0ea794952b5081c22a.json', 'error-messages': ["  Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']"], 'file': 'input/2.1/1626249325.7805614.json'} 
get(error): input/2.1/1626249325.7805614.json (file) [Failed to obtain key: ['[Errno 32] Broken pipe', '[Errno 32] Broken pipe']]

And indeed:

❱ git annex get input/2.1 
get input/2.1/1626161392.6669025.json (from entrystore-storage...) 
(checksum...) ok
get input/2.1/1626249325.7805614.json (from entrystore-storage...) 

  Failed to obtain key: ["'utf-8' codec can't decode byte 0x85 in position 0: invalid start byte", "'utf-8' codec can't decode byte 0xf0 in position 4: invalid continuation byte"]

  Unable to access these remotes: entrystore-storage

  Maybe add some of these git remotes (git remote add ...):
  	2f2283e5-a1ab-4c77-b328-914f8a8eef02 -- SFB 1451 Data Entry Set

  (Note that these git remotes have annex-ignore set: origin)
failed

Note: the remote which git-annex suggests to "maybe add" above is where the RIA store got populated from (same machine as the RIA). For the record, I can clone that "initial" dataset through SSH and have no problems getting the folder in that situation. But my intention is to drop data from there and only access the RIA from the outside.

Context:

  • Data in the RIA storage sibling is encrypted using git-annex encryption (sharedpubkey), I have a private (decryption) key on the machine where I run datalad get
  • observed with datalad 0.17.8 and git-annex version: 8.20210310
  • also observed with datalad 0.17.9 and git-annex version: 10.20221004-gbf27a02b0

Metadata

Metadata

Assignees

Labels

RIA/ORAIssues related to RIA-based workflows and related components

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions