Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX & git-annex: fails to addurl empty files: getFileStatus: does not exist (No such file or directory) #79

Closed
yarikoptic opened this issue Mar 24, 2015 · 8 comments
Assignees
Labels
annex Git-annex related issue platform-osx Issue concerned with MacOSX

Comments

@yarikoptic
Copy link
Member

uff... first we thought it is just a matter of obscure filename which we have used in that test: \"';a&b&cd |` but it fails similarly in simpler cases as well. Here is a full trace which lead to failing

['git', 'annex', 'addurl', '-c', 'annex.alwayscommit=false', '--debug', '--file', '2/d/1d', 'http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d']

which seems to be all "kosher":

2015-03-24 13:26:06,905 [INFO   ] Annexing (mode=download) /var/folders/90/vkz4dwlx0ss72djd8hywgppc0000gp/T/tmpmlUvNC//2/d/1d originating from url=http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d present locally under /var/folders/90/vkz4dwlx0ss72djd8hywgppc0000gp/T/tmptLhRMA//2/d/1d (repos.py:199)
2015-03-24 13:26:06,915 [DEBUG  ] Hardlinking /var/folders/90/vkz4dwlx0ss72djd8hywgppc0000gp/T/tmptLhRMA/2/d/1d under /var/folders/90/vkz4dwlx0ss72djd8hywgppc0000gp/T/tmpmlUvNC/2/d/1d (cmd.py:256)
2015-03-24 13:26:07,048 [DEBUG  ] Running: ['git', 'annex', 'addurl', '-c', 'annex.alwayscommit=false', '--debug', '--file', '2/d/1d', 'http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d'] (cmd.py:237)
2015-03-24 13:26:07,239 [DEBUG  ] HTTP: "HEAD /.tmp-page2annex-SIDRg3/2/d/1d HTTP/1.1" 200 - (utils.py:165)
2015-03-24 13:26:07,278 [DEBUG  ] HTTP: "GET /.tmp-page2annex-SIDRg3/2/d/1d HTTP/1.1" 200 - (utils.py:165)
2015-03-24 13:26:07,290 [DEBUG  ] stdout| addurl 2/d/1d (downloading http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d ...) 
| 
| failed (cmd.py:237)
2015-03-24 13:26:07,290 [ERROR  ] stderr| [2015-03-24 13:26:07 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","show-ref","git-annex"]
| [2015-03-24 13:26:07 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","show-ref","--hash","refs/heads/git-annex"]
| [2015-03-24 13:26:07 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","log","refs/heads/git-annex..71d65c58fceef000fb39693413ecb35aef63c87c","-n1","--pretty=%H"]
| [2015-03-24 13:26:07 EDT] chat: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","cat-file","--batch"]
| [2015-03-24 13:26:07 EDT] read: quvi ["--version"]
| [2015-03-24 13:26:07 EDT] call: curl ["-f","-L","-C","-","-#","-o",".git/annex/tmp/URL-s0--http&c%%localhost&c8302%.tmp-page2annex-SIDRg3%2%d%1d","http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d","--user-agent","git-annex/5.20150322-gb17566d"]
| 
| [2015-03-24 13:26:07 EDT] chat: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","check-attr","-z","--stdin","annex.backend","annex.numcopies","--"]
| [2015-03-24 13:26:07 EDT] read: git ["--version"]
| git-annex: .git/annex/tmp/URL-s0--http&c%%localhost&c8302%.tmp-page2annex-SIDRg3%2%d%1d: getFileStatus: does not exist (No such file or directory)
| git-annex: addurl: 1 failed (cmd.py:237)
2015-03-24 13:26:07,291 [ERROR  ] Failed to run ['git', 'annex', 'addurl', '-c', 'annex.alwayscommit=false', '--debug', '--file', '2/d/1d', 'http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d'] under '/var/folders/90/vkz4dwlx0ss72djd8hywgppc0000gp/T/tmpmlUvNC'. Exit code=1 (cmd.py:200)

@joeyh -- any ideas on WTF with this OSX? git annex is fresh: 5.20150322-gb17566d

to repl

@yarikoptic yarikoptic added the annex Git-annex related issue label Mar 24, 2015
@yarikoptic
Copy link
Member Author

uff -- figured it out. Apparently it is due to the file being empty! so it seems that if file from URL is empty, it fails under OSX... just incase -- here is my protocol of replicating this struggle:

datalads-imac:tmpxaN0zN datalad$ rm 2/d/1d                                                                                                          
datalads-imac:tmpxaN0zN datalad$ touch 2/d/1d
datalads-imac:tmpxaN0zN datalad$ curl http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d         
127.0.0.1 - - [24/Mar/2015 14:06:47] "GET /.tmp-page2annex-CUNZCK/2/d/1d HTTP/1.1" 200 -
datalads-imac:tmpxaN0zN datalad$ echo $?                                                          
0
<yscommit=false' '--debug' '--file' '2/d/1d' 'http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d'
[2015-03-24 14:06:55 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","show-ref","git-annex"]
[2015-03-24 14:06:55 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","show-ref","--hash","refs/heads/git-annex"]
[2015-03-24 14:06:55 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","log","refs/heads/git-annex..befc6b50f8f8e591692384d013a80b4daa838bec","-n1","--pretty=%H"]
[2015-03-24 14:06:55 EDT] chat: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","cat-file","--batch"]
[2015-03-24 14:06:55 EDT] read: quvi ["--version"]
127.0.0.1 - - [24/Mar/2015 14:06:55] "HEAD /.tmp-page2annex-CUNZCK/2/d/1d HTTP/1.1" 200 -
addurl 2/d/1d (downloading http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d ...) 
[2015-03-24 14:06:55 EDT] call: curl ["-f","-L","-C","-","-#","-o",".git/annex/tmp/URL-s0--http&c%%localhost&c8296%.tmp-page2annex-CUNZCK%2%d%1d","http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d","--user-agent","git-annex/5.20150322-gb17566d"]
127.0.0.1 - - [24/Mar/2015 14:06:55] "GET /.tmp-page2annex-CUNZCK/2/d/1d HTTP/1.1" 200 -

[2015-03-24 14:06:55 EDT] chat: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","check-attr","-z","--stdin","annex.backend","annex.numcopies","--"]
[2015-03-24 14:06:55 EDT] read: git ["--version"]

git-annex: .git/annex/tmp/URL-s0--http&c%%localhost&c8296%.tmp-page2annex-CUNZCK%2%d%1d: getFileStatus: does not exist (No such file or directory)
failed
git-annex: addurl: 1 failed
datalads-imac:tmpxaN0zN datalad$ cd -
/Users/datalad/code/datalad
datalads-imac:datalad datalad$ echo lkjsdf > .tmp-page2annex-CUNZCK/2/d/1d                                                                          
datalads-imac:datalad datalad$ cd -
/var/folders/90/vkz4dwlx0ss72djd8hywgppc0000gp/T/tmpxaN0zN
datalads-imac:tmpxaN0zN datalad$ curl http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d
127.0.0.1 - - [24/Mar/2015 14:07:23] "GET /.tmp-page2annex-CUNZCK/2/d/1d HTTP/1.1" 200 -
lkjsdf
<yscommit=false' '--debug' '--file' '2/d/1d' 'http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d'                                                  
[2015-03-24 14:07:27 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","show-ref","git-annex"]
[2015-03-24 14:07:27 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","show-ref","--hash","refs/heads/git-annex"]
[2015-03-24 14:07:27 EDT] read: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","log","refs/heads/git-annex..befc6b50f8f8e591692384d013a80b4daa838bec","-n1","--pretty=%H"]
[2015-03-24 14:07:27 EDT] chat: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","cat-file","--batch"]
[2015-03-24 14:07:27 EDT] read: quvi ["--version"]
127.0.0.1 - - [24/Mar/2015 14:07:27] "HEAD /.tmp-page2annex-CUNZCK/2/d/1d HTTP/1.1" 200 -
addurl 2/d/1d (downloading http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d ...) 
[2015-03-24 14:07:27 EDT] call: curl ["-f","-L","-C","-","-#","-o",".git/annex/tmp/URL-s7--http&c%%localhost&c8296%.tmp-page2annex-CUNZCK%2%d%1d","http://localhost:8296/.tmp-page2annex-CUNZCK/2/d/1d","--user-agent","git-annex/5.20150322-gb17566d"]
127.0.0.1 - - [24/Mar/2015 14:07:27] "GET /.tmp-page2annex-CUNZCK/2/d/1d HTTP/1.1" 200 -
######################################################################## 100.0%
[2015-03-24 14:07:27 EDT] chat: git ["--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","check-attr","-z","--stdin","annex.backend","annex.numcopies","--"]
[2015-03-24 14:07:27 EDT] read: git ["--version"]
ok
(recording state in git...)
[2015-03-24 14:07:27 EDT] feed: xargs ["-0","git","--git-dir=.git","--work-tree=.","-c","annex.alwayscommit=false","add","--"]

@yarikoptic yarikoptic added severity-normal standard severity platform-osx Issue concerned with MacOSX and removed severity-normal standard severity labels Mar 24, 2015
@yarikoptic yarikoptic changed the title OSX & git-annex: addurl fails with git-annex: .git/annex/tmp/URL-s0--http&c%%localhost&c8157%.tmp-page2annex-AqTXUC%2%d%1d: getFileStatus: does not exist (No such file or directory) OSX & git-annex: fails to addurl empty files: getFileStatus: does not exist (No such file or directory) Mar 25, 2015
@joeyh
Copy link

joeyh commented Mar 26, 2015

Reproduced on OSX: git-annex addurl http://tmp.kitenet.net/empty/

see shy jo

@joeyh
Copy link

joeyh commented Mar 26, 2015

This seems to be a wget bug on OSX:

oberon:annex joeyh$ rm foo
oberon:annex joeyh$ wget -O foo http://tmp.kitenet.net/empty
--2015-03-26 12:26:24-- http://tmp.kitenet.net/empty
Resolving tmp.kitenet.net... 66.228.36.95, 2600:3c03::f03c:91ff:fe73:b0d2
Connecting to tmp.kitenet.net|66.228.36.95|:80... connected.
HTTP request sent, awaiting response... 200 OK

The file is already fully retrieved; nothing to do.

On Linux with wget 1.16, it does not behave this way.

My OSX account has wget 1.16.1. I don't know if this is a regression
in the newer wget version, or some OSX specific bug.

see shy jo

@joeyh
Copy link

joeyh commented Mar 26, 2015

I was able to reproduce the bug when I built
http://ftp.gnu.org/gnu/wget/wget-1.16.1.tar.gz

And, happily, the bug is fixed in
http://ftp.gnu.org/gnu/wget/wget-1.16.3.tar.gz

wget is not currently bundled with git-annex for OSX, because there
were problems with the cert store. Instead, git-annex for OSX bundles
curl, but will use a locally installed wget if there is one.

This doesn't seem worth forcing git-annex to use curl over, since it
only affects one buggy version of wget. Looks like OSX homebrew has
already been updated to 1.16.3 too.

I suppose for datalad, the fix is to update the locally installed wget,
or perhaps force git-annex to use curl.

see shy jo

@yarikoptic
Copy link
Member Author

That is great -- thanks for boiling this one down!
I will close the issue whenever we upgrade wget there (that OSX is base with xcode/git/annex installed, no homebrews etc) and adjust tests for not working around there.

overall - this one is yet another indicator for needing external tools version-aware handling which we haven't yet anyhow supported in datalad (#76 points to how we did it in PyMVPA). I will add a new label to the issues ("bug-external-tools") to group such issues.

@joeyh
Copy link

joeyh commented Mar 26, 2015

Yaroslav Halchenko wrote:

That is great -- thanks for boiling this one down!
I will close the issue whenever we upgrade wget there (that OSX is base with
xcode/git/annex installed, no homebrews etc) and adjust tests for not working
around there.

Can you tell where the bad wget comes from? If it's shipped with git, I
guess it will be fixed quickly, but Apple is known to ship badly out of
date utilities (such as a many year old version of bash), so if OSX base
includes a broken wget, it may be worth working around in git-annex
somehow.

see shy jo

@yarikoptic
Copy link
Member Author

looking back at my report: who said that it was wget? ;)

| [2015-03-24 13:26:07 EDT] call: curl ["-f","-L","-C","-","-#","-o",".git/annex/tmp/URL-s0--http&c%%localhost&c8302%.tmp-page2annex-SIDRg3%2%d%1d","http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d","--user-agent","git-annex/5.20150322-gb17566d"]

btw -- I have added your ssh key to the datalad account (under which buildbot slave is running) on that mac, so you could check directly (not that you have Administrator access there anyways ;) ).

@joeyh
Copy link

joeyh commented Mar 27, 2015

Yaroslav Halchenko wrote:

looking back at my report: who said that it was wget? ;)

| [2015-03-24 13:26:07 EDT] call: curl ["-f","-L","-C","-","-#","-o",".git/annex/tmp/URL-s0--http&c%%localhost&c8302%.tmp-page2annex-SIDRg3%2%d%1d","http://localhost:8302/.tmp-page2annex-SIDRg3/2/d/1d","--user-agent","git-annex/5.20150322-gb17566d"]

Aha, well spotted!

It seems that curl also has a bug here; if the url is empty it just
doesn't create an empty file locally.

Apparently empty urls are a tricky buisiness if you're building an url
downloader. :-/

Sent bug to curl maintainers. Put in a workaround in git-annex
for this.

see shy jo

@joeyh joeyh closed this as completed Mar 27, 2015
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Apr 20, 2015
yarikoptic added a commit that referenced this issue Apr 20, 2015
ENH: #79 was addressed so we can test with empty load now
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Apr 20, 2015
…_and_normalize_paths

* origin/master:
  ENH: datalad#79 was addressed so we can test with empty load now
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Apr 22, 2015
* origin/master:
  fixed typo in virtualenv instructions
  DOC: Make README.md and requirements.txt more consistent
  Fixed up the url for git:// way to clone -- should be "/" not ":" (as in ssh way)
  minor pep8 to get rebuild going
  ENH: datalad#79 was addressed so we can test with empty load now
  RF: normalize description of files to be 'list of str'
  RF/NF: introduce normalize_path to be used if a single file is intended to be input
  RF/ENH: match_return_type for normalize_paths
  RF/BF: file_has_content always returns list of bool now
  ENH: two spaces before inline comments
  ENH: pep8 - some haders fixed up, require numpy in requirements, some spaces

Conflicts:
	datalad/support/annexrepo.py
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Apr 22, 2015
* rf-patool-logging-etc:
  ENH: paranoidal check that outputs are swallowed
  ENH/BF: patool integration - swallow its outputs, use our runner
  ENH: mock print function as well when swallow_outputs
  BF: patool cmdline is not installed properly on Windows -- use library
  fixed typo in virtualenv instructions
  DOC: Make README.md and requirements.txt more consistent
  Fixed up the url for git:// way to clone -- should be "/" not ":" (as in ssh way)
  minor pep8 to get rebuild going
  ENH: datalad#79 was addressed so we can test with empty load now
  RF: normalize description of files to be 'list of str'
  RF/NF: introduce normalize_path to be used if a single file is intended to be input
yarikoptic pushed a commit that referenced this issue May 3, 2021
Break up key statements to make them more readable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
annex Git-annex related issue platform-osx Issue concerned with MacOSX
Projects
None yet
Development

No branches or pull requests

2 participants