Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: AnnexRepo.get_metadata(batch=True) #3364

Merged
merged 3 commits into from Apr 30, 2019
Merged

Conversation

mih
Copy link
Member

@mih mih commented Apr 30, 2019

This is a fragile enterprise due to the limitations of BatchedCommand(). When queried for a non-annexed file, git annex will error not via JSON response, but through stderr

{"file": "."}
git-annex: not an annexed file: .

which ATM is not monitored at all. I assume due to the general sadness of subprocess under PY2.

Docstrings warn about this, and there is no batch=None automagic because of this.

Other than that, it works.

This is needed to achieve a sensible performance of the annex extractor in -revolution (the behavior of the one in -core is pointless anyways), where all these limitation do not matter.

Waiting here: https://github.com/datalad/datalad-revolution/compare/enh-annex

@codecov
Copy link

codecov bot commented Apr 30, 2019

Codecov Report

Merging #3364 into master will increase coverage by 0.02%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3364      +/-   ##
==========================================
+ Coverage   91.15%   91.18%   +0.02%     
==========================================
  Files         263      263              
  Lines       34196    34210      +14     
==========================================
+ Hits        31172    31193      +21     
+ Misses       3024     3017       -7
Impacted Files Coverage Δ
datalad/support/tests/test_annexrepo.py 96.01% <100%> (ø) ⬆️
datalad/support/annexrepo.py 87.53% <93.75%> (+0.01%) ⬆️
datalad/cmd.py 96.12% <95%> (+0.03%) ⬆️
datalad/interface/run_procedure.py 89.93% <0%> (+0.62%) ⬆️
datalad/downloaders/http.py 86.5% <0%> (+2.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96826fa...9ab87c2. Read the comment docs.

mih added 3 commits April 30, 2019 13:58
Can now be used with a generator as input and yields a generator
as output.
without juggling data types and return values.
This is a fragile enterprise due to the limitations of
`BatchedCommand()`. When queried for a non-annexed file, git annex
will error not via JSON response, but through stderr

```
{"file": "."}
git-annex: not an annexed file: .
```

which ATM is not monitored at all. I assume due to the general
sadness of subprocess under PY2.

Docstrings warn about this, there is no `batch=None` automagic because
of this.

Other than that, is works.
@mih mih requested review from kyleam and yarikoptic and removed request for kyleam April 30, 2019 12:18
mih added a commit to datalad/datalad-revolution that referenced this pull request Apr 30, 2019
@yarikoptic
Copy link
Member

Two things to fix:

  • our BatchedCommand should not hang whenever the output is all in stderr
  • git-annex metadata --json call should report json records with errors (@yarikoptic thinks we had similar case for some other annex commands, may be add or alike before)

@yarikoptic yarikoptic merged commit 02e344a into datalad:master Apr 30, 2019
@mih mih deleted the enh-batched branch April 30, 2019 15:16
@yarikoptic yarikoptic added this to the Release 0.12.0 milestone May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants