Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add annex key to the extracted "annex" metadata record #2952

merged 2 commits into from Nov 1, 2018


Copy link

@yarikoptic yarikoptic commented Oct 26, 2018

  • Replaces #2950 with more native placement of key record
  • adjusts AnnexRepo.get_file_key with explicit batch=None option to trigger batch mode of operation

cons: with this change (and as test fixups show) now all annexed files will get some metadata record within "annex" section. Time will show if any notable impact on size/performance

Copy link
Member Author

@yarikoptic yarikoptic commented Oct 26, 2018

FWIW @chrisfilo, with this RF search would look more like

$ datalad_ -f '{path}: {metadata[annex][key]} {metadata[datalad_core][url]}' -c search path:.*\.nii.gz annex.key:MD5E.* datalad_core.url:.

to limit to only the files having any url and being under MD5E backend (although even MD5 would suffice if someone uses such one ;-) i.e. without extensions within keyname)

Copy link

@codecov codecov bot commented Oct 28, 2018

Codecov Report

Merging #2952 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2952      +/-   ##
+ Coverage   90.32%   90.32%   +<.01%     
  Files         246      246              
  Lines       32029    32044      +15     
+ Hits        28930    28945      +15     
  Misses       3099     3099
Impacted Files Coverage Δ
datalad/metadata/tests/ 93.23% <ø> (ø) ⬆️
datalad/metadata/extractors/tests/ 88.37% <100%> (+1.19%) ⬆️
datalad/metadata/tests/ 99.26% <100%> (+0.75%) ⬆️
datalad/support/ 88.04% <100%> (-0.07%) ⬇️
datalad/support/tests/ 96.38% <100%> (+0.01%) ⬆️
datalad/metadata/extractors/ 93.33% <100%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84f44bc...f044295. Read the comment docs.

Copy link
Member Author

@yarikoptic yarikoptic commented Oct 29, 2018

@mih any comments/recommendations/merge?

Copy link

@Shotgunosine Shotgunosine commented Oct 31, 2018

I'd like to see this merged too.

Copy link
Member Author

@yarikoptic yarikoptic commented Nov 1, 2018

well, since the government wants it, how could we refuse? ;-) no objects were stated, so must be the best piece of code out there in the wild!

@yarikoptic yarikoptic merged commit 504545d into datalad:master Nov 1, 2018
9 of 10 checks passed
@yarikoptic yarikoptic added this to the Release 0.11.1 milestone Nov 24, 2018
yarikoptic added a commit that referenced this issue Nov 27, 2018
	## 0.11.1 (Nov 25, 2018) -- v7-better-than-v6

	Rushed out bugfix release to stay fully compatible with recent
	[git-annex] which introduced v7 to replace v6.

	### Fixes

	- [install]: be able to install recursively into a dataset ([#2982])
	- [save]: be able to commit/save changes whenever files potentially
	  could have swapped their storage between git and annex
	  ([#1651]) ([#2752]) ([#3009])
	- [aggregate-metadata]:
	  - dataset's itself is now not "aggregated" if specific paths are
		provided for aggregation ([#3002]). That resolves the issue of
		`-r` invocation aggregating all subdatasets of the specified dataset
		as well
	  - also compare/verify the actual content checksum of aggregated metadata
		while considering subdataset metadata for re-aggregation ([#3007])
	- `annex` commands are now chunked assuming 50% "safety margin" on the
	  maximal command line length. Should resolve crashes while operating
	  ot too many files at ones ([#3001])
	- `run` sidecar config processing ([#2991])
	- no double trailing period in docs ([#2984])
	- correct identification of the repository with symlinks in the paths
	  in the tests ([#2972])
	- re-evaluation of dataset properties in case of dataset changes ([#2946])
	- [text2git] procedure to use `ds.repo.set_gitattributes`
	  ([#2974]) ([#2954])
	- Switch to use plain `os.getcwd()` if inconsistency with env var
	  `$PWD` is detected ([#2914])
	- Make sure that credential defined in env var takes precedence
	  ([#2960]) ([#2950])

	### Enhancements and new features

	- [shub://datalad/datalad:git-annex-dev](
	  provides a Debian buster Singularity image with build environment for
	  [git-annex]. [tools/bisect-git-annex]() provides a helper for running
	  `git bisect` on git-annex using that Singularity container ([#2995])
	- Added [.zenodo.json]() for better integration with Zenodo for citation
	- [run-procedure] now provides names and help messages with a custom
	  renderer for ([#2993])
	- Documentation: point to [datalad-revolution] extension (prototype of
	  the greater DataLad future)
	- [run]
	  - support injecting of a detached command ([#2937])
	- `annex` metadata extractor now extracts `annex.key` metadata record.
	  Should allow now to identify uses of specific files etc ([#2952])
	- Test that we can install from
	- Proper rendering of `CommandError` (e.g. in case of "out of space"
	  error) ([#2958])

* tag '0.11.1':
  Adjust the date -- 25th fell through due to __version__ fiasco
  BF+ENH(TST): boost hardcoded version + provide a test to guarantee consistency in the future
  This (expensive) approach is not needed in v6+
  small tuneup to changelog
@yarikoptic yarikoptic deleted the enh-annexkey2 branch Feb 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants