Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add annex key to the extracted datalad-core metadata record #2950

Closed
wants to merge 3 commits into from

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Oct 26, 2018

This is to possibly facilitate additional search functionality such as requested in
https://neurostars.org/t/return-hash-and-url-for-every-nifti-file-in-datalad-super-dataset/2733
in case if a known checksum matches the backend used by git annex

I bet it is done better in the rev- version of the universe. Here I chose to provide an ugly hack to avoid secondary call to lookup key for a file.

Should break some tests, possibly in datalad-neuroimaging since output record is extended now.

  • may be it should be just a key or annex_key since I think - is not used in the metadata fields
@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 26, 2018

@mih @bpoldrack @kyleam what do you think? should annex_key be a part of datalad_core or annex (just a key there), or may be even at the top level alongside with path, type?

@codecov
Copy link

@codecov codecov bot commented Oct 26, 2018

Codecov Report

Merging #2950 into master will decrease coverage by 46.27%.
The diff coverage is 36.36%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #2950       +/-   ##
===========================================
- Coverage   90.32%   44.04%   -46.28%     
===========================================
  Files         246      245        -1     
  Lines       32029    32021        -8     
===========================================
- Hits        28930    14105    -14825     
- Misses       3099    17916    +14817
Impacted Files Coverage Δ
datalad/metadata/extractors/tests/test_base.py 0% <0%> (-87.18%) ⬇️
datalad/metadata/extractors/datalad_core.py 77.55% <100%> (-11.82%) ⬇️
datalad/support/annexrepo.py 24.53% <33.33%> (-63.59%) ⬇️
datalad/customremotes/tests/__init__.py 0% <0%> (-100%) ⬇️
datalad/distribution/tests/test_subdataset.py 0% <0%> (-100%) ⬇️
datalad/interface/tests/test_annotate_paths.py 0% <0%> (-100%) ⬇️
datalad/cmdline/tests/__init__.py 0% <0%> (-100%) ⬇️
datalad/distribution/tests/test_dataset.py 0% <0%> (-100%) ⬇️
datalad/metadata/extractors/tests/__init__.py 0% <0%> (-100%) ⬇️
datalad/interface/tests/test_download_url.py 0% <0%> (-100%) ⬇️
... and 173 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f935535...83c7bee. Read the comment docs.

@mih
Copy link
Member

@mih mih commented Oct 26, 2018

I would prefer to have annex stuff done by the annex extractor.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 26, 2018

ok. while at it - do you see any other information from annex wortwhile extracting? we already do metadata, and urls (as part of the datalad_core). Could be the list of clones (remotes) where file is present (too variable imho).

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Oct 26, 2018

superseded by #2952

@yarikoptic yarikoptic closed this Oct 26, 2018
yarikoptic added a commit that referenced this issue Nov 27, 2018
	## 0.11.1 (Nov 25, 2018) -- v7-better-than-v6

	Rushed out bugfix release to stay fully compatible with recent
	[git-annex] which introduced v7 to replace v6.

	### Fixes

	- [install]: be able to install recursively into a dataset ([#2982])
	- [save]: be able to commit/save changes whenever files potentially
	  could have swapped their storage between git and annex
	  ([#1651]) ([#2752]) ([#3009])
	- [aggregate-metadata]:
	  - dataset's itself is now not "aggregated" if specific paths are
		provided for aggregation ([#3002]). That resolves the issue of
		`-r` invocation aggregating all subdatasets of the specified dataset
		as well
	  - also compare/verify the actual content checksum of aggregated metadata
		while considering subdataset metadata for re-aggregation ([#3007])
	- `annex` commands are now chunked assuming 50% "safety margin" on the
	  maximal command line length. Should resolve crashes while operating
	  ot too many files at ones ([#3001])
	- `run` sidecar config processing ([#2991])
	- no double trailing period in docs ([#2984])
	- correct identification of the repository with symlinks in the paths
	  in the tests ([#2972])
	- re-evaluation of dataset properties in case of dataset changes ([#2946])
	- [text2git] procedure to use `ds.repo.set_gitattributes`
	  ([#2974]) ([#2954])
	- Switch to use plain `os.getcwd()` if inconsistency with env var
	  `$PWD` is detected ([#2914])
	- Make sure that credential defined in env var takes precedence
	  ([#2960]) ([#2950])

	### Enhancements and new features

	- [shub://datalad/datalad:git-annex-dev](https://singularity-hub.org/containers/5663/view)
	  provides a Debian buster Singularity image with build environment for
	  [git-annex]. [tools/bisect-git-annex]() provides a helper for running
	  `git bisect` on git-annex using that Singularity container ([#2995])
	- Added [.zenodo.json]() for better integration with Zenodo for citation
	- [run-procedure] now provides names and help messages with a custom
	  renderer for ([#2993])
	- Documentation: point to [datalad-revolution] extension (prototype of
	  the greater DataLad future)
	- [run]
	  - support injecting of a detached command ([#2937])
	- `annex` metadata extractor now extracts `annex.key` metadata record.
	  Should allow now to identify uses of specific files etc ([#2952])
	- Test that we can install from http://datasets.datalad.org
	- Proper rendering of `CommandError` (e.g. in case of "out of space"
	  error) ([#2958])

* tag '0.11.1':
  Adjust the date -- 25th fell through due to __version__ fiasco
  BF+ENH(TST): boost hardcoded version + provide a test to guarantee consistency in the future
  This (expensive) approach is not needed in v6+
  small tuneup to changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants