Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed forced metadata extraction #2752

Closed
cmaumet opened this issue Aug 8, 2018 · 5 comments
Closed

Failed forced metadata extraction #2752

cmaumet opened this issue Aug 8, 2018 · 5 comments
Labels

Comments

@cmaumet
Copy link
Member

@cmaumet cmaumet commented Aug 8, 2018

What is the problem?

When force-extracting metadata that used to be big (version-controlled with git-annex) and becomes small (should be version-controlled with git), we get the following error:

$ datalad  aggregate-metadata --force-extraction
[INFO   ] Aggregate metadata for dataset /PATH_TO_DATASET/demo-datalad-nidmresults 
                                                                                                                                                     [INFO   ] Update aggregate metadata in dataset at: /PATH_TO_DATASET/demo-datalad-nidmresults                                    
aggregate_metadata(ok): /PATH_TO_DATASET/demo-datalad-nidmresults (dataset)                                                     
[INFO   ] Attempting to save 6 files/datasets 
                                                                                                                                                     Failed to run ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'commit', '-m', '[DATALAD] Dataset aggregate metadata update', '--', u'.datalad/metadata/objects/ec/ds-dc8dc60c8b2bb980a0b03fd1e1ce8e', u'.datalad/metadata/objects/ec/cn-dc8dc60c8b2bb980a0b03fd1e1ce8e.xz', u'.datalad/metadata/aggregate_v1.json'] under '/PATH_TO_DATASET/demo-datalad-nidmresults'. Exit code=1. out= err=git-annex: Cannot make a partial commit with unlocked annexed files. You should `git annex add` the files you want to commit, and then run git commit.

git-annex: Cannot make a partial commit with unlocked annexed files. You should `git annex add` the files you want to commit, and then run git commit.

What steps will reproduce the problem?

$ ls -Ll .datalad/metadata/objects/ec/ds-dc8dc60c8b2bb980a0b03fd1e1ce8e
-r--r--r--  1 login  staff  46939 Aug  8 14:29 .datalad/metadata/objects/ec/ds-dc8dc60c8b2bb980a0b03fd1e1ce8e

$ ls -l .datalad/metadata/objects/ec/ds-dc8dc60c8b2bb980a0b03fd1e1ce8e
lrwxr-xr-x  1 login  staff  128 Aug  8 14:36 .datalad/metadata/objects/ec/ds-dc8dc60c8b2bb980a0b03fd1e1ce8e -> ../../../../.git/annex/objects/m1/FV/MD5E-s46939--4c4dcc1c58da0c5f8da6f73db1807dcc/MD5E-s46939--4c4dcc1c58da0c5f8da6f73db1807dcc

$ datalad  aggregate-metadata --force-extraction
[INFO   ] Aggregate metadata for dataset /PATH_TO_DATASET/demo-datalad-nidmresults 
                                                                                                                                                     [INFO   ] Update aggregate metadata in dataset at: /PATH_TO_DATASET/demo-datalad-nidmresults                                    
aggregate_metadata(ok): /PATH_TO_DATASET/demo-datalad-nidmresults (dataset)                                                     
[INFO   ] Attempting to save 6 files/datasets 
                                                                                                                                                     Failed to run ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'commit', '-m', '[DATALAD] Dataset aggregate metadata update', '--', u'.datalad/metadata/objects/ec/ds-dc8dc60c8b2bb980a0b03fd1e1ce8e', u'.datalad/metadata/objects/ec/cn-dc8dc60c8b2bb980a0b03fd1e1ce8e.xz', u'.datalad/metadata/aggregate_v1.json'] under '/PATH_TO_DATASET/demo-datalad-nidmresults'. Exit code=1. out= err=git-annex: Cannot make a partial commit with unlocked annexed files. You should `git annex add` the files you want to commit, and then run git commit.

git-annex: Cannot make a partial commit with unlocked annexed files. You should `git annex add` the files you want to commit, and then run git commit.

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?

datalad 0.10.2.dev98

## system 
  - distribution:  10.13.5/x86_64
  - encoding: 
    - default: ascii
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 318
  - name: Darwin
  - release: 17.6.0
  - type: posix
  - version: Darwin Kernel Version 17.6.0: Tue May  8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64

Is there anything else that would be useful to know in this context?

This problem was identified with @yarikoptic at the Montreal neuroinformatics hackathon.

Have you had any success using DataLad before? (to assess your expertise/prior luck. We would welcome your testimonial additions to https://github.com/datalad/datalad/wiki/Testimonials as well)

Plenty!

@yarikoptic-gitmate
Copy link

@yarikoptic-gitmate yarikoptic-gitmate commented Aug 8, 2018

GitMate.io thinks possibly related issues are #1971 (metadata-aggregation fails to rewrite an unavailable object file), #1930 (crawl of openfmri datasets fails since aggregate-metadata crashes with AttributeError), https://github.com/datalad/datalad/issues/2698 (metamovie metadata extractor?), #1615 (-S for aggregate-metadata), and #1994 (Metadata parsing).

@joeyh
Copy link

@joeyh joeyh commented Aug 9, 2018

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Aug 9, 2018

@mih @bpoldrack and @kyleam - you analysis/feedback to that issue on git-annex.branchable would be very much appreciated. e.g. I've just posted one more comment with an idea which might die quickly or will live and prosper to the bright future ;)

@mih
Copy link
Member

@mih mih commented Nov 24, 2018

Related to #1651

yarikoptic added a commit that referenced this issue Nov 27, 2018
	## 0.11.1 (Nov 25, 2018) -- v7-better-than-v6

	Rushed out bugfix release to stay fully compatible with recent
	[git-annex] which introduced v7 to replace v6.

	### Fixes

	- [install]: be able to install recursively into a dataset ([#2982])
	- [save]: be able to commit/save changes whenever files potentially
	  could have swapped their storage between git and annex
	  ([#1651]) ([#2752]) ([#3009])
	- [aggregate-metadata]:
	  - dataset's itself is now not "aggregated" if specific paths are
		provided for aggregation ([#3002]). That resolves the issue of
		`-r` invocation aggregating all subdatasets of the specified dataset
		as well
	  - also compare/verify the actual content checksum of aggregated metadata
		while considering subdataset metadata for re-aggregation ([#3007])
	- `annex` commands are now chunked assuming 50% "safety margin" on the
	  maximal command line length. Should resolve crashes while operating
	  ot too many files at ones ([#3001])
	- `run` sidecar config processing ([#2991])
	- no double trailing period in docs ([#2984])
	- correct identification of the repository with symlinks in the paths
	  in the tests ([#2972])
	- re-evaluation of dataset properties in case of dataset changes ([#2946])
	- [text2git] procedure to use `ds.repo.set_gitattributes`
	  ([#2974]) ([#2954])
	- Switch to use plain `os.getcwd()` if inconsistency with env var
	  `$PWD` is detected ([#2914])
	- Make sure that credential defined in env var takes precedence
	  ([#2960]) ([#2950])

	### Enhancements and new features

	- [shub://datalad/datalad:git-annex-dev](https://singularity-hub.org/containers/5663/view)
	  provides a Debian buster Singularity image with build environment for
	  [git-annex]. [tools/bisect-git-annex]() provides a helper for running
	  `git bisect` on git-annex using that Singularity container ([#2995])
	- Added [.zenodo.json]() for better integration with Zenodo for citation
	- [run-procedure] now provides names and help messages with a custom
	  renderer for ([#2993])
	- Documentation: point to [datalad-revolution] extension (prototype of
	  the greater DataLad future)
	- [run]
	  - support injecting of a detached command ([#2937])
	- `annex` metadata extractor now extracts `annex.key` metadata record.
	  Should allow now to identify uses of specific files etc ([#2952])
	- Test that we can install from http://datasets.datalad.org
	- Proper rendering of `CommandError` (e.g. in case of "out of space"
	  error) ([#2958])

* tag '0.11.1':
  Adjust the date -- 25th fell through due to __version__ fiasco
  BF+ENH(TST): boost hardcoded version + provide a test to guarantee consistency in the future
  This (expensive) approach is not needed in v6+
  small tuneup to changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants