crawl of openfmri datasets fails since aggregate-metadata crashes with AttributeError #1930

yarikoptic · 2017-10-29T02:32:25Z

(git)smaug:\u2026atasets-openfmri-crawl-20171028/datalad/crawl/openfmri/ds000001[master]git
$> datalad crawl         
[INFO   ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg 
[INFO   ] Creating a pipeline for the openfmri dataset ds000001 
Connecting to bucket: openneuro
[INFO   ] S3 session: Connecting to the bucket openneuro 
Bucket info:
  Versioning: S3ResponseError: 403 Forbidden
     Website: S3ResponseError: 403 Forbidden
         ACL: <Policy: openfmri (owner) = READ, openfmri (owner) = WRITE, openfmri (owner) = READ_ACP, openfmri (owner) = WRITE_ACP, http://acs.amazonaws.com/groups/global/AllUsers = READ, http://acs.amazonaws.com/groups/global/AllUsers = READ_ACP>
ds000001/
[INFO   ] Creating a pipeline for the openfmri bucket 
[WARNING] ATM we assume prefixes to correspond only to directories, adding / 
[INFO   ] Running pipeline [[<function switch_branch at 0x7f40e1039aa0>, [<datalad.crawler.nodes.s3.crawl_s3 object at 0x7f40e0678d90>, sub(ok_missing=True, subs=<<{'url': {'^s3://openfm...>>), switch(default=None, key='datalad_action', mapping=<<{'commit': <function _...>>, re=False)], <function switch_branch at 0x7f40e065ded8>], <function switch_branch at 0x7f40e065df50>, [<datalad.crawler.nodes.crawl_url.crawl_url object at 0x7f40e0678dd0>, [a_href_match(query='.*release_history.txt'), assign(assignments={'filename': 'changelog.txt'}, interpolate=False), <datalad.crawler.nodes.annex.Annexificator object at 0x7f40e101c690>], [a_href_match(query='.*/.*\\.(tgz|tar.*|zip)'), sub(ok_missing=False, subs=<<{'url': {'(http)s?(://...>>), <function func_node at 0x7f40dfff3140>, <datalad.crawler.nodes.annex.Annexificator object at 0x7f40e101c690>], <function _commit_versions at 0x7f40dfff31b8>], <datalad.crawler.nodes.annex._remove_obsolete object at 0x7f40e0678c50>, [{'loop': False}, <function switch_branch at 0x7f40dfff3230>, <function merge_branch at 0x7f40dfff3398>, <function _remove_other_versions at 0x7f40dfff3488>, [{'loop': True}, find_files(dirs=False, fail_if_none=True, regex='\\.(zip|tgz|tar(\\..+)?)$', topdir='.'), assign(assignments=<<{'dataset_file': 'ds00...>>, interpolate=True), switch(default=<function>, key='dataset_file', mapping=<<{'.*///ds000030_R1\\.0...>>, re=True)], [find_files(dirs=False, fail_if_none=False, regex=<<'(\\.(tsv|csv|txt|json...>>, topdir='.'), fix_permissions(executable=False, file_re='.*', input='filename', path=None)], <function switch_branch at 0x7f40dfff3668>, <function merge_branch at 0x7f40dfff3758>, <function _finalize at 0x7f40dfff37d0>], <function switch_branch at 0x7f40dfff3848>, <function _finalize at 0x7f40dfff38c0>] 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch incoming-s3-openneuro 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch master 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch incoming 
[INFO   ] Fetching 'https://openfmri.org/dataset/ds000001/' 
                                                                                                                                                                                    [INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch incoming-processed 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch master 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Aggregate metadata for dataset /mnt/btrfs/datasets-openfmri-crawl-20171028/datalad/crawl/openfmri/ds000001 
[ERROR  ] __exit__ [json_py.py:dump2xzstream:64] (AttributeError) 
datalad crawl  5.17s user 2.84s system 53% cpu 15.010 total

mih · 2017-10-29T09:12:39Z

Could you run this with --dbg, looks as if this is related to AutomagicIO?

yarikoptic · 2017-10-29T13:44:41Z

[DEBUG  ] Dump metadata of <Dataset path=/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001> (merge mode: init) into <Dataset path=/mnt/btrfs/datasets/datalad/crawl> 
[DEBUG  ] no usable BIDS metadata for CHANGES in <Dataset path=/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001>: File '/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001/CHANGES' could not be found in the current BIDS project. [bids_layout.py:get_nearest_helper:33] 
[DEBUG  ] no usable BIDS metadata for README in <Dataset path=/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001>: File '/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001/README' could not be found in the current BIDS project. [bids_layout.py:get_nearest_helper:33] 
[DEBUG  ] no usable BIDS metadata for participants.tsv in <Dataset path=/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001>: File '/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001/participants.tsv' could not be found in the current BIDS project. [bids_layout.py:get_nearest_helper:33] 
Traceback (most recent call last):
  File "/home/yoh/proj/datalad/datalad-master/venvs/dev/bin/datalad", line 8, in <module>
    main()
  File "/home/yoh/proj/datalad/datalad-master/datalad/cmdline/main.py", line 347, in main
    ret = cmdlineargs.func(cmdlineargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/base.py", line 425, in call_from_parser
    ret = cls.__call__(**kwargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/crawl.py", line 130, in __call__
    output = run_pipeline(pipeline, stats=stats)
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 114, in run_pipeline
    output = list(xrun_pipeline(*args, **kwargs))
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 194, in xrun_pipeline
    for idata_out, data_out in enumerate(xrun_pipeline_steps(pipeline, data_in, output=output_sub)):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 286, in xrun_pipeline_steps
    for data_out in xrun_pipeline_steps(pipeline_tail, data_, output=output):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 286, in xrun_pipeline_steps
    for data_out in xrun_pipeline_steps(pipeline_tail, data_, output=output):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 286, in xrun_pipeline_steps
    for data_out in xrun_pipeline_steps(pipeline_tail, data_, output=output):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 286, in xrun_pipeline_steps
    for data_out in xrun_pipeline_steps(pipeline_tail, data_, output=output):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 286, in xrun_pipeline_steps
    for data_out in xrun_pipeline_steps(pipeline_tail, data_, output=output):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 286, in xrun_pipeline_steps
    for data_out in xrun_pipeline_steps(pipeline_tail, data_, output=output):
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/pipeline.py", line 270, in xrun_pipeline_steps
    for data_ in data_in_to_loop:
  File "/home/yoh/proj/datalad/datalad-master/datalad/crawler/nodes/annex.py", line 1329, in _finalize
    aggregate_metadata(dataset='^', path=self.repo.path)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 437, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 425, in return_func
    results = list(results)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 382, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 449, in _process_results
    for res in results:
  File "/home/yoh/proj/datalad/datalad-master/datalad/metadata/aggregate.py", line 667, in __call__
    to_save)
  File "/home/yoh/proj/datalad/datalad-master/datalad/metadata/aggregate.py", line 238, in _extract_metadata
    store(meta, objpath)
  File "/home/yoh/proj/datalad/datalad-master/datalad/support/json_py.py", line 64, in dump2xzstream
    with lzma.LZMAFile(fname, mode='w') as f:
AttributeError: __exit__
()
> /home/yoh/proj/datalad/datalad-master/datalad/support/json_py.py(64)dump2xzstream()
-> with lzma.LZMAFile(fname, mode='w') as f:
(Pdb) p fname
'/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001/.datalad/metadata/objects/11/cn-ceee6877e54bbbe61688bcfa2dadac'
(Pdb) 
[1]  + 13262 suspended  datalad --dbg -l debug crawl
(dev)3 10569 ->148 [1].....................................:Sun 29 Oct 2017 09:41:08 AM EDT:.
(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001[master]git
$> ls -l /mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001/.datalad/metadata/objects/11/cn-ceee6877e54bbbe61688bcfa2dadac
-rw------- 1 yoh datalad 32 Oct 29 09:40 /mnt/btrfs/datasets/datalad/crawl/openfmri/ds000001/.datalad/metadata/objects/11/cn-ceee6877e54bbbe61688bcfa2dadac

FWIW, you could trigger it by running

datalad install  ///openfmri/ds000001; cd ds000001; datalad crawl

mih · 2017-10-30T12:01:20Z

I ran this (twice) sucessfully (using current master) and could not replicate the problem.

mih@meiner /tmp/openfmri/ds000001 (git)-[master] % datalad crawl
[INFO   ] Loading pipeline specification from ./.datalad/crawl/crawl.cfg 
[INFO   ] Creating a pipeline for the openfmri dataset ds000001 
Connecting to bucket: openneuro
You need to authenticate with 'datalad-test-s3' credentials. team@datalad.org provides information on how to gain access

<REDACTED>
[INFO   ] S3 session: Connecting to the bucket openneuro 
Bucket info:
  Versioning: S3ResponseError: 403 Forbidden
     Website: S3ResponseError: 403 Forbidden
         ACL: <Policy: openfmri (owner) = READ, openfmri (owner) = WRITE, openfmri (owner) = READ_ACP, openfmri (owner) = WRITE_ACP, http://acs.amazonaws.com/groups/global/AllUsers = READ, http://acs.amazonaws.com/groups/global/AllUsers = READ_ACP>
ds000001/
[INFO   ] Creating a pipeline for the openfmri bucket 
[WARNING] ATM we assume prefixes to correspond only to directories, adding / 
[INFO   ] Running pipeline [[<function switch_branch at 0x7fd8bd05ee60>, [<datalad.crawler.nodes.s3.crawl_s3 object at 0x7fd8b91799d0>, sub(ok_missing=True, subs=<<{'url': {'^s3://openfm...>>), switch(default=None, key='datalad_action', mapping=<<{'commit': <function _...>>, re=False)], <function switch_branch at 0x7fd8b9e90488>], <function switch_branch at 0x7fd8b9e671b8>, [<datalad.crawler.nodes.crawl_url.crawl_url object at 0x7fd8b9179910>, [a_href_match(query='.*release_history.txt'), assign(assignments={'filename': 'changelog.txt'}, interpolate=False), <datalad.crawler.nodes.annex.Annexificator object at 0x7fd8c26abcd0>], [a_href_match(query='.*/.*\\.(tgz|tar.*|zip)'), sub(ok_missing=False, subs=<<{'url': {'(http)s?(://...>>), <function func_node at 0x7fd8c43515f0>, <datalad.crawler.nodes.annex.Annexificator object at 0x7fd8c26abcd0>], <function _commit_versions at 0x7fd8b91737d0>], <datalad.crawler.nodes.annex._remove_obsolete object at 0x7fd8b9179b90>, [{'loop': False}, <function switch_branch at 0x7fd8b9173848>, <function merge_branch at 0x7fd8b9173938>, <function _remove_other_versions at 0x7fd8b9173a28>, [{'loop': True}, find_files(dirs=False, fail_if_none=True, regex='\\.(zip|tgz|tar(\\..+)?)$', topdir='.'), assign(assignments=<<{'dataset_file': 'ds00...>>, interpolate=True), switch(default=<function>, key='dataset_file', mapping=<<{'.*///ds000030_R1\\.0...>>, re=True)], [find_files(dirs=False, fail_if_none=False, regex=<<'(\\.(tsv|csv|txt|json...>>, topdir='.'), fix_permissions(executable=False, file_re='.*', input='filename', path=None)], <function switch_branch at 0x7fd8b9173c08>, <function merge_branch at 0x7fd8b9173c80>, <function _finalize at 0x7fd8b9173cf8>], <function switch_branch at 0x7fd8b9173d70>, <function _finalize at 0x7fd8b9173de8>] 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Did not find branch 'incoming-s3-openneuro' locally. Checking out remote one 'origin/incoming-s3-openneuro' 
[INFO   ] Checking out an existing branch incoming-s3-openneuro 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch master 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Did not find branch 'incoming' locally. Checking out remote one 'origin/incoming' 
[INFO   ] Checking out an existing branch incoming 
[INFO   ] Fetching 'https://openfmri.org/dataset/ds000001/' 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Did not find branch 'incoming-processed' locally. Checking out remote one 'origin/incoming-processed' 
[INFO   ] Checking out an existing branch incoming-processed 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Checking out an existing branch master 
[INFO   ] Found branch non-dirty -- nothing was committed 
[INFO   ] Aggregate metadata for dataset /tmp/openfmri/ds000001 
[INFO   ] Aggregate metadata for dataset /tmp/openfmri 
[INFO   ] Update aggregate metadata in dataset at: /tmp/openfmri/ds000001 
[INFO   ] Update aggregate metadata in dataset at: /tmp/openfmri 
[INFO   ] Attempting to save 176 files/datasets 
[INFO   ] No git house-keeping performed as no notable changes to git 
[INFO   ] Finished running pipeline: URLs processed: 7,  Files processed: 7, skipped: 6225 
[INFO   ] Total stats: URLs processed: 7,  Files processed: 7, skipped: 6225,  Datasets crawled: 1 
datalad crawl  7,84s user 4,73s system 18% cpu 1:09,61 total

yarikoptic mentioned this issue Oct 30, 2017

BF(workaround): pyliblzma might throw AttributeError __exit__ unless dir first #1939

Merged

mih closed this as completed in #1939 Nov 1, 2017

yarikoptic mentioned this issue Nov 13, 2017

BF: "run" to respect PWD, unless ran as ds.run (then top of the dataset) #1964

Merged

2 tasks

yarikoptic-gitmate mentioned this issue Aug 8, 2018

Failed forced metadata extraction #2752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crawl of openfmri datasets fails since aggregate-metadata crashes with AttributeError #1930

crawl of openfmri datasets fails since aggregate-metadata crashes with AttributeError #1930

yarikoptic commented Oct 29, 2017

mih commented Oct 29, 2017

yarikoptic commented Oct 29, 2017

mih commented Oct 30, 2017 •

edited

crawl of openfmri datasets fails since aggregate-metadata crashes with AttributeError #1930

crawl of openfmri datasets fails since aggregate-metadata crashes with AttributeError #1930

Comments

yarikoptic commented Oct 29, 2017

mih commented Oct 29, 2017

yarikoptic commented Oct 29, 2017

mih commented Oct 30, 2017 • edited

mih commented Oct 30, 2017 •

edited