New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A variety of enhancements and fixes for metadata search #2724

Merged
merged 7 commits into from Aug 15, 2018

Conversation

Projects
None yet
2 participants
@yarikoptic
Member

yarikoptic commented Aug 1, 2018

See individual commits please for the summaries

Please have a look @datalad/developers

yarikoptic added some commits Aug 1, 2018

@yarikoptic yarikoptic added the WIP label Aug 1, 2018

@codecov

This comment has been minimized.

codecov bot commented Aug 1, 2018

Codecov Report

Merging #2724 into master will decrease coverage by 0.2%.
The diff coverage is 52.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2724      +/-   ##
==========================================
- Coverage   90.42%   90.22%   -0.21%     
==========================================
  Files         245      245              
  Lines       30918    31236     +318     
==========================================
+ Hits        27959    28182     +223     
- Misses       2959     3054      +95
Impacted Files Coverage Δ
datalad/metadata/extractors/annex.py 92.3% <100%> (+0.3%) ⬆️
datalad/metadata/extractors/base.py 83.33% <100%> (+1.51%) ⬆️
datalad/metadata/tests/test_search.py 93.23% <100%> (+0.37%) ⬆️
datalad/metadata/metadata.py 85.75% <100%> (+0.27%) ⬆️
datalad/metadata/search.py 79.42% <26.66%> (-0.46%) ⬇️
datalad/plugin/wtf.py 79.89% <0%> (-10.27%) ⬇️
datalad/support/keyring_.py 84.44% <0%> (-6.67%) ⬇️
datalad/ui/progressbars.py 86.59% <0%> (-5.27%) ⬇️
datalad/customremotes/base.py 84.14% <0%> (-3.89%) ⬇️
datalad/utils.py 86.28% <0%> (-2.49%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cce8cf5...54ecd3b. Read the comment docs.

@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 1, 2018

crap -- something funky in direct mode -- file managed to get added to git not annex, although I could not replicate it manually... added an explicit check which now it should get tripped on in direct mode:

$> DATALAD_REPO_DIRECT=1 python -m nose -s -v datalad/metadata/tests/test_search.py:test_within_ds_file_search 
datalad.metadata.tests.test_search.test_within_ds_file_search ... FAIL
Versions: appdirs=1.4.3 boto=2.48.0 cmd:annex=6.20180626+gitg12cd64369-1~ndall+1 cmd:git=2.18.0 cmd:system-git=2.18.0 cmd:system-ssh=7.6p1 exifread=2.1.2 git=2.1.8 gitdb=2.0.3 humanize=0.5.1 iso8601=0.1.11 msgpack=0.5.1 mutagen=1.40.0 requests=2.18.4 scrapy=1.5.0 six=1.11.0 tqdm=4.19.5 wrapt=1.10.11
Obscure filename: str=b' "\';a&b&c\xce\x94\xd0\x99\xd7\xa7\xd9\x85\xe0\xb9\x97\xe3\x81\x82 `| ' repr=' "\';a&b&cΔЙקم๗あ `| '
Encodings: default='utf-8' filesystem='utf-8' locale.prefered='UTF-8'
Environment: LANG='en_US.utf8' PATH='/home/yoh/proj/datalad/datalad-master/venvs/dev3/bin:/home/yoh/gocode/bin:/home/yoh/gocode/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin:/sbin:/usr/sbin:/usr/local/sbin' GIT_PAGER='less --no-init --quit-if-one-screen' GIT_PYTHON_GIT_EXECUTABLE='/usr/lib/git-annex.linux/git'

======================================================================
FAIL: datalad.metadata.tests.test_search.test_within_ds_file_search
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 587, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/yoh/proj/datalad/datalad-master/datalad/metadata/tests/test_search.py", line 196, in test_within_ds_file_search
    ok_file_under_git(path, opj('stim', 'stim1.mp3'), annexed=True)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 237, in ok_file_under_git
    assert(annexed == in_annex)
AssertionError: 
-------------------- >> begin captured logging << --------------------
datalad.utils: Level 5: Importing datalad.utils
datalad.utils: Level 5: Done importing datalad.utils
datalad.cmd: Level 9: Will use git under '/usr/lib/git-annex.linux' (no adjustments to PATH if empty string)
datalad.cmd: Level 9: Running: ['git', 'version']
datalad.cmd: Level 8: Finished running ['git', 'version'] with status 0
datalad.cmd: Level 9: Running: ['git', 'config', '-z', '-l', '--show-origin']
datalad.cmd: Level 8: Finished running ['git', 'config', '-z', '-l', '--show-origin'] with status 0
datalad.ui: Level 5: Starting importing ui
datalad.ui.dialog: Level 5: Starting importing ui.dialog
datalad.ui.dialog: Level 5: Done importing ui.dialog
datalad.ui: Level 5: Initiating UI switcher
datalad.ui: DEBUG: UI set to DialogUI(out=<TextIOWrapper>)
datalad.ui: Level 5: Done importing ui
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 3.418s

FAILED (failures=1)

most amazing is that we have enough of tests for basic add'ing and so far things worked ok for that in direct mode as far as I see it... may be those git add --dry-run follow ups somehow confuse git annex???

/tmp/indirect-1.log:[Level 9] Running: ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'add', '--dry-run', '-N', '--ignore-missing', '--verbose', '--', '.datalad/config', 'stim/stim1.mp3'] 
/tmp/indirect-1.log:[Level 8] Finished running ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'add', '--dry-run', '-N', '--ignore-missing', '--verbose', '--', '.datalad/config', 'stim/stim1.mp3'] with status 0 
/tmp/indirect-1.log:[Level 9] Running: ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--debug', '--json', '--', '.datalad/config', 'stim/stim1.mp3'] 
/tmp/indirect-1.log:[Level 8] Finished running ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--debug', '--json', '--', '.datalad/config', 'stim/stim1.mp3'] with status 0 
/tmp/indirect-1.log:[DEBUG] received JSON result from annex: {'command': 'add', 'success': True, 'key': 'MD5E-s5702--3473a2b39b83b7b4282bbb5875250c2d.mp3', 'file': 'stim/stim1.mp3'} 
/tmp/indirect-1.log:[Level 9] Running: ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'add', '--dry-run', '-N', '--ignore-missing', '--verbose', '--', 'stim/stim1.mp3', '.datalad/config'] 
/tmp/indirect-1.log:[Level 8] Finished running ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'add', '--dry-run', '-N', '--ignore-missing', '--verbose', '--', 'stim/stim1.mp3', '.datalad/config'] with status 0 
/tmp/indirect-1.log:[Level 9] Running: ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--debug', '--json', '--', 'stim/stim1.mp3', '.datalad/config'] 
/tmp/indirect-1.log:[Level 8] Finished running ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--debug', '--json', '--', 'stim/stim1.mp3', '.datalad/config'] with status 0 

@mih @bpoldrack - any clues?

@yarikoptic yarikoptic added this to the Release 0.10.3 milestone Aug 2, 2018

@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 4, 2018

@mih @bpoldrack - any wisdom on abnormal behavior above?

@mih

This comment has been minimized.

Member

mih commented Aug 8, 2018

It looks weird -- I have no insights yet.

BF(TST,workaround): mark test known to fail in direct mode
fix for direct mode will come in a separate branch

@yarikoptic yarikoptic referenced this pull request Aug 14, 2018

Merged

BFing direct mode: "git commit file(s)" etc #2770

5 of 6 tasks complete

@yarikoptic yarikoptic merged commit 559b694 into datalad:master Aug 15, 2018

7 of 9 checks passed

codecov/patch 52.17% of diff hit (target 90.42%)
Details
codecov/project 90.22% (-0.21%) compared to cce8cf5
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
datalad-pr-dl-osx-64 DEV build done.
Details
datalad-pr-docker-dl-nd14_04 DEV build done.
Details
datalad-pr-docker-dl-nd16_04 DEV build done.
Details
datalad-pr-docker-dl-nd80 DEV build done.
Details
datalad-pr-docker-dl-nd90 DEV build done.
Details

yarikoptic added a commit that referenced this pull request Sep 13, 2018

Merge tag '0.10.3' into debian
0.10.3 (Sep 13, 2018) -- Almost-perfect

This is largely a bugfix release which addressed many (but not yet all)
issues of working with git-annex direct and version 6 modes, and operation
on Windows in general.  Among enhancements you will see the
support of public S3 buckets (even with periods in their names),
ability to configure new providers interactively, and improved `egrep`
search backend.

Although we do not require with this release, it is recommended to make
sure that you are using a recent `git-annex` since it also had a variety
of fixes and enhancements in the past months.

Fixes

- Parsing of combined short options has been broken since DataLad
  v0.10.0. ([#2710])
- The `datalad save` instructions shown by `datalad run` for a command
  with a non-zero exit were incorrectly formatted. ([#2692])
- Decompression of zip files (e.g., through `datalad
  add-archive-content`) failed on Python 3.  ([#2702])
- Windows:
  - colored log output was not being processed by colorama.  ([#2707])
  - more codepaths now try multiple times when removing a file to deal
    with latency and locking issues on Windows.  ([#2795])
- Internal git fetch calls have been updated to work around a
  GitPython `BadName` issue.  ([#2712]), ([#2794])
- The progess bar for annex file transferring was unable to handle an
  empty file.  ([#2717])
- `datalad add-readme` halted when no aggregated metadata was found
  rather than displaying a warning.  ([#2731])
- `datalad rerun` failed if `--onto` was specified and the history
  contained no run commits.  ([#2761])
- Processing of a command's results failed on a result record with a
  missing value (e.g., absent field or subfield in metadata).  Now the
  missing value is rendered as "N/A".  ([#2725]).
- A couple of documentation links in the "Delineation from related
  solutions" were misformatted.  ([#2773])
- With the latest git-annex, several known V6 failures are no longer
  an issue.  ([#2777])
- In direct mode, commit changes would often commit annexed content as
  regular Git files.  A new approach fixes this and resolves a good
  number of known failures.  ([#2770])
- The reporting of command results failed if the current working
  directory was removed (e.g., after an unsuccessful `install`). ([#2788])
- When installing into an existing empty directory, `datalad install`
  removed the directory after a failed clone.  ([#2788])
- `datalad run` incorrectly handled inputs and outputs for paths with
  spaces and other characters that require shell escaping.  ([#2798])
- Globbing inputs and outputs for `datalad run` didn't work correctly
  if a subdataset wasn't installed.  ([#2796])
- Minor (in)compatibility with git 2.19 - (no) trailing period
  in an error message now. ([#2815])

Enhancements and new features

- Anonymous access is now supported for S3 and other downloaders.  ([#2708])
- A new interface is available to ease setting up new providers.  ([#2708])
- Metadata: changes to egrep mode search  ([#2735])
  - Queries in egrep mode are now case-sensitive when the query
    contains any uppercase letters and are case-insensitive otherwise.
    The new mode egrepcs can be used to perform a case-sensitive query
    with all lower-case letters.
  - Search can now be limited to a specific key.
  - Multiple queries (list of expressions) are evaluated using AND to
    determine whether something is a hit.
  - A single multi-field query (e.g., `pa*:findme`) is a hit, when any
    matching field matches the query.
  - All matching key/value combinations across all (multi-field)
    queries are reported in the query_matched result field.
  - egrep mode now shows all hits rather than limiting the results to
    the top 20 hits.
- The documentation on how to format commands for `datalad run` has
  been improved.  ([#2703])
- The method for determining the current working directory on Windows
  has been improved.  ([#2707])
- `datalad --version` now simply shows the version without the
  license.  ([#2733])
- `datalad export-archive` learned to export under an existing
  directory via its `--filename` option.  ([#2723])
- `datalad export-to-figshare` now generates the zip archive in the
  root of the dataset unless `--filename` is specified.  ([#2723])
- After importing `datalad.api`, `help(datalad.api)` (or
  `datalad.api?` in IPython) now shows a summary of the available
  DataLad commands.  ([#2728])
- Support for using `datalad` from IPython has been improved.  ([#2722])
- `datalad wtf` now returns structured data and reports the version of
  each extension.  ([#2741])
- The internal handling of gitattributes information has been
  improved.  A user-visible consequence is that `datalad create
  --force` no longer duplicates existing attributes.  ([#2744])
- The "annex" metadata extractor can now be used even when no content
  is present.  ([#2724])
- The `add_url_to_file` method (called by commands like `datalad
  download-url` and `datalad add-archive-content`) learned how to
  display a progress bar.  ([#2738])

* tag '0.10.3': (214 commits)
  Changelog entry for 2.19 git compat fix
  DOC: slight tune ups to the ChangeLog
  ENH: link issue/pull #s in CHANGELOG, use tools/link_issues_CHANGELOG
  BF: remove trailing period while matching a mesage from git
  DOC: try_multiple_dec: Add description of `duration` parameter
  CLN: annexrepo: Fix grammar in a recently added comment
  TST: auto: Reformat comment from 900ee08
  DOC: _rmtree: Drop a word from summary line
  DOC: Info on IDs (fixes gh-2801)
  BF: diff -- when reraising - just raise, do not raise that instance of exception from new location
  BF(TST): precommit before removing submodule so there is no batched processes
  ENH+BF(TST): close established ssh sockets upon test finish
  BF(TST): One more "clost all log handlers in the test"
  CHANGELOG.md: Start adding entries for v0.10.3
  BF(TST): close cookies db in the teardown since atexit is later, so cannot assure no open files
  BF(TST): explicitly close created log handlers
  ENH(TST): @known_failure_windows to replace plain @skip_if_on_windows where there is hope
  BF(TST): do not swallow output while testing AutomagicIO to not cause some open files issue
  ENH(TST): Skip a test when cannot remove curent directory
  BF(TST): explicitly precommit a repo used under swallow_outputs
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment