Skip to content

Do not use gen4-metadata methods in datalad metadata-command #7001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

christian-monch
Copy link
Contributor

Fixes #7000

This PR fixes issue #7000. It restricts legacy metadata code that implents datalad metadata to only use legacy metadata query methods.

Changelog

🐛 Bug Fixes

This commit ensures that legacy metadata code in datalad
core use the legacy functions to query aggregated metadata,
i.e. `legacy_query_aggregated_metadata()`.

This fixes an error described in issue datalad#7000.
This commit fixes the "impossible" result records
that are returned if no gen4-metadata is available,
but a gen4-metadata is querried.

Before this commit, all queried types would be
included in a tuple in a single result record.
Now a result record is created for every requested
type, i.e. "file" or "dataset".
This commit unifies the behavior of
query_aggregated_metadata in legacy and gen4
cases. It adds a warning that is emitted, if
no gen4 metadata can be found in a dataset
@christian-monch christian-monch added the semver-internal Changes only affect the internal API label Sep 1, 2022
@christian-monch christian-monch requested review from mih and yarikoptic and removed request for mih September 1, 2022 13:30
@yarikoptic yarikoptic added the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 19, 2022
@github-actions github-actions bot removed the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 19, 2022
@yarikoptic yarikoptic added the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 19, 2022
@github-actions github-actions bot removed the CHANGELOG-missing When a PR's description does not contain a changelog item, yet. label Sep 19, 2022
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay with review
Left minor recommendations, trusting that it would just do the right thing (since no test demonstrating prior error was added). Also I think semver-path would be more appropriate since it addresses an issue which was "user-triggered", so I will change that label.

@yarikoptic yarikoptic added semver-patch Increment the patch version when merged and removed semver-internal Changes only affect the internal API labels Sep 19, 2022
@codecov
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Base: 90.14% // Head: 76.06% // Decreases project coverage by -14.07% ⚠️

Coverage data is based on head (7731545) compared to base (6a21dab).
Patch coverage: 68.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##            maint    #7001       +/-   ##
===========================================
- Coverage   90.14%   76.06%   -14.08%     
===========================================
  Files         354      355        +1     
  Lines       46338    59019    +12681     
  Branches     6613     6315      -298     
===========================================
+ Hits        41771    44894     +3123     
- Misses       4551    14111     +9560     
+ Partials       16       14        -2     
Impacted Files Coverage Δ
datalad/metadata/tests/test_search.py 69.68% <ø> (-28.99%) ⬇️
datalad/metadata/search.py 81.91% <33.33%> (-7.68%) ⬇️
datalad/metadata/metadata.py 90.13% <50.00%> (-0.68%) ⬇️
datalad/metadata/tests/test_base.py 70.58% <50.00%> (-28.66%) ⬇️
datalad/metadata/tests/test_metadata.py 100.00% <100.00%> (ø)
datalad/tests/test_misc.py 48.00% <0.00%> (-52.00%) ⬇️
datalad/core/local/tests/test_results.py 58.33% <0.00%> (-41.67%) ⬇️
datalad/tests/test_api.py 56.14% <0.00%> (-40.83%) ⬇️
datalad/local/tests/test_configuration.py 62.12% <0.00%> (-37.88%) ⬇️
datalad/runner/tests/test_exception.py 63.76% <0.00%> (-36.24%) ⬇️
... and 175 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@yarikoptic
Copy link
Member

eh, I wish you have finished this PR @christian-monch ... Trying to make datalad-neuroimaging sane again I spent time solving this issue , and differently and IMHO suboptimally (I just was disabling status="impossible" bu default) :-/ Are you planing to finish this PR?

@yarikoptic
Copy link
Member

FTR: checked -- actually it is not solving it for neuroimaging . Since it is done here at the level of metadata command and not query_aggregate_metadata where IMHO it should be done -- tests in neuroimaging would still get the "impossible" record.

@christian-monch
Copy link
Contributor Author

@yarikoptic sorry for the delay. Will finish it now

yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Oct 8, 2022
…le querying metadata

.... eh -- might not be needed/suboptimal since there is WiP
datalad#7001
christian-monch and others added 3 commits October 9, 2022 07:55
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
@christian-monch
Copy link
Contributor Author

@yarikoptic I added a test. looking into moving it into query_aggregate_metadata now

@christian-monch
Copy link
Contributor Author

christian-monch commented Oct 9, 2022

FTR: checked -- actually it is not solving it for neuroimaging . Since it is done here at the level of metadata command and not query_aggregate_metadata where IMHO it should be done -- tests in neuroimaging would still get the "impossible" record.

I think the change should remain at the level of metadata because metadata is meant to be used with "legacy" metadata. When the metadata-command was introduced, "gen4" metadata was not around. So it seems like a natural place and I think the limitation to "legacy" metadata should remain to keep the initial semantics. I think the change should
remain at the level of metadata in order to allow keeping metadata's old semantics, i.e. legacy-metadata only, or select a new semantics when calling metadata.

Also, since query_aggregated_metadata receives the metadata stores that it should search explicitly, it should yield an impossible-result when gen4-metadata is defined as metadata_source but does not exist.

Having said that, the default for the metadata_source parameter in query_aggregated_metadata should probably change. The thing is that the current default for metadata_source specifies querying both, "legacy" and "gen4" metadata. That changes the semantics of a call to query_aggregated_metadata that does not provide the metadata_source argument. I will change the default to only search in "legacy" metadata. After this change, existing code that does not specify the new argument metadata_sourceshould behave properly. With this change in place, the metadata-level code can also remain as it was before this PR.

This change should therefore work with neuroimaging as well.

This commit changes the default of `metadata_source` to
be 'legacy' in the metadata code in datalad core. This
restores the original semantics of search-calls as well
as of `query_aggregated_metadata`.
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR you are suggesting to revert back query_aggregated_metadata to

  • replace None with "all"
  • make "legacy" to be default for that function and search
    As a result, query_aggregated_metadata, metadata and search would now not error out while encountering datasets only with legacy metadata. Since it "fits my use-case" of immediate need to test datalad-neuroimaging, which I checked would pass now, I marked it as "approve".

But ideally I think it should have been more of a "any" instead of "all" which was added, and then sensing for any (legacy or gen4) was done or if indeed all are impossible, yield an impossible rec?

PS I will accept my suggestion on fixing warping of changelog entry to expedite

yarikoptic and others added 2 commits October 10, 2022 14:15
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
@yarikoptic-gitmate
Copy link
Collaborator

PR released in 0.17.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants