Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove metadata-related code after its move to -deprecated #7014

Merged
merged 29 commits into from
Nov 1, 2022

Conversation

mih
Copy link
Member

@mih mih commented Sep 7, 2022

This changeset is part of the transition from the 2nd-gen metadata implementation to then gen3 code provided by the -metalad extension.

This removal has side-effect on other code, not directly related to metadata. This following list can be considered a TODO for further maintenance work:

Now unused code:

  • datalad/support/json_py.py:load_xzstream() : load_xzstream and dump_xzstream removed in 8263128
  • datalad/interface/common_opts.py:reporton_opt: removed in 1c26072
  • datalad/utils.py:all_same(): removed in 5e50038
  • datalad/utils.py:as_unicode(): removed in 13514ba
  • datalad/utils.py:unicode_srctypes made obsolete by removal of as_unicode(): removed in 13514ba (same as above)

Now barely used (technical debt?):

  • nosave_opt: Residual use in addurls() and download_url()
  • datalad/support/path.py:split_ext(): Residual use in a single spot in addurls() <- Dissolve datalad.support.path and fully replace with pathlib #4056
  • datalad/utils.py:path_startswith(): 2 pieces of code still use this outside the tests
  • datalad/utils.py:get_suggestions_msg(): CLI parser and addurls() make the total of 2 calls made to this helper
  • datalad/utils.py:shortened_repr(): one call in get() and one in foreach_dataset()

Issues to file:

Related TODOs:

Closes #7012

  • remove temporary commit testing against a PR in deprecated rather than the released version
  • double-check the list above and changelog

@mih mih added semver-minor Increment the minor version when merged metadata labels Sep 7, 2022
@christian-monch
Copy link
Contributor

I think the dependency on simplejson can be removed in favor of the built-in json-module. The only use outside of json_py.py is in annexrepo.py and there it can just be replaced with the corresponding json-calls. The comment seems to indicate that simplejson was used to be able to log errors, but that is done json-based code as well.

@codecov
Copy link

codecov bot commented Sep 7, 2022

Codecov Report

Base: 89.42% // Head: 90.62% // Increases project coverage by +1.20% 🎉

Coverage data is based on head (d2acc02) compared to base (2346326).
Patch coverage: 52.63% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7014      +/-   ##
==========================================
+ Coverage   89.42%   90.62%   +1.20%     
==========================================
  Files         358      325      -33     
  Lines       46553    44001    -2552     
  Branches     6322        0    -6322     
==========================================
- Hits        41628    39876    -1752     
+ Misses       4910     4125     -785     
+ Partials       15        0      -15     
Impacted Files Coverage Δ
datalad/cli/interface.py 100.00% <ø> (ø)
datalad/cli/tests/test_main.py 95.42% <ø> (ø)
datalad/consts.py 100.00% <ø> (ø)
datalad/core/local/create.py 99.29% <ø> (ø)
datalad/core/local/tests/test_resulthooks.py 100.00% <ø> (ø)
datalad/distribution/tests/test_dataset_api.py 100.00% <ø> (ø)
datalad/interface/__init__.py 100.00% <ø> (ø)
datalad/interface/common_cfg.py 100.00% <ø> (ø)
datalad/interface/common_opts.py 100.00% <ø> (ø)
datalad/local/tests/test_clean.py 100.00% <ø> (ø)
... and 37 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

christian-monch added a commit to datalad/datalad-metalad that referenced this pull request Sep 8, 2022
This commit adds the extractors that were removed
from datalad core in the datalad PR #7014
(datalad/datalad#7014)
including their tests to datalad-metalad. This is
done to keep them available with only `datalad`
and `datalad-metalad` installed.
@christian-monch
Copy link
Contributor

christian-monch commented Sep 9, 2022

Should we move this out of draft and review it?

The extension tests can best be fixed after this PR is released.

Besides removing metadata-related code and fixing tests, the following has been done

  • All "now unused code" removed (see description for the respective commits)
  • Investigation whether the dependency on simplejson is still required has been performed. simplejson is no longer required and replaced with json (mainly done in: 66c2a63, minor fixes in additional commits)
  • Removed SEARCH_INDEX_DOTGITDIR (in: 7289576)

The following has NOT been tackled yet in this PR:

  • Remove "Now barely used (technical debt?)" code
  • No action on "datalad/interface/utils.py:discover_dataset_trace_to_targets() is now an internal helper of save()"

@christian-monch christian-monch marked this pull request as ready for review September 9, 2022 15:14
christian-monch added a commit to christian-monch/datalad-deprecated that referenced this pull request Sep 10, 2022
This commit adds the metadata code that is removed
from datalad core in the datalad PR #7014
(datalad/datalad#7014).

It is tested against the datalad commit
ff0f6a71b460075b7aa85a9da8052ab570f04590
Copy link
Member Author

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

datalad/consts.py Outdated Show resolved Hide resolved
datalad/local/clean.py Outdated Show resolved Hide resolved
mih and others added 14 commits October 20, 2022 11:31
This changeset is part of the transition from the 2nd-gen metadata
implementation to then gen3 code provided by the -metalad extension.

This removal has side-effect on other code, not directly related
to metadata. This following list can be considered a TODO for further
maintenance work:

Now unused code:

- `datalad/support/json_py.py:load_xzstream()`
- `datalad/interface/common_opts.py:reporton_opt`
- `datalad/utils.py:all_same()`
- `datalad/utils.py:as_unicode()`
- `datalad/utils.py:unicode_srctypes` made obsolete by removal of `as_unicode()`

Now barely used (technical debt?):

- `nosave_opt`: Residual use in `addurls()` and `download_url()`
- `datalad/support/path.py:split_ext()`: Residual use in a single spot in `addurls()`
- `datalad/utils.py:path_startswith()`: 2 pieces of code still use this outside the tests
- `datalad/utils.py:get_suggestions_msg()`: CLI parser and `addurls()` make the total of 2 calls made to this helper
- `datalad/utils.py:shortened_repr()`: one call in `get()` and one in `foreach_dataset()`

Issues to file:

- [ ] Investigate whether the dependency on `simplejson` is still needed.
- [ ] `datalad/interface/utils.py:discover_dataset_trace_to_targets()`
      is now an internal helper of `save()`

Closes datalad#7012
This commit removes tests that are related to the
metadata code that was contained in datalad-core.
These tests fail after the removal of the metadata
code.
This commit removes parts of the test
`test_duecredit_dataset`, that tried to patch
`Dataset.metadata`. The patching fails now
because the metadata code has been removed.
This commit removes `simplejson` dependency in favor
of the built-in `json`-module. Encoding is now a
property of the streams and not the encoder.
This commit removes the `reporton_opt` parameter
definition because with the removal of metalad
from datalad core it is not used anymore.
This commit adds a `readme.txt` to `docs/source/_extras`,
to ensure that the directory is added to the git-repo
The current implementation cannot rely on the
presence of metalad-commands, since the extension
might not be installed. This requires some
code that is conditionally executed, based on the
availability of certain metalad-commands.
This commit brings back the ability to clean old
metadata indices into core.
@yarikoptic
Copy link
Member

appveyor mac fail: FAILED ../datalad/support/tests/test_parallel.py::test_stalling - AssertionError: Future has not finished in 10x time . in 0.17.4 (#6995) we boosted from 5 to 10... odd that it is still not enough, might need to boost to some 100 and if that ever fails -- there must be a real hard to reproduce issue

@christian-monch
Copy link
Contributor

appveyor mac fail: FAILED ../datalad/support/tests/test_parallel.py::test_stalling - AssertionError: Future has not finished in 10x time . in 0.17.4 (#6995) we boosted from 5 to 10... odd that it is still not enough, might need to boost to some 100 and if that ever fails -- there must be a real hard to reproduce issue

I am not able to reproduce it locally on darwin 19.6.0. Since it seems unrelated to the metadata move, it would suggest to solve this in a dedicated issue

@bpoldrack
Copy link
Member

bpoldrack commented Oct 25, 2022

Ok, so still not ready. Apparently the move to deprecated is still incomplete. datalad-deprecated is trying to import annotate_path stuff from core, that has moved to deprecated.

datalad/datalad-deprecated#72

@codeclimate
Copy link

codeclimate bot commented Nov 1, 2022

Code Climate has analyzed commit b61c0e4 and detected 4 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 1
Security 3

View more on Code Climate.

@bpoldrack
Copy link
Member

Remaining failures are currently happening everywhere and are clearly unrelated. Looking into this.

However, extension tests now pass. I'm gonna go ahead and merge this finally.

@yarikoptic-gitmate
Copy link
Collaborator

PR released in 0.18.0

yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Jan 31, 2023
Follow up to datalad#7014
which removed all metadata handling.  Without this change we get

    ❯ make -C docs html

    make: Entering directory '/home/yoh/proj/datalad/datalad-maint/docs'
    DATALAD_SPHINX_RUN=1 sphinx-build -b html -d build/doctrees  -W source build/html
    Running Sphinx v6.1.3
    running build_manpage
    running build_cfginfo
    usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
       or: setup.py --help [cmd1 cmd2 ...]
       or: setup.py --help-commands
       or: setup.py cmd --help

    error: invalid command 'build_schema'
    ... proceeding normally

without crashing so we missed that
@adswa adswa deleted the mnt-metadata-mv branch March 3, 2023 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor Increment the minor version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move old-style metadata facilities to deprecated
5 participants