-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT: Modernize export-archive #6745
Conversation
meh, shit, this seems to break something in the metadata code. edit: found the cause. |
This change contains maintenance work on export-archive. This includes a complete switch from os.path to Pathlib, as well as abandoning old almost never used and @normalize_path-polluted repo methods get_indexed_files() and is_under_annex(), and replacing them with get_content_{annex,}info(). This allows us to simplify the logic in the for loop over all files further. Coincidentally, because the switch from normlize_path functions to get_content_{annex,}info() prevents faulty path handling when export-archive is called from subdirectories inside of datasets, this change fixes datalad#6741
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! LGTM!
From my POV this calls for an immediate move of GitRepo.get_indexed_files()
into datalad/tests/utils.py
only leaving a deprecated shim in GitRepo
-- but no TODO for this PR.
Moreover, it seems that there is only a single user for file_has_content()
left (in metadata.py
) -- an important milestone for #4595.
Likewise time is running out for is_under_annex()
only download_url()
, add_archive_content()
and that same metadata.py
line are left TODO.
Wonderful contribution, thx much!
filename = path.join(filename, default_filename) # under given directory | ||
if not filename.endswith(file_extension): | ||
filename += file_extension | ||
filename = Path(default_filename) # in current directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: Was wondering whether default_filename
should become Path already on line 112, but this rest of the logic seems to encourage not making this change.
Codecov Report
@@ Coverage Diff @@
## master #6745 +/- ##
==========================================
+ Coverage 88.82% 90.41% +1.58%
==========================================
Files 353 353
Lines 45761 45747 -14
==========================================
+ Hits 40647 41360 +713
+ Misses 5114 4387 -727
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! one notable suggestion to keep behavior consistent and logic "centralized" on "what is basename" when working with archives.
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
committed your suggestions, @yarikoptic |
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Code Climate has analyzed commit df977b6 and detected 0 issues on this pull request. View more on Code Climate. |
one appveyor run |
This change contains maintenance work on export-archive. This includes
a complete switch from os.path to Pathlib, as well as abandoning old
almost never used and @normalize_path-polluted repo methods
get_indexed_files() and is_under_annex(), and replacing them with
get_content_{annex,}info(). This allows us to simplify the logic in
the for loop over all files further.
Coincidentally, because the switch from normlize_path functions
to get_content_{annex,}info() prevents faulty path handling when
export-archive is called from subdirectories inside of datasets,
this change fixes #6741
Changelog
🐛 Bug Fixes
export-archive
doesn't rely onnormalize_path()
methods anymore and became more robust when called from subdirectories (fixesexport-archive
fails on dataset with subdatasets, only if run from a subdirectory #6741)