-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: status(eval_subdataset_state={'no'|'commit'|'full'} #3324
Conversation
New mode to fully or partially disable subdataset state evaluation (which involves recursion into all present subdatasets to find potential uncommited changes), without disabling subdataset discovery and reporting. This addition allows for using status() unconditionally for (fast(er)) discovery operation. This also replaces the (undocumented) argument and behavior switch 'ignore_submodules' that wasn't used in any sensible way, because it didn't actually provide sensible switching. It could only distinguish "clever" operation and "needlessly wasteful" operation.
Otherwise the exception would show a tuple with the format string and the value.
Codecov Report
@@ Coverage Diff @@
## master #3324 +/- ##
==========================================
- Coverage 91.15% 91.15% -0.01%
==========================================
Files 263 263
Lines 34230 34246 +16
==========================================
+ Hits 31204 31218 +14
- Misses 3026 3028 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a solid understanding of the underlying code, but the overall idea sounds good to me and the results looked good in the hierarchies I played around with. Pushed a couple fixups (feel free to squash in if you prefer) and left minor comments.
datalad/core/local/status.py
Outdated
) | ||
|
||
eval_subdataset_state=Parameter( | ||
args=("--eval-subdataset-state",), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems useful enough to get a short flag (-e
?). Also, perhaps setting metavar
to something shorter (STATE
?) would make the short help a bit more readable. Currently it says
[--eval-subdataset-state EVAL_SUBDATASET_STATE]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metavar issue is addressed via #3326
@kyleam Thx for the feedback. I added a performance note to the top comment. |
Follow up on dataladgh-3324 by removing a stale reference to ignore_submodules and describing the eval_submodule_state parameter in the docstrings of diff() and status(). [ci skip]
New mode to fully or partially disable subdataset state evaluation (which involves recursion into all present subdatasets to find potential uncommited changes), without disabling subdataset discovery and
reporting.
This addition allows for using status() unconditionally for (fast(er)) discovery operation.
This also replaces the (undocumented) argument and behavior switch 'ignore_submodules' that wasn't used in any sensible way, because it didn't actually provide sensible switching. It could only distinguish
"clever" operation and "needlessly wasteful" operation.
Performance info on the
///
dataset:datalad status --eval-subdataset-state commit -r 744.29s user 601.62s system 102% cpu 21:47.48 total
Looks ridiculous, but a plain
git status
at the top-level already takes 7min. In contrast this is a recursion across all datasets, with reporting on all datasets. Interestingly, the difference to "full" on the present state of that dataset's working tree (mixture of all kinds of things) is only 2min (~24min total runtime).So here is an artificial "worst case" example with a deep and clean hierarchy that show maximum performance difference between the modes "commit" and "full":