Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF(DOC)+RF: report "present" for the state of a present/installed/fulfilled (sub)dataset #5655

Merged
merged 1 commit into from May 17, 2021

Conversation

yarikoptic
Copy link
Member

I think that docstring was left from old(er) times whenever git submodule was
used to report the "state" of the subdatasets. After it was replaced with
ad-hoc parsing of .gitmodules as far as I see it "state" no longer corresponds
to any of those values besides "absent", and we simply do not report any state
(as changed unittest shows) for "present" (a clear antonym to "absent")
datasets. All those states are reported by "status" command though. E.g.
compare:

$> datalad -f '{type} {path}: {state} {status}' --report-type dataset status -d . | grep dataset
dataset /tmp/testds3/mod-inited: clean ok
dataset /tmp/testds3/not-committed: added ok
dataset /tmp/testds3/not-mod-inited: clean ok
dataset /tmp/testds3/openneuro: clean ok
dataset /tmp/testds3/uninstalled: clean ok

to (with these changes, without -- would say NA instead of "present" since
there would be no "state")

$> datalad -f '{type} {path}: {state} {status}' --report-type dataset subdatasets -d . | grep dataset
dataset /tmp/testds3/mod-inited: present ok
dataset /tmp/testds3/not-committed: present ok
dataset /tmp/testds3/not-mod-inited: present ok
dataset /tmp/testds3/openneuro: present ok
dataset /tmp/testds3/uninstalled: absent ok

to see that "status" does not care about "subdataset state" and reports it as
"clean" if absent.

To mitigate it, I have established "present" as a possible value for the
subdataset's "state", thus overall making it a "bool" value as just a field
with two possible states. ATM I do not see any other possible value we might
want to report which would be complimentary (not a sub-category) to those two.

This might establish a precedent for RFing of --fulfilled into
--state={any,present,absent} for selection of subdatasets according to their
state, making interface more coherent IMHO. But with that in mind I would
still have consideration of the fact that "state" means different things
depending on the command (subdatasets vs status) which might cause
confusion/conflict here. But I think it is status="ok" (pun intended) as long
as we document that it is "subdatasets state" and possibly use "--status-state"
or just "--status" whenever we decide to harmonize options for that.

WDYT? if we decide to proceed with this one, then for foreach (#749) I would add --state={present,absent,any} and possibly suggest a PR for --fulfilled -> --state={any,present,absent} RFing/deprecation of fulfilled

…filled dataset

I think that docstring was left from old(er) times whenever `git submodule` was
used to report the "state" of the subdatasets.  After it was replaced with
ad-hoc parsing of .gitmodules as far as I see it "state" no longer corresponds
to any of those values besides "absent", and we simply do not report any state
(as changed unittest shows) for "present" (a clear antonym to "absent")
datasets.  All those states are reported by "status" command though.  E.g.
compare:

	$> datalad -f '{type} {path}: {state} {status}' --report-type dataset status -d . | grep dataset
	dataset /tmp/testds3/mod-inited: clean ok
	dataset /tmp/testds3/not-committed: added ok
	dataset /tmp/testds3/not-mod-inited: clean ok
	dataset /tmp/testds3/openneuro: clean ok
	dataset /tmp/testds3/uninstalled: clean ok

to (with these changes, without -- would say NA instead of "present" since
there would be no "state")

	$> datalad -f '{type} {path}: {state} {status}' --report-type dataset subdatasets -d . | grep dataset
	dataset /tmp/testds3/mod-inited: present ok
	dataset /tmp/testds3/not-committed: present ok
	dataset /tmp/testds3/not-mod-inited: present ok
	dataset /tmp/testds3/openneuro: present ok
	dataset /tmp/testds3/uninstalled: absent ok

to see that "status" does not care about "subdataset state" and reports it as
"clean" if absent.

To mitigate it, I have established "present" as a possible value for the
subdataset's "state", thus overall making it a "bool" value as just a field
with two possible states.  ATM I do not see any other possible value we might
want to report which would be complimentary (not a sub-category) to those two.

This might establish a precedent for RFing of --fulfilled into
--state={any,present,absent} for selection of subdatasets according to their
state, making interface more coherent IMHO.  But with that in mind I would
still have consideration of the fact that "state" means different things
depending on the command (subdatasets vs status) which might cause
confusion/conflict here.  But I think it is status="ok" (pun intended) as long
as we document that it is "subdatasets state" and possibly use "--status-state"
or just "--status" whenever we decide to harmonize options for that.
@yarikoptic yarikoptic added enhancement documentation Documentation-related issue UX user experience DX developer experience labels May 11, 2021
@yarikoptic yarikoptic mentioned this pull request May 11, 2021
9 tasks
@codecov
Copy link

codecov bot commented May 11, 2021

Codecov Report

Merging #5655 (b8651e5) into master (5126f7a) will decrease coverage by 11.49%.
The diff coverage is 100.00%.

❗ Current head b8651e5 differs from pull request most recent head facaaa2. Consider uploading reports for the commit facaaa2 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5655       +/-   ##
===========================================
- Coverage   90.35%   78.85%   -11.50%     
===========================================
  Files         305      302        -3     
  Lines       41740    41705       -35     
===========================================
- Hits        37713    32888     -4825     
- Misses       4027     8817     +4790     
Impacted Files Coverage Δ
datalad/local/subdatasets.py 94.01% <100.00%> (-2.51%) ⬇️
datalad/local/tests/test_subdataset.py 100.00% <100.00%> (ø)
datalad/plugin/wtf.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/addurls.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/no_annex.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/add_readme.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/check_dates.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/support/tests/utils.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/export_archive.py 0.00% <0.00%> (-100.00%) ⬇️
... and 145 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5126f7a...facaaa2. Read the comment docs.

@yarikoptic yarikoptic added the merge-if-ok OP considers this work done, and requests review/merge label May 12, 2021
@yarikoptic
Copy link
Member Author

failed travis run - conda connection fluke (hopefully), restarted

@yarikoptic
Copy link
Member Author

well, if nobody objects, will merge monday. It seems to not break anything and makes returned records more consistent etc

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your reasoning is sound. Moreover, --state={any,present,absent} feel natural to me.

I cannot say that I have a good picture of when and why this all "broke", but this seems like a reasonable way forward.

Thx!

PS: Please don't take this approval as a statement of confidence that this move would not break anything else -- however, nothing comes to my mind, right now.

@yarikoptic
Copy link
Member Author

awesome, thanks @mih! Let's proceed!

@yarikoptic yarikoptic merged commit 043ce87 into datalad:master May 17, 2021
@yarikoptic yarikoptic added the semver-minor Increment the minor version when merged label Jun 2, 2021
@yarikoptic yarikoptic deleted the rf-subdatasets-state branch October 8, 2021 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation-related issue DX developer experience enhancement merge-if-ok OP considers this work done, and requests review/merge semver-minor Increment the minor version when merged UX user experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants