Revise DataLad additions over git/git-annex section #64

mih · 2021-04-17T14:09:29Z

This section is arguably the key section of "Statement of need" and in turn the entire paper. Currently it puts forth 5 reasons:

They are generic and lack support for domain-specific solutions
They require a layer above to establish a distribution
Modularization is needed to scale
Annotation of changes is not "re-executable"
Git and git-annex do not necessarily facilitate the best scientific workflow

I would propose to trim the list, and to straighten the argument:

A. Seamless nesting of independent modular units (with emphasis on "seamless", which is what DataLad adds to Git's submodules)
B. Reproducible execution (or capture of actionable provenance)
C. Interoperability adapters and interfaces (more of a collection of the former, rather than a definition of the latter)

I think 1-5 are outcomes that can be achieved with A-C, rather than the technological contribution.

The current text seems to be easily sortable under A, B, and C to illustrate more or less intuitive use cases, why one would want such features.

The description of B could be extended to reach beyond provenance capture and hint at a wider metadata support.

yarikoptic · 2021-04-17T16:19:22Z

Well, there are always multiple ways how to present an argument ;) And any of the dimensions to characterize against would not be totally orthogonal.

current paragraphs in "Statement of need" for DataLad subsection serve as answers to "Why Git and git-annex alone are not enough" question which compliments "Why Git and git-annex". IMHO such formulation fits "Statement of need" section quite well. If to go for A-C I think then subsection names and wording within would need to be adjusted to reflect that new structure, or proposed A-C renamed to pretty much current ones (see mapping below) to reflect deficiencies we are addressing instead of strengths of DataLad.
overall it boils down to mapping of 1 -> C, 3 -> A, 4 -> B with pretty much a complete removal of 5 (which I think would be a loss) and making 2 just a footnote of A.
"Wider metadata support" ATM has little to nothing to do with B. Reproducible execution but already fits well with a distribution aspect. Thus if to be RFed into those 3, might better be hinted in "modular" (after all "aggregation" of metadata across subdatasets is a unique feature of DataLad here as well).

Overall, besides a possible "contraction", with just above description for the possible change I am not yet convinced that it would provide a clearer presentation of the "Statement of need" since it would pretty much boil down to loosing "distribution" and "best practices" arguments, and seems would require reshaping of the entire "Statement of need" presentation.

mih · 2021-04-17T16:36:24Z

We seem to have rather different views on what DataLad contributes in essence. I would prefer to have each point be a crisp declaration of an added value. However, neither of the five points (just looking at the tag lines) clicks with me. That may be just me. However, I don't think I am able to improve upon the present points.

bpoldrack · 2021-04-18T13:38:48Z

I lean towards @mih's view here.
A notion of "distribution" that would imply anything on top of git/git-annex has always been a very vague thing to me. As far as I consider it something valueable it is an emergent property from datalad's entirety that is shown best in usecases (esentially: handbook). 5 is vague thing to me, too.

For me it boils down to A, B, C. Possibly plus a clearer notion of "making git/git-annex easier to handle", which is somewhat hidden in A's seamless.

yarikoptic · 2021-04-18T14:07:37Z

"distribution" aspect is what started it all, and it is still there. Ok, let's sacrifice 5, and go with A(3), B(4), C(1), and move distribution (2) to 4 and see if it survives.

adswa · 2021-04-18T16:16:16Z

I will try to implement the proposed structure

…tructuring proposed in #64

mih · 2021-04-21T06:18:49Z

As the OP, I think the manuscript has evolved in the spirit of this issue, hence it should be safe to close it.

mih mentioned this issue Apr 17, 2021

Primary demo #66

Closed

This was referenced Apr 17, 2021

Add section/paragraph on design principles #65

Closed

Eventually becoming a full pass (start at the top) #63

Merged

adswa added a commit that referenced this issue Apr 18, 2021

Rework 'why git-annex is not enough' section in accordance to the res…

7bf4ea9

…tructuring proposed in #64

adswa mentioned this issue Apr 18, 2021

Edits that became larger than intended because I started to drink while I was at it #74

Merged

bpoldrack mentioned this issue Apr 19, 2021

Proposals for Statement of Need #75

Closed

mih closed this as completed Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise DataLad additions over git/git-annex section #64

Revise DataLad additions over git/git-annex section #64

mih commented Apr 17, 2021

yarikoptic commented Apr 17, 2021

mih commented Apr 17, 2021

bpoldrack commented Apr 18, 2021 •

edited

Loading

yarikoptic commented Apr 18, 2021

adswa commented Apr 18, 2021

mih commented Apr 21, 2021

Revise DataLad additions over git/git-annex section #64

Revise DataLad additions over git/git-annex section #64

Comments

mih commented Apr 17, 2021

yarikoptic commented Apr 17, 2021

mih commented Apr 17, 2021

bpoldrack commented Apr 18, 2021 • edited Loading

yarikoptic commented Apr 18, 2021

adswa commented Apr 18, 2021

mih commented Apr 21, 2021

bpoldrack commented Apr 18, 2021 •

edited

Loading