Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorming: path to DataLad v2? #7560

Open
mih opened this issue Feb 8, 2024 · 2 comments
Open

Brainstorming: path to DataLad v2? #7560

mih opened this issue Feb 8, 2024 · 2 comments

Comments

@mih
Copy link
Member

mih commented Feb 8, 2024

This is not simply about a datalad v2. This is about a strategy to reorganize the DataLad ecosystem, of which datalad, but also its extensions are only one gear in the box.

The primary aim is to create more homogeneous modules, with streamlined dependencies. Modules that decouple code bases that evolve at different paces (more stable foundation, faster iteration on prototypes and focused applications), have disjoint dependencies (not just installation, but also how much code needs to be imported to be able to use a particular piece of DataLad), have different test demands (network operations with specific services vs local code).

One (possibly more) scenario(s) will be posted below. They should be discussed regarding their individual merits and problems. This issue is about collecting idea, not about making decisions.

Please do not use this issue for discussions -- github issues don't work well for that. Rather post any alternative/derived ideas (longform) into a dedicated response. If we keep individual ideas self-contained, and also updated over time, it will be easier to refer to them and also refine them.

To communicate appreciation or opposition for individual concept, please use the "reactions" interface.

@mih
Copy link
Member Author

mih commented Feb 8, 2024

Factor out a fundational package (FP)

The purpose of such a package would be to serve as a foundation to build DataLad-powered libraries and apps -- implemented in Python. This package is:

The development procedures should be suitable for creating a package that radiates confidence to build 3rd-party code on

  • mandatory code-reviews by two or more people
  • release when "done"
  • benchmarks
  • mandatory "full" (something like >95%) test coverage
  • detailed documentation targeting developers
  • PRs need to be comprehensive (code, test, documentation), all at once

"Phase-in" process

The FP would be introduced gradually, by shifting and elevating code from other projects. Pretty much never would from-scratch implementations be introduced to the FP directly.

This will make sure that code has seen some usage, and some "application" code already exists downstream to illustrate concrete usage patterns, and immediately justify a code addition to serve dependent packages.

After being established, code can flow to the FP from any source, and the source project sheds that code and adds a dependency to this FP, once a release was made.

Envisioned development trajectory for "datalad/datalad"

With respect to a v2 concept, code would flow out of the present main datalad package, and it would gain the dependency on FP. It would continue to be the main entrypoint.

If and when we would approach a modernization of the CLI, we would need to reevaluate the role again. It could then become an application/meta package:

graph TD;
    FP-->datalad;
    FP-->datalad-cli;
    datalad-cli-->datalad;

or continue as a provider of assorted functionality that is exposed via different API (hence have its own CLI implementation stripped).

graph TD;
    FP-->datalad;
    FP-->datalad-cli;
    datalad-->datalad-cli;
    datalad-->datalad-gooey
    FP-->datalad-gooey

Pros

  • starting an FP from scratch has the benefit of laying out clear rules from the start that contributions have to follow, and all code matches them
  • people have expressed discomfort re the complexity of the datalad package, a bottleneck that can be avoided with a clean setup
  • zero impact forced onto present users of datalad. The main package can make independent decisions how to deal with changes, whether or not to grease transitions, or to provide traditional interfaces (forever)

Cons

  • the two-reviewer-rules is important for creating a useful (consensus) library. However, it will be hard to make a reality. @yarikoptic and @mih can do that, but when they do development themselves at least one qualified additional reviewer must be found.
  • introducing additions to the FP does not simultaneously improve the main package (just like with datalad-next). Demonstrations of impact (if applicable) would need to come as a companion PR to the main package (that diverts the dependency to a PR branch). This is cumbersome.

Discussion

  • ...

Updates

  • the originally employed name datalad-core has been replaced by "foundational package" (FP) to reduce the ambiguity wrt the many purposes the label "core" has been used in the past

@mih mih transferred this issue from datalad/datalad-next Feb 8, 2024
@mih
Copy link
Member Author

mih commented May 14, 2024

An effort towards a foundational library has started at https://github.com/datalad/datasalad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant