Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLad GitHub Action #6929

Closed
thewtex opened this issue Aug 10, 2022 · 14 comments
Closed

DataLad GitHub Action #6929

thewtex opened this issue Aug 10, 2022 · 14 comments

Comments

@thewtex
Copy link
Contributor

thewtex commented Aug 10, 2022

What is the problem?

Hi folks!

I am interested in a DataLad GitHub Action to provide CI testing data. Functionality

  • Download / install DataLad
  • Across platforms
  • Possibly of a specific version
  • Optionally clone and get datasets

What steps will reproduce the problem?

No response

Datalad information

No response

Additional context

No response

Have you had any success using DataLad before?

Yes, with a bit of help from @yarikoptic :-) I recently read FAIRly big: A framework for computationally reproducible processing of large-scale data and was inspired. Going through the wonderful handbook now.

@yarikoptic
Copy link
Member

I think it indeed would make total sense to simplify life of others. What "action API" (options etc) do you envision? We might want to start with some rudimentary design doc PR into a file under https://github.com/datalad/datalad/tree/master/docs/source/design ?

But I think it probably should be a separate repo since overall would have its own life cycle. Do you have any experience with establishing github actions? is there any template to start from ? (attn @vsoch and @jwodder who might know more)

@vsoch
Copy link
Collaborator

vsoch commented Aug 10, 2022

It's pretty easy to make actions - when y'all have a design I'd be happy to take an initial shot, or let me know if install/get is a good start!

@yarikoptic
Copy link
Member

Yeah, I think at large it is install/get, and I would say

  • should use datalad-installer for install needs. Command should be smth like datalad-installer -E {miniconda_env_file:=~/miniconda_env.sh} miniconda --path {miniconda_path:=~/miniconda} miniconda git-annex -m conda datalad -m conda (might need dedicated git for windows git COMPONENT? datalad-installer#46 attn @jwodder).
    (I used bash-inspired {var:=DEFAULT} above)

  • action configuration should expose those options above ({miniconda*})

[miniconda_env_file: str ...]
[miniconda_path: str ... # if exists, could just reuse]
datasets: list[dict(source: str, [path: str], [recurisive:bool], [recursion_limit:int], [get_paths: list[str]] ]
  

with [] marking optional, and those dicts in datasets

  • source : url for dataset to install
  • path: path to install under
  • recursive either to install recursively
  • recursion_limit - possibly limit recursion
  • get_paths - paths to get

so for this step it is roughly:

import datalad.api as dl
for ds_spec datasets_spec:
   # kw should be populated based on relevant options
   ds = dl.install(source=ds_spec.source, **kw)
   ds.get(ds_spec.get_paths) 

@thewtex
Copy link
Contributor Author

thewtex commented Aug 10, 2022

@yarikoptic @vsoch awesome, thanks for the support! 🙏

I took a first stab at a design specification, based on @yarikoptic 's suggestion here: #6931

I was not sure how miniconda fits, so I left it out for now.

@vsoch
Copy link
Collaborator

vsoch commented Aug 10, 2022

lol ok, my help is not needed! Thanks @thewtex

@vsoch
Copy link
Collaborator

vsoch commented Aug 10, 2022

Oh I didn't read well 😆 Do y'all still want me to take a shot for the implementation? I was thinking a composite action would be appropriate here.

@thewtex
Copy link
Contributor Author

thewtex commented Aug 11, 2022

@vsoch your help in implementation would be greatly appreciated! 🙏

@vsoch
Copy link
Collaborator

vsoch commented Aug 11, 2022

Woot! Ok I’ll care away some time soon - likely latest over the weekend.

@thewtex
Copy link
Contributor Author

thewtex commented Aug 11, 2022

@vsoch thank you!

I was thinking a composite action would be appropriate here

Cool! Maybe there could be a cache action to make subsequent runs super fast ⚡ ?

@vsoch
Copy link
Collaborator

vsoch commented Aug 14, 2022

okay just started on my own account (and can move over when ready). https://github.com/vsoch/datalad-action Install is looking good, I'm not familiar with datalad but will try to use the examples here to take a shot at the downloads tomorrow!

@vsoch
Copy link
Collaborator

vsoch commented Aug 14, 2022

okay I'm done - https://github.com/vsoch/datalad-action it's missing some of the args mentioned, mostly because I can't find good usage example. This is a personal project so I can only work evenings / weekends, so (when you are ready) I can transfer it over here, and maybe give me maintainer permission on it (or add me to datalad org) so I can continue working on it. I did tweak the design a bit - I don't think it's a good design to allow providing multiple datasets into one action - my 0.02 (and the design here) is that one dataset get call == one source and parameters relevant to it. For next steps:

  • find whatever args are missing, show me how to add them
  • open other issues for discussion, etc.

Ty!

@yarikoptic
Copy link
Member

Woohoo!! Thank you @vsoch ! As soon as I am done replacing roofing before the rain comes, I will have a look/give it a try.

@vsoch
Copy link
Collaborator

vsoch commented Aug 14, 2022

ohno, don't get wet! 😆

@yarikoptic
Copy link
Member

ok, we have https://github.com/datalad/datalad-action and I just tagged 0.1.0 of it. Thank you @vsoch for making it happen!
Let's consider this issue closed and open any other desired issue in that repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants