-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove input data folder from SpeedyWeather.jl #178
base: main
Are you sure you want to change the base?
Conversation
Maybe that's now a good time to start an org for Speedy and move the repo to it, as this would also help with setting up GPU CI. Then the input data repo could also be there. |
Just set up github.com/SpeedyWeather and therein a repository called SpeedyWeatherInputData |
I think we should agree on an interface between SpeedyWeather.jl and SpeedyWeatherInputData.jl, because this might be also the time to think about what to do if someone wants to use their own orography for example. One way could be do officially register SpeedyWeatherInputData.jl and then have as a default run_speedy(input_data=SpeedyWeatherInputData) but someone could just load another module import MyInputData
using SpeedyWeather
run_speedy(input_data=MyInputData) For orography for example we need just an array on a grid that is supported by the spectral transform in SpeedyWeather.jl, so SpeedyWeatherInputData could provide a function like |
Downloads input files as required by SpeedyWeather.jl | ||
""" | ||
function download_input_data(input_data_path) | ||
input_data_url = "https://github.com/dmey/speedy-weather-input-data/archive/refs/heads/main.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we shouldn't hardcode the url. But I'm not sure what the best way is to provide a nice interface, which remains compatible or at least provides some control between input data version and SpeedyWeather.jl version. I like the idea to just create a download some netcdf files and create a folder once. But not sure what undesired consequences of that are compared to having a registered package with version and we can then adapt and adjust version compatibility in SpeedyWeather.jl.
Regarding orga see #182 |
I'll come back to this tomorrow but some food for thought... How do we want the data to be consumed eventually. Storing data on GitHub maybe fine initially but that's not really what we should do in the long-er term. We could theoretically store the data as artefacts in the release etc but I wonder if by registering a package like SpeedyWeatherInputData.jl you were thinking of permanently keeping it as a git repo... |
I'm honestly thinking to postpone this idea. While I agree with it in general, moving the data somewhere else and finding a good interface should probably only be done once we've had the chance to play around with it more. This is to understand what we would want from such a code/input data separation and what interface would make most sense. At the moment, I wanted to have primitive equations (but not necessarily all the parameterizations) for v0.5 and only if the surface parameterizations are implemented we will know better how to structure this separation. Does that make sense? |
Something simpler--why not just move the data for now and we can see if there is demand for adding an interface in the future? A standard dataset should be provided by default but my point was about not making it part of the codebase.
For now it's fine to keep it on GitHub they are not much data anyway. Re hard coding I don't see an issue as I would create a release for the dataset at https://github.com/SpeedyWeather/SpeedyWeatherInputData an use that in one or more SpeedyWeather releases. I think data releases will be very few so it will not create much overhead.
I was not thinking of creating an interface, I only meant it as a way to remove static artefacts from the current code repo. What you propose about interface sounds good but could be done at a later stage. The additional point I made about not storing static artifacts on GitHub still applies but I meant it as a future step--moving the data to HTTP storage would make little sense resource-wise for us now. In other words, moving the data to https://github.com/SpeedyWeather/SpeedyWeatherInputData will do for now... |
Another point is reproducibility. If we turn some SpeedyWeather.jl release into a doi (say with zenodo) in the future then we would have a dependency on another repository which should be clearly tagged with a version number. What I like about having the smallest default data set within the repository itself is that we don't have to deal with that. |
Yes but you'd do anyway. In fact I would have suggested to put both repos on Zenodo as soon as you can and then pull the data with the DOI. Reproducibility has little to do with it as you would version it anyway as you do with dependencies and that does not make SpeedyWeather unreproducible. I outlined this briefly at https://doi.org/10.1029/2019MS001961, Chapter 5.
Yes I agree for 2 something megabytes it does not make sense this is to start getting things ready for the future. |
Alright, I give this PR a v0.6 milestone for now and we can pick that up again once the primitive equation model is up and running and we actually use all that input data. |
This PR removes the input_data folder from SpeedyWeather.jl and adds functions to check and download input data at the first run of SpeedyWeather.jl. As more input data are likely to be added soon (e.g. surface fluxes param), I thought this is a good time to add something like this before this repo gets too bloated. For now, I use the same licence as in SpeedyWeather.jl but we should eventually correct the copyright and licencing information to that of the data. For now, and for simplicity, the input_data archive is stored on GitHub at https://github.com/dmey/speedy-weather-input-data but we should eventually move it onto HTTP storage. @milankl more immediately, if you like the sound of this, I would like to move the input_data repo under your username or on org for SpeedyWeather.jl.