-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data update 2023 #55
Comments
The change from v3 to v4 was because E-Obs data was updated with a different spatial resolution, right? |
As we changed the resolution to 500 m it was clear that we will need to re-calculate the whole time series again. However, in previous iterations (v2 and v3) we also always updated the whole time series, although the resolution didn't change. These are the changes (apart from continuing previous time series) in the two E-Obs version released since our last calculation: They may warrant a re-calculation of the whole time series. There is also a reason why E-Obs always releases a new version instead of just "updating" the old one I guess. We could also decide on some update policy, e.g. a new version only every 3 years and inbetween just updating the current version. I find important to keep all versions stored and accessible to the users This made me think that maybe we would need to define a reference period to calculate monthly and yearly values so values do not change every year. What do you think? |
Hi, I agree from an ideal point of view we should store all data versions for the sake of reproducibility. Last time we talked about this we discarded the idea for lack of resources ($$). But it would be nice to secure some online hosting to save all data versions. Alternatively, we could publish the source code that takes the E-Obs dataset and produces the rasters that are then hosted in the FTP server and served through easyclimate. Archiving the source code is trivial and free (e.g. in Zenodo), and would permit anyone to reproduce the rasters in case they needed to. We would just need to specify which version of the E-Obs dataset was used in each of our data versions. That would free us from having to store all former data versions, and serve only the most recent and updated rasters (perhaps storing the penultimate version too just in case). It looks like users will often request the latest year to be added soon, and IMO it looks better to serve the most correct, updated version whenever possible, rather than waiting 2-3 years between releases. So, I think we could publish the source code and update the dataset yearly, but storing only the latest and penultimate version in the server. Does that sound like a good option to you? |
I would like to relaunch this discussion! We get lost in details I think. I propose:
What do you think? And in relation to this: |
Sounds good to me! When you say "only create a new version if there are substantial changes", if we "update the whole time series every year", that means we will have one new version every year, right? So according to this plan the server would store current and last year versions..
I understand you want to include climatological averages besides monthly and yearly rasters. I'm fine with that, but if we update the whole series every year, the averages should be updated too, otherwise the data would be incoherent. But I'm fine with setting a reference period (maybe 1990-2020 would be more useful). This average would have to be recalculated every year with the yearly E-OBS update. |
I understand it the same way, a new version each year.
I'm not sure we have to provide this. We already give them yearly data, so they can calculate their periodic averages for whatever period they like if needed. |
Two options (@cpucher):
The text was updated successfully, but these errors were encountered: