-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable download of large (spatial extent) cutouts from ERA5 via cdsapi. #236
Conversation
How does that interact with queuing at CDSAPI? Does that increase the chances of getting stuck in the request in month 9 or so? |
I don't know. The downloads for the larger cutouts worked relatively smoothly (1-2 hours), but the number of requests is 12x higher for a normal year, so the chances might be higher. On the other hand, since the downloaded slices are smaller I would not expect major performance changes. Probably acceptable, since you're not downloading cutouts on an everyday basis. I don't know enough about the internals of the ERA5 climate store and I don't think we should optimise our retrieval routines for it as long as we haven't received any complaints for bad performance. |
Alright. I did not encounter any issues downloading large datasets. Seems to work nicely @FabianHofmann . What would be helpful is a message indicating which month/year combination is currently being downloaded, do you have an idea on how to easily implement this @FabianHofmann ? Then I'd suggest @davide-f tries to download his cutout as well and if that works without issues then we can merge. |
@euronion Super! thank you very much. Currently, I am a bit busy with other stuff and I cannot run the machine with copernicus waiting long time for the analysis, unfortunately. As I have free resources, I'll test that. |
Great. For the logging I would suggest to go with e.g. "2013-01", instead of "2013" only. atlite/atlite/datasets/era5.py Line 309 in 3c7b4b8
which could be changed into timestr = f"{request["year"])}-{request["month"]}" and changed replaced accordingly in atlite/atlite/datasets/era5.py Line 311 in 3c7b4b8
|
As discussed with @euronion, I'll wait for his latest updates by the end of the week (estimate), and I'll run the model for the entire world. As a comment, the "number of slices", currently one a month, may be a parameter as well. |
@davide-f You're good to give it a try! Regarding your comment: If it works for you @davide-f and the time it takes is acceptable (please report it as well if you can) then I'd stay away from overoptimising this aspect and just keep the monthly retrieval. |
@euronion the branch is running :) I'll track it and update you as I have news. I totally agree on seeing if the monthly retrieval works fine and it's expected time. I fear that it may take very long times though. I'll notify you as I have news :) |
I confirm that the first 1-month chunk has been downloaded. I'll be waiting for the entire procedure to end and let you know :) |
@euronion The procedure for the world (+- 180° lat lon) completed in 5 to 12 hours (I run it twice) successfully and produced an output file of 380Gb (large but we are speaking of a lot of data), see the settings below. atlite:
nprocesses: 4
cutouts:
# geographical bounds automatically determined from countries input
world-2013-era5:
module: era5
dx: 0.3 # cutout resolution
dy: 0.3 # cutout resolution
# Below customization options are dealt in an automated way depending on
# the snapshots and the selected countries. See 'build_cutout.py'
time: ["2013-01-01", "2014-01-01"] # specify different weather year (~40 years available)
x: [-180., 180.] # manual set cutout range
y: [-180., 180.] # manual set cutout range As a recommendation, to silence some warning, if interested, the following comment was risen:
The output also makes sense, however, it has some weird white bands, though I don't think this is related to this PR, what do you think? |
As discussed, for efficiency purposes, it may be interesting to decide the number of chunks to divide the output. |
I attempted to compress cutouts during/after creation but without much success. using I would have preferred a solution where compression is done by |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #236 +/- ##
==========================================
- Coverage 72.83% 72.74% -0.09%
==========================================
Files 19 19
Lines 1590 1596 +6
Branches 227 270 +43
==========================================
+ Hits 1158 1161 +3
- Misses 362 363 +1
- Partials 70 72 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@davide-f If you wish to reduce the file size you can follow the instructions in the updated doc: Should save ~50% :) |
Month indicator has been added, e.g. info prompt during creation looks like this to indicate the month currently being retrieved
|
I suggest we offload the heuristic into a separate issue and tackle it if necessary. ATM I think it would be a nice but unnecessary feature. |
RTR @FabianHofmann would you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested by @nworbmot
No idea why the CI keeps failing (no issues locally) and why it is continuing the old CI.yaml with Python 3.8 instead of 3.11 |
Closes #221 .
Change proposed in this Pull Request
Split download of ERA5 into monthly downloads (currently: annual downloads) to prevent too-large downloads from ERA5 CDSAPI.
TODO
Description
Motivation and Context
See #221 .
How Has This Been Tested?
Locally by downloading a large cutout.
Type of change
Checklist
pytest
inside the repository and no unexpected problems came up.doc/
.environment.yaml
file.doc/release_notes.rst
.pre-commit run --all
to lint/format/check my contribution