Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IRI #3

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Add IRI #3

wants to merge 20 commits into from

Conversation

t-downing
Copy link
Collaborator

Key processing steps:

  • Processing IRI seasonal forecasts globally. Currently this is done as a notebook in the exploration folder, but can be moved to src if we decide we're happy with this method.
  • Processing ACAPS seasonal calendars (in src)
  • Aggregating all CODABs for which we have a country_config in AnticiPy, and are included in the ACAPS dataset (in src)

Also includes approx_mask_raster() to up-sample Xarray datasets.

Also includes notebooks in exploration for exploring:

  • ACAPS seasonal calendars
  • ASAP seasonal calendar
  • FEWSNET livelihood zones (just loading)

All very much preliminary stuff, but I imagine we'll be using the seasonal calendars soon so thought you'd want to get a look at the processing.

Also, branch name is definitely too broad in scope! Sorry

@t-downing t-downing requested a review from zackarno October 4, 2023 17:46
@t-downing t-downing requested a review from caldwellst October 20, 2023 20:44
@t-downing
Copy link
Collaborator Author

I go through the steps for processing the ASAP phenology in exploration/asap_phenology. The slowest step is process_asap_phenology_dekads(), which has to iterate over longitude due to memory constraints.

I'm also using the outputs (in Data/public/processed/glb/asap/season) for the Sahel regional framework analysis.

Copy link
Collaborator

@caldwellst caldwellst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, not so many comments here as obviously this overlaps a lot with the Sahel codebase, we aren't going to use the livelihoods zones or other admin stuff for now, and the full processed code will be ported into the global monitoring repository.

I put a few comments in the code, but I think here are the overall comments:

  1. Just general comment sin the file, to explain what's being looked at and also just anything interesting or useful found there. Can help in the future when returning to these and other people or even yourself or trying to parse exactly the reasoning for doing certain things.
  2. In this vein, I think just a simple README.md explaining what was explored here would be good, and mentioning the files that are there and maybe a quick explanation of what was done or what to look at.
  3. For this, I think we should just have top level folders for different datasets being explored. Otherwise, this repo will be way too cluttered. For this, I think each folder should be self-contained and require nothing from outside. Basically a way to put everything in one location but without requiring these modules work together. So it would be asap/src, asap/exploration and the read me would sit under the folder as asap/README.md. Then in the overall README.md can just have a line or two explaining what is in the asap folder with a link to the folder itself (so the asap/README.md would be displayed on the GitHub platform.
  4. In the future, might be better to split out the utilities. Not really mattering here, but for instance in the Sahel repo would make it cleaner to read through and connect which work together and which provide disparate functionality.
  5. No requirements file. Not a big deal, but probably best to provide at least. You could use pipreqs to generate the file for the asap folder, and store under there.

"""
filepath = (
DATA_DIR
/ "public/processed/glb/acaps/seasonal-events-calendar_processed.csv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely never an issue, but if you separate all folder and file names as individual strings, they will be fully system independent pathing using pathlib.Path, but currently this still assumes the / path separator.

src/utils.py Outdated


def process_drought_codabs():
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For exploration this is fine, but for eventual bringing into global monitoring we won't want to request to countries with AnticiPy config files. If we need admin1 at the country level, much easier to use something like https://fieldmaps.io/data which is already aggregated globally and was based off the original CODAB files so in most cases should match the names.

# join with CODAB
# note - some asap1_ids don't match up,
# hence will be missing from plot
cod_crop = cod_asap.merge(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cod_asap is not defined in this notebook.


```python
lon, lat = 0, 0
da.where(da < 251).sel(x=slice(lon, lon + 40), y=slice(lat + 40, lat)).plot()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know these are just autoplots, but would be good to hear to have a single color band, which would make it easier to see where the intensity lies. And is it the case that this is the # of dekads in 3 months that are in season? if so, there seems to be a large swath with just 1?

# 253 = hard to tell, barely used
# 252 = hard to tell
# 251 = no season (desert but also rainforest?)
lon, lat = -68, -14
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a comment on where we are looking at.


```python
# plot cumulative distribution of probability (reversed)
df.hist("prob", cumulative=-1, bins=100, density=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely plot.

geobb = GeoBoundingBox.from_shape(adm0)

ds_adm0 = ds_f.rio.clip(adm0["geometry"], all_touched=True)
# resample to 0.01 degrees
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, what's the point of the resampling here instead of just using the base IRI output? Might be good to make clear in the analysis files. Is this something we want to do in the full implementation?

da_d = da_d.rio.set_spatial_dims("x", "y", inplace=True)
# da_d = da_d.astype("uint8")
for dekad in dekads:
da_d.loc[{"dekad": dekad}] = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if more efficient, but might be able to use the modulo operator I referenced in the other PR to reduce this to just the 2 sections connected by | for season 1 and 2.


# save
# File too big to save as 36-band raster, so must save as multiple files
# Note that saving as an actual Boolean in a NetCDF is even bigger somehow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so weird haha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants