Skip to content

360 days calendar support#144

Merged
emileten merged 20 commits intoClimateImpactLab:mainfrom
emileten:360_days_calendar_support
Dec 8, 2021
Merged

360 days calendar support#144
emileten merged 20 commits intoClimateImpactLab:mainfrom
emileten:360_days_calendar_support

Conversation

@emileten
Copy link
Contributor

@emileten emileten commented Dec 2, 2021

This PR adds support for 360-day calendar GCM input data. The changes are :

  1. I added the function xclim_convert_360day_calendar_interpolate to dodola.core. This function takes a 360-day calendar time indexed dataset and converts its calendar to a chosen target calendar type, offering the option to interpolate the inserted missing values.

  2. This functionality is now used by default with linear interpolation in standardize_gcm to handle the GCM data we currently use that has a 360-day calendar.

  3. I also added a test for this in dodola.tests.test_core, which required to add a calendar option in the _timeseriesfactory function.

This PR solves this downscaleCMIP6 issue.

@emileten emileten marked this pull request as draft December 2, 2021 03:35
@emileten emileten added the enhancement New feature or request label Dec 2, 2021
@emileten
Copy link
Contributor Author

emileten commented Dec 2, 2021

This was ready but I am actually unable to make xclim do the 360_day -> gregorian or 360_day -> noleap conversion in the tests, even manually. Surprisingly it works the other way around. I might be doing something wrong, will look more into that tomorrow.

For reference, I also tried with this manual testing code

import xarray as xr
from xclim.core import calendar as calc
import numpy as np

n = 1500
x = np.sin(np.linspace(-10 * np.pi, 10 * np.pi, n)) * 0.5
start_time = str("1950-01-01")
calendar="360_day" # can swap that with 'gregorian' to convert the opposite way
time = xr.cftime_range(
    start=start_time, freq="D", periods=len(x), calendar=calendar
)

ds = xr.Dataset(
    {
        "fakevariable": (
            ["time", "lon", "lat", "member_id"],
            x[:, np.newaxis, np.newaxis, np.newaxis],
        )
    },
    coords={
        "index": time,
        "time": time,
        "bnds": [0, 1],
        "lon": (["lon"], [1.0]),
        "lat": (["lat"], [1.0]),
        "member_id": (["member_id"], [1.0]),
        "height": (["height"], [1.0]),
        "time_bnds": (["time", "bnds"], np.ones((len(x), 2))),
    },
)

# test 360_day -> gregorian, doesn't work ?
assert calc.get_calendar(ds)=="360_day"
assert len(ds.sel(time="1950").time)==360
converted = calc.convert_calendar(source=ds, target="gregorian", align_on="random")
assert calc.get_calendar(converted)=="gregorian"
assert len(converted.sel(time="1950").time)==365 # fails
converted = calc.convert_calendar(source=ds, target="noleap", align_on="random")
assert calc.get_calendar(converted)=="noleap"
assert len(converted.sel(time="1950").time)==365 # fails


# test gregorian -> 360_day, works
assert calc.get_calendar(ds)=="gregorian"
assert len(ds.sel(time="1950").time)==365
converted = calc.convert_calendar(source=ds, target="360_day", align_on="random")
assert calc.get_calendar(converted)=="360_day"
assert len(converted.sel(time="1950").time)==360 # does not fail

@emileten emileten requested a review from brews December 3, 2021 03:32
@emileten emileten changed the title Draft : 360 days calendar support 360 days calendar support Dec 3, 2021
@emileten emileten marked this pull request as ready for review December 3, 2021 03:33
@emileten
Copy link
Contributor Author

emileten commented Dec 3, 2021

@brews @dgergel you can ignore my second comment. I realized afterwards that xclim doesn't do any interpolation but just (optionally) inserts missing values after conversion. One has to manually interpolate ex-post.

@emileten emileten requested review from dgergel and removed request for brews December 3, 2021 05:32
Copy link
Member

@dgergel dgergel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emileten nice work here! One comment regarding documentation and a clarification question as well. Really like the testing as well.

One thing is we may want to also add a services test in addition to the core one. @brews thoughts on that?

@brews
Copy link
Member

brews commented Dec 3, 2021

@dgergel You could go a couple of different ways about testing this.

Here it might be good to focus on testing for desired behavior near the code that has that responsibility. dodola.services.clean_cmip6 is actually pretty simple, it reads and calls stuff from dodola.core. It has some basic test coverage. Now what it's calling in the core is a bit more messy. dodola.core.standardize_gcm doesn't really have a summary description in its docstr (the one line summary at the top of the docstr) and it's kinda unclear about what it does. I'm picking on this because I'd try to write things so functions have simply-defined jobs or responsibilities and then test for that. So, my gut says that the core is a bit messy and it might be worth the effort to clean up dodola.core.standardize_gcm and make it look simple by having it call other functions in the core -- for example dodola.core._standardize_calendar or calendar_to_noleapdays or something pithy and intuitive. Anyways, I would then test the shit out of these functions for all the things they are responsible for doing.

So, long story short, I think dodola.services.clean_cmip6 doesn't really need more testing unless you have something particular in mind or a history of a particular kind of problem.

It'd hopefully be easier to write and check simple test cases rather than for the whole service. I'd clean up and refactor the core functions it needs so responsibility is clear, and then I'd test those for all the little tricky stuff. Because we're short on time, maybe just focus on cleaning up the calendar stuff changed in this PR.

Thanks for coming to my ted talk.

@dgergel
Copy link
Member

dgergel commented Dec 3, 2021

@brews thanks for the input. You make a lot of good points, but since we are in crunch mode let's table what you outlined until after Dec. 31st. Again, agree with what you said - but we don't have time to do this until after we produce our first round of deliverables.

@brews
Copy link
Member

brews commented Dec 4, 2021

Yup. No sweat. Deadlines are a-comin' to town. 👍

@emileten
Copy link
Contributor Author

emileten commented Dec 7, 2021

@dgergel @brews thanks for the review. Just FYI : @dgergel I did see this part of your slack conversation with the xclim people, related to the case where one has nan pre-existing in the data before calendar conversion and interpolation.

You were saying there that it's fine for us to ignore this, since we don't expect nan in our input data. I initially followed your thought, and the current changes do not handle this possibility. But, with this dtr issue we're having, I think it's safer if I explicitly handle that. In the current state of the changes, if there are nan pre-existing in the data those get silently interpolated on the way.

In the case of randomly selected interpolation, it's quite tricky to keep track of pre existing nans, so it's taking me some time. Just wanted to let you know (and also, let me know if you really think it's unnecessary).

@dgergel
Copy link
Member

dgergel commented Dec 7, 2021

@emileten would be great if we can go ahead and get this merged in ASAP so that we can rerun the 360-day cal models that we're including in our deliverable - any chance you could tie this up today?

@emileten
Copy link
Contributor Author

emileten commented Dec 7, 2021

@dgergel yep, this will be merged today 👍

@emileten
Copy link
Contributor Author

emileten commented Dec 8, 2021

Ready to be merged, I added docs. Also, regarding my comment above about nan values, for reference :

The code becomes messy if I try to preserve these values, and in any case would fail later at validation. So, I decided to add another nan check there. That’s going to be a little memory intensive, and it’s duplicating the later validation check that contains a nan check as well. But it seems to be the best solution at this moment. Also this function is going to be called for one model only.

Edit : this check is now skipped by default.

@emileten emileten merged commit 8e983d4 into ClimateImpactLab:main Dec 8, 2021
@brews
Copy link
Member

brews commented Dec 8, 2021

We're missing a changelog entry for this PR.

@emileten
Copy link
Contributor Author

emileten commented Dec 8, 2021

We're missing a changelog entry for this PR.

I just saw that @brews sorry. Can I add it directly in the main branch ?

@brews
Copy link
Member

brews commented Dec 8, 2021

I beat you by 60 seconds, @emileten !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants