first attempt at adding basic bias correction code and fixing tests #2

dgergel · 2020-12-14T17:43:54Z

This PR adds in some super basic bias correction code as well as a dependency on a forked version of scikit-downscale with an interim fix for a current bug in PointwiseDownscaler.

…o feature/fix_failing_tests

brews · 2020-12-15T02:19:17Z

Interesting, you're getting this error because of https://github.com/dgergel/xsd/blob/2f76a3a02a5cccf962d3c801f1603c545ced967b/skdownscale/pointwise_models/core.py#L46 where it's trying to get the .data attribute from an xr.Dataset.

Stacktrace just before this error looks like:

...
  /Users/bmalevich/Projects/dodola/dodola/core.py(35)bias_correct_bcsd()
-> model.fit(gcm_training_ds, obs_training_ds)
  /Users/bmalevich/miniconda3/envs/dodola_dev/lib/python3.8/site-packages/skdownscale/pointwise_models/core.py(177)fit()
-> self._models = _fit_wrapper(X, *args, **kws)
  /Users/bmalevich/miniconda3/envs/dodola_dev/lib/python3.8/site-packages/skdownscale/pointwise_models/core.py(74)_fit_wrapper()
-> ydf = y[index].pipe(_da_to_df, feature_dim)
  /Users/bmalevich/miniconda3/envs/dodola_dev/lib/python3.8/site-packages/xarray/core/common.py(634)pipe()
-> return func(self, *args, **kwargs)
> /Users/bmalevich/miniconda3/envs/dodola_dev/lib/python3.8/site-packages/skdownscale/pointwise_models/core.py(47)_da_to_df()
-> data = da.transpose('time', ...).data

brews · 2020-12-15T02:34:30Z

Looks like input to the *.fit() and *.predict() methods for a PointWiseDownscaler need be a xr.DataArray or maybe xr.Dataset if we're careful. I should have read the docs on PointWiseDownscaler more carefully.

It also looks like dimensions are transposed in the output...

So if we hackishly change tests.test_services._datafactory() to

def _datafactory(x, start_time="1950-01-01"):
    """Populate xr.Dataset with synthetic data for testing"""
    start_time = str(start_time)
    if x.ndim != 1:
        raise ValueError("'x' needs dim of one")

    out = xr.Dataset(
        {"fakevariable": (["time", "lon", "lat"], x[:, np.newaxis, np.newaxis])},
        coords={
            "time": xr.cftime_range(
                start=start_time, freq="D", periods=len(x), calendar="noleap"
            ),
            "lon": (["lon"], [1.0]),
            "lat": (["lat"], [1.0]),
        },
    )
    return out["fakevariable"]

The test completes but output values differ from the goal:

E       AssertionError: Left and right DataArray objects are not equal
E       
E       Differing values:
E       L
E           array([[[-726]],
E           
E                  [[-725]],
E           
E                  ...,
E           
E                  [[2549]],
E           
E                  [[2551]]])
E       R
E           array([[[   0]],
E           
E                  [[   1]],
E           
E                  ...,
E           
E                  [[1823]],
E           
E                  [[1824]]])

What say you, @dgergel ? You're more familiar with the method, does this L array match your expectations from the input?

Edit: If this is correct, then we need to change input to DataArrays and change docstrs to reflect this.

dgergel · 2020-12-16T00:21:44Z

@brews inputs to the *.fit() and *.predict() methods for the PointWiseDownscaler do need be DataArrays unless we set it up a bit more cleverly. I should have paid closer attention to that in the test code when I was looking at it earlier.

As for what's happening with the outputs, I'm going to do a bit more exploring/testing myself to figure that out - I don't immediately know what's going on. But I don't think those outputs are right. More soon.

dgergel · 2020-12-17T18:10:33Z

@brews I've been digging into this and I think something is going wrong with how the bias correction code is ingesting our test data, which seems to be due to a mismatch between what it expects and what we're giving it. For example, in the outputs, given that our training and testing data is 0 - 1825 +/- 10, our bias corrected values should not be negative (or equal to -725). I tried a couple things, for example changing the time index in case the time grouper was struggling with a cftime index, e.g.

def _datafactory(x, start_time="1950-01-01"):
    """Populate xr.Dataset with synthetic data for testing"""
    start_time = str(start_time)
    if x.ndim != 1:
        raise ValueError("'x' needs dim of one")

    time = pd.date_range(start=start_time, freq="D", periods=len(x))
    
    out = xr.Dataset(
        {"fakevariable": (["time", "lon", "lat"], x[:, np.newaxis, np.newaxis])},
        coords={"index": time, "time": time, "lon": (["lon"], [1.0]), "lat": (["lat"], [1.0])})
    
    return out["fakevariable"]

But that didn't do it. I tested the branch of scikit-downscale that we're using with the global bias correction workflow and looked at a timestep of the output before we tried it here, so I'm thinking there is a problem bw what PointwiseDownscaler is expecting and the test data we're giving it as I mentioned earlier. I think removing PointwiseDownscaler and just testing on a gridcell with a test DataFrame and the BCSDTemperature model is probably the best next step, unless you have other ideas?

brews · 2020-12-17T18:37:26Z

@dgergel So if I follow you right, you're saying this is an upstream package problem with PointWiseDownscaler?

In that case, I think that trying to get this to work without PointWiseDownscaler might be the way to go. We can continue to prototype this and come back to this later once things are fixed upstream. That's why we test things like this 👍

Given that the our fake data for this test case is only a single grid point anyways... Can we make the test work with a xr.DataArray?

Regardless, I say go ahead and drop PointWiseDownscaler and do what you need to do to get the test to run. I suggest any data input and output data to your public core.py functions be an xr data structure. If the BCSDTemperature model needs to be a pd.Dataframe, then I suggest we make the conversion in your core.py functions.

(And stop me anytime if you want to walk through anything or you have any questions)

brews · 2020-12-17T21:48:11Z

I tried something different for this failing test. I fed it np.linspace(0, 10, n) (giving a range of 0 to 10) instead of np.arange(0, n) (range of 0 to 1825) to give it "realistic" temperature range and get:

E       Differing dimensions:
E           (time: 1825, lon: 1, lat: 1) != (lon: 1, lat: 1, time: 1825)
E       Differing values:
E       L
E           array([[[-3.985746]],
E           
E                  [[-3.977522]],
E           
E                  ...,
E           
E                  [[13.977522]],
E           
E                  [[13.985746]]])
E       R
E           array([[[0.000000e+00, 5.482456e-03, ..., 9.994518e+00, 1.000000e+01]]])

brews · 2020-12-19T02:03:54Z

From meeting today sounds like this is actually working as expected and unexpected numbers stem from quantile mapping in addition to a bias correction.

Potential solutions are:

Adjust data we input to the service - use case where we can directly calculate the expected output DataArray and compare the expected and output DataArrays in their entirety. Only do this if we can easily create the input data and easily calculate the expected output.
Continue with the test input data we have and then check the output DataArray dimensions/shape and the head/tail 3-5 values via a slice or .sel()

Choose whichever is less work at this point.

In addition we need to ensure input data to fit the model is passed in as a DataArray, not a Dataset as was the original setup.

Brainstorming for later work: At some point in the future we're going to need to allow users to choose what variable name to extract from output input data. The alternative is to allow only a single variable in each input dataset, but this feels like it might be clunky and too implicit. Do we allow users to have a different variable name for input files or does the variable name need to be the same for all files. Just things to chew on for later.

…l values

brews

Nice work @dgergel ! Thanks again for putting this together.

The automated tests passed and that's your goal!

I have a few suggestions but nothing essential to fix before this merges. I don't think any of this is going to immediately break anything -- and we're still prototyping everything -- so it's okay if we get some of this in a later PR. Your call.

Go ahead and merge when ready!

Edit: Also let me know if anything is unclear or if you want to discuss things further.

brews · 2020-12-24T00:44:32Z

dodola/core.py

+        observation data for building quantile map
+    gcm_predict_ds : Dataset
+        future model data to be bias corrected
+    predicted : Dataset


I think this "predicted" arg was renamed or removed. If so I'd remove the two docstr lines for predicted

good catch - it should be ds_predicted instead of predicted, thanks!

brews · 2020-12-24T00:45:22Z

dodola/core.py

-# import numpy as np
-# import xarray as xr
-# from skdownscale.pointwise_models import PointWiseDownscaler, BcsdTemperature
+import numpy as np


I think can remove numpy and xarray imports - they're not used.

brews · 2020-12-24T00:46:43Z

dodola/core.py



 def morenerdymathstuff(*args):
-    raise NotImplementedError
+    # TO-DO: implement additional bias correction functionality
+    return None


I think you're set to just remove morenerdymathstuff() alltogether. 👍

agreed - done 👍

setup.py

dodola/tests/test_services.py

dgergel · 2020-12-24T19:19:02Z

@brews all of your comments should be addressed, so this is ready to merge (I'd do it myself but looks like I don't have merge privileges - can you update that btw?)

first attempt at adding basic bias correction code and fixing tests

2416a81

This was referenced Dec 14, 2020

CI (GH actions) workflow not triggered on pull request. #3

Closed

Trigger CI on PR to main or push #4

Merged

brews added the enhancement New feature or request label Dec 14, 2020

dgergel added 8 commits December 14, 2020 15:08

Merge branch 'main' of https://github.com/ClimateImpactLab/dodola int…

4a0a48e

…o feature/fix_failing_tests

fix flake8 errors

4ac1718

fix whitespace errors again

7f74f91

update requirements.txt

65ae323

fix scikit downscale dependendency again

ba02a5e

another try

bc82192

another try

3d79cb0

diff strategy

6705bae

brews assigned dgergel Dec 15, 2020

dgergel added 6 commits December 23, 2020 13:53

update tests with different synthetic data and only checking head/tai…

c71f4b5

…l values

fix previous update

3d1a23a

fix previous commit again

8fe8ab5

fix fakestorage

e948a95

add command line args for training and output variable names

d9bf192

fix formatting

9f2c58e

brews self-requested a review December 24, 2020 00:38

brews approved these changes Dec 24, 2020

View reviewed changes

address PR review comments

404cb6f

brews merged commit 103fe43 into ClimateImpactLab:main Dec 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first attempt at adding basic bias correction code and fixing tests #2

first attempt at adding basic bias correction code and fixing tests #2

dgergel commented Dec 14, 2020

brews commented Dec 15, 2020 •

edited

Loading

brews commented Dec 15, 2020 •

edited

Loading

dgergel commented Dec 16, 2020 •

edited

Loading

dgergel commented Dec 17, 2020

brews commented Dec 17, 2020

brews commented Dec 17, 2020

brews commented Dec 19, 2020

brews left a comment •

edited

Loading

brews Dec 24, 2020

dgergel Dec 24, 2020

brews Dec 24, 2020

brews Dec 24, 2020

dgergel Dec 24, 2020

dgergel commented Dec 24, 2020 •

edited

Loading

first attempt at adding basic bias correction code and fixing tests #2

first attempt at adding basic bias correction code and fixing tests #2

Conversation

dgergel commented Dec 14, 2020

brews commented Dec 15, 2020 • edited Loading

brews commented Dec 15, 2020 • edited Loading

dgergel commented Dec 16, 2020 • edited Loading

dgergel commented Dec 17, 2020

brews commented Dec 17, 2020

brews commented Dec 17, 2020

brews commented Dec 19, 2020

brews left a comment • edited Loading

Choose a reason for hiding this comment

brews Dec 24, 2020

Choose a reason for hiding this comment

dgergel Dec 24, 2020

Choose a reason for hiding this comment

brews Dec 24, 2020

Choose a reason for hiding this comment

brews Dec 24, 2020

Choose a reason for hiding this comment

dgergel Dec 24, 2020

Choose a reason for hiding this comment

dgergel commented Dec 24, 2020 • edited Loading

brews commented Dec 15, 2020 •

edited

Loading

brews commented Dec 15, 2020 •

edited

Loading

dgergel commented Dec 16, 2020 •

edited

Loading

brews left a comment •

edited

Loading

dgergel commented Dec 24, 2020 •

edited

Loading