Interp update #456

gidden · 2020-11-10T12:56:59Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
Description in RELEASE_NOTES.md Added

Description of PR

This PR fixes a bug when there are no values to interpolate (all scenarios already have this value) and allows users to past a list-like input for interpolation of multiple time points.

closes #240
closes #371

pyam/core.py

gidden · 2020-11-10T13:12:53Z

seems like test failures are related to iiasa queries!

danielhuppmann · 2020-11-10T13:13:24Z

seems like test failures are related to iiasa queries!

already pinged @peterkolp and @fonfon about this...

danielhuppmann · 2020-11-10T13:17:03Z

@gidden, is this related to #371? not quite sure what the bug to be resolved is, can't replicate the issue described there...

danielhuppmann

Lgtm, thanks @gidden!

Minor quibble: I guess a substantial performance increase could be had by avoiding doing df.timeseries() for every iteration, instead trying something like:

        df = self.timeseries()
        time = set(time) if islistable(time) else set([time]). # cast to set to drop duplicates
        lst = [None] * len(time)

        # apply fill_series, re-add time dimension to index, set series name
        for i, t in enumerate(time):
            lst[i] = df[np.isnan(df[t])].apply(fill_series, raw=False, axis=1, time=t).dropna()
            lst[i].index = append_index_level(
                index=_values.index, codes=[0] * len(_values),
                level=[t], name=self.time_col, order=self._data.index.names)
            lst[i].name = 'value'

        # append interpolated values to `_data` and sort index
        self._data = self._data.append(pd.concat(lst)).sort_index()

znicholls · 2020-11-10T20:23:25Z

Looks like @danielhuppmann has the review under control. My only thought: if you cast to wide format first, could you get an extra performance boost by using panda’s or numpy’s inbuilt Interpolation rather than looping over all the times?

coroa · 2020-11-11T00:27:16Z

My only thought: if you cast to wide format first, could you get an extra performance boost by using panda’s or numpy’s inbuilt Interpolation rather than looping over all the times?

Agreed.

if not islistable(time):
    time = [time]
pyam.IamDataFrame(
    idf.timeseries().reindex(columns=np.sort(idf.year + time)).interpolate(method='slinear', axis=1),
    meta=idf.meta
)

has proven to beat pyam's implementation by leaps and bounds.

danielhuppmann · 2020-11-11T07:57:21Z

Indeed, it would be great to switch to a more performant implementation of the interpolation!

One issue to keep in mind, though, is that pandas.DataFrame.interpolate fills in all missing values in the dataframe, not just the values for one new column. That's why I didn't use it at the time. And numpy.interp assumes that the values are constant outside the timeseries domain, which is also not a valid general assumption.

znicholls · 2020-11-11T08:05:27Z

is that pandas.DataFrame.interpolate fills in all missing values in the dataframe, not just the values for one new column

Yes you'd have to choose the subset you want to interpolate first if you didn't just want to interpolate everything.

Another thought: you could just also change the API so the user just specifies the new time points they desire (including any which are already present) rather than just adding extra time points alone.

So, say you had data for 2010, 2020 and 2030 and you wanted to interpolate to have 2010, 2015, 2020, 2025 and 2030.

In the current implementation you'd do

df.interpolate([2015, 2025])

But we could just make it so it's

df.interpolate([2010, 2015, 2020, 2025, 2030])

And just leave all the calculations to pandas. That interface would mean that interpolating would always leave you with a nan-free dataset.

And numpy.interp assumes that the values are constant outside the timeseries domain

scipy interp1d will extrapolate if you want with the fill_value argument.

p.s. @coroa

heaps and bounds

was that a typo?

coroa · 2020-11-11T11:19:32Z

Not sure about the interpolate interface change, will be difficult to guarantee some sort of efficient deprecation path. On the other hand a nice way to solve the all-nan's filled out problem.

And numpy.interp assumes that the values are constant outside the timeseries domain

scipy interp1d will extrapolate if you want with the fill_value argument.

method='slinear' of pandas is indeed passed to scipy's interp1d (together with all extra **kwargs) and its default is to raise a ValueError, unless fill_value is set to f.ex. `'extrapolate'.

p.s. @coroa

heaps and bounds

was that a typo?

something like that :)

coroa · 2020-11-11T11:34:11Z

Otherwise, something along the lines of (untested)

df = self.timeseries()
new_values = df.apply(lambda s: pd.Series(scipy.interpolate.interp1d(s.index, s.values, method='slinear', **kwargs)(time), time), axis=1)
for year in time:
    if year in df:
        df[year].fillna(new_values[year], inplace=True)
    else:
        i = bisect(df.columns, year)
        df.insert(i, year, new_values[year])
return pyam.IamDataFrame(df, meta=self.meta)

should work fine.

gidden · 2020-11-11T17:31:28Z

Thanks for the discussion all! I'll take a look at the suggestions, update, and flag for final review.

gidden · 2020-11-14T13:42:58Z

Ok folks, I have updated implementation as attached in this notebook. Let me know what you think - it should maintain prior behavior (only updating columns asked for).
interpolate_implementation (1).pdf

pyam/core.py

tests/test_core.py

pyam/core.py

gidden · 2020-11-14T15:36:24Z

Ok... this took me a bit further down a rabbit hole than I expected. The basic approach follows what @coroa suggested earlier, and includes:

pivot to wide-format once
interpolate using pandas.DataFrame.interpolate, passing along kwargs
include back only time values that were asked for (keeping old behavior)
return a copy by default (in the future)

There were a few issues I ran into that I will raise to others. Namely an issue for @danielhuppmann / @Rlamboll regarding interpolation with extra_cols and an issue with datetime.datetime for @znicholls. I will ping on the specific lines.

@coroa happy if you take a look at the final version here too.

pyam/core.py

gidden · 2020-11-14T15:39:51Z

tests/test_core.py

 def test_interpolate_extra_cols():
    # check hat interpolation with non-matching extra_cols has no effect (#351)
    EXTRA_COL_DF = pd.DataFrame([
        ['foo', 2005, 1],
+        ['foo', 2010, 2],


@Rlamboll and @danielhuppmann I had to update this test to get correct behavior. The current implementation uses the wide-form timeseries object, which treats each index value (row) as an indepedent observation. Therefore, the foo and bar entries here are each interpolated. If I left it as is, I would get an error, because there are too few datapoints to interpolate (i.e., 1 each)

OK, I believe this is consistent with the current state as of the discussion in #351. It's unclear whether this is the desired behaviour or not, but it's what we're used to.

Slightly concerned that this behavior might throw users that have an IamDataFrame where just one timeseries-row has only one datapoint...

For the moment, an error will be raised if trying to interpolate with a single data point. I suggest we allow this, as I am not sure we 'want' to support a use case of pyam data with a single data point (this is what metadata is for, right)? @danielhuppmann I will let you mark if resolved.

I'm worried if a user has a table like the one below and wants to interpolate for 2015.

variable 2010 2020 2030

Primary Energy 1 2 3

Population 5 nan nan

I would expect a return-object like

variable 2010 2015 2020 2030

Primary Energy 1 1.5 2 3

Population 5 nan nan nan

Rlamboll · 2020-11-15T01:31:04Z

tests/test_core.py

 def test_interpolate_extra_cols():
    # check hat interpolation with non-matching extra_cols has no effect (#351)
    EXTRA_COL_DF = pd.DataFrame([
        ['foo', 2005, 1],
+        ['foo', 2010, 2],


OK, I believe this is consistent with the current state as of the discussion in #351. It's unclear whether this is the desired behaviour or not, but it's what we're used to.

danielhuppmann

Looks really great, thanks @gidden! Just a bunch of minor comments...

I ran a test on my standard large-data ensemble (4 scenarios, 2m data points) and can report:

old implementation (using two separate calls for interpolation): 64.4 sec
new implementation (using two separate calls for interpolation): 50.8 sec
new implementation (calling it via a list): 29.3 sec

pyam/core.py

danielhuppmann · 2020-11-15T05:44:50Z

tests/test_core.py

@@ -642,8 +686,11 @@ def test_interpolate_extra_cols():
    df2 = df.copy()
    df2.interpolate(2007)

-    # assert that interpolation didn't change any data
-    assert_iamframe_equal(df, df2)
+    # interpolate should work as if this is a new index


Not clear to me what this comment means...

have cleared it up

tests/test_core.py

pyam/core.py

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

pyam/core.py

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

pyam/core.py

tests/test_core.py

gidden · 2020-11-17T11:41:02Z

Ok @danielhuppmann - should be good for final review now. I must admit, I am a bit surprised at the rather minor speedups (@coroa can you confirm these numbers match your observations?). But in general, this PR was meant to support additional input, not optimization, so no need to block it there.

danielhuppmann

Thanks @gidden, really nice!

danielhuppmann · 2020-11-24T13:08:42Z

resolved merge conflicts

gidden added 2 commits November 10, 2020 13:49

bugfix for interpolate where year already exists

f85a7d0

add support for multiple interpolation values

5b37b51

gidden requested review from danielhuppmann and znicholls November 10, 2020 12:57

stickler-ci reviewed Nov 10, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

stickler

78e508c

danielhuppmann assigned gidden Nov 10, 2020

danielhuppmann approved these changes Nov 10, 2020

View reviewed changes

finish up interpolate

2454db6

stickler-ci reviewed Nov 14, 2020

View reviewed changes

stickler

9b06e81

stickler-ci reviewed Nov 14, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

stickler

87c0497

stickler-ci reviewed Nov 14, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

stickler

2fa0154

gidden commented Nov 14, 2020

View reviewed changes

pyam/core.py Show resolved Hide resolved

gidden commented Nov 14, 2020

View reviewed changes

Rlamboll approved these changes Nov 15, 2020

View reviewed changes

danielhuppmann requested changes Nov 15, 2020

View reviewed changes

update interpolate docstring

cf84259

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

stickler-ci reviewed Nov 17, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

gidden and others added 5 commits November 17, 2020 12:20

fix comment on dep release

ef3f143

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

update dep docstring

b1e905b

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

fix default value docstring

219f571

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

fix order of return type

628607e

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

add kwarg, stickler

b82abeb

stickler-ci reviewed Nov 17, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

stickler

456c046

stickler-ci reviewed Nov 17, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

gidden added 3 commits November 17, 2020 12:26

stickler

48404e4

stickler

296c40c

stickler

8f4d3e6

stickler-ci reviewed Nov 17, 2020

View reviewed changes

pyam/core.py Outdated Show resolved Hide resolved

gidden added 2 commits November 17, 2020 12:29

stickler

44a18f6

bugfixes from updates

1e3ae15

stickler-ci reviewed Nov 17, 2020

View reviewed changes

tests/test_core.py Outdated Show resolved Hide resolved

gidden added 2 commits November 17, 2020 12:38

stickler

ff0fb35

added to release notes

38d66e2

final fixes

52acd64

danielhuppmann approved these changes Nov 24, 2020

View reviewed changes

Merge branch 'master' into interp-update

0fde47b

danielhuppmann merged commit 27eee85 into IAMconsortium:master Nov 24, 2020

gidden deleted the interp-update branch June 15, 2022 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interp update #456

Interp update #456

gidden commented Nov 10, 2020 •

edited by danielhuppmann

gidden commented Nov 10, 2020

danielhuppmann commented Nov 10, 2020

danielhuppmann commented Nov 10, 2020 •

edited

danielhuppmann left a comment

znicholls commented Nov 10, 2020

coroa commented Nov 11, 2020 •

edited

danielhuppmann commented Nov 11, 2020

znicholls commented Nov 11, 2020

coroa commented Nov 11, 2020 •

edited

coroa commented Nov 11, 2020

gidden commented Nov 11, 2020

gidden commented Nov 14, 2020

gidden commented Nov 14, 2020

gidden Nov 14, 2020

Rlamboll Nov 15, 2020

danielhuppmann Nov 15, 2020

gidden Nov 17, 2020

danielhuppmann Nov 17, 2020

Rlamboll Nov 15, 2020

danielhuppmann left a comment

danielhuppmann Nov 15, 2020

gidden Nov 17, 2020

gidden commented Nov 17, 2020

danielhuppmann left a comment

danielhuppmann commented Nov 24, 2020

Interp update #456

Interp update #456

Conversation

gidden commented Nov 10, 2020 • edited by danielhuppmann

Please confirm that this PR has done the following:

Description of PR

gidden commented Nov 10, 2020

danielhuppmann commented Nov 10, 2020

danielhuppmann commented Nov 10, 2020 • edited

danielhuppmann left a comment

Choose a reason for hiding this comment

znicholls commented Nov 10, 2020

coroa commented Nov 11, 2020 • edited

danielhuppmann commented Nov 11, 2020

znicholls commented Nov 11, 2020

coroa commented Nov 11, 2020 • edited

coroa commented Nov 11, 2020

gidden commented Nov 11, 2020

gidden commented Nov 14, 2020

gidden commented Nov 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielhuppmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gidden commented Nov 17, 2020

danielhuppmann left a comment

Choose a reason for hiding this comment

danielhuppmann commented Nov 24, 2020

gidden commented Nov 10, 2020 •

edited by danielhuppmann

danielhuppmann commented Nov 10, 2020 •

edited

coroa commented Nov 11, 2020 •

edited

coroa commented Nov 11, 2020 •

edited