Skip to content

Commit

Permalink
Merge b43fe04 into d6bd877
Browse files Browse the repository at this point in the history
  • Loading branch information
danielhuppmann committed Jan 20, 2020
2 parents d6bd877 + b43fe04 commit f8b875f
Show file tree
Hide file tree
Showing 9 changed files with 262 additions and 24 deletions.
1 change: 1 addition & 0 deletions doc/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Tutorials

tutorials/pyam_first_steps.ipynb
tutorials/aggregating_downscaling_consistency.ipynb
tutorials/subannual_time_resolution.ipynb
tutorials/ipcc_colors.ipynb
tutorials/iiasa_dbs.ipynb
tutorials/aggregating_variables_and_plotting_with_negative_values.ipynb
Expand Down
134 changes: 134 additions & 0 deletions doc/source/tutorials/subannual_time_resolution.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Aggregating subannual timeseries data\n",
"\n",
"The **pyam** package offers many tools to facilitate processing of scenario data.\n",
"In this notebook, we illustrate methods to aggregate timeseries data that is given at a sub-annual resolution using timeslices (seasons, representative days, etc.).\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"\n",
"The features for working with subannual time resolution are still in an experimental status.\n",
"The functions illustrated in this tutorial are operational and tested, but other tools such as the plotting library may not work as expected (yet) when working with subannual data.\n",
"\n",
"</div>\n",
"\n",
"## Overview\n",
"\n",
"This notebook illustrates the following features:\n",
"\n",
"0. Load timeseries data from a snapshot file and inspect the scenario\n",
"1. Aggregate timeseries data given at a sub-annual time resolution to a yearly value"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pyam"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Load timeseries data from snapshot file and inspect the scenario\n",
"\n",
"The stylized scenario used in this tutorial has data for primary-energy timeseries for two subannual timeslices `summer` and `winter`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pyam.IamDataFrame(data='tutorial_data_subannual_time.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.timeseries()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Aggregating timeseries across sub-annual timesteps\n",
"\n",
"Per default, the [aggregate_time()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.aggregate_time) function\n",
"aggregates (by summation) the data from all sub-annual timesteps (given in the column `subannual`) to a `year` value.\n",
"\n",
"The function returns an `IamDataFrame`, so we can use [timeseries()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.timeseries) to display the resulting data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.aggregate_time('Primary Energy').timeseries()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function also supports directly appending the aggregated data to the original `IamDataFrame`. You can also pass a a list of variables, or call [variables()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.variables) to perform the aggregation on all timeseries data.\n",
"\n",
"You will notice that the following cell returns a larger dataset compared to calling the same function above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.aggregate_time(df.variables(), append=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.timeseries()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
5 changes: 5 additions & 0 deletions doc/source/tutorials/tutorial_data_subannual_time.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
model,scenario,region,variable,unit,subannual,2005,2010
model_a,scen_a,World,Primary Energy,EJ/y,summer,3.5999999999999996,4.5
model_a,scen_a,World,Primary Energy,EJ/y,winter,8.399999999999999,10.5
model_a,scen_a,World,Primary Energy|Coal,EJ/y,summer,2.6999999999999997,3.0
model_a,scen_a,World,Primary Energy|Coal,EJ/y,winter,6.3,7.0
36 changes: 31 additions & 5 deletions pyam/_aggregate.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import pandas as pd
import numpy as np
import logging

Expand Down Expand Up @@ -50,8 +51,8 @@ def _aggregate(df, variable, components=None, method=np.sum):
return _group_and_agg(_df, [], method)


def _aggregate_region(self, variable, region, subregions=None,
components=False, method='sum', weight=None):
def _aggregate_region(df, variable, region, subregions=None, components=False,
method='sum', weight=None):
"""Internal implementation for aggregating data over subregions"""
if not isstr(variable) and components is not False:
msg = 'aggregating by list of variables with components ' \
Expand All @@ -63,7 +64,7 @@ def _aggregate_region(self, variable, region, subregions=None,
raise ValueError(msg)

# default subregions to all regions other than `region`
subregions = subregions or self._all_other_regions(region, variable)
subregions = subregions or df._all_other_regions(region, variable)

if not len(subregions):
msg = 'cannot aggregate variable `{}` to `{}` because it does not'\
Expand All @@ -73,7 +74,7 @@ def _aggregate_region(self, variable, region, subregions=None,
return

# compute aggregate over all subregions
subregion_df = self.filter(region=subregions)
subregion_df = df.filter(region=subregions)
rows = subregion_df._apply_filters(variable=variable)
if weight is None:
col = 'region'
Expand All @@ -86,7 +87,7 @@ def _aggregate_region(self, variable, region, subregions=None,
# if not `components=False`, add components at the `region` level
if components is not False:
with adjust_log_level(logger):
region_df = self.filter(region=region)
region_df = df.filter(region=region)

# if `True`, auto-detect `components` at the `region` level,
# defaults to variables below `variable` only present in `region`
Expand All @@ -106,6 +107,31 @@ def _aggregate_region(self, variable, region, subregions=None,
return _data


def _aggregate_time(df, variable, column, value, components, method=np.sum):
"""Internal implementation for aggregating data over subannual time"""
# default `components` to all entries in `column` other than `value`
if components is None:
components = list(set(df.data.subannual.unique()) - set([value]))

# compute aggregate over time
filter_args = dict(variable=variable)
filter_args[column] = components
index = _list_diff(df.data.columns, [column, 'value'])

_data = pd.concat(
[
df.filter(**filter_args).data
.pivot_table(index=index, columns=column)
.value
.rename_axis(None, axis=1)
.apply(_get_method_func(method), axis=1)
], names=[column] + index, keys=[value])

# reset index-level order to original IamDataFrame
_data.index = _data.index.reorder_levels(df._LONG_IDX)

return _data

def _group_and_agg(df, by, method=np.sum):
"""Groupby & aggregate `df` by column(s), return indexed `pd.Series`"""
by = [by] if isstr(by) else by
Expand Down
52 changes: 44 additions & 8 deletions pyam/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
)
from pyam.read_ixmp import read_ix
from pyam.timeseries import fill_series
from pyam._aggregate import _aggregate, _aggregate_region, _group_and_agg
from pyam._aggregate import _aggregate, _aggregate_region, _aggregate_time,\
_group_and_agg

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -802,7 +803,7 @@ def aggregate(self, variable, components=None, method='sum', append=False):
method to use for aggregation, e.g. np.mean, np.sum, 'min', 'max'
append: bool, default False
append the aggregate timeseries to `self` and return None,
else return aggregate timeseries
else return aggregate timeseries as new :class:`IamDataFrame`
"""
_df = _aggregate(self, variable, components=components, method=method)

Expand Down Expand Up @@ -888,12 +889,11 @@ def aggregate_region(self, variable, region='World', subregions=None,
(currently only supported with `method='sum'`)
append: bool, default False
append the aggregate timeseries to `self` and return None,
else return aggregate timeseries
else return aggregate timeseries as new :class:`IamDataFrame`
"""
_df = _aggregate_region(
self, variable, region=region, subregions=subregions,
components=components, method=method, weight=weight
)
_df = _aggregate_region(self, variable, region=region,
subregions=subregions, components=components,
method=method, weight=weight)

# return None if there is nothing to aggregate
if _df is None:
Expand Down Expand Up @@ -969,6 +969,42 @@ def check_aggregate_region(self, variable, region='World', subregions=None,
_df.index = _df.index.reorder_levels(self._LONG_IDX)
return _df

def aggregate_time(self, variable, column='subannual', value='year',
components=None, method='sum', append=False):
"""Aggregate a timeseries over a subannual time resolution
Parameters
----------
variable: str or list of str
variable(s) to be aggregated
column: str, default 'subannual'
the data column to be used as subannual time representation
value: str, default 'year
the name of the aggregated (subannual) time
components: list of str
subannual timeslices to be aggregated; defaults to all subannual
timeslices other than ``value``
method: func or str, default 'sum'
method to use for aggregation, e.g. np.mean, np.sum, 'min', 'max'
append: bool, default False
append the aggregate timeseries to `self` and return None,
else return aggregate timeseries as new :class:`IamDataFrame`
"""
_df = _aggregate_time(self, variable, column=column, value=value,
components=components, method=method)

# return None if there is nothing to aggregate
if _df is None:
return None

# else, append to `self` or return as `IamDataFrame`
if append is True:
self.append(_df, inplace=True)
else:
df = IamDataFrame(_df)
df.meta = self.meta.loc[_make_index(df.data)]
return df

def downscale_region(self, variable, proxy, region='World',
subregions=None, append=False):
"""Downscale a timeseries to a number of subregions
Expand All @@ -985,7 +1021,7 @@ def downscale_region(self, variable, proxy, region='World',
list of subregions, defaults to all regions other than `region`
append: bool, default False
append the downscaled timeseries to `self` and return None,
else return downscaled data as new `IamDataFrame`
else return downscaled data as new :class:`IamDataFrame`
"""
# get default subregions if not specified
subregions = subregions or self._all_other_regions(region)
Expand Down
1 change: 1 addition & 0 deletions pyam/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ def read_file(fname, *args, **kwargs):
def format_data(df, **kwargs):
"""Convert a `pd.Dataframe` or `pd.Series` to the required format"""
if isinstance(df, pd.Series):
df.name = df.name or 'value'
df = df.to_frame()

# Check for R-style year columns, converting where necessary
Expand Down
39 changes: 28 additions & 11 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,20 @@
TEST_DATA_DIR = os.path.join(here, 'data')


TEST_YEARS = [2005, 2010]
TEST_DTS = [datetime(2005, 6, 17), datetime(2010, 7, 21)]
TEST_TIME_STR = ['2005-06-17', '2010-07-21']
TEST_TIME_STR_HR = ['2005-06-17 00:00:00', '2010-07-21 12:00:00']

DTS_MAPPING = {2005: TEST_DTS[0], 2010: TEST_DTS[1]}


TEST_DF = pd.DataFrame([
['model_a', 'scen_a', 'World', 'Primary Energy', 'EJ/y', 1, 6.],
['model_a', 'scen_a', 'World', 'Primary Energy|Coal', 'EJ/y', 0.5, 3],
['model_a', 'scen_b', 'World', 'Primary Energy', 'EJ/y', 2, 7],
],
columns=IAMC_IDX + [2005, 2010],
columns=IAMC_IDX + TEST_YEARS,
)


Expand Down Expand Up @@ -52,7 +60,7 @@
['reg_a', 'Population', 'm', 2, 3],
['reg_b', 'Population', 'm', 1, 2],
],
columns=['region', 'variable', 'unit', 2005, 2010],
columns=['region', 'variable', 'unit'] + TEST_YEARS,
)


Expand All @@ -64,7 +72,7 @@
['MESSAGE-GLOBIOM', 'a_scenario', 'AFR', 'Primary Energy', 'EJ/y', 2, 7],
['MESSAGE-GLOBIOM', 'a_scenario', 'World', 'Primary Energy', 'EJ/y', 3, 13],
],
columns=IAMC_IDX + [2005, 2010],
columns=IAMC_IDX + TEST_YEARS,
)


Expand All @@ -84,14 +92,6 @@
TEST_STACKPLOT_DF['scenario'] = 'a_scen'


TEST_YEARS = [2005, 2010]
TEST_DTS = [datetime(2005, 6, 17), datetime(2010, 7, 21)]
TEST_TIME_STR = ['2005-06-17', '2010-07-21']
TEST_TIME_STR_HR = ['2005-06-17 00:00:00', '2010-07-21 12:00:00']

DTS_MAPPING = {2005: TEST_DTS[0], 2010: TEST_DTS[1]}


# minimal IamDataFrame with four different time formats
@pytest.fixture(
scope="function",
Expand Down Expand Up @@ -137,6 +137,23 @@ def simple_df(request):
yield IamDataFrame(model='model_a', scenario='scen_a', data=_df)


# IamDataFrame with subannual time resolution
@pytest.fixture(scope="function")
def subannual_df():
_df = FULL_FEATURE_DF.iloc[0:6].copy()

def add_subannual(_data, name, value):
_data['subannual'] = name
_data[TEST_YEARS] = _data[TEST_YEARS] * value
return _data

# primary energy is a direct sum across sub-annual timeslices
mapping = [('year', 1), ('winter', 0.7), ('summer', 0.3)]
lst = [add_subannual(_df.copy(), name, value) for name, value in mapping]

yield IamDataFrame(model='model_a', scenario='scen_a', data=pd.concat(lst))


@pytest.fixture(scope="function")
def reg_df():
df = IamDataFrame(data=REG_DF)
Expand Down
Loading

0 comments on commit f8b875f

Please sign in to comment.