Skip to content

Commit

Permalink
Merge 1f94ccf into 394d7f3
Browse files Browse the repository at this point in the history
  • Loading branch information
danielhuppmann authored Mar 31, 2020
2 parents 394d7f3 + 1f94ccf commit ae8cd23
Show file tree
Hide file tree
Showing 9 changed files with 268 additions and 23 deletions.
1 change: 1 addition & 0 deletions doc/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The source code of the tutorials is available in the folder
tutorials/data_table_formats.ipynb
tutorials/unit_conversion.ipynb
tutorials/aggregating_downscaling_consistency.ipynb
tutorials/subannual_time_resolution.ipynb
tutorials/ipcc_colors.ipynb
tutorials/iiasa_dbs.ipynb
tutorials/aggregating_variables_and_plotting_with_negative_values.ipynb
Expand Down
140 changes: 140 additions & 0 deletions doc/source/tutorials/subannual_time_resolution.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Aggregating subannual timeseries data\n",
"\n",
"The **pyam** package offers many tools to facilitate processing of scenario data.\n",
"In this notebook, we illustrate methods to aggregate timeseries data that is given at a sub-annual resolution using timeslices (seasons, representative days, etc.).\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"\n",
"The features for working with subannual time resolution are still in an experimental status.\n",
"The functions illustrated in this tutorial are operational and tested, but other tools such as the plotting library may not work as expected (yet) when working with subannual data.\n",
"\n",
"</div>\n",
"\n",
"## Overview\n",
"\n",
"This notebook illustrates the following features:\n",
"\n",
"0. Load timeseries data from a snapshot file and inspect the scenario\n",
"1. Aggregate timeseries data given at a sub-annual time resolution to a yearly value\n",
"\n",
"***"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pyam"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Load timeseries data from snapshot file and inspect the scenario\n",
"\n",
"The stylized scenario used in this tutorial has data for primary-energy timeseries for two subannual timeslices `summer` and `winter`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = pyam.IamDataFrame(data='tutorial_data_subannual_time.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.timeseries()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Aggregating timeseries across sub-annual timesteps\n",
"\n",
"Per default, the [aggregate_time()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.aggregate_time) function\n",
"aggregates (by summation) the data from all sub-annual timesteps (given in the column `subannual`) to a `year` value.\n",
"\n",
"The function returns an `IamDataFrame`, so we can use [timeseries()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.timeseries) to display the resulting data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.aggregate_time('Primary Energy').timeseries()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function also supports directly appending the aggregated data to the original `IamDataFrame`. You can also pass a a list of variables, or call [variables()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.variables) to perform the aggregation on all timeseries data.\n",
"\n",
"A user can also manually set the \"target\" sub-annual value and the components to be aggregated;\n",
"for example, this can then be used to process an aggregate of hourly data to monthly values.\n",
"\n",
"You will notice that the following cell returns a larger dataset compared to calling the same function above."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.aggregate_time(df.variables(), value='year', components=['summer', 'winter'],\n",
" append=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.timeseries()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
5 changes: 5 additions & 0 deletions doc/source/tutorials/tutorial_data_subannual_time.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
model,scenario,region,variable,unit,subannual,2005,2010
model_a,scen_a,World,Primary Energy,EJ/y,summer,3.5999999999999996,4.5
model_a,scen_a,World,Primary Energy,EJ/y,winter,8.399999999999999,10.5
model_a,scen_a,World,Primary Energy|Coal,EJ/y,summer,2.6999999999999997,3.0
model_a,scen_a,World,Primary Energy|Coal,EJ/y,winter,6.3,7.0
36 changes: 31 additions & 5 deletions pyam/_aggregate.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import pandas as pd
import numpy as np
import logging

Expand Down Expand Up @@ -50,8 +51,8 @@ def _aggregate(df, variable, components=None, method=np.sum):
return _group_and_agg(_df, [], method)


def _aggregate_region(self, variable, region, subregions=None,
components=False, method='sum', weight=None):
def _aggregate_region(df, variable, region, subregions=None, components=False,
method='sum', weight=None):
"""Internal implementation for aggregating data over subregions"""
if not isstr(variable) and components is not False:
msg = 'aggregating by list of variables with components ' \
Expand All @@ -63,7 +64,7 @@ def _aggregate_region(self, variable, region, subregions=None,
raise ValueError(msg)

# default subregions to all regions other than `region`
subregions = subregions or self._all_other_regions(region, variable)
subregions = subregions or df._all_other_regions(region, variable)

if not len(subregions):
msg = 'cannot aggregate variable `{}` to `{}` because it does not'\
Expand All @@ -73,7 +74,7 @@ def _aggregate_region(self, variable, region, subregions=None,
return

# compute aggregate over all subregions
subregion_df = self.filter(region=subregions)
subregion_df = df.filter(region=subregions)
rows = subregion_df._apply_filters(variable=variable)
if weight is None:
col = 'region'
Expand All @@ -86,7 +87,7 @@ def _aggregate_region(self, variable, region, subregions=None,
# if not `components=False`, add components at the `region` level
if components is not False:
with adjust_log_level(logger):
region_df = self.filter(region=region)
region_df = df.filter(region=region)

# if `True`, auto-detect `components` at the `region` level,
# defaults to variables below `variable` only present in `region`
Expand All @@ -106,6 +107,31 @@ def _aggregate_region(self, variable, region, subregions=None,
return _data


def _aggregate_time(df, variable, column, value, components, method=np.sum):
"""Internal implementation for aggregating data over subannual time"""
# default `components` to all entries in `column` other than `value`
if components is None:
components = list(set(df.data.subannual.unique()) - set([value]))

# compute aggregate over time
filter_args = dict(variable=variable)
filter_args[column] = components
index = _list_diff(df.data.columns, [column, 'value'])

_data = pd.concat(
[
df.filter(**filter_args).data
.pivot_table(index=index, columns=column)
.value
.rename_axis(None, axis=1)
.apply(_get_method_func(method), axis=1)
], names=[column] + index, keys=[value])

# reset index-level order to original IamDataFrame
_data.index = _data.index.reorder_levels(df._LONG_IDX)

return _data

def _group_and_agg(df, by, method=np.sum):
"""Groupby & aggregate `df` by column(s), return indexed `pd.Series`"""
by = [by] if isstr(by) else by
Expand Down
51 changes: 44 additions & 7 deletions pyam/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@
)
from pyam.read_ixmp import read_ix
from pyam.timeseries import fill_series
from pyam._aggregate import _aggregate, _aggregate_region, _group_and_agg
from pyam._aggregate import _aggregate, _aggregate_region, _aggregate_time,\
_group_and_agg
from pyam.units import convert_unit, convert_unit_with_mapping

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -849,7 +850,7 @@ def aggregate(self, variable, components=None, method='sum', append=False):
e.g. :func:`numpy.mean`, :func:`numpy.sum`, 'min', 'max'
append : bool, default False
append the aggregate timeseries to `self` and return None,
else return aggregate timeseries
else return aggregate timeseries as new :class:`IamDataFrame`
"""
_df = _aggregate(self, variable, components=components, method=method)

Expand Down Expand Up @@ -938,12 +939,11 @@ def aggregate_region(self, variable, region='World', subregions=None,
(currently only supported with `method='sum'`)
append : bool, default False
append the aggregate timeseries to `self` and return None,
else return aggregate timeseries
else return aggregate timeseries as new :class:`IamDataFrame`
"""
_df = _aggregate_region(
self, variable, region=region, subregions=subregions,
components=components, method=method, weight=weight
)
_df = _aggregate_region(self, variable, region=region,
subregions=subregions, components=components,
method=method, weight=weight)

# return None if there is nothing to aggregate
if _df is None:
Expand Down Expand Up @@ -1021,6 +1021,43 @@ def check_aggregate_region(self, variable, region='World', subregions=None,
_df.index = _df.index.reorder_levels(self._LONG_IDX)
return _df

def aggregate_time(self, variable, column='subannual', value='year',
components=None, method='sum', append=False):
"""Aggregate a timeseries over a subannual time resolution
Parameters
----------
variable : str or list of str
variable(s) to be aggregated
column : str, default 'subannual'
the data column to be used as subannual time representation
value : str, default 'year
the name of the aggregated (subannual) time
components : list of str
subannual timeslices to be aggregated; defaults to all subannual
timeslices other than ``value``
method : func or str, default 'sum'
method to use for aggregation,
e.g. :func:`numpy.mean`, :func:`numpy.sum`, 'min', 'max'
append : bool, default False
append the aggregate timeseries to `self` and return None,
else return aggregate timeseries as new :class:`IamDataFrame`
"""
_df = _aggregate_time(self, variable, column=column, value=value,
components=components, method=method)

# return None if there is nothing to aggregate
if _df is None:
return None

# else, append to `self` or return as `IamDataFrame`
if append is True:
self.append(_df, inplace=True)
else:
df = IamDataFrame(_df)
df.meta = self.meta.loc[_make_index(df.data)]
return df

def downscale_region(self, variable, proxy, region='World',
subregions=None, append=False):
"""Downscale a timeseries to a number of subregions
Expand Down
1 change: 1 addition & 0 deletions pyam/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ def read_file(fname, *args, **kwargs):
def format_data(df, **kwargs):
"""Convert a pandas.Dataframe or pandas.Series to the required format"""
if isinstance(df, pd.Series):
df.name = df.name or 'value'
df = df.to_frame()

# Check for R-style year columns, converting where necessary
Expand Down
39 changes: 28 additions & 11 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,20 @@
TEST_DATA_DIR = os.path.join(here, 'data')


TEST_YEARS = [2005, 2010]
TEST_DTS = [datetime(2005, 6, 17), datetime(2010, 7, 21)]
TEST_TIME_STR = ['2005-06-17', '2010-07-21']
TEST_TIME_STR_HR = ['2005-06-17 00:00:00', '2010-07-21 12:00:00']

DTS_MAPPING = {2005: TEST_DTS[0], 2010: TEST_DTS[1]}


TEST_DF = pd.DataFrame([
['model_a', 'scen_a', 'World', 'Primary Energy', 'EJ/yr', 1, 6.],
['model_a', 'scen_a', 'World', 'Primary Energy|Coal', 'EJ/yr', 0.5, 3],
['model_a', 'scen_b', 'World', 'Primary Energy', 'EJ/yr', 2, 7],
],
columns=IAMC_IDX + [2005, 2010],
columns=IAMC_IDX + TEST_YEARS,
)


Expand Down Expand Up @@ -52,7 +60,7 @@
['reg_a', 'Population', 'm', 2, 3],
['reg_b', 'Population', 'm', 1, 2],
],
columns=['region', 'variable', 'unit', 2005, 2010],
columns=['region', 'variable', 'unit'] + TEST_YEARS,
)


Expand All @@ -67,7 +75,7 @@
msg + ['AFR', 'Primary Energy', 'EJ/yr', 2, 7],
msg + ['World', 'Primary Energy', 'EJ/yr', 3, 13],
],
columns=IAMC_IDX + [2005, 2010],
columns=IAMC_IDX + TEST_YEARS,
)


Expand All @@ -87,14 +95,6 @@
TEST_STACKPLOT_DF['scenario'] = 'a_scen'


TEST_YEARS = [2005, 2010]
TEST_DTS = [datetime(2005, 6, 17), datetime(2010, 7, 21)]
TEST_TIME_STR = ['2005-06-17', '2010-07-21']
TEST_TIME_STR_HR = ['2005-06-17 00:00:00', '2010-07-21 12:00:00']

DTS_MAPPING = {2005: TEST_DTS[0], 2010: TEST_DTS[1]}


# minimal IamDataFrame with four different time formats
@pytest.fixture(
scope="function",
Expand Down Expand Up @@ -140,6 +140,23 @@ def simple_df(request):
yield IamDataFrame(model='model_a', scenario='scen_a', data=_df)


# IamDataFrame with subannual time resolution
@pytest.fixture(scope="function")
def subannual_df():
_df = FULL_FEATURE_DF.iloc[0:6].copy()

def add_subannual(_data, name, value):
_data['subannual'] = name
_data[TEST_YEARS] = _data[TEST_YEARS] * value
return _data

# primary energy is a direct sum across sub-annual timeslices
mapping = [('year', 1), ('winter', 0.7), ('summer', 0.3)]
lst = [add_subannual(_df.copy(), name, value) for name, value in mapping]

yield IamDataFrame(model='model_a', scenario='scen_a', data=pd.concat(lst))


@pytest.fixture(scope="function")
def reg_df():
df = IamDataFrame(data=REG_DF)
Expand Down
Loading

0 comments on commit ae8cd23

Please sign in to comment.