Merge b43fe04 into d6bd877

IAMconsortium · Jan 20, 2020 · f8b875f · f8b875f
2 parents d6bd877 + b43fe04
commit f8b875f
Show file tree

Hide file tree

Showing 9 changed files with 262 additions and 24 deletions.
diff --git a/doc/source/tutorials.rst b/doc/source/tutorials.rst
@@ -9,6 +9,7 @@ Tutorials
 
    tutorials/pyam_first_steps.ipynb
    tutorials/aggregating_downscaling_consistency.ipynb
+   tutorials/subannual_time_resolution.ipynb
    tutorials/ipcc_colors.ipynb
    tutorials/iiasa_dbs.ipynb
    tutorials/aggregating_variables_and_plotting_with_negative_values.ipynb

diff --git a/doc/source/tutorials/subannual_time_resolution.ipynb b/doc/source/tutorials/subannual_time_resolution.ipynb
@@ -0,0 +1,134 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Aggregating subannual timeseries data\n",
+    "\n",
+    "The **pyam** package offers many tools to facilitate processing of scenario data.\n",
+    "In this notebook, we illustrate methods to aggregate timeseries data that is given at a sub-annual resolution using timeslices (seasons, representative days, etc.).\n",
+    "\n",
+    "<div class=\"alert alert-warning\">\n",
+    "\n",
+    "The features for working with subannual time resolution are still in an experimental status.\n",
+    "The functions illustrated in this tutorial are operational and tested, but other tools such as the plotting library may not work as expected (yet) when working with subannual data.\n",
+    "\n",
+    "</div>\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "This notebook illustrates the following features:\n",
+    "\n",
+    "0. Load timeseries data from a snapshot file and inspect the scenario\n",
+    "1. Aggregate timeseries data given at a sub-annual time resolution to a yearly value"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import pyam"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 0. Load timeseries data from snapshot file and inspect the scenario\n",
+    "\n",
+    "The stylized scenario used in this tutorial has data for primary-energy timeseries for two subannual timeslices `summer` and `winter`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pyam.IamDataFrame(data='tutorial_data_subannual_time.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.timeseries()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Aggregating timeseries across sub-annual timesteps\n",
+    "\n",
+    "Per default, the [aggregate_time()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.aggregate_time) function\n",
+    "aggregates (by summation) the data from all sub-annual timesteps (given in the column `subannual`) to a `year` value.\n",
+    "\n",
+    "The function returns an `IamDataFrame`, so we can use [timeseries()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.timeseries) to display the resulting data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.aggregate_time('Primary Energy').timeseries()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The function also supports directly appending the aggregated data to the original `IamDataFrame`. You can also pass a a list of variables, or call [variables()](https://pyam-iamc.readthedocs.io/en/stable/api.html#pyam.IamDataFrame.variables) to perform the aggregation on all timeseries data.\n",
+    "\n",
+    "You will notice that the following cell returns a larger dataset compared to calling the same function above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.aggregate_time(df.variables(), append=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.timeseries()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/doc/source/tutorials/tutorial_data_subannual_time.csv b/doc/source/tutorials/tutorial_data_subannual_time.csv
@@ -0,0 +1,5 @@
+model,scenario,region,variable,unit,subannual,2005,2010
+model_a,scen_a,World,Primary Energy,EJ/y,summer,3.5999999999999996,4.5
+model_a,scen_a,World,Primary Energy,EJ/y,winter,8.399999999999999,10.5
+model_a,scen_a,World,Primary Energy|Coal,EJ/y,summer,2.6999999999999997,3.0
+model_a,scen_a,World,Primary Energy|Coal,EJ/y,winter,6.3,7.0
diff --git a/pyam/_aggregate.py b/pyam/_aggregate.py
@@ -1,3 +1,4 @@
+import pandas as pd
 import numpy as np
 import logging
 
@@ -50,8 +51,8 @@ def _aggregate(df, variable, components=None, method=np.sum):
     return _group_and_agg(_df, [], method)
 
 
-def _aggregate_region(self, variable, region, subregions=None,
-                      components=False, method='sum', weight=None):
+def _aggregate_region(df, variable, region, subregions=None, components=False,
+                      method='sum', weight=None):
     """Internal implementation for aggregating data over subregions"""
     if not isstr(variable) and components is not False:
         msg = 'aggregating by list of variables with components ' \
@@ -63,7 +64,7 @@ def _aggregate_region(self, variable, region, subregions=None,
         raise ValueError(msg)
 
     # default subregions to all regions other than `region`
-    subregions = subregions or self._all_other_regions(region, variable)
+    subregions = subregions or df._all_other_regions(region, variable)
 
     if not len(subregions):
         msg = 'cannot aggregate variable `{}` to `{}` because it does not'\
@@ -73,7 +74,7 @@ def _aggregate_region(self, variable, region, subregions=None,
         return
 
     # compute aggregate over all subregions
-    subregion_df = self.filter(region=subregions)
+    subregion_df = df.filter(region=subregions)
     rows = subregion_df._apply_filters(variable=variable)
     if weight is None:
         col = 'region'
@@ -86,7 +87,7 @@ def _aggregate_region(self, variable, region, subregions=None,
     # if not `components=False`, add components at the `region` level
     if components is not False:
         with adjust_log_level(logger):
-            region_df = self.filter(region=region)
+            region_df = df.filter(region=region)
 
         # if `True`, auto-detect `components` at the `region` level,
         # defaults to variables below `variable` only present in `region`
@@ -106,6 +107,31 @@ def _aggregate_region(self, variable, region, subregions=None,
     return _data
 
 
+def _aggregate_time(df, variable, column, value, components, method=np.sum):
+    """Internal implementation for aggregating data over subannual time"""
+    # default `components` to all entries in `column` other than `value`
+    if components is None:
+        components = list(set(df.data.subannual.unique()) - set([value]))
+
+    # compute aggregate over time
+    filter_args = dict(variable=variable)
+    filter_args[column] = components
+    index = _list_diff(df.data.columns, [column, 'value'])
+
+    _data = pd.concat(
+        [
+            df.filter(**filter_args).data
+            .pivot_table(index=index, columns=column)
+            .value
+            .rename_axis(None, axis=1)
+            .apply(_get_method_func(method), axis=1)
+        ], names=[column] + index, keys=[value])
+
+    # reset index-level order to original IamDataFrame
+    _data.index = _data.index.reorder_levels(df._LONG_IDX)
+
+    return _data
+
 def _group_and_agg(df, by, method=np.sum):
     """Groupby & aggregate `df` by column(s), return indexed `pd.Series`"""
     by = [by] if isstr(by) else by

diff --git a/pyam/core.py b/pyam/core.py
@@ -40,7 +40,8 @@
 )
 from pyam.read_ixmp import read_ix
 from pyam.timeseries import fill_series
-from pyam._aggregate import _aggregate, _aggregate_region, _group_and_agg
+from pyam._aggregate import _aggregate, _aggregate_region, _aggregate_time,\
+    _group_and_agg
 
 logger = logging.getLogger(__name__)
 
@@ -802,7 +803,7 @@ def aggregate(self, variable, components=None, method='sum', append=False):
             method to use for aggregation, e.g. np.mean, np.sum, 'min', 'max'
         append: bool, default False
             append the aggregate timeseries to `self` and return None,
-            else return aggregate timeseries
+            else return aggregate timeseries as new :class:`IamDataFrame`
         """
         _df = _aggregate(self, variable, components=components, method=method)
 
@@ -888,12 +889,11 @@ def aggregate_region(self, variable, region='World', subregions=None,
             (currently only supported with `method='sum'`)
         append: bool, default False
             append the aggregate timeseries to `self` and return None,
-            else return aggregate timeseries
+            else return aggregate timeseries as new :class:`IamDataFrame`
         """
-        _df = _aggregate_region(
-            self, variable, region=region, subregions=subregions,
-            components=components, method=method, weight=weight
-        )
+        _df = _aggregate_region(self, variable, region=region,
+                                subregions=subregions, components=components,
+                                method=method, weight=weight)
 
         # return None if there is nothing to aggregate
         if _df is None:
@@ -969,6 +969,42 @@ def check_aggregate_region(self, variable, region='World', subregions=None,
             _df.index = _df.index.reorder_levels(self._LONG_IDX)
             return _df
 
+    def aggregate_time(self, variable, column='subannual', value='year',
+                       components=None, method='sum', append=False):
+        """Aggregate a timeseries over a subannual time resolution
+
+         Parameters
+         ----------
+         variable: str or list of str
+             variable(s) to be aggregated
+         column: str, default 'subannual'
+             the data column to be used as subannual time representation
+         value: str, default 'year
+             the name of the aggregated (subannual) time
+         components: list of str
+             subannual timeslices to be aggregated; defaults to all subannual
+             timeslices other than ``value``
+         method: func or str, default 'sum'
+             method to use for aggregation, e.g. np.mean, np.sum, 'min', 'max'
+         append: bool, default False
+             append the aggregate timeseries to `self` and return None,
+             else return aggregate timeseries as new :class:`IamDataFrame`
+         """
+        _df = _aggregate_time(self, variable, column=column, value=value,
+                              components=components, method=method)
+
+        # return None if there is nothing to aggregate
+        if _df is None:
+            return None
+
+        # else, append to `self` or return as `IamDataFrame`
+        if append is True:
+            self.append(_df, inplace=True)
+        else:
+            df = IamDataFrame(_df)
+            df.meta = self.meta.loc[_make_index(df.data)]
+            return df
+
     def downscale_region(self, variable, proxy, region='World',
                          subregions=None, append=False):
         """Downscale a timeseries to a number of subregions
@@ -985,7 +1021,7 @@ def downscale_region(self, variable, proxy, region='World',
             list of subregions, defaults to all regions other than `region`
         append: bool, default False
             append the downscaled timeseries to `self` and return None,
-            else return downscaled data as new `IamDataFrame`
+            else return downscaled data as new :class:`IamDataFrame`
         """
         # get default subregions if not specified
         subregions = subregions or self._all_other_regions(region)

diff --git a/pyam/utils.py b/pyam/utils.py
@@ -130,6 +130,7 @@ def read_file(fname, *args, **kwargs):
 def format_data(df, **kwargs):
     """Convert a `pd.Dataframe` or `pd.Series` to the required format"""
     if isinstance(df, pd.Series):
+        df.name = df.name or 'value'
         df = df.to_frame()
 
     # Check for R-style year columns, converting where necessary

diff --git a/tests/conftest.py b/tests/conftest.py
@@ -16,12 +16,20 @@
 TEST_DATA_DIR = os.path.join(here, 'data')
 
 
+TEST_YEARS = [2005, 2010]
+TEST_DTS = [datetime(2005, 6, 17), datetime(2010, 7, 21)]
+TEST_TIME_STR = ['2005-06-17', '2010-07-21']
+TEST_TIME_STR_HR = ['2005-06-17 00:00:00', '2010-07-21 12:00:00']
+
+DTS_MAPPING = {2005: TEST_DTS[0], 2010: TEST_DTS[1]}
+
+
 TEST_DF = pd.DataFrame([
     ['model_a', 'scen_a', 'World', 'Primary Energy', 'EJ/y', 1, 6.],
     ['model_a', 'scen_a', 'World', 'Primary Energy|Coal', 'EJ/y', 0.5, 3],
     ['model_a', 'scen_b', 'World', 'Primary Energy', 'EJ/y', 2, 7],
 ],
-    columns=IAMC_IDX + [2005, 2010],
+    columns=IAMC_IDX + TEST_YEARS,
 )
 
 
@@ -52,7 +60,7 @@
     ['reg_a', 'Population', 'm', 2, 3],
     ['reg_b', 'Population', 'm', 1, 2],
 ],
-    columns=['region', 'variable', 'unit', 2005, 2010],
+    columns=['region', 'variable', 'unit'] + TEST_YEARS,
 )
 
 
@@ -64,7 +72,7 @@
     ['MESSAGE-GLOBIOM', 'a_scenario', 'AFR', 'Primary Energy', 'EJ/y', 2, 7],
     ['MESSAGE-GLOBIOM', 'a_scenario', 'World', 'Primary Energy', 'EJ/y', 3, 13],
 ],
-    columns=IAMC_IDX + [2005, 2010],
+    columns=IAMC_IDX + TEST_YEARS,
 )
 
 
@@ -84,14 +92,6 @@
 TEST_STACKPLOT_DF['scenario'] = 'a_scen'
 
 
-TEST_YEARS = [2005, 2010]
-TEST_DTS = [datetime(2005, 6, 17), datetime(2010, 7, 21)]
-TEST_TIME_STR = ['2005-06-17', '2010-07-21']
-TEST_TIME_STR_HR = ['2005-06-17 00:00:00', '2010-07-21 12:00:00']
-
-DTS_MAPPING = {2005: TEST_DTS[0], 2010: TEST_DTS[1]}
-
-
 # minimal IamDataFrame with four different time formats
 @pytest.fixture(
     scope="function",
@@ -137,6 +137,23 @@ def simple_df(request):
     yield IamDataFrame(model='model_a', scenario='scen_a', data=_df)
 
 
+# IamDataFrame with subannual time resolution
+@pytest.fixture(scope="function")
+def subannual_df():
+    _df = FULL_FEATURE_DF.iloc[0:6].copy()
+
+    def add_subannual(_data, name, value):
+        _data['subannual'] = name
+        _data[TEST_YEARS] = _data[TEST_YEARS] * value
+        return _data
+
+    # primary energy is a direct sum across sub-annual timeslices
+    mapping = [('year', 1), ('winter', 0.7), ('summer', 0.3)]
+    lst = [add_subannual(_df.copy(), name, value) for name, value in mapping]
+
+    yield IamDataFrame(model='model_a', scenario='scen_a', data=pd.concat(lst))
+
+
 @pytest.fixture(scope="function")
 def reg_df():
     df = IamDataFrame(data=REG_DF)