# Algebraic operations on timeseries data

The **pyam** package offers many tools to facilitate processing of scenario data.
In this notebook, we illustrate algebraic operations on the timeseries data of an **IamDataFrame**:
addition, subtraction, multiplication, and division.

The algebraic operations are (by default) "unit-aware", meaning that **pyam** tries to handle units correctly.
This is implemented via the [iam-units](https://github.com/IAMconsortium/units) package,
an extension of [pint](https://pint.readthedocs.io) package.

The **pint** package natively handles conversion of standard (SI) units and commonly used equivalents 
(e.g., exajoule to terawatt-hours, *EJ -> TWh*), and it can parse combined units
(e.g., exajoule per year, *EJ/yr*).
To better support common use cases when working with energy systems analysis and integrated-assessment scenarios,
the default [pint.UnitRegistry](https://pint.readthedocs.io/en/stable/developers_reference.html#pint.UnitRegistry)
used by **pyam** uses the **iam-units** registry (see [IAMconsortium/units](https://github.com/IAMconsortium/units)),
which extends the pint-defaults with a wide range of conversion factors commonly used in that domain.

## Overview

0. Import data from file and inspect the scenario
1. A simple subtraction
2. Multiplying timeseries data with scalars
3. Calculating shares and dealing with units
4. Overriding unit handling
5. Working on other dimensions of timeseries data

<div class="alert alert-info">

**See Also**

The **pyam** package also supports aggregation and downscaling
along the sectoral and regional dimensions including consistency checks.
See the [aggregation/downscaling tutorial notebook](https://pyam-iamc.readthedocs.io/en/stable/tutorials/aggregating_downscaling_consistency.html)
for more information.

</div>

In [None]:
import pandas as pd
import pyam

## 0. Import data from file and inspect the scenario

The stylized scenario used in this tutorial has data for two regions (`reg_a` & `reg_b`) as well as the `World` aggregate, and for categories of variables: primary energy demand, emissions, carbon price, and population.

In [None]:
df = pyam.IamDataFrame(data='tutorial_data_aggregating_downscaling.csv')
df

In [None]:
df.variable

## 1. A simple subtraction

We first display the existing variables *Primary Energy* and *Primary Energy|Coal*.

In [None]:
df.filter(variable=["Primary Energy", "Primary Energy|Coal"]).timeseries()

Now, we subtract fossil fuels (coal) from the total to see non-fossil energy use, and display the timeseries in wide format.

All algebraic-operations functions follow the syntax:

```
df.<method>(a, b, c) => a <op> b = c
```

Note that in simple cases, **pyam** will try to keep the unit consistent during the operation.

In [None]:
(
    df.subtract("Primary Energy", "Primary Energy|Coal", "Primary Energy|Non-Fossil")
    .timeseries()
)

We can also directly merge newly computed timeseries directly into the original **IamDataFrame** using the keyword argument ``append=True``.

The new variable *Primary Energy|Non-Fossil* is then part of the variable list.

In [None]:
(
    df.subtract("Primary Energy", "Primary Energy|Coal", "Primary Energy|Non-Fossil",
                append=True)
)

In [None]:
df.variable

## 2. Multiplying timeseries data with scalars

The algebraic operations do not only work on items in the **IamDataFrame**, but you can also pass scalars.

You will see that in more elaborate computations, **pyam** may change the notation of the units.
In the example below, *EJ/yr* is changed to *EJ / a*.
This is due to how the **pint** package works internally.

In [None]:
df.multiply("Primary Energy", 3, "PE * 3").timeseries()

You can also define a [pint.Quantity](https://pint.readthedocs.io/en/stable/developers_reference.html#pint.Quantity) from the **iam-units** registry
and use this in the calculation. Note that **pyam** will (try to) correctly reduce the fraction.

In [None]:
from iam_units import registry

q = registry.Quantity(3, "t / EJ")
df.multiply("Primary Energy", q, "custom variable").timeseries()

## 3. Calculating shares and dealing with units

As a next step, we calculate the primary energy use per capita.

In [None]:
(
    df.divide("Primary Energy", "Population", "Energy/Capita")
    .timeseries()
)

As illustrated above, the notation of the units may be changed during the computation.

If you do not like the returned units, you can change that using the [rename()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.rename) function.

In [None]:
(
    df.divide("Primary Energy", "Population", "Energy/Capita")
    .rename(unit={"EJ / a / million": "EJ/yr/million"})
    .timeseries()
)

Or you can use the [convert_unit()](https://pyam-iamc.readthedocs.io/en/stable/api/iamdataframe.html#pyam.IamDataFrame.convert_unit) function;
see the [unit conversion tutorial notebook](https://pyam-iamc.readthedocs.io/en/stable/tutorials/unit_conversion.html) for more information.

In [None]:
(
    df.divide("Primary Energy", "Population", "Energy/Capita")
    .convert_unit("EJ / a / million", "GWh/yr")
    .timeseries()
)

## 4. Overriding unit handling

Even though **pint** is quite powerful, it does not always work as expected.
For example, *Mt CO2* is (strictly speaking) not a unit, but a species indicator *CO2* combined with a unit.

For illustration, computing the emissions per capita will raise a [pint.UndefinedUnitError](https://pint.readthedocs.io/en/stable/developers_reference.html#pint.errors.UndefinedUnitError).

We can override this behavior by setting ``ignore_units=True``; in this case, the unit of the returned timeseries data will be set to *unknown*.

In [None]:
(
    df.divide("Emissions|CO2", "Population", "Emissions/Capita",
              ignore_units=True)
    .timeseries()
)

You can also pass a string as the ``ignore_units`` keyword argument. Then, this string will be used as unit.

Seeing that the unit of emissions is *Mt CO2* and Population is given in *million*, we know that the returned value should be given in *tons of CO2*.

In [None]:
(
    df.divide("Emissions|CO2", "Population", "Emissions/Capita",
              ignore_units="t CO2")
    .timeseries()
)

## 5. Working on other dimensions of timeseries data

By default, algebraic operations in **pyam** will work on the *variable* dimenion.
But you can pass an ``axis`` keyword argument to, for example, perform computations
between scenarios or regions.

Try it!