# Multiple infillers

Here we investigate two infillers that perform many cruncher operations. One is designed to break down composits, like the Kyoto gases or HFCs, the other designed to infill all required data. These functions are purposefully not as object-oriented as the crunchers so that modellers unfamiliar with this coding structure can use them. 

You will need to install silicone from https://github.com/znicholls/silicone/ in order to run this code. 

## Imports

In [1]:
import os.path
import traceback

import pandas as pd
import pyam
import matplotlib.pyplot as plt
import numpy as np

import silicone.multiple_infillers as mi
from silicone.utils import (
    _get_unit_of_variable,
    find_matching_scenarios,
    _make_interpolator,
    _make_wide_db,
    get_sr15_scenarios, 
    return_cases_which_consistently_split,
    convert_units_to_MtCO2_equiv
)


<IPython.core.display.Javascript object>

pyam - INFO: Running in a notebook, setting `pyam` logging level to `logging.INFO` and adding stderr handler


In [2]:
SR15_SCENARIOS = "./sr15_scenarios.csv"

## Example data

Here we pull some example data by downloading a selection of the SR1.5 scenarios.

In [3]:
valid_model_ids = [
        "MESSAGE*",
        "AIM*",
        "C-ROADS*",
        "GCAM*",
        "IEA*",
        "IMAGE*",
        "MERGE*",
        "POLES*",
        "REMIND*",
        "WITCH*"
    ]
if not os.path.isfile(SR15_SCENARIOS):
    get_sr15_scenarios(SR15_SCENARIOS, valid_model_ids)

### Starting point

Our starting point is the test data, loaded a `pyam.IamDataFrame`.

In [4]:
sr15_data = pyam.IamDataFrame(SR15_SCENARIOS)
target = "Emissions|CO2"
constituents = ["Emissions|CO2|*"]
to_infill = sr15_data.filter(model="WITCH*", variable=[target] + constituents)
database = sr15_data.filter(model="WITCH*", keep=False)
to_infill.head()

pyam.utils - INFO: Reading `./sr15_scenarios.csv`


Unnamed: 0,model,scenario,region,variable,unit,year,subannual,meta,value
231183,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CO2,Mt CO2/yr,2005,0.0,0,31922.04435
231184,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CO2,Mt CO2/yr,2010,0.0,0,35303.38776
231185,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CO2,Mt CO2/yr,2020,0.0,0,37312.30711
231186,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CO2,Mt CO2/yr,2030,0.0,0,10560.00846
231187,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CO2,Mt CO2/yr,2040,0.0,0,7435.934987


## Investigating where the data is consistent

A utility function called return_cases_which_consistently_split indicates which variables consist of only other variables, which is useful to know in order to work out where data can be consistently split using decompose_collection_with_time_dep_ratio. Note that this is not a requirement for using that method (a consistent aggregate value is constructed in any case) but indicates that this is approach is rigorous. 

In the first instance, it does not find any results because there are several layers of constituents. However with only one layer, this works as expected. Note that if only one layer is used, there is also a pyam built-in function called 'check_consistency' that performs the same effect. 

In [5]:
no_cases = return_cases_which_consistently_split(
    to_infill, target, constituents, 
)
len(no_cases)

0

However in the case below, we select only the next level of info, and find it matches in all cases (the number of cases does not depend on the accuracy, as shown in the second box. 

In [6]:
all_cases = return_cases_which_consistently_split(
    to_infill, target, ["Emissions|CO2|AFOLU", "Emissions|CO2|Energy and Industrial Processes"], 
)
len(all_cases)

39

In [7]:
all_cases = return_cases_which_consistently_split(
    to_infill, target, ["Emissions|CO2|AFOLU", "Emissions|CO2|Energy and Industrial Processes"], 
     how_close={
            'equal_nan': True,
            'rtol': 100, # This means that we accept a factor of 100 inaccuracy. 
    }
)
len(all_cases)

39

## Using the infiller functions
Here we show the use of the DecomposeCollectionTimeDepRatio and InfillAllRequiredVariables functions. 
### InfillAllRequiredVariables
This is a function to more conveniently infill all of the required variables that are not already found in the system. 

In [8]:
database.data["subannual"] = 0
database.data["meta"] = 0
database.tail()

Unnamed: 0,model,scenario,region,variable,unit,year,subannual,meta,value
231123,REMIND-MAgPIE 1.7-3.0,SMP_REF_Sust,World,Emissions|VOC|Other,Mt VOC/yr,2060,0,0,23.2311
231124,REMIND-MAgPIE 1.7-3.0,SMP_REF_Sust,World,Emissions|VOC|Other,Mt VOC/yr,2070,0,0,22.1971
231125,REMIND-MAgPIE 1.7-3.0,SMP_REF_Sust,World,Emissions|VOC|Other,Mt VOC/yr,2080,0,0,21.1632
231126,REMIND-MAgPIE 1.7-3.0,SMP_REF_Sust,World,Emissions|VOC|Other,Mt VOC/yr,2090,0,0,20.1292
231127,REMIND-MAgPIE 1.7-3.0,SMP_REF_Sust,World,Emissions|VOC|Other,Mt VOC/yr,2100,0,0,19.0953


In [9]:
to_infill.data["subannual"] = 0
to_infill.data["meta"] = 0
to_infill.tail()

Unnamed: 0,model,scenario,region,variable,unit,year,subannual,meta,value
244614,WITCH-GLOBIOM 4.4,CD-LINKS_NoPolicy,World,Emissions|CO2|Energy|Supply|Electricity,Mt CO2/yr,2060,0,0,29487.271577
244615,WITCH-GLOBIOM 4.4,CD-LINKS_NoPolicy,World,Emissions|CO2|Energy|Supply|Electricity,Mt CO2/yr,2070,0,0,31404.870008
244616,WITCH-GLOBIOM 4.4,CD-LINKS_NoPolicy,World,Emissions|CO2|Energy|Supply|Electricity,Mt CO2/yr,2080,0,0,31911.146283
244617,WITCH-GLOBIOM 4.4,CD-LINKS_NoPolicy,World,Emissions|CO2|Energy|Supply|Electricity,Mt CO2/yr,2090,0,0,30995.012917
244618,WITCH-GLOBIOM 4.4,CD-LINKS_NoPolicy,World,Emissions|CO2|Energy|Supply|Electricity,Mt CO2/yr,2100,0,0,29852.45485


In [10]:
infilled = mi.infill_all_required_variables(
    to_infill,
    database,
    [target],
    output_timesteps = list(range(2020, 2101, 10))
)
infilled.head()

Filling required variables: 100%|████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.94it/s]
Filling required variables: 100%|██████████████████████████████████████████████████████| 22/22 [00:19<00:00,  1.12it/s]


Unnamed: 0,model,scenario,region,variable,unit,year,subannual,meta,value
0,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|BC,Mt BC/yr,2020,0,0,6.643182
1,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|BC,Mt BC/yr,2030,0,0,5.033023
2,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|BC,Mt BC/yr,2040,0,0,4.253766
3,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|BC,Mt BC/yr,2050,0,0,3.64123
4,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|BC,Mt BC/yr,2060,0,0,3.198676


We now have a complete scenario, with all required variables. 

In [11]:
to_infill.filter().variables(True)

Unnamed: 0,variable,unit
0,Emissions|CO2,Mt CO2/yr
1,Emissions|CO2|AFOLU,Mt CO2/yr
2,Emissions|CO2|Energy and Industrial Processes,Mt CO2/yr
5,Emissions|CO2|Energy|Demand,Mt CO2/yr
8,Emissions|CO2|Energy|Demand|Industry,Mt CO2/yr
9,Emissions|CO2|Energy|Demand|Residential and Co...,Mt CO2/yr
6,Emissions|CO2|Energy|Demand|Transportation,Mt CO2/yr
3,Emissions|CO2|Energy|Supply,Mt CO2/yr
4,Emissions|CO2|Energy|Supply|Electricity,Mt CO2/yr
7,Emissions|CO2|Energy|Supply|Liquids,Mt CO2/yr


### DecomposeCollectionTimeDepRatio
This function is designed to split up an aggregate value into its known components, using the relationship between these found in models that have all components (but not necessarily the aggregate variable). The aggregate value is calculated in the first step. 

In [12]:
# Define some calculation parameters
components = ["Emissions|CO2", "Emissions|CH4", "Emissions|N2O", "Emissions|F-Gases"]
aggregate = "Emissions|Kyoto Gases (AR4-GWP100)"
to_infill = sr15_data.filter(model="WITCH*", variable=aggregate)
unit_consistant_db = convert_units_to_MtCO2_equiv(database.filter(variable=components))
unit_consistant_db.variables(True)

Unnamed: 0,variable,unit
0,Emissions|CH4,Mt CO2-equiv/yr
1,Emissions|CO2,Mt CO2/yr
2,Emissions|F-Gases,Mt CO2-equiv/yr
3,Emissions|N2O,Mt CO2-equiv/yr


In [13]:
decomposer = mi.DecomposeCollectionTimeDepRatio(unit_consistant_db)
results = decomposer.infill_components(aggregate, components, to_infill.filter(year=[2000, 2005], keep=False))
results.head()

  pos_inds = data_leader[year].values > 0
  data_follower[year].iloc[~pos_inds].values) / np.nanmean(
  data_leader[year].iloc[~pos_inds].values)


Unnamed: 0,model,scenario,region,variable,unit,year,subannual,meta,value
0,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CH4,Mt CO2-equiv/yr,2010,0.0,0,9661.641271
1,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CH4,Mt CO2-equiv/yr,2015,0.0,0,9742.231406
2,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CH4,Mt CO2-equiv/yr,2020,0.0,0,9586.593085
3,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CH4,Mt CO2-equiv/yr,2030,0.0,0,3449.001415
4,WITCH-GLOBIOM 3.1,SSP1-19,World,Emissions|CH4,Mt CO2-equiv/yr,2040,0.0,0,2861.02983


In [14]:
# We now have variable information for each of the components
results.variables()

0        Emissions|CH4
1        Emissions|CO2
2    Emissions|F-Gases
3        Emissions|N2O
Name: variable, dtype: object