# Summarising the data fitting process in 4 cells

This file shows a minimal working example of the correlation framework . This minimal example serves to illustrate the extensibility of this framework to other data sources.

The correlation framework follows 4 steps:

+ data extraction;
+ data recombination;
+ correlation;
+ and prediction.

Additional info:

+ data preparation: datasets are in [`data/processed`](data/processed), and are generated by the notebooks in folder [`data/`](data/);
+ visualisation: [`data_viz`](data_viz.ipynb)
+ In depth correlation [`correlate`](correlate.ipynb)

In [1]:
# Single include needed
import brsflufight_nerc2 as bff2

## Data extraction

In [2]:
data_sets = bff2.load_data_files()

country = 'United Kingdom'
d_uk = data_sets.get_country(country)

A loading function must be defined in `data_access.default_file_read_functions`.
Success: historical_GHG_Sectors_GCP
Success: historical_GHG_Sectors_PIK
Success: historical_GHG_Sectors_UNFCCC
Success: mobility_apple
Failed: United Kingdom not in mobility_citymapper
Success: mobility_google
Success: uk_energy_daily
Success: uk_energy_demand_reduction


## Data recombination

In [3]:
# Data recombination
d_uk['uk_energy_yearly'] = bff2.summarise_to_freq(
    d_uk['uk_energy_daily'],
    freq='1Y', 
    fun_test_data=lambda gr: gr.count()["dayofyear"]>350,
)
d_uk['uk_energy_demand_reduction_yearly'] = bff2.summarise_to_freq(
    d_uk['uk_energy_demand_reduction'], freq='1Y', fun_test_data=None
)

d_uk['historical_GHG_Sectors_GCP']['Gas and Coal (CO2)'] = \
    d_uk['historical_GHG_Sectors_GCP']['Gas (CO2)'] \
    + d_uk['historical_GHG_Sectors_GCP']['Coal (CO2)']


## Correlation

In [4]:
to_correlate = {
    'uk_energy_yearly': ['oil', 'ccgt', 'coal', 'demand'],  # Carbon power generation
    'historical_GHG_Sectors_GCP': [  # Select only CO2 data not other gases
        s for s in d_uk['historical_GHG_Sectors_GCP'].columns if f"(CO2)" in s
    ],
    'historical_GHG_Sectors_PIK': [  # Select Energy data not other sectors
        'Energy (KYOTOGHG)', 'Energy (CO2)', 'Energy (N2O)', 'Energy (CH4)'
    ],
}

correlation_dict = bff2.correlate(
    selector=to_correlate,  # datasets defined above
    data_sets=d_uk,  # UK data
    main_compare='uk_energy_yearly',  # reference dataset, the other are correlated to it
)

bff2.display_correlations(correlation_dict, display_fun=display)

Linear regression failed for: 'Bunkers (CO2)' fit to  'oil':
  Message: Input contains NaN, infinity or a value too large for dtype('float64').
Linear regression failed for: 'Bunkers (CO2)' fit to  'ccgt':
  Message: Input contains NaN, infinity or a value too large for dtype('float64').
Linear regression failed for: 'Bunkers (CO2)' fit to  'coal':
  Message: Input contains NaN, infinity or a value too large for dtype('float64').
Linear regression failed for: 'Bunkers (CO2)' fit to  'demand':
  Message: Input contains NaN, infinity or a value too large for dtype('float64').
Pearson correlation coefficients in dataset 'historical_GHG_Sectors_GCP'
	 on data from 2012 to 2018 (inclusive)
_________________________________________________________


Unnamed: 0,Bunkers (CO2),Cement (CO2),Coal (CO2),Gas (CO2),Gas flaring (CO2),Oil (CO2),Total fossil fuels and cement (CO2),Gas and Coal (CO2)
oil,,-0.845,0.883,-0.613,-0.7,-0.883,0.883,0.883
ccgt,,0.739,-0.786,0.643,0.487,0.857,-0.786,-0.786
coal,,-0.703,1.0,-0.571,-0.775,-0.786,1.0,1.0
demand,,-0.703,1.0,-0.571,-0.775,-0.786,1.0,1.0



Pearson correlation coefficients in dataset 'historical_GHG_Sectors_PIK'
	 on data from 2012 to 2017 (inclusive)
_________________________________________________________


Unnamed: 0,Energy (KYOTOGHG),Energy (CO2),Energy (N2O),Energy (CH4)
oil,0.986,0.986,0.812,0.986
ccgt,-0.829,-0.829,-0.829,-0.886
coal,1.0,1.0,0.771,0.943
demand,1.0,1.0,0.771,0.943





## Prediction

In [5]:
prediction, change_corona = bff2.predict_correlation_model(
    d_uk['uk_energy_demand_reduction_yearly'],
    correlation_dict['historical_GHG_Sectors_PIK'],
)

display(change_corona)

Unnamed: 0_level_0,predictor_value,Energy (KYOTOGHG),Energy (CO2),Energy (N2O),Energy (CH4),quantity
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-01,-3741.930094,-76.637021,-74.360175,-0.094797,-1.903151,absolute difference
2020-01-01,-0.117657,-0.199307,-0.198163,-0.037639,-0.273903,relative difference
