# Cumulative emissions

This example will walk through calculating and visulaizing cumulative emissions. 

In [None]:
import concurrent.futures
from itertools import cycle
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
import numpy as np
from openclimate import Client
import pandas as pd

We will first initialize a `Client()` object.

In [None]:
client = Client()

If you are using a jupyter enviornment, you will need to first `client.jupyter`. 
This patches the `asyncio` library to work in Jupyter envionrments using [nest-asyncio](https://pypi.org/project/nest-asyncio/).

In [None]:
client.jupyter

## Get country codes 

OpenClimate references each country by its two-letter [ISO-3166](https://www.iso.org/iso-3166-country-codes.html) code.
To access this in `openclimate` we can use the `.parts()` method to get all the "parts" of EARTH. Other codes we use are [UN/LOCODEs](https://unece.org/trade/cefact/unlocode-code-list-country-and-territory) for cities and [LEI](https://www.gleif.org/en/about-lei/introducing-the-legal-entity-identifier-lei) for companies. As a catch-all term, we call them an `actor_id`.

In [None]:
df_country = client.parts('EARTH')

Looking at the dataframe that's returned, we have a column with each country's `actor_id`. 

In [None]:
df_country.head()

Let's save just the `actor_id` to a list 

In [None]:
iso_and_name = list(zip(df_country['actor_id'], df_country['name']))

## Which datasets are available?

To get a list of datasets available for an actor you can use the `.emissions_datasets()` method.
Here I am asking for datasets with Candian emissions.

In [None]:
client.emissions_datasets('CA')

You can return datasets for multiple actors at once by passing them as a callable, such as a list or tuple. Here I am asking for Canadian and Italian emission datasets, but only returning a sample of 5 records.

In [None]:
client.emissions_datasets(['CA', 'IT']).sample(5)

## Get emissions

If we just pass an `actor_id` to the `.emissions()` method, all the emissions will be returned.

In [None]:
df_tmp = client.emissions(actor_id='US')
df_tmp.head()

Keep in mind that this will return *all* the data for that actor. Below are the datasets available.

In [None]:
set(df_tmp['datasource_id'])

In most cases, we want to filter this and use a particular dataset. We can do that with the `datasource_id` parameter. 

In [None]:
df_tmp = client.emissions(actor_id='US', datasource_id='PRIMAP:10.5281/zenodo.7179775:v2.4')

As a sanity check, let's look at which datasets are returned

In [None]:
set(df_tmp['datasource_id'])

As you see, only PRIMAP was returned. 

## Get emissions for all countries

Now let's get emissions for all countries

In [None]:
%%time
iso_codes = [iso_code[0] for iso_code in iso_and_name]
df_emissions = client.emissions(
    actor_id=iso_codes, 
    datasource_id='PRIMAP:10.5281/zenodo.7179775:v2.4'
)

This takes about 30 seconds to retrieve all that data, even with `asyncio` working behind the scenes.
This outputs a massive dataframe with the data from all countries concatenated together

In [None]:
df_emissions.sample(5)

## Calculate cumulative emissions

let's first make sure all the datasets have the same starting year

In [None]:
all([df_emissions.loc[df_emissions['actor_id']==iso_code, 'year'].min() for iso_code in set(df_emissions['actor_id'])])

Now we can calculate cumulative emissions

In [None]:
df_out = df_emissions.assign(cumulative_emissions = df_emissions.groupby('actor_id')['total_emissions'].cumsum())

Now we have a column for cumulative emissions

In [None]:
df_out.head()

## Rank country by cumulative emissions

Now that we now the cumulative emission, we can rank the countries by the cumulative emissions in the most recent year.

In [None]:
last_year = df_out['year'].max()
df_sorted = (
    df_out.loc[df_out['year'] == last_year, ['actor_id', 'cumulative_emissions', 'year']]
    .sort_values(by='cumulative_emissions', ascending=False)
)

df_sorted['rank'] = df_sorted['cumulative_emissions'].rank(ascending=False)


Here are the top 10 cumulative emitters

In [None]:
pd.merge(df_sorted.loc[df_sorted['rank'] <= 10], df_country[['actor_id', 'name']], on='actor_id')

The United States and China are the top two emitters, with the U.S. emitting about 50% more emissions than China over the period from 1750 to 2021. 

In [None]:
561240060000 / 375048000000

## Plot cumulative emissions

Now that we know the top emitters, we can plot a time series

In [None]:
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(111)

# top 8 emitters
top_emitters = list(df_sorted.head(8).actor_id)

# wong color palette (https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40)
colors = ['#000000', '#E69F00', '#56B4E9', '#009E73', '#F0E442', '#0072B2', '#D55E00', '#CC79A7']

for actor_id, color in zip(top_emitters, cycle(colors)):
    actor_name = df_country.loc[df_country['actor_id'] == actor_id, 'name'].values[0]
    filt = df_out['actor_id'] == actor_id
    df_tmp = df_out.loc[filt]

    ax.plot(df_tmp['year'], df_tmp['cumulative_emissions']/10**9, 
            linewidth=4, 
            label = actor_name,
            color=color)

    ylim = [0, 600]
    ax.set_ylim(ylim)
    ax.set_xlim([1850, 2022])

    # Turn off the display of all ticks.
    ax.tick_params(which='both',     # Options for both major and minor ticks
                   top='off',        # turn off top ticks
                   left='off',       # turn off left ticks
                   right='off',      # turn off right ticks
                   bottom='off')     # turn off bottom ticks

    # Remove x tick marks
    plt.setp(ax.get_xticklabels(), rotation=0)

    # Hide the right and top spines
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)

    # Only show ticks on the left and bottom spines
    ax.yaxis.set_ticks_position('left')
    ax.xaxis.set_ticks_position('bottom')

    # major/minor tick lines
    ax.xaxis.set_minor_locator(AutoMinorLocator(5))
    ax.grid(axis='y', 
            which='major', 
            color=[0.8, 0.8, 0.8], linestyle='-')

    ax.set_ylabel("Cumulative Emissions (GtCO$_2$e)", fontsize=12)
    ax.legend(loc='upper left', frameon=False)