# Comparison between ClimateTRACE and C40 inventories

This analysis compares city aggregated estimates from [climateTRACE](https://climatetrace.org/) to estimates [GPC](https://ghgprotocol.org/ghg-protocol-cities) compliant C40 city inventories downloaded from [here](https://www.c40knowledgehub.org/s/article/C40-cities-greenhouse-gas-emissions-interactive-dashboard?language=en_US).

I am not sure if the C40 inventories are high quality. Comparing to downscaled observations would not be a fair comparison. 

In [1]:
import json
import os
import fnmatch
import pandas as pd
import tarfile
import os
import requests
from sqlalchemy import create_engine, MetaData, text
from sqlalchemy.orm import sessionmaker
from tqdm import tqdm

In [2]:
from utils import (
    get_c40_data, 
    filter_out_notation_keys,
    climatetrace_file_names,
    load_climatetrace_file,
    point_to_lat_lon,
    lat_lon_to_locode
)

## Read raw C40 data

**Units**: metric tonnes CO2-eq. (I am assuming these are units since they should be following the GPC)

In [3]:
df_c40_raw = get_c40_data()

### filter C40

In [4]:
refnos = ['II.1.1']
columns = ['city', 'locode', 'year'] + refnos

df_tmp = filter_out_notation_keys(df_c40_raw, refnos)
df_c40 = (
    df_tmp
    .loc[:, columns]
    .rename(columns = {'II.1.1': 'emissions_c40'})
)

### Read ClimateTRACE

**Units**: Units are tonnes 

In [22]:
asset_file = './transportation/asset_road-transportation_emissions.csv'
df_ct_raw = load_climatetrace_file(asset_file)
filt = (df_ct_raw['gas'] == 'co2e_100yr')
df_data = df_ct_raw.loc[filt]

df_ct = (
    df_data
    .assign(year = lambda row: pd.to_datetime(row['start_time']).dt.year)
    .loc[:, ['asset_name', 'year', 'emissions_quantity', 'emissions_factor_units']]
    .rename(columns = {'emissions_quantity': 'emissions_ct', 'asset_name': 'city'})
)

## Comparison

In [30]:
df_int = pd.merge(df_ct, df_c40, on = ['year', 'city'], how='inner')
df_int['diff'] = df_int['emissions_ct'] - df_int['emissions_c40']
df_int['percent_error'] = (df_int['diff'] / df_int['emissions_c40']) * 100

In [31]:
df_int

Unnamed: 0,city,year,emissions_ct,emissions_factor_units,locode,emissions_c40,diff,percent_error
0,Istanbul,2021,7124044.0,average_tonnes_gas_per_vehicle_km_traveled,TR IST,14147990.0,-7023946.0,-49.646248
