# Prepare Carbon Intensity Data from National Grid ESO

The National Grid ESO provide grid electricity carbon intensity data from 2009 to current, at https://data.nationalgrideso.com/carbon-intensity1/historic-generation-mix

"This dataset contains data from the 1st of Jan 2009. It has seasonal decomposition applied to correct missing or irregular data points. The carbon intensity of electricity is a measure of how much Carbon dioxide emissions are produced per kilowatt hour of electricity consumed. The data is subject to change due to a data cleansing process taking place to provide the most accurate figures."

Looking at https://dashboard.nationalgrideso.com/, and the scale of the provided dataset implies the units of carbon intensity are [gCO2/kWh] - units required are [kg_CO2/kWh]

In [1]:
import pandas as pd
import numpy as np
import os

In [2]:
dir = 'raw_data'
fname = 'df_fuel_ckan.csv'
fpath = os.path.join(dir, fname)

No metadata required as these figures are for the whole of the UK

In [3]:
data = pd.read_csv(
    fpath,
    usecols=['DATETIME','CARBON_INTENSITY']
)

In [4]:
print(data)

                         DATETIME  CARBON_INTENSITY
0       2009-01-01 00:00:00+00:00             525.0
1       2009-01-01 00:30:00+00:00             526.0
2       2009-01-01 01:00:00+00:00             527.0
3       2009-01-01 01:30:00+00:00             528.0
4       2009-01-01 02:00:00+00:00             529.0
...                           ...               ...
266660  2024-03-18 10:00:00+00:00             152.0
266661  2024-03-18 10:30:00+00:00             144.0
266662  2024-03-18 11:00:00+00:00             133.0
266663  2024-03-18 11:30:00+00:00             126.0
266664  2024-03-18 12:00:00+00:00             117.0

[266665 rows x 2 columns]


In [5]:
# if last hour incomplete, delete that row
if pd.Timestamp(data['DATETIME'].iloc[-1]).minute == 0:
    data.drop(data.tail(1).index,inplace=True)

In [6]:
hh_intensities = data['CARBON_INTENSITY'].to_numpy()/1000
hr_intensities = np.around(np.mean([hh_intensities[:-1:2],hh_intensities[1::2]],axis=0), 3)

assert not (True in np.isnan(hr_intensities))

In [7]:
timestamps = [pd.Timestamp(ts) for ts in data['DATETIME'][::2]]

In [8]:
intensities = pd.DataFrame({'datetime':timestamps,'Carbon Intensity [kg_CO2/kWh]':hr_intensities})

In [9]:
print(intensities)

                        datetime  Carbon Intensity [kg_CO2/kWh]
0      2009-01-01 00:00:00+00:00                          0.526
1      2009-01-01 01:00:00+00:00                          0.528
2      2009-01-01 02:00:00+00:00                          0.530
3      2009-01-01 03:00:00+00:00                          0.531
4      2009-01-01 04:00:00+00:00                          0.534
...                          ...                            ...
133327 2024-03-18 07:00:00+00:00                          0.178
133328 2024-03-18 08:00:00+00:00                          0.166
133329 2024-03-18 09:00:00+00:00                          0.159
133330 2024-03-18 10:00:00+00:00                          0.148
133331 2024-03-18 11:00:00+00:00                          0.130

[133332 rows x 2 columns]


In [10]:
# save processed data to csv
years = range(2009,2024)
save_dir = os.path.join('processed_data')
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

for year in years:
    year_data = intensities[intensities['datetime'].dt.year == year].copy()
    year_data['datetime'] = year_data['datetime'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
    year_data.to_csv(os.path.join(save_dir,f'{year}.csv'), index=False)

intensities.to_csv(os.path.join(save_dir,f'{year}.csv'), index=False)

Interesting note: numpy uses a round to even convention, which apparently is more mathematically desirable - https://stackoverflow.com/questions/28617841/rounding-to-nearest-int-with-numpy-rint-not-consistent-for-5