# Prepare Electricity Pricing Data from Agile Octopus Downloads

Agile Octopus has:
- import (buy price) data from 18th Feb 2018
- outgoing (sell price) data from 15th May 2019

Available at: https://energy-stats.uk/download-historical-pricing-data/

Pricing data is given by Distribution Network Operator region (see https://commons.wikimedia.org/wiki/File:DNO_map.png)

In [1]:
import pandas as pd
import numpy as np
import os

In [2]:
pricing_metadata = {
    'region': 'Eastern England',
    'code': 'A'
}

In [3]:
dir = 'raw_data'
fname = 'csv_agile_{0}_{1}.csv'.format(pricing_metadata['code'],pricing_metadata['region'].replace(' ','_'))
fpath = os.path.join(dir, fname)

In [4]:
data = pd.read_csv(
    fpath,
    names=['time','price'],
    header=None,
    usecols=[0,4]
)

In [5]:
print(data)

                             time    price
0       2018-02-21 00:00:00+00:00  10.3425
1       2018-02-21 00:30:00+00:00  10.5735
2       2018-02-21 01:00:00+00:00  10.6680
3       2018-02-21 01:30:00+00:00  10.1850
4       2018-02-21 02:00:00+00:00  10.1430
...                           ...      ...
106457  2024-03-18 20:30:00+00:00  14.7735
106458  2024-03-18 21:00:00+00:00  14.2905
106459  2024-03-18 21:30:00+00:00  12.1695
106460  2024-03-18 22:00:00+00:00  14.7735
106461  2024-03-18 22:30:00+00:00  11.6865

[106462 rows x 2 columns]


In [6]:
hh_prices = data['price'].to_numpy()
hr_prices = np.around(np.mean([hh_prices[:-1:2],hh_prices[1::2]],axis=0)/100, 4)
assert not (True in np.isnan(hr_prices))

In [7]:
timestamps = [pd.Timestamp(ts) for ts in data['time'][::2]]

In [8]:
in_prices = pd.DataFrame({'datetime':timestamps,'Electricity Pricing [£/kWh]':hr_prices})

In [9]:
print(in_prices)

                       datetime  Electricity Pricing [£/kWh]
0     2018-02-21 00:00:00+00:00                       0.1046
1     2018-02-21 01:00:00+00:00                       0.1043
2     2018-02-21 02:00:00+00:00                       0.1017
3     2018-02-21 03:00:00+00:00                       0.1034
4     2018-02-21 04:00:00+00:00                       0.0992
...                         ...                          ...
53226 2024-03-18 18:00:00+00:00                       0.3457
53227 2024-03-18 19:00:00+00:00                       0.1799
53228 2024-03-18 20:00:00+00:00                       0.1451
53229 2024-03-18 21:00:00+00:00                       0.1323
53230 2024-03-18 22:00:00+00:00                       0.1323

[53231 rows x 2 columns]


In [10]:
# save processed data to csv by year
years = range(2019,2024)
save_dir = os.path.join('processed_data','import_price_reg{0}'.format(pricing_metadata['code']))

if not os.path.exists(save_dir):
    os.makedirs(save_dir)

for year in years:
    year_data = in_prices[in_prices['datetime'].dt.year == year].copy()
    year_data['datetime'] = year_data['datetime'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
    year_data.to_csv(os.path.join(save_dir,f'{year}.csv'), index=False)

Repeat for outgoing data, and maybe make a list of average of the two to get a better estimate on true grid costs?