# NESO Regional Carbon Intensity Data Transformation

### Notebook Objectives

### Primary Goals
- **Transform raw NESO regional carbon intensity data** into clean, analysis-ready format
- **Standardize datetime handling** across all energy datasets
- **Create consistent regional mapping** for geographic analysis
- **Generate summary statistics** and data quality reports for carbon intensity trends

### User Stories
> **As a data analyst**, I want clear documentation and explanations for each NESO dataset we extract so that I and other team members can understand the source, structure, meaning and caveats of the data without digging into code.

> **As a climate researcher**, I want standardized regional carbon intensity data so that I can easily identify the cleanest and dirtiest electricity by time and location across Great Britain.

> **As an energy consumer**, I want reliable carbon intensity transformation pipelines so that I can trust insights about when and where to use electricity most sustainably.

### About This Dataset

### Source: NESO Regional Carbon Intensity
- **Update Frequency**: Daily
- **Time Resolution**: 30-minute intervals
- **Attribution**: "Supported by National Energy SO Open Data"

### Stage 1: Environment Setup

In [7]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime

### Stage 2: Load raw data

In [8]:
# Load the raw regional carbon intensity data
df = pd.read_csv('../data/raw/regional_carbon_intensity.csv')

# Check the row and column count and preview the first 4 rows
print(f"Loaded regional carbon intensity data: {df.shape[0]:,} rows × {df.shape[1]} columns")
df.head()

Loaded regional carbon intensity data: 119,629 rows × 15 columns


Unnamed: 0,datetime,North Scotland,South Scotland,North West England,North East England,Yorkshire,North Wales and Merseyside,South Wales,West Midlands,East Midlands,East England,South West England,South England,London,South East England
0,2018-09-17T23:00:00,30.0,11.0,38.0,24.0,319.0,157.0,271.0,40.0,325.0,58.0,72.0,116.0,49.0,73.0
1,2018-09-17T23:30:00,44.0,7.0,41.0,30.0,298.0,183.0,157.0,39.0,340.0,61.0,90.0,152.0,69.0,70.0
2,2018-09-18T00:00:00,44.0,8.0,39.0,30.0,295.0,180.0,155.0,39.0,337.0,60.0,89.0,152.0,68.0,70.0
3,2018-09-18T00:30:00,44.0,11.0,37.0,31.0,288.0,175.0,151.0,38.0,330.0,58.0,87.0,150.0,69.0,70.0
4,2018-09-18T01:00:00,45.0,13.0,34.0,32.0,278.0,171.0,148.0,37.0,326.0,57.0,85.0,151.0,70.0,69.0


### Stage 3: Custom Function to create Data Dictionary 
I am using a reusable function to show the before and after alterations to the Neso Dataset

In [9]:
# Custom Function to create a comprehensive data dictionary for NESO Regional Carbon Intensity dataset
# Takes a DataFrame and returns a data dictionary with NESO-specific column descriptions
def create_data_dictionary(df):
    # Official descriptions from NESO Regional Carbon Intensity dataset
    # Source: https://www.neso.energy/data-portal/regional-carbon-intensity-forecast
    descriptions = {
        'datetime': 'Timestamp of record, given in UTC (Coordinated Universal Time)',
        'North Scotland': 'North Scotland carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'South Scotland': 'South Scotland carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'North West England': 'North West England carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'North East England': 'North East England carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'Yorkshire': 'Yorkshire carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'North Wales and Merseyside': 'North Wales and Merseyside carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'South Wales': 'South Wales carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'West Midlands': 'West Midlands carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'East Midlands': 'East Midlands carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'East England': 'East England carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'South West England': 'South West England carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'South England': 'South England carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'London': 'London carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)',
        'South East England': 'South East England carbon intensity forecast, predicted using machine learning models and metered generation (gCO2/kWh)'
    }
    
    dictionary_data = []
    for column in df.columns:
        # Get 3 sample values (non-null)
        sample_values = df[column].dropna().head(3).tolist()
        sample_str = ', '.join([str(x) for x in sample_values])
        
        # Truncate very long sample strings (for generation_mix JSON data)
        if len(sample_str) > 100:
            sample_str = sample_str[:100] + "..."
        
        dictionary_data.append({
            'Column': column,
            'Data Type': str(df[column].dtype),
            'Missing Values': df[column].isnull().sum(),
            'Missing %': round((df[column].isnull().sum() / len(df)) * 100, 2),
            'Unique Values': df[column].nunique(),
            'Sample Values': sample_str,
            'Description': descriptions.get(column, 'Additional column - description needed (may be new regional metric)')
        })
    return pd.DataFrame(dictionary_data)

# Store the dictionary in a variable
raw_data_dictionary = create_data_dictionary(df)

# Display data dictionary
print("NESO Regional Carbon Intensity Dataset - Data Dictionary")
raw_data_dictionary

NESO Regional Carbon Intensity Dataset - Data Dictionary


Unnamed: 0,Column,Data Type,Missing Values,Missing %,Unique Values,Sample Values,Description
0,datetime,object,0,0.0,119629,"2018-09-17T23:00:00, 2018-09-17T23:30:00, 2018...","Timestamp of record, given in UTC (Coordinated..."
1,North Scotland,float64,1311,1.1,382,"30.0, 44.0, 44.0","North Scotland carbon intensity forecast, pred..."
2,South Scotland,float64,1311,1.1,241,"11.0, 7.0, 8.0","South Scotland carbon intensity forecast, pred..."
3,North West England,float64,1311,1.1,289,"38.0, 41.0, 39.0","North West England carbon intensity forecast, ..."
4,North East England,float64,1311,1.1,262,"24.0, 30.0, 30.0","North East England carbon intensity forecast, ..."
5,Yorkshire,float64,1311,1.1,461,"319.0, 298.0, 295.0","Yorkshire carbon intensity forecast, predicted..."
6,North Wales and Merseyside,float64,1311,1.1,670,"157.0, 183.0, 180.0",North Wales and Merseyside carbon intensity fo...
7,South Wales,float64,1311,1.1,641,"271.0, 157.0, 155.0","South Wales carbon intensity forecast, predict..."
8,West Midlands,float64,1311,1.1,526,"40.0, 39.0, 39.0","West Midlands carbon intensity forecast, predi..."
9,East Midlands,float64,1311,1.1,749,"325.0, 340.0, 337.0","East Midlands carbon intensity forecast, predi..."
