# 📊 Mexico Electricity Demand Analysis (2023–2025)

**Author:** Diego Ramírez  
**Date:** July 2025  
**Description:**  
This notebook explores hourly electricity demand patterns in Mexico using public data. We analyze temporal trends, visualize seasonality, and extract actionable insights useful for forecasting and grid planning.

**Tools:** pandas, matplotlib, seaborn, numpy  
**Data:** SVs downloaded from CENACE: https://www.cenace.gob.mx/Paginas/SIM/Reportes/EstimacionDemandaReal.aspx

## 0. ⚙️ Imports

In [13]:
import os
import glob

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.notebook import tqdm

## 1. 📥 Data Loading

We begin by loading the full dataset from CSV files, parsing date columns, and checking the overall shape and structure.

In [9]:
# Files have 8 rows at the top that we must skip, since the dataset starts at row 9.

df_sample = pd.read_csv("./data/raw_data/Demanda Real Balance_0_v3 Dia Operacion 2023-01-01 v2023 01 15_12 25 01.csv", skiprows=8)
df_sample.head()

Unnamed: 0,Sistema,Area,Hora,Generacion (MWh),Importacion Total (MWh),Exportacion Total (MWh),Intercambio neto entre Gerencias (MWh),Estimacion de Demanda por Balance (MWh)
0,BCA,BCA,1,1036.03252,22.10174,20.97918,---,1037.15508
1,BCA,BCA,2,1027.37909,43.79153,44.91889,---,1026.25173
2,BCA,BCA,3,1042.72685,51.27296,46.05216,---,1047.94764
3,BCA,BCA,4,1022.79761,55.63775,55.51087,---,1022.92449
4,BCA,BCA,5,1001.48521,65.39402,70.70259,---,996.17664


In [12]:
df_sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 216 entries, 0 to 215
Data columns (total 8 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   Sistema                                    216 non-null    object 
 1    Area                                      216 non-null    object 
 2    Hora                                      216 non-null    int64  
 3    Generacion (MWh)                          216 non-null    float64
 4    Importacion Total (MWh)                   216 non-null    float64
 5    Exportacion Total (MWh)                   216 non-null    float64
 6    Intercambio neto entre Gerencias (MWh)    216 non-null    object 
 7    Estimacion de Demanda por Balance (MWh)   216 non-null    float64
dtypes: float64(4), int64(1), object(3)
memory usage: 13.6+ KB


Data is clean and structured, but there are some initial concerns to consider for when the full data is consolidated into one big dataset:
* No date column, will probably have to add it via the file name.
* Change "---" to np.nan in Intercambio neto entre Gerencias

In [14]:
DATA_FOLDER = "./data/raw_data/"

csv_files = glob.glob(os.path.join(DATA_FOLDER, "*.csv"))
print(f"Found {len(csv_files)} CSV files")

Found 3548 CSV files


In [15]:
df_sample = pd.read_csv(csv_files[0], skiprows=8)
df_sample.head()

Unnamed: 0,Sistema,Area,Hora,Generacion (MWh),Importacion Total (MWh),Exportacion Total (MWh),Intercambio neto entre Gerencias (MWh),Estimacion de Demanda por Balance (MWh)
0,BCA,BCA,1,1036.03252,22.10174,20.97918,---,1037.15508
1,BCA,BCA,2,1027.37909,43.79153,44.91889,---,1026.25173
2,BCA,BCA,3,1042.72685,51.27296,46.05216,---,1047.94764
3,BCA,BCA,4,1022.79761,55.63775,55.51087,---,1022.92449
4,BCA,BCA,5,1001.48521,65.39402,70.70259,---,996.17664


In [16]:
def add_file_name(path: str) -> pd.DataFrame:
    # Reads the csv file from the path and adds a coulumn with the file name.
    df = pd.read_csv(path, skiprows=8)
    df["file_name"] = path

    return df

all_dataframes = []

for file in tqdm(csv_files, desc="Reading CSVs"):
    try:
        df = add_file_name(file)
        all_dataframes.append(df)
    except Exception as e:
        print(f"Failed to read {file}: {e}")

Reading CSVs:   0%|          | 0/3548 [00:00<?, ?it/s]

In [19]:
full_df = pd.concat(all_dataframes, ignore_index=True)
full_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 766364 entries, 0 to 766363
Data columns (total 9 columns):
 #   Column                                     Non-Null Count   Dtype  
---  ------                                     --------------   -----  
 0   Sistema                                    766364 non-null  object 
 1    Area                                      766364 non-null  object 
 2    Hora                                      766364 non-null  int64  
 3    Generacion (MWh)                          766364 non-null  float64
 4    Importacion Total (MWh)                   766364 non-null  float64
 5    Exportacion Total (MWh)                   766364 non-null  float64
 6    Intercambio neto entre Gerencias (MWh)    766364 non-null  object 
 7    Estimacion de Demanda por Balance (MWh)   766364 non-null  float64
 8   file_name                                  766364 non-null  object 
dtypes: float64(4), int64(1), object(4)
memory usage: 52.6+ MB


In [23]:
full_df.columns

Index(['Sistema', ' Area', ' Hora', ' Generacion (MWh)',
       ' Importacion Total (MWh)', ' Exportacion Total (MWh)',
       ' Intercambio neto entre Gerencias (MWh)',
       ' Estimacion de Demanda por Balance (MWh) ', 'file_name'],
      dtype='object')

In [24]:
full_df.head()

Unnamed: 0,Sistema,Area,Hora,Generacion (MWh),Importacion Total (MWh),Exportacion Total (MWh),Intercambio neto entre Gerencias (MWh),Estimacion de Demanda por Balance (MWh),file_name
0,BCA,BCA,1,1036.03252,22.10174,20.97918,---,1037.15508,./data/raw_data\Demanda Real Balance_0_v3 Dia ...
1,BCA,BCA,2,1027.37909,43.79153,44.91889,---,1026.25173,./data/raw_data\Demanda Real Balance_0_v3 Dia ...
2,BCA,BCA,3,1042.72685,51.27296,46.05216,---,1047.94764,./data/raw_data\Demanda Real Balance_0_v3 Dia ...
3,BCA,BCA,4,1022.79761,55.63775,55.51087,---,1022.92449,./data/raw_data\Demanda Real Balance_0_v3 Dia ...
4,BCA,BCA,5,1001.48521,65.39402,70.70259,---,996.17664,./data/raw_data\Demanda Real Balance_0_v3 Dia ...


In [None]:
full_df.to_csv("./data/consolidated_data", index=False)

## 2. 🧹 Data Cleaning

This section handles:
- Missing values
- Incorrect data types
- Duplicates (if any)
- Parsing datetime columns correctly

We'll also inspect and clean column names for consistency.

## 3. 🧮 Feature Engineering

We'll create new columns to support analysis:
- Hour, Day, Month, Weekday from timestamps
- Weekend vs Weekday
- Peak vs Off-Peak labeling (optional)

This makes it easier to explore time-based demand patterns.

## 4. 📈 Exploratory Data Analysis (EDA)

This is the core of our notebook, where we visualize and analyze:

### 4.1. ⚡ Overall Demand Over Time
- Total or average demand by hour/day
- Line plot of demand trends

### 4.2. 🕓 Hourly Patterns
- Mean demand by hour of the day
- Compare across months or seasons

### 4.3. 📆 Weekly & Monthly Trends
- Boxplots by day of week
- Monthly demand averages

### 4.4. 🌡️ Seasonal Patterns
- Compare summer vs winter demand profiles
- Heatmaps: hour vs day, hour vs month

## 5. 🔍 Key Insights

We summarize the most important insights from the EDA section:

- Demand peaks consistently at 7–9 PM, especially in summer.
- Weekends show significantly lower demand than weekdays (~X%).
- Winter shows a morning demand spike not seen in other seasons.
- [Other interesting insight here]

## 6. 📌 Conclusions & Next Steps

This exploratory analysis revealed consistent temporal demand patterns in Mexico's grid that may support better forecasting and operations planning.

### Next steps:
- Break down demand by geographic region (if data available)
- Correlate demand with temperature or weather patterns
- Build a simple forecasting model using Prophet or scikit-learn

## 🧠 Appendix

Any extra functions, alternative visualizations, failed experiments, or detailed technical notes can go here.

This keeps the main notebook clean and reader-friendly.