# Introduction

---

Exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. Here, we will perform EDA on the dataset to understand the data and its distribution.

Specifically, we will cover the following topics:

1. Data from CAISO DA market: Electricity price of the day-ahead market in California's Fremont node.
2. Analysis of the data: We will analyze the data to understand the distribution of the electricity price and will also look at the trend of the electricity price over time (time series analysis).
3. Conclusion: We will summarize our findings and conclude the analysis and also provide some insights and inspiration for future work.

In [1]:
import pandas as pd
import numpy as np
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

## Load data

Let's start by loading the data and understanding its structure.

In [2]:
df_Jan = pd.read_csv('../data/20230101_20230201_PRC_LMP_DAM_20240717_11_47_11_v12.csv')
df_Feb = pd.read_csv('../data/20230201_20230301_PRC_LMP_DAM_20240717_11_51_08_v12.csv')
df_Mar = pd.read_csv('../data/20230301_20230401_PRC_LMP_DAM_20240717_11_52_59_v12.csv')
df_Apr = pd.read_csv('../data/20230401_20230501_PRC_LMP_DAM_20240717_12_34_03_v12.csv')
df_May = pd.read_csv('../data/20230501_20230601_PRC_LMP_DAM_20240717_12_36_04_v12.csv')
df_Jun = pd.read_csv('../data/20230601_20230701_PRC_LMP_DAM_20240717_12_40_16_v12.csv')
df_Jul = pd.read_csv('../data/20230701_20230801_PRC_LMP_DAM_20240719_16_07_44_v12.csv')
df_Aug = pd.read_csv('../data/20230801_20230831_PRC_LMP_DAM_20240717_12_43_44_v12.csv')
df_Sep = pd.read_csv('../data/20230901_20231001_PRC_LMP_DAM_20240717_12_45_13_v12.csv')
df_Oct = pd.read_csv('../data/20231001_20231101_PRC_LMP_DAM_20240717_12_47_07_v12.csv')
df_Nov = pd.read_csv('../data/20231101_20231201_PRC_LMP_DAM_20240717_12_48_48_v12.csv')
df_Dec = pd.read_csv('../data/20231201_20240101_PRC_LMP_DAM_20240717_12_50_22_v12.csv')

### Put all the data together

The data is split into multiple files. We will load all the data and put it together to form a single dataset.

In [3]:
df_2023 = pd.concat(
    [
        df_Jan,
        df_Feb,
        df_Mar,
        df_Apr,
        df_May,
        df_Jun,
        df_Jul,
        df_Aug,
        df_Sep,
        df_Oct,
        df_Nov,
        df_Dec,
    ],
)

### Organize the data

The data frame to be organized and indexed by the date and time.

In [4]:
df_2023['datetime'] = pd.to_datetime(df_2023['OPR_DT']) + pd.to_timedelta(df_2023['OPR_HR'] - 1, unit='h')
df_2023 = df_2023.drop(columns=['OPR_DT', 'OPR_HR'])
df_2023.set_index(['LMP_TYPE', 'datetime'], inplace=True)
df_2023.sort_index(inplace=True)

# Plot data

---

We will plot the data to understand the distribution of the electricity price and also to see the trend of the electricity price over time.

In [5]:
figure = go.Figure()
figure.add_trace(
    go.Scatter(
        x=df_2023.index.get_level_values('datetime'),
        y=df_2023.loc['LMP', 'MW'],
        mode='lines',
        name='LMP',
        line=dict(color='blue'),
    ),
)
figure.update_layout(
    title="2023 wholesale electricity prices in CAISO, Fremont node",
    xaxis_title="Date",
    yaxis_title="Location Marginal Price ($/MWh)",
    showlegend=True,
)
# Slanted x-axis labels
figure.update_layout(xaxis_tickangle=-45)
figure.show()