#  Spatio-Temporal Prediction and Coordination of EV Charging Demand for Power System Resilience

## Research Objectives

Recent studies have explored electric vehicles (EVs) from different perspectives, ranging from estimating vehicle range based on battery capacity, model specifications, and internal components (Ahmed et al., 2022) to forecasting charging behavior using machine learning methods such as Random Forest and SVM with factors like previous payment data, weather, and traffic (Shahriar et al., 2020). In parallel, research on smart cities has focused on managing traffic flow efficiently to reduce congestion and energy consumption (Dymora, Mazurek, & Jucha, 2024).

Building on these insights, this study links traffic dynamics with EV energy consumption to better predict when and where charging demand will arise. By integrating spatio-temporal traffic features with deep learning models, the goal is to anticipate EV charging needs in real time and enable coordinated charging strategies that support overall power system resilience.


## Load Required Libraries 

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## Load and Clean the Data 

In [2]:
df = pd.read_csv("cleaned_traffic_data.csv")

## How the data looks directly from PEMS

In [3]:
df.head()

Unnamed: 0,Timestamp,Station,District,Route,Direction of Travel,Lane Type,Station Length,Samples,% Observed,Total Flow,...,Lane 5 Avg Speed,Lane 6 Flow,Lane 6 Avg Occ,Lane 6 Avg Speed,Lane 7 Flow,Lane 7 Avg Occ,Lane 7 Avg Speed,Lane 8 Flow,Lane 8 Avg Occ,Lane 8 Avg Speed
0,10/01/2024 00:00:00,308512,3,50,W,ML,3.995,197,0,497.0,...,,,,,,,,,,
1,10/01/2024 00:00:00,311831,3,5,S,OR,,101,92,27.0,...,,,,,,,,,,
2,10/01/2024 00:00:00,311832,3,5,S,FR,,101,92,78.0,...,,,,,,,,,,
3,10/01/2024 00:00:00,311844,3,5,N,OR,,202,92,43.0,...,,,,,,,,,,
4,10/01/2024 00:00:00,311847,3,5,N,OR,,303,92,73.0,...,,,,,,,,,,


### We ignore and remove features that contain only NAN values, and maintain the other features.

In [4]:
# Define the final selected columns
selected_columns = [
    "Timestamp", "Station", "Route", "Direction of Travel",
    "Total Flow", "Avg Speed", "% Observed","Samples","Lane Type"
]

# Keep only the selected columns
df = df[selected_columns]

In [5]:
df

Unnamed: 0,Timestamp,Station,Route,Direction of Travel,Total Flow,Avg Speed,% Observed,Samples,Lane Type
0,10/01/2024 00:00:00,308512,50,W,497.0,64.1,0,197,ML
1,10/01/2024 00:00:00,311831,5,S,27.0,,92,101,OR
2,10/01/2024 00:00:00,311832,5,S,78.0,,92,101,FR
3,10/01/2024 00:00:00,311844,5,N,43.0,,92,202,OR
4,10/01/2024 00:00:00,311847,5,N,73.0,,92,303,OR
...,...,...,...,...,...,...,...,...,...
4114675,12/31/2024 23:00:00,3423094,99,S,68.0,64.8,96,118,ML
4114676,12/31/2024 23:00:00,3900021,50,E,803.0,66.5,67,292,ML
4114677,12/31/2024 23:00:00,3900022,50,E,509.0,68.0,0,0,HV
4114678,12/31/2024 23:00:00,3900023,50,W,881.0,67.4,67,289,ML


## Check the Percent of Missing Data in every feature 

In [8]:
pd.set_option('display.float_format', '{:.4f}'.format)

missing_percent = (df.isna().sum() / len(df)) * 100
print(missing_percent)


Timestamp              0.0000
Station                0.0000
Route                  0.0000
Direction of Travel    0.0000
Total Flow             7.3827
Avg Speed             38.4621
% Observed             0.0000
Samples                0.0000
Lane Type              0.0000
dtype: float64


## Imputation Strategy for Key Traffic Variables

We decided to retain both the Average Speed and Total Flow features instead of dropping them because they are core variables that capture the essence of traffic dynamics. Average Speed reflects congestion levels and driving conditions, while Total Flow represents the number of vehicles passing a station—both directly influencing how traffic impacts EV range and, ultimately, charging demand. Dropping them would mean ignoring the very behaviors that determine how energy is consumed on the road. Even though these features had missing values, the patterns in traffic data are strongly structured in time and space, making them ideal candidates for informed imputation rather than removal.

For Average Speed, we applied a two-step temporal–spatial imputation strategy. First, we used forward and backward filling within each station to maintain continuity and preserve the natural hourly flow of traffic data. This approach works well because traffic speed rarely changes abruptly from one hour to the next unless influenced by an external event.

For Total Flow, the missingness was much lower, so a simpler approach was sufficient. We performed linear interpolation within each station to fill in small hourly gaps, ensuring that flow values remained smooth and representative of actual traffic movement. These imputation steps allowed us to preserve critical information about how vehicles move through the network without introducing artificial noise or bias. By reconstructing rather than discarding incomplete data, we maintained the integrity of the dataset and strengthened the foundation for accurate spatio-temporal modeling of EV charging demand and range prediction.

In [9]:
df.sort_values(['Station', 'Timestamp'], inplace=True)
df['Avg Speed'] = df.groupby('Station')['Avg Speed'].ffill().bfill()

In [10]:
df['Total Flow'] = df.groupby('Station')['Total Flow'].transform(
    lambda x: x.interpolate(method='linear')
)

## How the data Looks Like Now 

In [11]:
df.head()

Unnamed: 0,Timestamp,Station,Route,Direction of Travel,Total Flow,Avg Speed,% Observed,Samples,Lane Type
1827,10/01/2024 01:00:00,308511,50,E,12.0,67.5,100,202,ML
3688,10/01/2024 02:00:00,308511,50,E,12.0,67.0,100,197,ML
5549,10/01/2024 03:00:00,308511,50,E,20.0,66.3,92,197,ML
7410,10/01/2024 04:00:00,308511,50,E,55.0,67.4,100,197,ML
9271,10/01/2024 05:00:00,308511,50,E,228.0,66.1,83,168,ML
