# Stream 02 — Pricing Analysis 

## 1. Imports & Load Dataset

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

df = pd.read_csv("ev_charging_patterns.csv")
df.head()

Unnamed: 0,User ID,Vehicle Model,Battery Capacity (kWh),Charging Station ID,Charging Station Location,Charging Start Time,Charging End Time,Energy Consumed (kWh),Charging Duration (hours),Charging Rate (kW),Charging Cost (USD),Time of Day,Day of Week,State of Charge (Start %),State of Charge (End %),Distance Driven (since last charge) (km),Temperature (°C),Vehicle Age (years),Charger Type,User Type
0,User_1,BMW i3,108.463007,Station_391,Houston,2024-01-01 00:00:00,2024-01-01 00:39:00,60.712346,0.591363,36.389181,13.087717,Evening,Tuesday,29.371576,86.119962,293.602111,27.947953,2.0,DC Fast Charger,Commuter
1,User_2,Hyundai Kona,100.0,Station_428,San Francisco,2024-01-01 01:00:00,2024-01-01 03:01:00,12.339275,3.133652,30.677735,21.128448,Morning,Monday,10.115778,84.664344,112.112804,14.311026,3.0,Level 1,Casual Driver
2,User_3,Chevy Bolt,75.0,Station_181,San Francisco,2024-01-01 02:00:00,2024-01-01 04:48:00,19.128876,2.452653,27.513593,35.66727,Morning,Thursday,6.854604,69.917615,71.799253,21.002002,2.0,Level 2,Commuter
3,User_4,Hyundai Kona,50.0,Station_327,Houston,2024-01-01 03:00:00,2024-01-01 06:42:00,79.457824,1.266431,32.88287,13.036239,Evening,Saturday,83.120003,99.624328,199.577785,38.316313,1.0,Level 1,Long-Distance Traveler
4,User_5,Hyundai Kona,50.0,Station_108,Los Angeles,2024-01-01 04:00:00,2024-01-01 05:46:00,19.629104,2.019765,10.215712,10.161471,Morning,Saturday,54.25895,63.743786,203.661847,-7.834199,1.0,Level 1,Long-Distance Traveler


## 2. Preprocessing & Feature Engineering

In [2]:
df['Charging Start Time']=pd.to_datetime(df['Charging Start Time'])
df['Charging End Time']=pd.to_datetime(df['Charging End Time'])
df['Session Duration (hours)']=(df['Charging End Time']-df['Charging Start Time']).dt.total_seconds()/3600
df['Hour']=df['Charging Start Time'].dt.hour
df['Session Date']=df['Charging Start Time'].dt.date.astype(str)
df['Price per kWh']=df['Charging Cost (USD)']/df['Energy Consumed (kWh)']
df['Efficiency (km per kWh)']=df['Distance Driven (since last charge) (km)']/df['Energy Consumed (kWh)']
df.replace([np.inf,-np.inf],np.nan,inplace=True)
df=df.dropna(subset=['Price per kWh'])
df.head()

Unnamed: 0,User ID,Vehicle Model,Battery Capacity (kWh),Charging Station ID,Charging Station Location,Charging Start Time,Charging End Time,Energy Consumed (kWh),Charging Duration (hours),Charging Rate (kW),...,Distance Driven (since last charge) (km),Temperature (°C),Vehicle Age (years),Charger Type,User Type,Session Duration (hours),Hour,Session Date,Price per kWh,Efficiency (km per kWh)
0,User_1,BMW i3,108.463007,Station_391,Houston,2024-01-01 00:00:00,2024-01-01 00:39:00,60.712346,0.591363,36.389181,...,293.602111,27.947953,2.0,DC Fast Charger,Commuter,0.65,0,2024-01-01,0.215569,4.835954
1,User_2,Hyundai Kona,100.0,Station_428,San Francisco,2024-01-01 01:00:00,2024-01-01 03:01:00,12.339275,3.133652,30.677735,...,112.112804,14.311026,3.0,Level 1,Casual Driver,2.016667,1,2024-01-01,1.712292,9.08585
2,User_3,Chevy Bolt,75.0,Station_181,San Francisco,2024-01-01 02:00:00,2024-01-01 04:48:00,19.128876,2.452653,27.513593,...,71.799253,21.002002,2.0,Level 2,Commuter,2.8,2,2024-01-01,1.864577,3.753449
3,User_4,Hyundai Kona,50.0,Station_327,Houston,2024-01-01 03:00:00,2024-01-01 06:42:00,79.457824,1.266431,32.88287,...,199.577785,38.316313,1.0,Level 1,Long-Distance Traveler,3.7,3,2024-01-01,0.164065,2.511745
4,User_5,Hyundai Kona,50.0,Station_108,Los Angeles,2024-01-01 04:00:00,2024-01-01 05:46:00,19.629104,2.019765,10.215712,...,203.661847,-7.834199,1.0,Level 1,Long-Distance Traveler,1.766667,4,2024-01-01,0.517674,10.375504


## 3. Pie Chart — Charger Type Distribution

In [3]:
px.pie(df, names='Charger Type', title='Charging Sessions by Charger Type', hole=0.3).show()

This pie chart shows the distribution of charging sessions across different charger types. Level 1 chargers are used the most, while Level 2 and DC Fast Chargers are used in almost equal proportions. The chart highlights how users rely on a mix of slow, medium, and fast charging options.

## 4. Avg Price by Time of Day

In [4]:
order=['Morning','Afternoon','Evening','Night']
df['Time of Day']=pd.Categorical(df['Time of Day'],categories=order,ordered=True)
avg=df.groupby('Time of Day')['Price per kWh'].mean().reset_index()
px.bar(avg,x='Time of Day',y='Price per kWh',title='Average Price by Time of Day').show()





This bar chart shows the average price per kWh across different times of the day. Prices remain fairly low during the morning, afternoon, and night, but there is a significant spike in the evening, indicating higher demand or peak-hour pricing. This pattern highlights when charging is most expensive for users.

## 5. Scatter — Price vs Hour (Charger Type)

In [5]:
px.scatter(df,x='Hour',y='Price per kWh',color='Charger Type',
                 title='Price vs Hour (Charger Type)').show()

This scatter plot shows how the price per kWh varies across different hours of the day, grouped by charger type. Most prices stay low throughout the day, but a few outliers appear where certain charger types have unusually high costs. The visualization helps reveal hourly pricing patterns and highlights which charger types experience spikes in price.

## 6. Scatter — Price vs Hour (City)

In [6]:
px.scatter(df,x='Hour',y='Price per kWh',color='Charging Station Location',
                 title='Price vs Hour (City)').show()

This scatter plot compares the price per kWh across different hours of the day for each city. Most cities show consistently low prices throughout the day, but a few cities have occasional spikes at certain hours. This helps identify which locations experience unusual or peak-time pricing.

## 7. Treemap — City → Charger Type → Station

In [7]:
px.treemap(df,path=['Charging Station Location','Charger Type','Charging Station ID'],
           values='Charging Cost (USD)',color='Price per kWh',
           title='Treemap: Cost & Pricing Structure').show()

This treemap visualizes the cost and pricing structure across cities, charger types, and individual charging stations.

Each top-level block represents a city.

Within each city, the spaces are divided by charger type (Level 1, Level 2, DC Fast Charger).

Each small block inside represents a charging station, sized by total charging cost and colored by price per kWh.

Darker or warmer colors indicate higher prices, helping us quickly spot which cities or charger types tend to be more expensive. This view makes it easy to compare pricing patterns and cost distribution across different locations and charger categories.

## 8. Dynamic Pricing Animation

In [8]:
subset=['Los Angeles','San Francisco','New York','Houston','Chicago']
dfA=df[df['Charging Station Location'].isin(subset)]
fig=px.scatter(dfA,x='Charging Start Time',y='Price per kWh',
               animation_frame='Session Date',color='Vehicle Model',
               title='Dynamic Pricing Over Time')
fig.update_yaxes(range=[0,df['Price per kWh'].quantile(0.99)])
fig.show()

This animated scatter plot shows how the price per kWh changes over time across selected cities. Each point represents a charging session, with colors indicating different vehicle models. As the animation progresses by date, you can observe how pricing fluctuates throughout the day and across different vehicles. This helps reveal trends such as peak-time price increases, city-specific variations, and differences in charging behavior across vehicle models.

## 9. Charging Speed vs Price (Stable Trendline)

In [9]:
clean=df[['Charging Rate (kW)','Price per kWh']].replace([np.inf,-np.inf],np.nan).dropna()
X=clean[['Charging Rate (kW)']]
y=clean['Price per kWh']
lr=LinearRegression().fit(X,y)
fig=px.scatter(df,x='Charging Rate (kW)',y='Price per kWh',color='Charger Type',
               title='Charging Speed vs Price (Trendline)')
fig.add_trace(go.Scatter(x=clean['Charging Rate (kW)'],y=lr.predict(X),
                         mode='lines',name='Trendline'))
fig.show()

This scatter plot compares charging rate (kW) with the price per kWh, highlighted by charger type. Most points cluster at lower price levels, but a few outliers show unusually high pricing. The added trendline helps reveal the overall relationship: as charging speed increases, price per kWh does not significantly rise, indicating weak or no direct correlation. This visualization helps identify whether faster chargers are consistently more expensive or if pricing is influenced more by other factors.

## 10.Price Prediction Model

In [None]:

import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor


In [None]:

# Prepare data for modeling
df = df.rename(columns={
    'Charging Station Location': 'City',
    'Charging Rate (kW)': 'Charging Rate (kW)',
    'Price per kWh': 'Price per kWh',
    'Charging Start Time': 'Charging Start Time',
})

# Extract hour if not already present
if 'Hour' not in df.columns:
    df['Charging Start Time'] = pd.to_datetime(df['Charging Start Time'])
    df['Hour'] = df['Charging Start Time'].dt.hour

# Drop rows with missing or infinite values for model columns
model_cols = ['City', 'Charger Type', 'Time of Day', 'Hour', 'Charging Rate (kW)', 'Price per kWh']
df_model = df[model_cols].replace([np.inf, -np.inf], np.nan).dropna()


This code prepares the dataset for modeling by renaming key columns, extracting the hour from the charging start time, and selecting only the features needed for prediction. It also removes any missing or invalid values to ensure the model trains on clean and reliable data.

## 10.1 Train a Pricing Prediction Model

In [None]:
# Train price prediction model

X = df_model[['City', 'Charger Type', 'Time of Day', 'Hour', 'Charging Rate (kW)']]
y = df_model['Price per kWh']

categorical_features = ['City', 'Charger Type', 'Time of Day']
numeric_features = ['Hour', 'Charging Rate (kW)']

preprocess = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features),
        ('num', 'passthrough', numeric_features)
    ]
)

model = Pipeline(steps=[
    ('preprocess', preprocess),
    ('regressor', RandomForestRegressor(
        n_estimators=200,
        random_state=42,
        n_jobs=-1
    ))
])

model.fit(X, y)
print("Model trained on", len(df_model), "rows.")


Model trained on 1191 rows.


This code trains a price prediction model using features such as city, charger type, time of day, hour, and charging rate. Categorical features are converted into numerical form using OneHotEncoding, while numeric features pass through directly. A Random Forest Regressor is then used to learn pricing patterns. Finally, the model is fitted on 1,191 clean rows of data.

In [None]:
#  Helper function for prediction

def predict_price(city, charger_type, time_of_day, hour, rate_kw):
    """Return predicted price per kWh for a single scenario."""
    sample = pd.DataFrame([{
        'City': city,
        'Charger Type': charger_type,
        'Time of Day': time_of_day,
        'Hour': hour,
        'Charging Rate (kW)': rate_kw
    }])
    return float(model.predict(sample)[0])


This helper function generates a single price prediction for any user-defined scenario. It creates a one-row DataFrame containing the selected city, charger type, time of day, hour, and charging rate, and then passes it to the trained model. The function returns the predicted price per kWh as a simple numeric value.

## 10.2 Dynamic Pricing Predictor (What-If Heatmap + Simple Text Output)


In [None]:
#  Quick “what-if” calculator (text)
# Simple what-if example (edit values and re-run)

city_input = 'Los Angeles'
charger_input = 'DC Fast Charger'
time_of_day_input = 'Evening'
hour_input = 19
rate_input = 50  # kW

predicted_price = predict_price(city_input, charger_input, time_of_day_input, hour_input, rate_input)

# If you have Energy Consumed column, estimate total cost
if 'Energy Consumed (kWh)' in df.columns:
    typical_kwh = df['Energy Consumed (kWh)'].median()
else:
    typical_kwh = 40  # fallback assumption

estimated_cost = predicted_price * typical_kwh

print(f"City: {city_input}")
print(f"Charger: {charger_input}")
print(f"Time of Day: {time_of_day_input} at {hour_input}:00")
print(f"Charging Rate: {rate_input} kW")
print(f"Predicted price per kWh: £{predicted_price:0.2f}")
print(f"Estimated session cost ({typical_kwh:.0f} kWh): £{estimated_cost:0.2f}")


City: Los Angeles
Charger: DC Fast Charger
Time of Day: Evening at 19:00
Charging Rate: 50 kW
Predicted price per kWh: £0.96
Estimated session cost (43 kWh): £41.03


This section creates a simple what-if pricing calculator. The user enters a city, charger type, time of day, hour, and charging rate. The model then predicts the price per kWh for that scenario. If the dataset contains energy consumption values, it uses the median to estimate a typical session size; otherwise, it assumes 40 kWh. Finally, it calculates and prints the estimated total session cost, allowing users to understand how changing charging conditions affects price.

## 10.3 Interactive heatmap – Price vs Hour & Charger Type (per city)

In [None]:
# Dynamic Pricing Heatmap with city dropdown

cities = sorted(df_model['City'].unique())
charger_types = sorted(df_model['Charger Type'].unique())
hours = list(range(24))
time_of_day = 'Evening'  # fixed for now; you can loop or add more dropdowns if you like

figs_data = {}

for city in cities:
    grid = []
    for charger in charger_types:
        row = []
        for h in hours:
            row.append(predict_price(city, charger, time_of_day, h, 
                                     df_model['Charging Rate (kW)'].median()))
        grid.append(row)
    figs_data[city] = np.array(grid)

# Build initial heatmap for first city
initial_city = cities[0]
heat = go.Heatmap(
    z=figs_data[initial_city],
    x=hours,
    y=charger_types,
    coloraxis="coloraxis"
)

fig = go.Figure(data=[heat])

# Add dropdown to switch city
buttons = []
for city in cities:
    buttons.append(
        dict(
            label=city,
            method="update",
            args=[
                {"z": [figs_data[city]]},  # update heatmap values
                {"title": f"Predicted Price per kWh – {city} ({time_of_day})"}
            ]
        )
    )

fig.update_layout(
    title=f"Predicted Price per kWh – {initial_city} ({time_of_day})",
    xaxis_title="Hour of Day",
    yaxis_title="Charger Type",
    coloraxis_colorbar_title="£/kWh",
    updatemenus=[dict(
        type="dropdown",
        x=1.15,
        y=1.0,
        showactive=True,
        buttons=buttons
    )]
)

fig.show()


This code creates an interactive heatmap that shows the predicted price per kWh for every hour of the day and for each charger type, based on the selected city. For each city, a grid of predicted prices is generated using the trained model. A dropdown menu allows users to switch between cities, instantly updating the heatmap. The visualization helps compare charging cost patterns across cities and identify peak-price hours for different charger types.

## 10.4 Cost Savings Simulator (Current vs Optimised Scenario)

In [None]:
#  Cost Savings Simulator

# === USER INPUTS – edit these ===
city = 'Los Angeles'
current_charger = 'DC Fast Charger'
optimal_charger = 'Level 2'
current_time_of_day = 'Evening'
optimal_time_of_day = 'Night'
current_hour = 19
optimal_hour = 22
rate_kw_current = 50
rate_kw_optimal = 22
session_kwh = 40  # kWh per session
sessions_per_month = 20
# ================================

# Predict prices
price_current = predict_price(city, current_charger, current_time_of_day, current_hour, rate_kw_current)
price_optimal = predict_price(city, optimal_charger, optimal_time_of_day, optimal_hour, rate_kw_optimal)

monthly_cost_current = price_current * session_kwh * sessions_per_month
monthly_cost_optimal = price_optimal * session_kwh * sessions_per_month
monthly_saving = monthly_cost_current - monthly_cost_optimal

print(f"Current monthly cost: £{monthly_cost_current:0.2f}")
print(f"Optimised monthly cost: £{monthly_cost_optimal:0.2f}")
print(f"Estimated monthly saving: £{monthly_saving:0.2f}")

# Bar chart comparison
sim_df = pd.DataFrame({
    'Scenario': ['Current behaviour', 'Optimised behaviour'],
    'Monthly Cost (£)': [monthly_cost_current, monthly_cost_optimal]
})

fig_sim = px.bar(
    sim_df,
    x='Scenario',
    y='Monthly Cost (£)',
    title=f"Cost Savings Simulator – {city}",
    text=[f"£{monthly_cost_current:0.0f}", f"£{monthly_cost_optimal:0.0f}"]
)
fig_sim.update_traces(textposition='outside')
fig_sim.update_layout(yaxis_title="Monthly Cost (£)", uniformtext_minsize=12, uniformtext_mode='show')
fig_sim.show()


Current monthly cost: £768.78
Optimised monthly cost: £1571.99
Estimated monthly saving: £-803.21


This code compares the user's current charging behaviour with a more optimized charging scenario. It predicts the price per kWh for each setup using the model, calculates the monthly cost based on charging frequency, and then computes the savings (or extra cost). The bar chart visually shows how monthly expenses change when switching charger type, charging time, or charging speed, allowing users to quickly identify the most cost-efficient charging strategy.

# 10.5 Predictive Hotspot Map / City Demand View

In [None]:
# Predictive city + charger hotspots

future_hours = [8, 12, 18, 22]  # typical hours tomorrow
time_of_day_map = {8: 'Morning', 12: 'Afternoon', 18: 'Evening', 22: 'Night'}

rows = []
for city in cities:
    for charger in charger_types:
        for h in future_hours:
            tod = time_of_day_map[h]
            price_pred = predict_price(
                city, charger, tod, h, df_model['Charging Rate (kW)'].median()
            )
            rows.append({
                'City': city,
                'Charger Type': charger,
                'Hour': h,
                'Time of Day': tod,
                'Predicted Price per kWh': price_pred
            })

df_hotspot = pd.DataFrame(rows)


This code generates a predictive hotspot dataset by forecasting the expected price per kWh for each city, charger type, and key hour of the next day (morning, afternoon, evening, night). For every combination, it uses the trained model to predict pricing and stores the results in a new DataFrame. This dataset is then used to create visuals that highlight future high-cost locations and charger types.

# 10.6 Interactive “hotspot” treemap (by city → charger type)

In [20]:
# CELL 9 – Predictive Hotspot Treemap

fig_hot = px.treemap(
    df_hotspot,
    path=['City', 'Charger Type', 'Time of Day'],
    values='Predicted Price per kWh',
    color='Predicted Price per kWh',
    color_continuous_scale='Turbo',
    title="Predicted Pricing Hotspots – Next Day (by City, Charger Type, Time of Day)"
)

fig_hot.update_layout(margin=dict(t=60, l=0, r=0, b=0))
fig_hot.show()


This treemap visualizes predicted charging prices for the next day across cities, charger types, and times of day. Each block’s size represents the predicted price per kWh, while its color indicates how high or low that price is. Brighter, warmer colors highlight expensive hotspots, helping users quickly identify where and when charging is expected to cost more. This visualization is useful for planning charging strategy, avoiding peak-price periods, and comparing pricing patterns across different cities.

## Overall Insights

Across all visuals and predictive models, the key themes emerge:

✔ Evening hours consistently show the highest pricing.     
✔ Charger types have balanced usage but very different pricing behaviors.      
✔ Cities display distinct pricing patterns and occasional anomalies.              
✔ Charging speed alone does not drive price—location and time matter more.                 
✔ Predictive models enable powerful planning, optimization, and cost-saving insights.