# EV Charging Analysis - Feature Engineering & SA Market Translation

**Author:** Luqmaan
**Date:** December 2025
**Purpose:** Calculate efficiency metrics and translate US charging data to SA market context

---

## Project Context
This notebook takes cleaned EV charging data from US cities and:
1. Calculates efficiency and cost metrics
2. Translates costs to South African context (GridCars rates, Eskom rates)
3. Creates comparison scenarios for BYD Dolphin Surf
4. Compares EV costs to petrol equivalent

**Key SA Market Data (Dec 2025):**
- DC Fast Charging (GridCars): R7.35/kWh
- AC Charging (GridCars): R5.88/kWh
- Home Charging (Eskom): R3.00/kWh
- Petrol: R21/liter, ~7L/100km = R1.47/km
- BYD Dolphin Surf: 30-38 kWh battery, R339,900-R389,900

## 1. Setup & Imports

In [2]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 2. Define SA Market Constants

In [3]:
# SA Charging Rates (Dec 2024)
GRIDCARS_DC_FAST = 7.35   # R/kWh - DC Fast Charging
GRIDCARS_AC = 5.88        # R/kWh - AC Charging (Level 2)
HOME_CHARGING = 3.00      # R/kWh - Eskom residential average

# Currency conversion
USD_TO_ZAR = 17.00        # Exchange rate Dec 2024

# BYD Dolphin Surf Specifications
BYD_BATTERY_STANDARD = 30.08   # kWh
BYD_RANGE_STANDARD = 232       # km
BYD_BATTERY_PREMIUM = 38.88    # kWh
BYD_RANGE_PREMIUM = 295        # km
BYD_PRICE_MIN = 339900         # ZAR
BYD_PRICE_MAX = 389900         # ZAR

# Real-world efficiency (conservative estimate from data analysis)
REAL_WORLD_EFFICIENCY = BYD_RANGE_STANDARD/BYD_BATTERY_STANDARD   # km/kWh

# Petrol comparison
PETROL_PRICE_PER_LITER = 21.00 # ZAR
PETROL_CONSUMPTION = 7.0       # liters per 100km (average sedan)
PETROL_COST_PER_KM = (PETROL_CONSUMPTION / 100) * PETROL_PRICE_PER_LITER

print("SA MARKET RATES (Dec 2025)")
print("-" * 50)
print(f"DC Fast Charging: R{GRIDCARS_DC_FAST:.2f}/kWh")
print(f"AC Charging:      R{GRIDCARS_AC:.2f}/kWh")
print(f"Home Charging (AC):    R{HOME_CHARGING:.2f}/kWh")
print(f"\nPetrol Cost:      R{PETROL_COST_PER_KM:.2f}/km")
print(f"\nUSD to ZAR:       R{USD_TO_ZAR:.2f}")
print(f"\nBYD Dolphin Efficiency: {REAL_WORLD_EFFICIENCY:.2f} km/kWh")

SA MARKET RATES (Dec 2025)
--------------------------------------------------
DC Fast Charging: R7.35/kWh
AC Charging:      R5.88/kWh
Home Charging (AC):    R3.00/kWh

Petrol Cost:      R1.47/km

USD to ZAR:       R17.00

BYD Dolphin Efficiency: 7.71 km/kWh


## 3. Load Clean Data

In [4]:
# Load the cleaned dataset
df = pd.read_csv('ev_charging_patterns_CLEAN.csv')

print("DATA LOADED")
print(f"Total rows: {len(df):,}")
print(f"Columns: {len(df.columns)}")
print(f"\nDate range: {df['start_time'].min()} to {df['start_time'].max()}")
print(f"\nCharger types: {df['charger_type'].unique()}")
print(f"User types: {df['user_type'].unique()}")
print(f"Cities: {df['city'].unique()}")

# Show first few rows
print("\nFirst 3 rows:")
df.head(3)

DATA LOADED
Total rows: 1,320
Columns: 22

Date range: 2024-01-01 00:00:00 to 2024-02-24 23:00:00

Charger types: ['DC Fast Charger' 'Level 1' 'Level 2']
User types: ['Commuter' 'Casual Driver' 'Long-Distance Traveler']
Cities: ['Houston' 'San Francisco' 'Los Angeles' 'Chicago' 'New York']

First 3 rows:


Unnamed: 0,user_id,vehicle_model,battery_capacity_kwh,station_id,city,start_time,end_time,energy_consumed_kwh,duration_hours,charging_rate_kw,cost_usd,time_of_day,day_of_week,soc_start_pct,soc_end_pct,distance_km,temperature_c,vehicle_age,charger_type,user_type,has_distance_data,has_energy_data
0,User_1,BMW i3,108.46,Station_391,Houston,2024-01-01 00:00:00,2024-01-01 00:39:00,60.71,0.59,36.39,13.09,Evening,Tuesday,29.37,86.12,293.6,27.95,2.0,DC Fast Charger,Commuter,True,True
1,User_2,Hyundai Kona,100.0,Station_428,San Francisco,2024-01-01 01:00:00,2024-01-01 03:01:00,12.34,3.13,30.68,21.13,Morning,Monday,10.12,84.66,112.11,14.31,3.0,Level 1,Casual Driver,True,True
2,User_3,Chevy Bolt,75.0,Station_181,San Francisco,2024-01-01 02:00:00,2024-01-01 04:48:00,19.13,2.45,27.51,35.67,Morning,Thursday,6.85,69.92,71.8,21.0,2.0,Level 2,Commuter,True,True


## 4. Calculate Basic Efficiency Metrics

In [5]:
# Create a copy for complete data (has distance and energy)
df_complete = df[df['has_distance_data'] & df['has_energy_data']].copy()

print(f"Working with {len(df_complete):,} complete records (out of {len(df):,} total)")
print(f"Data completeness: {len(df_complete)/len(df)*100:.1f}%")

# Calculate efficiency: km per kWh
df_complete['km_per_kwh'] = df_complete['distance_km'] / df_complete['energy_consumed_kwh']

# Calculate cost per km (USD)
df_complete['cost_per_km_usd'] = df_complete['cost_usd'] / df_complete['distance_km']

# Calculate cost per kWh (USD)
df_complete['cost_per_kwh_usd'] = df_complete['cost_usd'] / df_complete['energy_consumed_kwh']

# Calculate charging rate (kW)
df_complete['kwh_per_hour (kW)'] = df_complete['energy_consumed_kwh'] / df_complete['duration_hours']

# Calculate battery utilization
df_complete['battery_used_pct'] = df_complete['soc_end_pct'] - df_complete['soc_start_pct']

print("\n Efficiency metrics calculated")
print("\nNew columns created:")
print("  - km_per_kwh: Efficiency (distance per energy unit)")
print("  - cost_per_km_usd: Cost per kilometer traveled")
print("  - cost_per_kwh_usd: Cost per energy unit")
print("  - kwh_per_hour (kW): Charging speed")
print("  - battery_used_pct: Battery charge gained")

Working with 1,193 complete records (out of 1,320 total)
Data completeness: 90.4%

 Efficiency metrics calculated

New columns created:
  - km_per_kwh: Efficiency (distance per energy unit)
  - cost_per_km_usd: Cost per kilometer traveled
  - cost_per_kwh_usd: Cost per energy unit
  - kwh_per_hour (kW): Charging speed
  - battery_used_pct: Battery charge gained


## 5. Quick Data Quality Check

In [6]:
# Check for any unrealistic values
print("EFFICIENCY METRICS SUMMARY")
print("\nkm per kWh (efficiency):")
print(df_complete['km_per_kwh'].describe())

print("\nCost per km (USD):")
print(df_complete['cost_per_km_usd'].describe())

print("\nCost per kWh (USD):")
print(df_complete['cost_per_kwh_usd'].describe())

# Check for outliers (values that seem unrealistic)
print("\n" + "-" * 50)
print("OUTLIER CHECK")
print("-" * 50)

# Flag extremely high efficiency (might indicate data error)
high_efficiency = df_complete[df_complete['km_per_kwh'] > 50]
print(f"\nSessions with >50 km/kWh: {len(high_efficiency)} ({len(high_efficiency)/len(df_complete)*100:.1f}%)")

# Flag extremely low efficiency
low_efficiency = df_complete[df_complete['km_per_kwh'] < 1]
print(f"Sessions with <1 km/kWh: {len(low_efficiency)} ({len(low_efficiency)/len(df_complete)*100:.1f}%)")

print("\n For analysis, we'll use median values which are less affected by outliers")

EFFICIENCY METRICS SUMMARY

km per kWh (efficiency):
count    1193.00
mean       11.47
std       173.42
min         0.05
25%         1.86
50%         3.59
75%         6.91
max      5962.57
Name: km_per_kwh, dtype: float64

Cost per km (USD):
count    1.19e+03
mean     2.86e-01
std      5.64e-01
min      8.52e-04
25%      8.30e-02
50%      1.48e-01
75%      2.89e-01
max      1.24e+01
Name: cost_per_km_usd, dtype: float64

Cost per kWh (USD):
count    1.19e+03
mean     1.68e+00
std      2.10e+01
min      4.52e-03
25%      2.97e-01
50%      5.39e-01
75%      9.55e-01
max      6.55e+02
Name: cost_per_kwh_usd, dtype: float64

--------------------------------------------------
OUTLIER CHECK
--------------------------------------------------

Sessions with >50 km/kWh: 3 (0.3%)
Sessions with <1 km/kWh: 149 (12.5%)

 For analysis, we'll use median values which are less affected by outliers


## 6. Calculate SA Charging Costs (GridCars Rates)

In [7]:
# Calculate what each session would cost at SA rates
# Based on energy consumed and charger type

def calculate_sa_cost(row):
    """
    Calculate what this charging session would cost in SA
    based on the charger type and GridCars/Eskom rates
    """
    energy = row['energy_consumed_kwh']
    
    if row['charger_type'] == 'DC Fast Charger':
        return energy * GRIDCARS_DC_FAST
    elif row['charger_type'] == 'Level 2':
        return energy * GRIDCARS_AC
    else:  # Level 1 (home charging)
        return energy * HOME_CHARGING

# Apply SA pricing
df_complete['cost_sa_zar'] = df_complete.apply(calculate_sa_cost, axis=1)
df_complete['cost_per_km_sa_zar'] = df_complete['cost_sa_zar'] / df_complete['distance_km']

print("SA CHARGING COSTS CALCULATED")
print("-" * 50)
print("\nUsing GridCars & Eskom rates:")
print(f"  DC Fast: R{GRIDCARS_DC_FAST:.2f}/kWh")
print(f"  Level 2: R{GRIDCARS_AC:.2f}/kWh")
print(f"  Level 1: R{HOME_CHARGING:.2f}/kWh")

print("\nAverage SA costs by charger type:")
sa_costs_by_type = df_complete.groupby('charger_type').agg({
    'cost_sa_zar': 'mean',
    'cost_per_km_sa_zar': 'mean',
    'duration_hours': 'mean'
}).round(2)
print(sa_costs_by_type)

SA CHARGING COSTS CALCULATED
--------------------------------------------------

Using GridCars & Eskom rates:
  DC Fast: R7.35/kWh
  Level 2: R5.88/kWh
  Level 1: R3.00/kWh

Average SA costs by charger type:
                 cost_sa_zar  cost_per_km_sa_zar  duration_hours
charger_type                                                    
DC Fast Charger       305.21                4.02            2.30
Level 1               125.88                1.59            2.27
Level 2               261.88                3.48            2.30


## 7. Compare EV vs Petrol Costs

In [8]:
# Add petrol cost comparison
df_complete['petrol_cost_equivalent_zar'] = df_complete['distance_km'] * PETROL_COST_PER_KM
df_complete['savings_vs_petrol_zar'] = df_complete['petrol_cost_equivalent_zar'] - df_complete['cost_sa_zar']
df_complete['savings_pct'] = (df_complete['savings_vs_petrol_zar'] / df_complete['petrol_cost_equivalent_zar']) * 100

print("EV vs PETROL COMPARISON")
print("-" * 50)
print(f"Petrol cost: R{PETROL_COST_PER_KM:.2f}/km")
print(f"\nAverage savings per charging session:")
print(f"  Amount: R{df_complete['savings_vs_petrol_zar'].mean():.2f}")
print(f"  Percentage: {df_complete['savings_pct'].mean():.1f}%")

print("\nSavings by charger type:")
savings_by_type = df_complete.groupby('charger_type').agg({
    'cost_per_km_sa_zar': 'mean',
    'savings_pct': 'mean'
}).round(2)
savings_by_type['petrol_cost_per_km'] = PETROL_COST_PER_KM
print(savings_by_type)

EV vs PETROL COMPARISON
--------------------------------------------------
Petrol cost: R1.47/km

Average savings per charging session:
  Amount: R-1.43
  Percentage: -103.5%

Savings by charger type:
                 cost_per_km_sa_zar  savings_pct  petrol_cost_per_km
charger_type                                                        
DC Fast Charger                4.02      -173.45                1.47
Level 1                        1.59        -8.18                1.47
Level 2                        3.48      -136.91                1.47


### Dataset Limitations

The following analysis (Cells 7-8) applies SA charging rates to the US
dataset. However, you may notice costs appear higher than expected:

**Dataset-based costs:**
- DC Fast: ~R4/km
- Level 2: ~R3.5/km
- Home: ~R1.6/km

**Why these seem high:**
The dataset shows median efficiency of only 3.6 km/kWh, significantly
lower than modern EVs (typically 6-10 km/kWh). This may be due to:
- Older vehicle models in the dataset
- Synthetic/aggregated data
- Data quality issues

**For accurate SA cost projections, see Cells 8 and 10** which uses actual BYD
Dolphin Surf specifications (7.71 km/kWh) with GridCars rates.

## 8. BYD Dolphin Surf - SA Calculations - Colleague Scenario

In [9]:
print("BYD DOLPHIN SURF - REAL-WORLD SA CALCULATIONS")
print("-" * 50)
print("Using manufacturer specifications + GridCars rates\n")

# BYD Dolphin Surf Standard specifications
BATTERY_CAPACITY = 30.08  # kWh
RANGE_WLTP = 232  # km (conservative, real-world estimate)
REAL_EFFICIENCY = RANGE_WLTP / BATTERY_CAPACITY  # = 7.71 km/kWh

print(f"BYD Dolphin Surf Standard:")
print(f"  Battery: {BATTERY_CAPACITY} kWh")
print(f"  Range: {RANGE_WLTP} km")
print(f"  Efficiency: {REAL_EFFICIENCY:.2f} km/kWh")

# Cost per km for different charging scenarios
cost_per_km_home = HOME_CHARGING / REAL_EFFICIENCY
cost_per_km_ac = GRIDCARS_AC / REAL_EFFICIENCY
cost_per_km_dc = GRIDCARS_DC_FAST / REAL_EFFICIENCY

print(f"\nCost per km (SA rates):")
print(f"  Home charging: R{cost_per_km_home:.2f}/km")
print(f"  AC (GridCars): R{cost_per_km_ac:.2f}/km")
print(f"  DC Fast (GridCars): R{cost_per_km_dc:.2f}/km")
print(f"  Petrol equivalent: R{PETROL_COST_PER_KM:.2f}/km")

# Colleague's scenario: 60km daily commute
DAILY_COMMUTE = 60  # km
WORK_DAYS_PER_MONTH = 22

# Scenario 1: 80% home charging, 20% AC charging
home_km = DAILY_COMMUTE * WORK_DAYS_PER_MONTH * 0.80
ac_km = DAILY_COMMUTE * WORK_DAYS_PER_MONTH * 0.20

monthly_ev_cost = (home_km * cost_per_km_home) + (ac_km * cost_per_km_ac)
monthly_petrol_cost = (DAILY_COMMUTE * WORK_DAYS_PER_MONTH) * PETROL_COST_PER_KM

print(f"\nMonthly Cost Scenario (60km/day, 22 work days):")
print(f"  80% home, 20% AC charging")
print(f"  EV cost: R{monthly_ev_cost:.2f}/month")
print(f"  Petrol cost: R{monthly_petrol_cost:.2f}/month")
print(f"  Monthly savings: R{monthly_petrol_cost - monthly_ev_cost:.2f}")
print(f"  Annual savings: R{(monthly_petrol_cost - monthly_ev_cost) * 12:.2f}")
print(f"  5-year savings: R{(monthly_petrol_cost - monthly_ev_cost) * 60:,.2f}")

# Charging time estimates (based on typical SA infrastructure)
print(f"\nCharging Time Estimates:")
print(f"  Home (AC, 7kW): {BATTERY_CAPACITY / 6.6:.1f} hours (0-100%)")
print(f"  GridCars AC (22kW): {BATTERY_CAPACITY / 22:.1f} hours (0-100%)")
print(f"  GridCars DC (60kW): {BATTERY_CAPACITY / 60 * 60:.0f} minutes (0-100%)")
print(f"\n  Note: Real-world charging is typically 20-80%, reducing times by ~50%")

BYD DOLPHIN SURF - REAL-WORLD SA CALCULATIONS
--------------------------------------------------
Using manufacturer specifications + GridCars rates

BYD Dolphin Surf Standard:
  Battery: 30.08 kWh
  Range: 232 km
  Efficiency: 7.71 km/kWh

Cost per km (SA rates):
  Home charging: R0.39/km
  AC (GridCars): R0.76/km
  DC Fast (GridCars): R0.95/km
  Petrol equivalent: R1.47/km

Monthly Cost Scenario (60km/day, 22 work days):
  80% home, 20% AC charging
  EV cost: R612.01/month
  Petrol cost: R1940.40/month
  Monthly savings: R1328.39
  Annual savings: R15940.63
  5-year savings: R79,703.17

Charging Time Estimates:
  Home (AC, 7kW): 4.6 hours (0-100%)
  GridCars AC (22kW): 1.4 hours (0-100%)
  GridCars DC (60kW): 30 minutes (0-100%)

  Note: Real-world charging is typically 20-80%, reducing times by ~50%


## 9. User Type Behavior Analysis

In [10]:
# Analyze which user types prefer which charger types
print("USER TYPE BEHAVIOR ANALYSIS")
print("-" * 50)

# Create cross-tabulation
user_charger_prefs = pd.crosstab(
    df_complete['user_type'], 
    df_complete['charger_type'],
    normalize='index'
) * 100  # Convert to percentage

print("\nCharger Type Preferences by User Type (% of sessions):")
print(user_charger_prefs.round(1))

# Calculate average costs and time by user type
user_analysis = df_complete.groupby('user_type').agg({
    'cost_sa_zar': 'mean',
    'cost_per_km_sa_zar': 'mean',
    'duration_hours': 'mean',
    'distance_km': 'mean',
    'savings_vs_petrol_zar': 'mean'
}).round(2)

print("\nAverage Behavior by User Type:")
print(user_analysis)

print("\n User behavior patterns analyzed")

USER TYPE BEHAVIOR ANALYSIS
--------------------------------------------------

Charger Type Preferences by User Type (% of sessions):
charger_type            DC Fast Charger  Level 1  Level 2
user_type                                                
Casual Driver                      34.2     34.8     31.0
Commuter                           30.1     33.6     36.2
Long-Distance Traveler             33.5     37.2     29.2

Average Behavior by User Type:
                        cost_sa_zar  cost_per_km_sa_zar  duration_hours  \
user_type                                                                 
Casual Driver                237.64                3.15            2.32   
Commuter                     221.54                2.87            2.33   
Long-Distance Traveler       226.27                2.97            2.23   

                        distance_km  savings_vs_petrol_zar  
user_type                                                   
Casual Driver                147.14          

There are clearly some errors in the data - My suspicion is that this is synthetic data from Kaggle as the mean durations for DC and AC charging (Level 1 & Level 2) are all very similar which cannot be the case

## 10. BYD Dolphin Surf - Annual Cost Scenarios

In [11]:
# Calculate annual costs for different usage scenarios
print("BYD DOLPHIN SURF - ANNUAL COST SCENARIOS")
print("-" * 50)
print(f"Using real-world efficiency: {REAL_EFFICIENCY:.2f} km/kWh")
print(f"(Based on BYD Dolphin Surf: {RANGE_WLTP}km range / {BATTERY_CAPACITY}kWh battery)")

# Calculate cost per km for each charging type
cost_per_km_home = HOME_CHARGING / REAL_EFFICIENCY
cost_per_km_ac = GRIDCARS_AC / REAL_EFFICIENCY
cost_per_km_dc = GRIDCARS_DC_FAST / REAL_EFFICIENCY

print(f"\nCost per km by charging type:")
print(f"  Home (R{HOME_CHARGING:.2f}/kWh): R{cost_per_km_home:.2f}/km")
print(f"  AC (R{GRIDCARS_AC:.2f}/kWh): R{cost_per_km_ac:.2f}/km")
print(f"  DC Fast (R{GRIDCARS_DC_FAST:.2f}/kWh): R{cost_per_km_dc:.2f}/km")
print(f"  Petrol: R{PETROL_COST_PER_KM:.2f}/km")

# Define realistic SA scenarios
scenarios = {
    'Commuter (My Colleague)': {
        'description': '60km daily commute, charges at home + occasional AC top-ups',
        'annual_km': 15600,  # 60km/day × 22 work days × 12 months
        'dc_fast_pct': 0.05,  # Rare emergency use
        'ac_pct': 0.15,       # Occasional top-ups at work/mall
        'home_pct': 0.80      # Majority overnight charging
    },
    'Weekend Driver': {
        'description': 'Casual use, mostly short trips with home charging',
        'annual_km': 8000,
        'dc_fast_pct': 0.10,
        'ac_pct': 0.20,
        'home_pct': 0.70
    },
    'Road Warrior': {
        'description': 'High mileage with frequent long trips',
        'annual_km': 25000,
        'dc_fast_pct': 0.30,  # Regular long-distance travel
        'ac_pct': 0.30,
        'home_pct': 0.40
    }
}

results = []

for scenario_name, params in scenarios.items():
    annual_km = params['annual_km']

    # Calculate km per charging type
    dc_km = annual_km * params['dc_fast_pct']
    ac_km = annual_km * params['ac_pct']
    home_km = annual_km * params['home_pct']

    # Calculate costs (cost per km × km traveled)
    dc_cost = dc_km * cost_per_km_dc
    ac_cost = ac_km * cost_per_km_ac
    home_cost = home_km * cost_per_km_home

    total_ev_cost = dc_cost + ac_cost + home_cost
    total_petrol_cost = annual_km * PETROL_COST_PER_KM
    annual_savings = total_petrol_cost - total_ev_cost

    # Calculate monthly equivalent
    monthly_ev = total_ev_cost / 12
    monthly_petrol = total_petrol_cost / 12

    results.append({
        'Scenario': scenario_name,
        'Description': params['description'],
        'Annual km': f"{annual_km:,}",
        'EV Cost/Year': f"R{total_ev_cost:,.0f}",
        'Petrol Cost/Year': f"R{total_petrol_cost:,.0f}",
        'Annual Savings': f"R{annual_savings:,.0f}",
        'Monthly Savings': f"R{annual_savings/12:,.0f}",
        '5-Year Savings': f"R{annual_savings * 5:,.0f}"
    })

results_df = pd.DataFrame(results)

# Print with better formatting
print("\n" + "-" * 100)
for idx, row in results_df.iterrows():
    print(f"\n{row['Scenario']}")
    print(f"  {row['Description']}")
    print(f"  Annual distance: {row['Annual km']} km")
    print(f"  EV cost: {row['EV Cost/Year']}/year ({row['Monthly Savings']}/month savings)")
    print(f"  Petrol cost: {row['Petrol Cost/Year']}/year")
    print(f"  Total savings: {row['Annual Savings']}/year | {row['5-Year Savings']} over 5 years")

print("\n" + "-" * 100)
print("KEY INSIGHTS")
print("-" * 100)

# Calculate the commuter scenario specifically (your colleague)
commuter_data = scenarios['Commuter (My Colleague)']
commuter_monthly_ev = (commuter_data['annual_km'] / 12) * (
    cost_per_km_home * commuter_data['home_pct'] +
    cost_per_km_ac * commuter_data['ac_pct'] +
    cost_per_km_dc * commuter_data['dc_fast_pct']
)
commuter_monthly_petrol = (commuter_data['annual_km'] / 12) * PETROL_COST_PER_KM

print(f"\nFor a typical Cape Town commuter (60km/day):")
print(f"  Monthly EV cost: R{commuter_monthly_ev:.0f} (80% home, 15% AC, 5% DC)")
print(f"  Monthly petrol: R{commuter_monthly_petrol:.0f}")
print(f"  Monthly savings: R{commuter_monthly_petrol - commuter_monthly_ev:.0f}")

BYD DOLPHIN SURF - ANNUAL COST SCENARIOS
--------------------------------------------------
Using real-world efficiency: 7.71 km/kWh
(Based on BYD Dolphin Surf: 232km range / 30.08kWh battery)

Cost per km by charging type:
  Home (R3.00/kWh): R0.39/km
  AC (R5.88/kWh): R0.76/km
  DC Fast (R7.35/kWh): R0.95/km
  Petrol: R1.47/km

----------------------------------------------------------------------------------------------------

Commuter (My Colleague)
  60km daily commute, charges at home + occasional AC top-ups
  Annual distance: 15,600 km
  EV cost: R7,382/year (R1,296/month savings)
  Petrol cost: R22,932/year
  Total savings: R15,550/year | R77,752 over 5 years

Weekend Driver
  Casual use, mostly short trips with home charging
  Annual distance: 8,000 km
  EV cost: R4,160/year (R633/month savings)
  Petrol cost: R11,760/year
  Total savings: R7,600/year | R37,998 over 5 years

Road Warrior
  High mileage with frequent long trips
  Annual distance: 25,000 km
  EV cost: R16,755/ye

## 11. Save Dataset

In [12]:
# Save the dataset with all new features
output_file = 'ev_charging_patterns_WITH_FEATURES.csv'
df_complete.to_csv(output_file, index=False)

print("-" * 50)
print(f" Saved to: {output_file}")
print(f" Total rows: {len(df_complete):,}")
print(f" Total columns: {len(df_complete.columns)}")

print("\nNew columns added:")
new_cols = [
    'km_per_kwh',
    'cost_per_km_usd',
    'cost_per_kwh_usd',
    'kwh_per_hour',
    'battery_used_pct',
    'cost_zar',
    'cost_per_km_zar',
    'cost_per_kwh_zar',
    'cost_sa_zar',
    'cost_per_km_sa_zar',
    'petrol_cost_equivalent_zar',
    'savings_vs_petrol_zar',
    'savings_pct'
]

for col in new_cols:
    print(f"  - {col}")

print("\n" + "-" * 50)
print("FEATURE ENGINEERING COMPLETE!")
print("-" * 50)
print("\nYou are now ready for:")
print("1. Creating visualizations")
print("2. Building your LinkedIn post")
print("3. Writing your analysis narrative")

--------------------------------------------------
 Saved to: ev_charging_patterns_WITH_FEATURES.csv
 Total rows: 1,193
 Total columns: 32

New columns added:
  - km_per_kwh
  - cost_per_km_usd
  - cost_per_kwh_usd
  - kwh_per_hour
  - battery_used_pct
  - cost_zar
  - cost_per_km_zar
  - cost_per_kwh_zar
  - cost_sa_zar
  - cost_per_km_sa_zar
  - petrol_cost_equivalent_zar
  - savings_vs_petrol_zar
  - savings_pct

--------------------------------------------------
FEATURE ENGINEERING COMPLETE!
--------------------------------------------------

You are now ready for:
1. Creating visualizations
2. Building your LinkedIn post
3. Writing your analysis narrative


## 12. Quick Summary Statistics

In [13]:
# Generate summary statistics for your LinkedIn post
print("SUMMARY STATISTICS")
print("-" * 50)

print("\nDATASET OVERVIEW")
print(f"Total charging sessions analyzed: {len(df_complete):,}")
print(f"Cities covered: {', '.join(df_complete['city'].unique())}")
print(f"Date range: {df_complete['start_time'].min()[:10]} to {df_complete['start_time'].max()[:10]}")

# Use BYD-specific calculations (not dataset-based)
cost_per_km_home = HOME_CHARGING / REAL_WORLD_EFFICIENCY
cost_per_km_ac = GRIDCARS_AC / REAL_WORLD_EFFICIENCY
cost_per_km_dc = GRIDCARS_DC_FAST / REAL_WORLD_EFFICIENCY

print("\nCHARGING COSTS (BYD Dolphin Surf at SA rates)")
print(f"Home:   R{cost_per_km_home:.2f}/km")
print(f"AC:     R{cost_per_km_ac:.2f}/km")
print(f"DC Fast: R{cost_per_km_dc:.2f}/km")
print(f"Petrol: R{PETROL_COST_PER_KM:.2f}/km")

print("\nCOMMUTER SCENARIO (60km/day, 22 work days/month)")
commuter_annual = 15600
commuter_monthly_ev = (commuter_annual / 12) * (0.8 * cost_per_km_home + 0.15 * cost_per_km_ac + 0.05 * cost_per_km_dc)
commuter_monthly_petrol = (commuter_annual / 12) * PETROL_COST_PER_KM

print(f"Monthly EV cost: R{commuter_monthly_ev:.0f}")
print(f"Monthly petrol: R{commuter_monthly_petrol:.0f}")
print(f"Monthly savings: R{commuter_monthly_petrol - commuter_monthly_ev:.0f}")
print(f"Annual savings: R{(commuter_monthly_petrol - commuter_monthly_ev) * 12:,.0f}")
print(f"5-year savings: R{(commuter_monthly_petrol - commuter_monthly_ev) * 60:,.0f}")

SUMMARY STATISTICS
--------------------------------------------------

DATASET OVERVIEW
Total charging sessions analyzed: 1,193
Cities covered: Houston, San Francisco, Los Angeles, Chicago, New York
Date range: 2024-01-01 to 2024-02-24

CHARGING COSTS (BYD Dolphin Surf at SA rates)
Home:   R0.39/km
AC:     R0.76/km
DC Fast: R0.95/km
Petrol: R1.47/km

COMMUTER SCENARIO (60km/day, 22 work days/month)
Monthly EV cost: R615
Monthly petrol: R1911
Monthly savings: R1296
Annual savings: R15,550
5-year savings: R77,752
