## EV Adoption Forecasting
As electric vehicle (EV) adoption surges, urban planners need to anticipate infrastructure needs—especially charging stations. Inadequate planning can lead to bottlenecks, impacting user satisfaction and hindering sustainability goals.

**Problem Statement:** Using the electric vehicle dataset (which includes information on EV populations, vehicle types, and possibly historical charging usage), create a model to forecast future EV adoption. For example, predict the number of electric vehicles in upcoming years based on the trends in the data.

**Goal:** Build a regression model that forecasts future EV adoption demand based on historical trends in EV growth, types of vehicles, and regional data.

**Dataset:** This dataset shows the number of vehicles that were registered by Washington State Department of Licensing (DOL) each month. The data is separated by county for passenger vehicles and trucks.

- Date: Counts of registered vehicles are taken on this day (the end of this month). - 2017-01-31
2024-02-29
- County: This is the geographic region of a state that a vehicle's owner is listed to reside within. Vehicles registered in Washington
- State: This is the geographic region of the country associated with the record. These addresses may be located in other
- Vehicle Primary Use: This describes the primary intended use of the vehicle.(Passenger-83%, Truck-17%)
- Battery Electric Vehicles (BEVs): The count of vehicles that are known to be propelled solely by an energy derived from an onboard electric battery.
- Plug-In Hybrid Electric Vehicles (PHEVs): The count of vehicles that are known to be propelled from energy partially sourced from an onboard electric battery
- Electric Vehicle (EV) Total: The sum of Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs).
- Non-Electric Vehicle Total: The count of vehicles that are not electric vehicles.
- Total Vehicles: All powered vehicles registered in the county. This includes electric vehicles.
- Percent Electric Vehicles: Comparison of electric vehicles versus their non-electric counterparts.

**Dataset Link:** https://www.kaggle.com/datasets/sahirmaharajj/electric-vehicle-population-size-2024/data

### Import Required Libraries

In [1]:
import joblib
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

### Load Dataset

In [2]:
# Load data
df = pd.read_csv("Electric_Vehicle_Population_By_County.csv")

### Explore and Understand the Data

In [3]:
# Check Dataset Dimensions
print("Dataset Shape:", df.shape)
print("\nDataset Info:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

Dataset Shape: (20819, 10)

Dataset Info:

Missing Values:


Total 20819 data points and 10 features.

In [4]:
df.head()

Unnamed: 0,Date,County,State,Vehicle Primary Use,Battery Electric Vehicles (BEVs),Plug-In Hybrid Electric Vehicles (PHEVs),Electric Vehicle (EV) Total,Non-Electric Vehicle Total,Total Vehicles,Percent Electric Vehicles
0,September 30 2022,Riverside,CA,Passenger,7,0,7,460,467,1.5
1,December 31 2022,Prince William,VA,Passenger,1,2,3,188,191,1.57
2,January 31 2020,Dakota,MN,Passenger,0,1,1,32,33,3.03
3,June 30 2022,Ferry,WA,Truck,0,0,0,3575,3575,0.0
4,July 31 2021,Douglas,CO,Passenger,0,1,1,83,84,1.19


In [5]:
# Data preprocessing
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Quarter'] = df['Date'].dt.quarter

# Clean numeric columns
numeric_cols = ['Battery Electric Vehicles (BEVs)', 'Plug-In Hybrid Electric Vehicles (PHEVs)', 
                'Electric Vehicle (EV) Total', 'Non-Electric Vehicle Total', 'Total Vehicles']

for col in numeric_cols:
    df[col] = df[col].astype(str).str.replace(',', '').astype(int)

# Create growth rate features
df = df.sort_values(['State', 'County', 'Vehicle Primary Use', 'Date'])
df['EV_Growth_Rate'] = df.groupby(['State', 'County', 'Vehicle Primary Use'])['Electric Vehicle (EV) Total'].pct_change()
df['EV_Growth_Rate'] = df['EV_Growth_Rate'].fillna(0)

# Encode categorical variables
le_state = LabelEncoder()
le_county = LabelEncoder()
le_use = LabelEncoder()

df['State_encoded'] = le_state.fit_transform(df['State'])
df['County_encoded'] = le_county.fit_transform(df['County'])
df['Vehicle_Use_encoded'] = le_use.fit_transform(df['Vehicle Primary Use'])

In [6]:
# Prepare features and target
features = ['Year', 'Month', 'Quarter', 'State_encoded', 'County_encoded', 'Vehicle_Use_encoded',
           'Battery Electric Vehicles (BEVs)', 'Plug-In Hybrid Electric Vehicles (PHEVs)',
           'Non-Electric Vehicle Total', 'Total Vehicles', 'EV_Growth_Rate', 'Percent Electric Vehicles']

X = df[features]
y = df['Electric Vehicle (EV) Total']

# Split data chronologically for time series
split_date = df['Date'].quantile(0.8)
train_mask = df['Date'] <= split_date

X_train, X_test = X[train_mask], X[~train_mask]
y_train, y_test = y[train_mask], y[~train_mask]

In [7]:
# Hyperparameter tuning
param_dist = {
    'n_estimators': [100, 200],
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5]
}

rf_model = RandomForestRegressor(random_state=42)
random_search = RandomizedSearchCV(rf_model, param_dist, n_iter=5, cv=3, random_state=42)
random_search.fit(X_train, y_train)

# Best model
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)

In [8]:
# Evaluate model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Absolute Error: {mae:.2f}')
print(f'Mean Squared Error: {mse:.2f}')
print(f'R² Score: {r2:.4f}')

In [9]:
# Feature importance
feature_importance = pd.DataFrame({
    'feature': features,
    'importance': best_model.feature_importances_
}).sort_values('importance', ascending=False)

print('Feature Importance:')
print(feature_importance)

# Visualize top features
plt.figure(figsize=(10, 6))
sns.barplot(data=feature_importance.head(8), x='importance', y='feature')
plt.title('Top 8 Feature Importance for EV Adoption Forecasting')
plt.tight_layout()
plt.show()

In [10]:
# Save model and encoders
joblib.dump(best_model, 'ev_adoption_model.pkl')
joblib.dump({'state': le_state, 'county': le_county, 'use': le_use}, 'encoders.pkl')
print('Model and encoders saved successfully!')

### Future Predictions

Create predictions for future EV adoption based on historical trends.

In [11]:
# Create realistic future scenarios
# Scenario 1: Conservative growth
conservative_data = pd.DataFrame({
    'Year': [2025, 2026, 2027],
    'Month': [6, 6, 6],
    'Quarter': [2, 2, 2],
    'State_encoded': [df['State_encoded'].mode()[0]] * 3,
    'County_encoded': [df['County_encoded'].mode()[0]] * 3,
    'Vehicle_Use_encoded': [0] * 3,  # Passenger vehicles
    'Battery Electric Vehicles (BEVs)': [100, 120, 140],
    'Plug-In Hybrid Electric Vehicles (PHEVs)': [50, 55, 60],
    'Non-Electric Vehicle Total': [5000, 4800, 4600],
    'Total Vehicles': [5150, 4975, 4800],
    'EV_Growth_Rate': [0.15, 0.12, 0.10],
    'Percent Electric Vehicles': [2.9, 3.5, 4.2]
})

# Scenario 2: Aggressive growth
aggressive_data = conservative_data.copy()
aggressive_data['Battery Electric Vehicles (BEVs)'] = [200, 280, 380]
aggressive_data['EV_Growth_Rate'] = [0.25, 0.22, 0.20]
aggressive_data['Percent Electric Vehicles'] = [4.5, 6.2, 8.5]

# Make predictions
conservative_pred = best_model.predict(conservative_data)
aggressive_pred = best_model.predict(aggressive_data)

print('EV Adoption Forecasting Results:')
print('\nConservative Growth Scenario:')
for i, year in enumerate([2025, 2026, 2027]):
    print(f'{year}: {conservative_pred[i]:.0f} EVs')

print('\nAggressive Growth Scenario:')
for i, year in enumerate([2025, 2026, 2027]):
    print(f'{year}: {aggressive_pred[i]:.0f} EVs')