# **Electric Vehicle Distribution & Infrastructure Optimization**

Group - 3

Team Members: Fangzhou Zheng, Haojiang Wu, Yixi Yu, Pranaya Bindu Buyya

##**Introduction**

## **Problem Statement**

The rapid growth of electric vehicle (EV) penetration is both a challenge and a chance, predominantly in charging infrastructure planning, spatial differences in geographical adoption, and policy effectiveness. With EV sales growing by 35% (IEA, 2023), a deficit in charging facilities is the largest barrier to widescale adoption (McKinsey, 2022). Moreover, existing evidence is often poor in incorporating overall evaluation that brings together vehicle factors, geographical extent, and the impact of policies.

Our objective is to measure EV distribution patterns, identify charging demand hotspots, and assess the impact of policy stimuli. Through machine learning techniques of association rule mining and clustering, we will establish key insights that will optimize the location of charging infrastructure, facilitate policy-making choices, and enhance market strategies. The findings will provide actionable insights for energy retailers, policymakers, and auto-makers, facilitating an efficient and sustainable EV society.

## **Data Source**

**Source:** The dataset is made available for public release by the Washington Open Data Portal, providing up-to-date records of registered electric vehicles within the state.

**Dataset URL:** https://drive.google.com/file/d/1iLFacW1f3ENf4u6t4VgsW6GaeNn99za_/view?usp=drive_link   

**Data Dictionary:**

This dataset provides a comprehensive list of electric vehicles (EVs) registered in Washington State, including vehicle details, geographic location, and policy data. The data is taken from the Washington State Department of Licensing (DOL) and updated from time to time to reflect the latest EV registrations.

Dataset Size: approximately 51.7 MB

Number of Records: 22,392 rows (one for each EV registration)

Number of Attributes: 17 columns



| **Feature Name**               | **Description**                                           | **Data Type**                |
|--------------------------------|-----------------------------------------------------------|------------------------------|
| **VIN (1-10)**                 | First 10 digits of the Vehicle Identification Number      | Categorical                  |
| **County**                     | County where the EV is registered                        | Categorical                  |
| **City**                       | City of EV registration                                  | Categorical                  |
| **State**                      | State abbreviation                                       | Categorical                  |
| **Postal Code**                | ZIP code where the vehicle is registered                 | Categorical (Numeric)        |
| **Model Year**                 | Year the vehicle was manufactured                        | Numeric                      |
| **Make**                       | Vehicle manufacturer                                     | Categorical                  |
| **Model**                      | Specific model of the EV                                 | Categorical                  |
| **Electric Vehicle Type**       | EV classification (Battery or Hybrid)                   | Categorical                  |
| **CAFV Eligibility**           | Eligibility for Clean Alternative Fuel Vehicle incentives | Categorical                  |
| **Electric Range**             | Estimated driving range on a full charge (in miles)     | Numeric                      |
| **Base MSRP**                  | Manufacturer’s Suggested Retail Price                   | Numeric                      |
| **Legislative District**       | Washington legislative district                          | Categorical (Numeric)        |
| **Vehicle Location**           | Geographic coordinates (latitude & longitude)           | Categorical (Converted to Numeric) |
| **Electric Utility**           | Power provider supplying electricity to the EV owner    | Categorical                  |
| **2020 Census Tract**         | Census demographic data reference                       | Categorical (Numeric)        |


In [None]:
#import data
import pandas as pd
import numpy as np

url = 'https://drive.google.com/uc?export=download&id=1iLFacW1f3ENf4u6t4VgsW6GaeNn99za_'
df = pd.read_csv(url)
df.head()

Unnamed: 0,VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
0,1C4JJXP66P,Kitsap,Poulsbo,WA,98370.0,2023,JEEP,WRANGLER,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,21.0,0.0,23.0,258127145,POINT (-122.64681 47.73689),PUGET SOUND ENERGY INC,53035090000.0
1,1G1FX6S08K,Snohomish,Lake Stevens,WA,98258.0,2019,CHEVROLET,BOLT EV,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238.0,0.0,44.0,4735426,POINT (-122.06402 48.01497),PUGET SOUND ENERGY INC,53061050000.0
2,WBY1Z2C58F,King,Seattle,WA,98116.0,2015,BMW,I3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,81.0,0.0,34.0,272697666,POINT (-122.41067 47.57894),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033010000.0
3,5YJ3E1EBXK,King,Seattle,WA,98178.0,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220.0,0.0,37.0,477309682,POINT (-122.23825 47.49461),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033010000.0
4,5YJSA1V24F,Yakima,Selah,WA,98942.0,2015,TESLA,MODEL S,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,208.0,0.0,15.0,258112970,POINT (-120.53145 46.65405),PACIFICORP,53077000000.0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 223995 entries, 0 to 223994
Data columns (total 17 columns):
 #   Column                                             Non-Null Count   Dtype  
---  ------                                             --------------   -----  
 0   VIN (1-10)                                         223995 non-null  object 
 1   County                                             223992 non-null  object 
 2   City                                               223992 non-null  object 
 3   State                                              223995 non-null  object 
 4   Postal Code                                        223992 non-null  float64
 5   Model Year                                         223995 non-null  int64  
 6   Make                                               223995 non-null  object 
 7   Model                                              223995 non-null  object 
 8   Electric Vehicle Type                              223995 non-null  object

## **Data Cleaning & Preprocessing**

##**K-Means Clustering:**

We applied K-Means clustering to analyze geography-oriented patterns in EV adoption near Washington State. By segmenting geographical areas into high, medium, and low EV adoption regions. This segmentation helps strategically locate charging hardware, identify necessary policy actions, and project near-future demand for EVs.



In [None]:
df.isnull().sum()

Unnamed: 0,0
VIN (1-10),0
County,3
City,3
State,0
Postal Code,3
Model Year,0
Make,0
Model,0
Electric Vehicle Type,0
Clean Alternative Fuel Vehicle (CAFV) Eligibility,0


In [None]:
df.dropna(inplace=True)

In [None]:
df.isnull().count()

Unnamed: 0,0
VIN (1-10),223496
County,223496
City,223496
State,223496
Postal Code,223496
Model Year,223496
Make,223496
Model,223496
Electric Vehicle Type,223496
Clean Alternative Fuel Vehicle (CAFV) Eligibility,223496


In [None]:
df[['Longitude', 'Latitude']] = df['Vehicle Location'].str.extract(r'POINT \((-?\d+\.\d+) (-?\d+\.\d+)\)')
df['Longitude'] = df['Longitude'].astype(float)
df['Latitude'] = df['Latitude'].astype(float)
df.drop(columns=['Vehicle Location'], inplace=True)

df

Unnamed: 0,VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Electric Utility,2020 Census Tract,Longitude,Latitude
0,1C4JJXP66P,Kitsap,Poulsbo,WA,98370.0,2023,JEEP,WRANGLER,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,21.0,0.0,23.0,258127145,PUGET SOUND ENERGY INC,5.303509e+10,-122.64681,47.73689
1,1G1FX6S08K,Snohomish,Lake Stevens,WA,98258.0,2019,CHEVROLET,BOLT EV,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238.0,0.0,44.0,4735426,PUGET SOUND ENERGY INC,5.306105e+10,-122.06402,48.01497
2,WBY1Z2C58F,King,Seattle,WA,98116.0,2015,BMW,I3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,81.0,0.0,34.0,272697666,CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),5.303301e+10,-122.41067,47.57894
3,5YJ3E1EBXK,King,Seattle,WA,98178.0,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220.0,0.0,37.0,477309682,CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),5.303301e+10,-122.23825,47.49461
4,5YJSA1V24F,Yakima,Selah,WA,98942.0,2015,TESLA,MODEL S,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,208.0,0.0,15.0,258112970,PACIFICORP,5.307700e+10,-120.53145,46.65405
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
223990,7SAYGDEE4R,Pierce,Puyallup,WA,98374.0,2024,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,0.0,0.0,2.0,264662359,PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),5.305307e+10,-122.27575,47.13959
223991,WBY8P2C00M,Snohomish,Lake Stevens,WA,98258.0,2021,BMW,I3,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,0.0,0.0,44.0,157728168,PUGET SOUND ENERGY INC,5.306105e+10,-122.06402,48.01497
223992,JN1AZ0CP3B,Pierce,University Place,WA,98466.0,2011,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,73.0,0.0,28.0,261733433,BONNEVILLE POWER ADMINISTRATION||CITY OF TACOM...,5.305307e+10,-122.53756,47.23165
223993,5YJ3E1EA2R,Pierce,Puyallup,WA,98374.0,2024,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,0.0,0.0,25.0,275283487,PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),5.305307e+10,-122.27575,47.13959


In [None]:
df.drop_duplicates(inplace=True)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 223496 entries, 0 to 223994
Data columns (total 18 columns):
 #   Column                                             Non-Null Count   Dtype  
---  ------                                             --------------   -----  
 0   VIN (1-10)                                         223496 non-null  object 
 1   County                                             223496 non-null  object 
 2   City                                               223496 non-null  object 
 3   State                                              223496 non-null  object 
 4   Postal Code                                        223496 non-null  float64
 5   Model Year                                         223496 non-null  int64  
 6   Make                                               223496 non-null  object 
 7   Model                                              223496 non-null  object 
 8   Electric Vehicle Type                              223496 non-null  object 
 9 

In [None]:
df[df.select_dtypes(include=['object']).columns] = df.select_dtypes(include=['object']).apply(lambda x: x.astype('category'))

## **DBSCAN Clustering**

In [None]:
from sklearn.cluster import DBSCAN
import numpy as np
import pandas as pd

charging_stations = []

for county, county_data in df.groupby("County"):
    car_locations = county_data[['Latitude', 'Longitude']].values

    dbscan = DBSCAN(eps=0.01, min_samples=5)
    dbscan.fit(car_locations)

    labels = dbscan.labels_
    unique_clusters = set(labels) - {-1}

    print(f"DBSCAN detected {len(unique_clusters)} clusters for {county}")

    for cluster in unique_clusters:
        cluster_points = car_locations[labels == cluster]
        cluster_center = np.mean(cluster_points, axis=0)
        charging_stations.append([county, cluster, cluster_center[0], cluster_center[1]])

charging_stations_df_db = pd.DataFrame(charging_stations, columns=["County", "Cluster", "Latitude", "Longitude"])

  for county, county_data in df.groupby("County"):


DBSCAN detected 2 clusters for Adams
DBSCAN detected 2 clusters for Asotin
DBSCAN detected 8 clusters for Benton
DBSCAN detected 8 clusters for Chelan
DBSCAN detected 6 clusters for Clallam
DBSCAN detected 20 clusters for Clark
DBSCAN detected 1 clusters for Columbia
DBSCAN detected 8 clusters for Cowlitz
DBSCAN detected 6 clusters for Douglas
DBSCAN detected 3 clusters for Ferry
DBSCAN detected 5 clusters for Franklin
DBSCAN detected 0 clusters for Garfield
DBSCAN detected 9 clusters for Grant
DBSCAN detected 14 clusters for Grays Harbor
DBSCAN detected 7 clusters for Island
DBSCAN detected 8 clusters for Jefferson
DBSCAN detected 79 clusters for King
DBSCAN detected 17 clusters for Kitsap
DBSCAN detected 8 clusters for Kittitas
DBSCAN detected 6 clusters for Klickitat
DBSCAN detected 19 clusters for Lewis
DBSCAN detected 4 clusters for Lincoln
DBSCAN detected 9 clusters for Mason
DBSCAN detected 11 clusters for Okanogan
DBSCAN detected 8 clusters for Pacific
DBSCAN detected 3 cluster

In [None]:
print(charging_stations_df_db.head())

   County  Cluster  Latitude  Longitude
0   Adams        0  47.12740 -118.37977
1   Adams        1  46.82616 -119.17420
2  Asotin        0  46.41402 -117.04556
3  Asotin        1  46.34056 -117.04784
4  Benton        0  46.31484 -119.26844


In [None]:
cluster_counts = charging_stations_df_db.groupby("County").size()
print(cluster_counts.sort_values(ascending=False))

County
King            79
Pierce          54
Spokane         33
Snohomish       27
Clark           20
Lewis           19
Kitsap          17
Yakima          17
Whatcom         16
Grays Harbor    14
Thurston        12
Okanogan        11
Stevens          9
Mason            9
Skagit           9
Grant            9
Cowlitz          8
Kittitas         8
Jefferson        8
Benton           8
Chelan           8
Pacific          8
San Juan         7
Island           7
Skamania         6
Douglas          6
Clallam          6
Klickitat        6
Walla Walla      5
Franklin         5
Whitman          5
Lincoln          4
Pend Oreille     3
Ferry            3
Wahkiakum        3
Asotin           2
Adams            2
Columbia         1
dtype: int64


In [None]:
m = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=8)

heat_data = list(zip(df['Latitude'], df['Longitude']))
HeatMap(heat_data).add_to(m)

for _, row in charging_stations_df_db.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        icon=folium.Icon(color="red", icon="bolt", prefix="fa"),
        popup=f"Charging Station in {row['County']}"
    ).add_to(m)

m

**Observations:**

The map illustrates that there are clear patterns of EV uptake across Washington State, highlighting urban hotspots, suburban growth possibilities, and rural access difficulties.

Seattle, Bellevue, and Tacoma emerge as high-density EV clusters, with high levels of demand for fast-charging points.

Spokane, Everett, and Olympia's moderate-density clusters indicate growing suburban EV uptake and, as such, are good locations to develop infrastructure.

In contrast, rural communities in Eastern Washington exhibitresult in scattered clusters, indicating low but growing adoption constrained by restricted charging opportunities.