# **Electric Vehicles Data Analysis Project**

## Task - 2 : **Chloropleth**

#### **Problem Statement**

Create a Choropleth using plotly.express to display the charging frequency based on location. Location should be in the level of ZIP code.

#### **Step 1 : Loading the dataset**

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('vehicles_dataset.csv')

In [3]:
data = data.dropna()

In [4]:
data.head(3)

Unnamed: 0,VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
2,JN1AZ0CP8B,Yakima,Yakima,WA,98901,2011,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,73,0,15.0,218972519,POINT (-120.50721 46.60448),PACIFICORP,53077001602
3,1G1FW6S08H,Skagit,Concrete,WA,98237,2017,CHEVROLET,BOLT EV,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238,0,39.0,186750406,POINT (-121.7515 48.53892),PUGET SOUND ENERGY INC,53057951101
4,3FA6P0SU1K,Snohomish,Everett,WA,98201,2019,FORD,FUSION,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,26,0,38.0,2006714,POINT (-122.20596 47.97659),PUGET SOUND ENERGY INC,53061041500


#### **Step 2 : Setting Up Data for the Choropleth**

In [5]:
# fixing the data type of Postal Code column (int-->str)
data['Postal Code'] = data['Postal Code'].astype('str').str.strip()

In [6]:
# checking data types of the Postal Code column
for val in data['Postal Code'][0:4]:
    print(type(val))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


In [7]:
# grouping the dataframe based on postal codes
# size or count of rows in each group is determined, this is
# same as the number of charging events that occurred in that location
# this is then stored as a new dataframe 'loc_df'
loc_df = data.groupby(['Postal Code','County']).size().reset_index(name='No. of Charging Events')

In [8]:
loc_df.head()

Unnamed: 0,Postal Code,County,No. of Charging Events
0,98001,King,465
1,98002,King,165
2,98003,King,312
3,98004,King,2001
4,98005,King,829


In [9]:
loc_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 556 entries, 0 to 555
Data columns (total 3 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Postal Code             556 non-null    object
 1   County                  556 non-null    object
 2   No. of Charging Events  556 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 13.2+ KB


#### **Step 4 : Collecting data from GeoJSON file**

In [10]:
# for reading the geojson file
import geopandas as gpd

I have used the following link for getting geojson file data :
https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/wa_washington_zip_codes_geo.min.json

**GeoJSON file ?**

* geojson is a json file format for storing geographical features data
* it is basically a dictionary
* each feature object is a dictionary with 3 keys : 'type', 'geometry', 'properties'
* the 'type' key of a feature object has the value 'Feature' indicating that it is a feature object
* the 'geometry' key holds th information like feature type, feature coordinates etc.
* the 'properties' key contains the attributes or metadata of the feature
* In our case the key "ZCTA5CE10" of properties dictionary, stores the zip code values
* This particular key matches with the column 'Postal Code' in our dataframe

In [11]:
# reading the goejson file into a geodataframe
geo_df = gpd.read_file('https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/wa_washington_zip_codes_geo.min.json')

In [12]:
# how does the geodataframe look like?
geo_df.head(3)

Unnamed: 0,STATEFP10,ZCTA5CE10,GEOID10,CLASSFP10,MTFCC10,FUNCSTAT10,ALAND10,AWATER10,INTPTLAT10,INTPTLON10,PARTFLG10,geometry
0,53,98822,5398822,B5,G6350,S,1131837710,5582389,47.9019257,-120.5504512,N,"POLYGON ((-120.47985 47.68373, -120.48008 47.6..."
1,53,98821,5398821,B5,G6350,S,4754899,198324,47.5497169,-120.5586129,N,"POLYGON ((-120.57188 47.55317, -120.57191 47.5..."
2,53,98357,5398357,B5,G6350,S,110004759,462073,48.3338551,-124.635404,N,"MULTIPOLYGON (((-124.74255 48.39175, -124.7424..."


In [13]:
geo_df.columns[1]

'ZCTA5CE10'

#### **Step 5 : Merging data to the GeoDataFrame**

In [14]:
merged_df = geo_df.merge(loc_df, left_on="ZCTA5CE10", right_on="Postal Code")

In [15]:
merged_df.shape

(553, 15)

In [16]:
merged_df.head(3)

Unnamed: 0,STATEFP10,ZCTA5CE10,GEOID10,CLASSFP10,MTFCC10,FUNCSTAT10,ALAND10,AWATER10,INTPTLAT10,INTPTLON10,PARTFLG10,geometry,Postal Code,County,No. of Charging Events
0,53,98822,5398822,B5,G6350,S,1131837710,5582389,47.9019257,-120.5504512,N,"POLYGON ((-120.47985 47.68373, -120.48008 47.6...",98822,Chelan,15
1,53,98357,5398357,B5,G6350,S,110004759,462073,48.3338551,-124.635404,N,"MULTIPOLYGON (((-124.74255 48.39175, -124.7424...",98357,Clallam,7
2,53,98663,5398663,B5,G6350,S,11134084,70154,45.6573955,-122.6631613,N,"POLYGON ((-122.67066 45.64994, -122.67066 45.6...",98663,Clark,171


#### **Step 6 : Creating the Choropleth**

In [17]:
import plotly.express as px # for creating the choropleth
import plotly.io as pio # for visualizing the choropleth in the web browser
# visualizing the choropleth in the notebook can make the notebook heavier

In [18]:
# to view the choropleth directly in default web browser
pio.renderers.default = 'browser'

In [19]:
# Create the choropleth map
fig = px.choropleth(merged_df,
                    geojson=merged_df.geometry,  # Use the geometry column 
                    locations=merged_df.index,   # Use index after merge
                    color="No. of Charging Events", # Color based on Number of EV charging events 
                    projection="miller",       # Projection for the map
                    hover_name='Postal Code',      # Display zip code on hover
                    hover_data={'County':True},
                    title="EV CHARGING HOTSPOTS IN WASHINGTON STATE",
                    width=1500,
                    height=800)

**Why use df.geometry ?**

* is used to specify the geographic boundaries that will be used to create the choropleth map
* geometry column of merged_df contains the geometries (shapes) of these geographic boundaries, such as polygons or points.
* By passing this column to the geojson parameter, we ask Plotly to use these geometries as the base for the map.

In [20]:
# formatting the title for better visibility
fig.update_layout(
    title_font=dict(
        size=30,
        family="Cooper Black",
        color="teal"
    ),
    title_x=0.5,  # Center the title horizontally
    title_y=0.95,  # Adjust the vertical position of the title
)

In [21]:
# Fit the map to the bounds of the selected zip codes and hide extra layers
fig.update_geos(fitbounds="locations", visible=False)
# Show the map
fig.show()