## AI-Driven Optimal Placement of Electric Vehicles Charging Stations in Kenya

### Business Understanding 

#### Backgound Information & Overview

Kenya is undergoing a transportation and energy transformation, with electric vehicle (EV) adoption increasing due to rising fuel costs, government incentives, and a global push for sustainability efforts. However, the absence of a data-driven approach to charging station placement is slowing down EV adoption. Currently, charging station deployment is largely arbitrary, reactive, or limited to a few locations, leading to underutilization, range anxiety, and inefficient infrastructure investment.

#### Problem Statement

The adoption of electric vehicles (EVs) in Kenya is increasing, but the absence of a well-planned, optimized EV charging infrastructure remains a major barrier to widespread adoption. Current charging stations are placed without data-driven insights, leading to low utilization rates, inconvenient locations, and poor return on investment for operators.

* current number EV charging stations
* 

#### Proposed Solution

By integrating machine learning, geospatial analytics, and optimization models, this AI-driven platform will revolutionize EV infrastructure planning in Kenya. The solution ensures that charging stations are placed where they are most needed, cost-effective, and energy-efficient, paving the way for a sustainable and profitable EV ecosystem.

* Using K-Means Clustering, DBSCAN, and Hierarchical Clustering to help map out the best possible station locations based on geography and infrastructure constraints.
* Use Graph-based Routing and Dijkstra’s Algorithm to ensure stations are placed within an optimal travel distance for EV users. For example ensuring no driver needs to travel more than 5 km to find a charging station.
* Use Random Forest Regression, XGBoost, and Gradient Boosting Machines (GBM) to identify the key drivers of charging station demand based on traffic volume, population density, nearby commercial hubs, weather conditions and charging station accessibility.

#### Objectives

This project seeks to solve this by developing an AI-powered platform that leverages machine learning, geospatial data, and predictive analytics to identify optimal locations for EV charging stations.

The platform will enable:

* EV charging network planners to maximize utilization and profitability by selecting high-demand locations.
* Government agencies to accelerate green mobility initiatives through data-backed decision-making.
* Investors to make informed funding decisions, ensuring high ROI.
* EV users to access conveniently located charging stations, improving the overall user experience.

#### Metrics of Success

1. The model should correctly predict at least 90% of high-demand locations, minimizing false positives and negatives when identifying optimal sites. 
2. The model should achieve an R² score of at least 0.85, ensuring strong correlation between predicted and actual charging demand.
3. At least 80% of the suggested locations should be within 500 meters of a power grid connection, ensuring practical deployment feasibility.
4. The model should maintain an accuracy above 85% when tested on new urban areas, ensuring adaptability as Nairobi’s EV market grows.





### Data Understanding

The data is sourced from the U.S. Department of Transportation, Bureau of Transportation Statistics, about Electric Vehicle Public Charging Stations in the United States as of January 2020. Link - https://data-usdot.opendata.arcgis.com/datasets/alternative-fueling-stations/explore

### Importin the Relevant Libraries

In [199]:
import matplotlib.pyplot as plt
import seaborn as sns
import Functions as Mf
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import pandas as pd 
from sklearn.impute import SimpleImputer
from  sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier

In [176]:
import pandas as pd

#Alternative_Fueling_Stations.csv

# Load the first sheet into a DataFrame
df = pd.read_csv('Alternative_Fueling_Stations.csv')

# Display the first few rows to understand the structure
df.head()


  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,OBJECTID,access_code,access_days_time,access_detail_code,cards_accepted,date_last_confirmed,expected_date,fuel_type_code,groups_with_access_code,id,...,bd_blends_fr,groups_with_access_code_fr,ev_pricing_fr,federal_agency_id,federal_agency_code,federal_agency_name,ev_network_ids_station,ev_network_ids_posts,x,y
0,1,private,,,,10/11/2024 12:00:00 AM,,CNG,Private,17.0,...,,Privé,,,,,,,-86.267021,32.367916
1,2,private,,GOVERNMENT,,2/12/2024 12:00:00 AM,,CNG,Private - Government only,45.0,...,,Privé - Réservé au gouvernement,,,,,,,-84.367461,33.821911
2,3,private,,,,12/13/2023 12:00:00 AM,,CNG,Private,64.0,...,,Privé,,,,,,,-84.543822,33.760256
3,4,public,24 hours daily,CREDIT_CARD_ALWAYS,CREDIT M V Voyager,4/14/2024 12:00:00 AM,,CNG,Public - Credit card at all times,73.0,...,,Public - Carte de crédit en tout temps,,,,,,,-94.375338,35.362213
4,5,public,24 hours daily; call 866-809-4869 for Clean En...,CREDIT_CARD_ALWAYS,A CleanEnergy Comdata D FuelMan M V Voyager Wr...,12/10/2024 12:00:00 AM,,CNG,Public - Credit card at all times,81.0,...,,Public - Carte de crédit en tout temps,,,,,,,-71.026549,42.374706


In [177]:
df.shape

(97882, 80)

The dataset contains information about alternative fueling stations, including EV charging stations. Key columns relevant for EV Charging Station Placement Analysis include:

* access_code – Whether the station is public or private.
* access_days_time – Availability (e.g., 24-hour access).
* fuel_type_code – Identifies the fuel type (e.g., EV for Electric Vehicles).
* groups_with_access_code – Specifies access permissions.
* ev_pricing_fr – Pricing model for EV charging.
* ev_network_ids_station – Identifies the charging network.
* x, y – Longitude and Latitude coordinates of the station (for mapping).
* federal_agency_name – Indicates government or private ownership.

In [178]:
df['fuel_type_code'].value_counts()

ELEC    85398
E85      4719
LPG      2865
BD       1793
RD       1519
CNG      1372
LNG       112
HY        103
Name: fuel_type_code, dtype: int64

* From the above value_counts of the fuel_type_code, we are interested in the electrical vehicles. We proceed to filtering the dataframe for the 'ELEC' fuel_type_code.
* Some of the features are irrelevant to our analysis; First, determine the features of interest in the analysis and filter the dataframe accordingly.

In [179]:
df = df[df['fuel_type_code'] == 'ELEC']
df

Unnamed: 0,OBJECTID,access_code,access_days_time,access_detail_code,cards_accepted,date_last_confirmed,expected_date,fuel_type_code,groups_with_access_code,id,...,bd_blends_fr,groups_with_access_code_fr,ev_pricing_fr,federal_agency_id,federal_agency_code,federal_agency_name,ev_network_ids_station,ev_network_ids_posts,x,y
125,126,private,Fleet use only,,,9/14/2023 12:00:00 AM,,ELEC,Private,1517.0,...,,Privé,,,,,,,-118.387971,34.248319
126,127,public,5:30am-9pm; pay lot,,,1/10/2023 12:00:00 AM,,ELEC,Public,1523.0,...,,Public,,,,,,,-118.271387,34.040539
127,128,private,For fleet and employee use only,,,9/14/2023 12:00:00 AM,,ELEC,Private,1525.0,...,,Privé,,,,,,,-118.248589,34.059133
128,129,private,Fleet use only,,,1/9/2024 12:00:00 AM,,ELEC,Private,1531.0,...,,Privé,,,,,,,-118.096665,33.759802
129,130,private,Fleet use only,,,1/9/2024 12:00:00 AM,,ELEC,Private,1552.0,...,,Privé,,,,,,,-118.265628,33.770508
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97877,94193,public,24 hours daily,,,2/10/2025 12:00:00 AM,,ELEC,Public,383892.0,...,,Public,,,,,"[""""USCPIL17039901""""]","[""""26885821""""]",-96.987131,33.021393
97878,94194,public,,,,2/10/2025 12:00:00 AM,,ELEC,Public,383893.0,...,,Public,,,,,"[""""b9364272-e5f2-11ef-ae25-42010aa40043""""]","[""7a0a7f85-629d-4585-ada1-6e13dcc1a5cd_1"",""80b...",-80.357119,25.598390
97879,94195,public,,,,2/10/2025 12:00:00 AM,,ELEC,Public,383894.0,...,,Public,,,,,"[""""d43bff62-e5f2-11ef-9155-42010aa40043""""]","[""14b8de62-6110-44a7-a1e5-7f8fbfee2288_1"",""2a9...",-73.995734,40.674470
97880,94196,public,,,,2/10/2025 12:00:00 AM,,ELEC,Public,383895.0,...,,Public,,,,,"[""""d4e30294-e5f2-11ef-96cf-42010aa40043""""]","[""3ae60df2-c3c0-46dc-86c2-52ce04a25b86_1"",""b52...",-122.281623,37.783456


In [180]:
# Display the columns
print(df.columns)

Index(['OBJECTID', 'access_code', 'access_days_time', 'access_detail_code',
       'cards_accepted', 'date_last_confirmed', 'expected_date',
       'fuel_type_code', 'groups_with_access_code', 'id',
       'maximum_vehicle_class', 'open_date', 'owner_type_code',
       'restricted_access', 'status_code', 'funding_sources', 'facility_type',
       'station_name', 'station_phone', 'updated_at', 'geocode_status',
       'latitude', 'longitude', 'city', 'country', 'intersection_directions',
       'plus4', 'state', 'street_address', 'zip', 'bd_blends',
       'cng_dispenser_num', 'cng_fill_type_code', 'cng_has_rng', 'cng_psi',
       'cng_renewable_source', 'cng_total_compression', 'cng_total_storage',
       'cng_vehicle_class', 'e85_blender_pump', 'e85_other_ethanol_blends',
       'ev_connector_types', 'ev_dc_fast_num', 'ev_level1_evse_num',
       'ev_level2_evse_num', 'ev_network', 'ev_network_web', 'ev_other_evse',
       'ev_pricing', 'ev_renewable_source', 'ev_workplace_charging',



Geographic Coordinates:

Latitude and Longitude: These are the geographic coordinates that specify the exact location of each charging station. They are crucial for mapping and spatial analysis, allowing you to visualize where charging stations are located and identify areas that may lack coverage.
Station Specifications:

Connector Types: This refers to the types of connectors available at the charging station (e.g., Type 1, Type 2, CCS, CHAdeMO). Different EV models require different types of connectors, so knowing what is available helps ensure compatibility with various vehicles.
Charging Levels: Charging stations can provide different levels of charging (e.g., Level 1, Level 2, DC Fast Charging). Level 1 is the slowest and typically uses a standard outlet, while Level 2 is faster and requires a dedicated charging unit. DC Fast Charging is the quickest option, allowing for rapid charging. Understanding these levels helps in determining the charging speed and convenience for users.
Access and Pricing Information:

Access Code: This indicates how users can access the charging station (e.g., open to the public, requires a membership, etc.). This information is important for understanding the usability of the station for potential EV users.
Pricing: This includes information on how much it costs to use the charging station. Pricing can influence user behavior and the overall attractiveness of a charging station. Knowing the pricing structure helps in assessing the economic viability of placing new stations.
Network Connectivity:

EV Network: This refers to the network or service provider that operates the charging station. Some networks may offer better coverage, reliability, or user experience than others. Understanding the network connectivity helps in evaluating the overall infrastructure and support available for EV users.
These elements are essential for making informed decisions about where to place new EV charging stations, ensuring they meet user needs and promote the adoption of electric vehicles in Kenya

In [181]:
relevant_columns = [
    'station_name', 'latitude', 'longitude', 'city', 'country', 'state',
    'street_address', 'status_code', 'access_code',
    'ev_connector_types', 'ev_dc_fast_num', 'ev_level1_evse_num', 
    'ev_level2_evse_num', 'ev_network', 'ev_pricing'
]

Ev_df = df[relevant_columns]
Ev_df

Unnamed: 0,station_name,latitude,longitude,city,country,state,street_address,status_code,access_code,ev_connector_types,ev_dc_fast_num,ev_level1_evse_num,ev_level2_evse_num,ev_network,ev_pricing
125,LADWP - Truesdale Center,34.248319,-118.387971,Sun Valley,US,CA,11797 Truesdale St,E,private,"[""CHADEMO"",""J1772"",""J1772COMBO""]",2.0,,57.0,SHELL_RECHARGE,
126,Los Angeles Convention Center,34.040539,-118.271387,Los Angeles,US,CA,1201 S Figueroa St,E,public,"[""""J1772""""]",,,7.0,Non-Networked,Free; parking fee
127,LADWP - John Ferraro Building,34.059133,-118.248589,Los Angeles,US,CA,111 N Hope St,E,private,"[""CHADEMO"",""J1772"",""J1772COMBO""]",12.0,,338.0,Non-Networked,
128,LADWP - Haynes Power Plant,33.759802,-118.096665,Long Beach,US,CA,6801 E 2nd St,E,private,"[""CHADEMO"",""J1772"",""J1772COMBO""]",1.0,,19.0,Non-Networked,
129,LADWP - Harbor Generating Station,33.770508,-118.265628,Wilmington,US,CA,161 N Island Ave,E,private,"[""""J1772""""]",,,10.0,Non-Networked,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97877,SHOP CHARGING S SHOP 2,33.021393,-96.987131,Lewisville,US,TX,1547 S Stemmons Fwy,E,public,"[""""J1772""""]",,,1.0,ChargePoint Network,
97878,Bay Point Rentals,25.598390,-80.357119,Miami,US,FL,18412 Homestead Ave,E,public,"[""""J1772""""]",,,7.0,CHARGELAB,
97879,"Out of Service - Energy Conservation & Supply,...",40.674470,-73.995734,Brooklyn,US,NY,53 9th St,E,public,"[""""J1772""""]",,,10.0,CHARGELAB,
97880,Out of Service - Dignity Moves,37.783456,-122.281623,Alameda,US,CA,2350 5th St,E,public,"[""""J1772""""]",,,2.0,CHARGELAB,


## EV Charging Station Data Dictionary  

### **Location Information**  
- **`station_name`**: The name of the EV charging station, typically identifying the location or brand.  
- **`latitude`**: The north-south geographic coordinate of the station (in degrees).  
- **`longitude`**: The east-west geographic coordinate of the station (in degrees).  
- **`city`**: The city where the EV charging station is located.  
- **`state`**: The state or region where the station is located (if applicable).  
- **`country`**: The country where the EV charging station is located.  
- **`street_address`**: The full street address, including building number and street name.  

### **Operational Status & Access**  
- **`status_code`**: Indicates the station’s current status (e.g., "Available," "Out of Service," "In Use").  
- **`access_code`**: A code or identifier required to access the charging station (e.g., PIN, card number).  

### **Charging Equipment & Capabilities**  
- **`ev_connector_types`**: The types of connectors available (e.g., Type 1, Type 2, CCS, CHAdeMO).  
- **`ev_dc_fast_num`**: The number of **DC fast chargers**, which provide rapid charging.  
- **`ev_level1_evse_num`**: The number of **Level 1 EVSE units**, which offer slow charging (typically for residential use).  
- **`ev_level2_evse_num`**: The number of **Level 2 EVSE units**, which offer faster charging and are commonly found in public networks.  

### **Network & Pricing**  
- **`ev_network`**: The name of the network operating the station (e.g., Tesla Supercharger, ChargePoint).  
- **`ev_pricing`**: Information on the station's cost structure (e.g., per-minute, per-kWh, or membership-based).  


In [182]:
(Ev_df.isna().sum() / len(Ev_df) * 100).sort_values(ascending=False)

ev_level1_evse_num    99.241200
ev_dc_fast_num        84.649523
ev_pricing            81.877796
ev_level2_evse_num    18.306049
ev_connector_types     4.359587
ev_network             4.334996
street_address         0.037472
city                   0.003513
station_name           0.001171
access_code            0.000000
status_code            0.000000
state                  0.000000
country                0.000000
longitude              0.000000
latitude               0.000000
dtype: float64

* Any column with more that 20% of missing value; should be dropped.
Rationale: the column, ev_level2_evse_num, is critical for our analysis, this the dropping threshold should cater to retain it.

In [183]:
df_Filtered = Ev_df.drop(columns = ['ev_level1_evse_num', 'ev_dc_fast_num', 'ev_pricing'], axis = 1)
df_Filtered.head()

Unnamed: 0,station_name,latitude,longitude,city,country,state,street_address,status_code,access_code,ev_connector_types,ev_level2_evse_num,ev_network
125,LADWP - Truesdale Center,34.248319,-118.387971,Sun Valley,US,CA,11797 Truesdale St,E,private,"[""CHADEMO"",""J1772"",""J1772COMBO""]",57.0,SHELL_RECHARGE
126,Los Angeles Convention Center,34.040539,-118.271387,Los Angeles,US,CA,1201 S Figueroa St,E,public,"[""""J1772""""]",7.0,Non-Networked
127,LADWP - John Ferraro Building,34.059133,-118.248589,Los Angeles,US,CA,111 N Hope St,E,private,"[""CHADEMO"",""J1772"",""J1772COMBO""]",338.0,Non-Networked
128,LADWP - Haynes Power Plant,33.759802,-118.096665,Long Beach,US,CA,6801 E 2nd St,E,private,"[""CHADEMO"",""J1772"",""J1772COMBO""]",19.0,Non-Networked
129,LADWP - Harbor Generating Station,33.770508,-118.265628,Wilmington,US,CA,161 N Island Ave,E,private,"[""""J1772""""]",10.0,Non-Networked


In [184]:
df_Filtered['status_code'].nunique()

3

In [185]:
df_Filtered['ev_network'].value_counts()[:10]

ChargePoint Network    41946
Non-Networked           9705
Blink Network           7441
Tesla Destination       4961
SHELL_RECHARGE          2707
Tesla                   2529
EV Connect              1789
eVgo Network            1155
Electrify America       1040
AMPUP                   1032
Name: ev_network, dtype: int64

#### Considerations

* The USA dataset constitutes all states; *QUESTION* To ensure similarity to our Nairobi scenario, is it important that we consider maybe one major state, then a specific city? Maybe consider a city with the highest number of charging stations.

* How are we going to deal with the street addresses? we could not onehot encode all the 55896 unique instances.

Rationale:
Street addresses might not be particularly useful for clustering EV charging stations based on location, given that we have all the geographic coordinates provided( latitudes and longitudes). They provide a more precise and direct indication of the stations physical positions. Street addresses will add more complexity since they vary in format. The street address could only be helpful for field/ ground application where the predicted Ev station coordinate can be mapped to the nearest street (geocoding). The street addresses does not directly impact geographical Ev stations clustering when we already have the latitudes and longitudes.

* Station_ name
Drop this column. Rationale, station names are surface representations of Ev stations. The dataset has a total of 78629 different stations. Encoding these stations for the purpose of clustering will only increase sparsity of our data. Moreover, the stations are precisely represented by geographic cordinates. 


* City
There are 7206 different cities in the data set. In these cities, there are various charging stations referenced by geographic coordinates, station names and street addresses. Encoding this column would increase sparsity with minimal significance on the clustering accuracy.

* State.
There are 52 US states. Consideration on the effect of encoding this column and its influence on the cluster model. *QUESTION*; would the different states be good representations of different regions of Nairobi city. Better results can be achieved by filtering for a specific state, with highest value counts, and filter for specific cities, and map the Ev station to the clusters. State-level differences might capture regional variations, but these variations might be subtle compared to geographic coordinates.


* How to deal with the ev_connector_types; contains a list of various conncetors: 
*Opt to drop this row; Rationale, the project aim is to locate the best location for these connectors based on the geographic and other operational data. This column is not of much significance to our model.* Onehot encoding this feature would only increase more sparsity to our dataset, thus influencing the cluster algorithm.

* ev_level2_evse_num, drop this column. 
Rationale; this column is not relevant for our analysis since it only focus on the number of units at a charging station over the strategic position for the station.

* ev_network
Drop this column. Rationale; the ev_network reflects the suppliers or the managers of the charging stations.Encoding this column could only increase sparsity of the data with no significant adjustment to the accuracy of the cluster model.



###### NB, consider only the features that have significant influence on the locations of the Ev charging stations

* Label encode the rest the categoricals columns