# EV Charging Station Siting Analysis
### Modified by Dr. Harry Patria at Patria & Co.

## Table of contents
* [Introduction: Business Problem Outline and Target Audience](#introduction)
* [Data Sources](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction: Business Problem Outline and Target Audience <a name="introduction"></a>

_Background:_ How to select the best location for optimizing the use of EV charging station?.<br>

_Business Problem Outline_ <br>

Electric Vehicles (EVs) are gaining popularity due to absence of local emissions, and certain technical advantages that EVs have over conventional vehicles (faster acceleration, no need for periodic service requirements etc.). This is not to say that there are no barriers to the widespread acceptance of EVs. One of the primary drawbacks is the lack of access to charging equipments at public locations.<br>

In this project, we look at the city of Raleigh in North Carolina, where I currently reside. The objective is try to come up with a siting analysis for public EV charging station installations - preferably close to local shops and restaurants so that people can charge their cars while getting groceries or sharing a meal with their loved ones. One of the indirect benefits of EV charging that is often talked about is contribution to the local economy - i.e. people tend to spend money on nearby shops when waiting on charging their EVs <br>

To be specific, the area of interest in this case is within a 100 km radius of downtown Raleigh in North Carolina. This approximately covers the towns of Durham, Raleigh and Chapel Hill. As of 2020, the population of the larger Raleigh-Durham-Chapel Hill Combined Statistical Area (CSA) is estimated at 2.03 million. Hailed as a technological hub within the state, the mean age of residents range between between 26 to 35 years. This area may be considered a suitable case study location for testing out algorithms ranking retail facilities that could benefit from EV charger installations. <br>

Note that the approach that I am taking is pretty simplistic - I am aiming to identify retail locations (primarily grocery shops, restauraunts and some service industries) that have relatively few EV chargers in the vicinity. A list of suitable retail locations that could benefit from an EV charging installation in its neighborhood is provided as input to a clustering algorithm. This algorithm determines locations that could improve access to EV charging in our region of interest. <br>

Outputs include a map with clusters of retail locations that could benefit from EV charger installations and a list of suitable locations for EV charging installations.

_Target Audience_ <br> 

This project is primarily aimed at any one interested in Data Science applications in the Energy sector, but I hope that other Data Science enthusiasts will also find ways to repurpose the code. Data Science techniques and/or Machine Learning Algorithms often continue to remain abstract ideas until we apply them to problems. I hope this project provides some insight into translating abstract ideas into specific outputs that could aid decision making in business ventures. 

## Data Sources <a name="data"></a>

The following publicly available data-sets have been used to perform the analysis
1. <a href = "https://foursquare.com/"> API Foursquare - This database provides information on locations of popular restaurants </a>
2. <a href = "https://afdc.energy.gov/fuels/electricity_locations.html#/find/nearest?fuel=ELEC"> AFDC Database - This database provides information on preexisting EV charging locations </a>


## Methodology <a name="methodology"></a>

__Step 0__: Import relevant Python libraries

In [1]:
# Following line needs to be uncommented if geopy is not installed
#!conda install -c conda-forge geopy --yes 
#!pip3.9 install geopy
# Following line needs to be uncommented if geopy is not installed
#!conda install -c conda-forge folium=0.5.0 --yes
#!pip3.9 install folium
from functools import partial
#import geopy.geocoders
from geopy.geocoders import Nominatim
#install openpyxl
#!pip3.9 install openpyxl

In [2]:
import pandas as pd # this library offers data structures and operations for manipulating numerical tables 
import numpy as np # this library is used for Scientific Computing

import requests # library to handle requests
import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Import libraries for clustering analysis
import sklearn.neighbors
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # plotting library

print('Libraries imported.')

Libraries imported.


__Step 1__: Read in the data from AFDC and Foursquare API

In [3]:
# Importing locations of existing public EVSE chargers from AFDC
EVSE = pd.read_excel(open('AFDC_Data.xlsx','rb'), sheet_name='AFDC_Master_Data_NC')
EVSE.head()

Unnamed: 0,Station Name,Street Address,City,ZIP,EV Level1 EVSE Num,EV Level2 EVSE Num,EV DC Fast Count,Latitude,Longitude,Date Last Confirmed,ID
0,DUKE ENERGY,410 S Mint St,Charlotte,28202,,1.0,,35.226914,-80.850182,2020-05-17,38892
1,City of Raleigh - Municipal Building,285 W Hargett St,Raleigh,27601,,2.0,,35.778416,-78.64347,2019-11-08,39016
2,City of Raleigh - Downtown,215 W Cabarrus St,Raleigh,27601,,1.0,,35.77435,-78.642287,2019-11-08,39017
3,Modern Nissan - Concord,967 Concord Pkwy S,Concord,28027,,1.0,1.0,35.392063,-80.622777,2019-09-09,40066
4,Fred Anderson Nissan,4559 Raeford Rd,Fayetteville,28304,,1.0,1.0,35.042419,-78.956747,2019-09-09,40067


In [4]:
# Use Foursquare API to download places of interest within the RDU-Chapel Hill Area

Following cell contains foursquare API credentials and has been removed from public view

In [5]:
CLIENT_ID = '3LV40XKAW4DLMFQYMKV4V2EU0L1TCFVLCQ5PMVIGX44YT4GP' # your Foursquare ID
CLIENT_SECRET = 'ES2DAIMEHEO4KSY2YENFMUHCPMOVDLXPA1PVME3XQZQATFIB' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3LV40XKAW4DLMFQYMKV4V2EU0L1TCFVLCQ5PMVIGX44YT4GP
CLIENT_SECRET:ES2DAIMEHEO4KSY2YENFMUHCPMOVDLXPA1PVME3XQZQATFIB


In [6]:
# Testing connection with Foursquare API

address = '4242 Six Forks Rd, Raleigh, NC 27609'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

35.8378253 -78.640808


Next up - identification of trending spots in the Raleigh... <br> There are a few different ways of identifying trending spots. For purposes of this project we consider two specific Foursquare categories:<br> 
(a) 'Food', and (b) 'Shops & Services'. <br> Refer to <a href = "https://developer.foursquare.com/docs/build-with-foursquare/categories/"> API Foursquare List of Venue Categories </a> for full list of categories

URI to search for a specific venue category
> `https://api.foursquare.com/v2/venues/`**search**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`&query=`**QUERY**`&radius=`**RADIUS**`&limit=`**LIMIT**

For reference, here is the <a href = "https://developer.foursquare.com/docs/build-with-foursquare/categories/"> link </a> to Foursquare API's webpage that lists all parameter definitions for URI 

In [7]:
#
search_query = 'Shops & Services'
radius = 100000 # radius in meters
LIMIt = 500 # Limit on number of venues
print(search_query + ' .... OK!')

Shops & Services .... OK!


In [8]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=3LV40XKAW4DLMFQYMKV4V2EU0L1TCFVLCQ5PMVIGX44YT4GP&client_secret=ES2DAIMEHEO4KSY2YENFMUHCPMOVDLXPA1PVME3XQZQATFIB&ll=35.8378253,-78.640808&v=20180604&query=Shops & Services&radius=100000&limit=500'

In [9]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '62ff8f09fc714a54d1e215c9'},
 'response': {'venues': [{'id': '4ad4c00af964a52037ed20e3',
    'name': 'Shops of Cameron Village',
    'location': {'address': '2034 Cameron St',
     'crossStreet': 'at Daniels St',
     'lat': 35.791233,
     'lng': -78.660793,
     'labeledLatLngs': [{'label': 'display',
       'lat': 35.791233,
       'lng': -78.660793}],
     'distance': 5491,
     'postalCode': '27605',
     'cc': 'US',
     'city': 'Raleigh',
     'state': 'NC',
     'country': 'United States',
     'formattedAddress': ['2034 Cameron St (at Daniels St)',
      'Raleigh, NC 27605',
      'United States']},
    'categories': [{'id': '5744ccdfe4b0c0459246b4dc',
      'name': 'Shopping Plaza',
      'pluralName': 'Shopping Plazas',
      'shortName': 'Shopping Plaza',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/mall_',
       'suffix': '.png'},
      'primary': True}],
    'venuePage': {'id': '61850785'},
    'referralId': '

In [10]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
Retail_df = json_normalize(venues)
Retail_df.head()

  Retail_df = json_normalize(venues)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,venuePage.id
0,4ad4c00af964a52037ed20e3,Shops of Cameron Village,"[{'id': '5744ccdfe4b0c0459246b4dc', 'name': 'S...",v-1660915465,False,2034 Cameron St,at Daniels St,35.791233,-78.660793,"[{'label': 'display', 'lat': 35.791233, 'lng':...",5491,27605.0,US,Raleigh,NC,United States,"[2034 Cameron St (at Daniels St), Raleigh, NC ...",61850785.0
1,4ddbfc573151ee0807532b88,Shops of Cameron Village CAT Bus Stop,"[{'id': '4bf58dd8d48988d1fe931735', 'name': 'B...",v-1660915465,False,Cameron at Daniels,,35.791208,-78.660974,"[{'label': 'display', 'lat': 35.79120760073099...",5499,,US,Raleigh,NC,United States,"[Cameron at Daniels, Raleigh, NC, United States]",
2,4b5eebaff964a520f79d29e3,Park Shops,"[{'id': '4bf58dd8d48988d198941735', 'name': 'C...",v-1660915465,False,101 Current Dr,at North Carolina State University,35.785491,-78.667355,"[{'label': 'display', 'lat': 35.7854910218201,...",6299,27695.0,US,Raleigh,NC,United States,[101 Current Dr (at North Carolina State Unive...,
3,5c3fcfaf35811b002cdeccf1,Shops At Mcneil Pointe,"[{'id': '5744ccdfe4b0c0459246b4dc', 'name': 'S...",v-1660915465,False,,,35.812685,-78.627096,"[{'label': 'display', 'lat': 35.81268494095342...",3060,27608.0,US,Raleigh,NC,United States,"[Raleigh, NC 27608, United States]",
4,500552e8e4b0b02ea225c44d,Park Shops 201,"[{'id': '4bf58dd8d48988d1a0941735', 'name': 'C...",v-1660915465,False,Stinson Drive,,35.785371,-78.667727,"[{'label': 'display', 'lat': 35.78537071485849...",6324,27607.0,US,Raleigh,NC,United States,"[Stinson Drive, Raleigh, NC 27607, United States]",


In [11]:
# Tracking down restaurants in the area
search_query = 'Food'
radius = 100000 # radius in meters
LIMIt = 500 # Limit on number of venues
print(search_query + ' .... OK!')

Food .... OK!


In [12]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=3LV40XKAW4DLMFQYMKV4V2EU0L1TCFVLCQ5PMVIGX44YT4GP&client_secret=ES2DAIMEHEO4KSY2YENFMUHCPMOVDLXPA1PVME3XQZQATFIB&ll=35.8378253,-78.640808&v=20180604&query=Food&radius=100000&limit=500'

In [13]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '62ff8f0afc729374e980c980'},
 'response': {'venues': [{'id': '4b22b0e6f964a520e04b24e3',
    'name': 'Food Lion Grocery Store',
    'location': {'address': '5426 Six Forks Rd',
     'lat': 35.854909950587164,
     'lng': -78.64072713192202,
     'labeledLatLngs': [{'label': 'display',
       'lat': 35.854909950587164,
       'lng': -78.64072713192202}],
     'distance': 1901,
     'postalCode': '27609',
     'cc': 'US',
     'city': 'Raleigh',
     'state': 'NC',
     'country': 'United States',
     'formattedAddress': ['5426 Six Forks Rd',
      'Raleigh, NC 27609',
      'United States']},
    'categories': [{'id': '52f2ab2ebcbc57f1066b8b46',
      'name': 'Supermarket',
      'pluralName': 'Supermarkets',
      'shortName': 'Supermarket',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/food_grocery_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1660915466',
    'hasPerk': False},
   {'id': '4ed01

In [14]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
Restaurants_Groceries_df = json_normalize(venues)
Restaurants_Groceries_df.head()

  Restaurants_Groceries_df = json_normalize(venues)


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,venuePage.id,location.neighborhood
0,4b22b0e6f964a520e04b24e3,Food Lion Grocery Store,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",v-1660915466,False,5426 Six Forks Rd,35.85491,-78.640727,"[{'label': 'display', 'lat': 35.85490995058716...",1901,27609,US,Raleigh,NC,United States,"[5426 Six Forks Rd, Raleigh, NC 27609, United ...",,,
1,4ed01e2b775bbb5f32900c8a,Target - Food Avenue,"[{'id': '4bf58dd8d48988d120951735', 'name': 'F...",v-1660915466,False,North Hills,35.838319,-78.641821,"[{'label': 'display', 'lat': 35.8383193183181,...",106,27609,US,Raleigh,NC,United States,"[North Hills, Raleigh, NC 27609, United States]",,,
2,4b071d63f964a52087f722e3,Food Lion Grocery Store,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",v-1660915466,False,4317 Fall Of The Neuse Rd,35.841043,-78.613129,"[{'label': 'display', 'lat': 35.84104260746239...",2523,27609,US,Raleigh,NC,United States,"[4317 Fall Of The Neuse Rd, Raleigh, NC 27609,...",,,
3,4ba15048f964a520a9ab37e3,Food Bank of Central & Eastern NC,"[{'id': '50328a8e91d4c4b30a586d6c', 'name': 'N...",v-1660915466,False,3808 Tarheel Dr,35.831524,-78.607639,"[{'label': 'display', 'lat': 35.83152395695958...",3074,27609,US,Raleigh,NC,United States,[3808 Tarheel Dr (btwn Wolfpack Lane & St Alba...,btwn Wolfpack Lane & St Albans Dr,,
4,4b2a335bf964a52003a624e3,Food Lion Grocery Store,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",v-1660915466,False,1121 Falls River Ave Ste 101,35.906078,-78.589293,"[{'label': 'display', 'lat': 35.90607806421702...",8906,27614,US,Raleigh,NC,United States,"[1121 Falls River Ave Ste 101, Raleigh, NC 276...",,,


Now that we have a list of retail services, groceries and restaurants in the area, let's combine the two data-sets 

In [15]:
Retail_df = Retail_df[['name', 'location.lat', 'location.lng']]

In [16]:
Restaurants_Groceries_df = Restaurants_Groceries_df[['name', 'location.lat', 'location.lng']]

In [17]:
Locations_Venue = pd.concat([Retail_df, Restaurants_Groceries_df], ignore_index=True)
Locations_Venue.rename(columns={'name':'Venue Name'}, 
                 inplace=True)
Locations_Venue.head()

Unnamed: 0,Venue Name,location.lat,location.lng
0,Shops of Cameron Village,35.791233,-78.660793
1,Shops of Cameron Village CAT Bus Stop,35.791208,-78.660974
2,Park Shops,35.785491,-78.667355
3,Shops At Mcneil Pointe,35.812685,-78.627096
4,Park Shops 201,35.785371,-78.667727


__Step 2:__ Now that we have the geographic coordinates of businesses in the area, let us determine the distance between each venue obtained from the foursquare API and each EVSE outlet location: <br>
Procedure to calculate the distance has been obtained from this article in medium: <a href = "https://medium.com/@danalindquist/finding-the-distance-between-two-lists-of-geographic-coordinates-9ace7e43bb2f"> Finding the distance between two lists of geographic coordinates </a> by Dana Lindquist

In [18]:
# Create two dataframes with names of EVSE outlets and retail venues, and lat-long in degrees

Locations_EVSE = EVSE[['Station Name','Latitude', 'Longitude']].copy()

Locations_EVSE.rename(columns={'Station Name':'EVSE Station Name'}, 
                 inplace=True)


In [19]:
# add columns with radians for latitude and longitude
Locations_EVSE[['Latitude', 'Longitude']] = (
    np.radians(Locations_EVSE.loc[:,['Latitude', 'Longitude']])
)
Locations_Venue[['location.lat', 'location.lng']] = (
    np.radians(Locations_Venue.loc[:,['location.lat', 'location.lng']])
)

The distance computed here is a haversine distance. This assumes the earth is a true sphere which makes for a relatively fast computation. The sklearn computation assumes the radius of the sphere is 1, so to get the distance in miles we multiply the output of the sklearn computation by 3959 miles, the average radius of the earth. To get the distance in kilometers this number would be 6371 km.

In [20]:
dist = sklearn.neighbors.DistanceMetric.get_metric('haversine')
dist_matrix = (dist.pairwise
    (Locations_EVSE[['Latitude', 'Longitude']],
     Locations_Venue[['location.lat', 'location.lng']])*6371
)
# Note that 6371 is the radius of the earth in km
df_dist_matrix = (
    pd.DataFrame(dist_matrix,index=Locations_EVSE['EVSE Station Name'], 
                 columns=Locations_Venue['Venue Name'])
)



In [21]:
df_dist_matrix.head()

Venue Name,Shops of Cameron Village,Shops of Cameron Village CAT Bus Stop,Park Shops,Shops At Mcneil Pointe,Park Shops 201,Bass Pro Shops,Shops at Town Station,Shops at Brennan Station,Flip Flop Shops,Shops at Oberlin Court,...,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,Food Lion Grocery Store,The Streets at Southpoint Food Court
EVSE Station Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
DUKE ENERGY,207.864559,207.848091,207.112806,211.470816,207.076829,199.776821,194.649437,209.861712,207.744806,208.165049,...,232.303378,191.718493,226.209711,187.869197,227.394111,193.736678,211.074442,198.580637,222.376446,188.320491
City of Raleigh - Municipal Building,2.114912,2.125133,2.293743,4.08669,2.320789,12.843331,14.743878,13.894064,7.738003,2.628716,...,23.798727,26.582474,22.262171,32.346114,26.716136,31.519437,20.274784,36.351329,23.468931,30.377019
City of Raleigh - Downtown,2.512168,2.520966,2.578477,4.477435,2.601618,13.146853,14.834826,14.358494,8.191014,3.067138,...,23.733678,26.904986,21.979695,32.663094,26.385087,31.897069,19.813195,36.774346,23.091992,30.683432
Modern Nissan - Concord,182.871199,182.854637,182.147432,186.388969,182.11171,174.371684,169.568757,184.191349,182.41182,183.129647,...,207.648152,165.707766,202.169735,161.643572,203.704491,167.38197,187.535261,171.937466,198.707459,162.203175
Fred Anderson Nissan,87.476766,87.469043,86.687799,90.708159,86.664865,89.215991,82.219775,98.474145,92.196897,88.215571,...,98.694247,95.379101,87.377717,97.695319,84.575383,102.51852,71.979064,110.67489,80.791157,95.908444


__Step 3:__ Now let's count the number of EVSE Stations that are at distance which is less than or equal to 5 km from the prospective venues

In [22]:
# Defining an indicator matrix

df_indicator_matrix = df_dist_matrix.le(5).astype(int)
df_indicator_matrix.shape

(640, 100)

In [23]:
# Obtain the sum of each venue (column) in the indicator matrix - this gives the total number of EVSEs within 5 km of the said venue

Num_EVSE = df_indicator_matrix.sum(axis = 0)
Num_EVSE.head(30)

Venue Name
Shops of Cameron Village                       34
Shops of Cameron Village CAT Bus Stop          34
Park Shops                                     34
Shops At Mcneil Pointe                         30
Park Shops 201                                 34
Bass Pro Shops                                 11
Shops at Town Station                          11
Shops at Brennan Station                        5
Flip Flop Shops                                13
Shops at Oberlin Court                         42
Shops                                          12
shops of Lafayette Village                      5
Suzio's Boutique (at Shops of Baileywick)       5
Shopsmith Repair/Woodworking Academy           15
Rainbow Shops                                   4
Rainbow Shops                                   4
Fun and Fabulous (at Shops of Baileywick)       5
Fancy That by A&E (at Shops of Baileywick)      5
The Shops at Preston                           15
Seaboard Ace Hardware                  

In [24]:
# Select venues that have less than 2 EVSE outlets within 5 kms
venue_names = Num_EVSE[(Num_EVSE <= 2)]

# Use previously selected venue names and identify the corresponding coordinates, 
# based on dataframe previously geneerated from Foursquare API data 
Venue_Selected = Locations_Venue.loc[Locations_Venue['Venue Name'].isin(venue_names.index)]
# Also convert the latitudes and longitudes back to degrees from radians
Venue_Selected[['location.lat', 'location.lng']] = (
    np.degrees(Locations_Venue.loc[:,['location.lat', 'location.lng']])
)

Venue_Selected.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,Venue Name,location.lat,location.lng
14,Rainbow Shops,35.798592,-78.579493
15,Rainbow Shops,35.873154,-78.582456
21,The Shops at Garner Plaza,35.704787,-78.611581
24,Rainbow Shops,35.798753,-78.507796
27,Rainbow Shops,35.719979,-78.656921


__Step 4:__ Let us now __cluster__ those locations to create centers of zones containing good locations. Those zones, their centers and addresses will be the final result of our analysis.

In [25]:
number_of_clusters = 7
Venue_Selected_xys = Venue_Selected[['location.lat', 'location.lng']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(Venue_Selected_xys)

cluster_centers = kmeans.cluster_centers_

print(cluster_centers)
print(kmeans.labels_)

[[ 35.88019589 -78.6387405 ]
 [ 35.71896359 -78.61450202]
 [ 35.68726115 -78.77987294]
 [ 35.97169682 -78.88259607]
 [ 35.74420874 -78.42264179]
 [ 35.9615835  -78.48583398]
 [ 35.479688   -79.178982  ]]
[1 0 1 4 1 1 1 1 5 0 2 0 1 3 3 2 6 0 0 0 0 3 0 0 0 1 1 0 0 0 5 4 1 1 0 0 5
 1 2 1 2 2 2 5 2 3 4 4 3 4 3 4 3 1 3 4]


## Results <a name="results"></a>

In [26]:
# add clustering labels

Venue_labels = Venue_Selected

Venue_labels.insert(0, 'Cluster Labels', kmeans.labels_)



In [27]:
Venue_labels.head()

Unnamed: 0,Cluster Labels,Venue Name,location.lat,location.lng
14,1,Rainbow Shops,35.798592,-78.579493
15,0,Rainbow Shops,35.873154,-78.582456
21,1,The Shops at Garner Plaza,35.704787,-78.611581
24,4,Rainbow Shops,35.798753,-78.507796
27,1,Rainbow Shops,35.719979,-78.656921


In [28]:
# create map

map_clusters = folium.Map(location=[35.78, -78.64], zoom_start=11)

# set color scheme for the clusters
x = np.arange(number_of_clusters)
ys = [i + x + (i*x)**2 for i in range(number_of_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Venue_labels['location.lat'], Venue_labels['location.lng'], Venue_labels['Venue Name'], Venue_labels['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Finally, let's reverse geocode those k-means clustering centers to get the addresses which can be presented to stakeholders.

In [29]:
geolocator = Nominatim(user_agent="foursquare_agent")

lat = 35.74137968
long = -78.59448015

for lat, long in cluster_centers:
    location = geolocator.reverse([lat, long])
    print(location.address, "\n")

165, Galloway Court, Newton Parish, Raleigh, Wake County, North Carolina, 27615, United States 

242, Weston Road, School Acres, Garner, Wake County, North Carolina, 27529, United States 

4008, Saint Edmunds Lane, Surry Point, Wake County, North Carolina, 27539, United States 

Newton Ind. & Engineering Tech Center, Cooper Street, Durham, Durham County, North Carolina, 27703, United States 

Meadow Loop Trail, Wake County, North Carolina, United States 

516, Jones Dairy Road, Wake Forest, Wake County, North Carolina, 27587, United States 

The Steele Pig, 133, South Steele Street, Sunset Terrace, Sanford, Lee County, North Carolina, 27330, United States 



## Discussion <a name="discussion"></a>

While this project does a good job demonstrating how data science can be used to inform energy infrastructure siting decisions, the analysis could be improved in many different ways. A few are listed here but should not be considered an exhaustive list:<br> <br> (a) Socio-economic as well as demographic factors need to be considered when prioritizing siting locations. Here is an <a href = "https://theicct.org/sites/default/files/publications/Expanding-access-electric-mobility_ICCT-Briefing_06122017_vF.pdf"> excellent article </a> summarizing policies and actions being taken to expand access to electric transporation among low income groups and apartment dwellers <br> <br>
(b) Siting analysis may need to take into considerations the practicality of EV charging installations - zoning permits, transmission capacity to support supply of electricity for EV charging etc. are some factors that could affect EV charging installations <br> <br>
(c) Demand for EV charging is another really important factor. If there are not enough EVs driving through the region that need to stop and charge - there may not be any incentive for installation of EV chargers. North Carolina DMV has recently started releasing <a href = "https://www.ncdot.gov/initiatives-policies/environmental/climate-change/Pages/zev-registration-data.aspx"> EV registration data in NC counties </a>. This could indicate how many EV owners currently reside in NC. Market research could also be performed to explore the out-of-state EV traffic passing through NC. 


## Conclusion <a name="conclusion"></a>

This project provides a simple example of utilization of k-means clustering technique in the clean energy sector. Data cleaning and manipulation, application of algorithms, and subsequent data visualization are the primary steps involved in any problem that needs to be solved using data analytics. I hope that this notebook provides a source of reference for those starting to explore data-driven solutions to their business problems.