# 1. Introduction/Business Problem

My wife loves Fort Collins, Colorado. It's about a 45 mins drive from where we live in Cheyenne Wyoming, and we potentially plan to live in Fort Collins in the future. However, we do not want to overly restrict our options, in terms of cities in Colorado. Because, for example, we both may not get employment in Fort Collins. It would be good to have a potential list of places similar to Fort Collins that we can both consider applying for jobs in. Therefore my capstone will focus on segmenting the cities in Colorado and highlighting what cities are similar to Fort Collins so that it may guide our future career planning prospects and life goals. This capstone will leverage foursquare data for the top 50 cities in Colorado to group these cities into clusters

# 2. Data

I'll be using a number of data sources. First, Forsquare, offcourse to get venues in the cities under consideration. This will help chracterize each city based on the type of commecercial are present in the city.
Then here's the wikipedia wikipedia article <a href="https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_Colorado">list of cities and towns in Colorado</a> that will provide me with cities to consider. Finally, I'll use geopy library to get the latitude and longitude information of the cities. With all these information, I'll be able to segment cities in Colorado.

#### Imports

In [1]:
#!conda install -c anaconda beautifulsoup4
#!conda install -c conda-forge geocoder
#!conda install -c conda-forge folium=0.5.0 --yes

import pandas as pd
import numpy as np
import geocoder
from sklearn.cluster import KMeans
import folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
import requests

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [2]:
page = requests.get("https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_Colorado")

soup = BeautifulSoup(page.content, 'html.parser')


My_table = soup.find('tbody')
#print(My_table.prettify())

nameList = []
popEstList = []
areaList = []

for row in My_table.find_all('tr'):
    try:
        rowValue = row.find_all('td')
        nameList.append(rowValue[0].text[:-1])
        popEstList.append(rowValue[7].text[:-1].replace(',',''))
        areaList.append((rowValue[13].text)[:3])
    except IndexError:pass

df = pd.DataFrame()
df['Name'] = nameList
df['popEst2014'] = popEstList
df['popEst2014'] = df['popEst2014'].astype(int)
df['area(SqMi)'] = areaList
df['area(SqMi)'] = df['area(SqMi)'].astype(float)
df['Radius'] = df['area(SqMi)'].apply(lambda x: (x // 10)*1000)
df['Radius'] = np.where(df['Radius'] > 10.0, df['Radius'],1000).astype(int)
df = df[df.Name !='Fountain' ]

df_topCities = df.sort_values(by=['popEst2014'], ascending=False).copy()
#df_topCities

##### Limit to city populations greater than 15,000 

In [3]:
df_topCities = df_topCities[df_topCities.popEst2014 >= 12000].reset_index(drop=True)
#df_topCities

Let's get latitude and longitude data

In [4]:
latList = []
longList = []

for index, row in df_topCities.iterrows():
    city = row['Name']
    geolocator = Nominatim(user_agent="explorer")
    location = geolocator.geocode('{}, CO'.format(city))
    latList.append(location.latitude)
    longList.append(location.longitude)

df_topCities['Latitude'] = latList
df_topCities['Longitude'] = longList

df_topCities

Unnamed: 0,Name,popEst2014,area(SqMi),Radius,Latitude,Longitude
0,Denver,663862,154.0,15000,39.739236,-104.984862
1,Colorado Springs,445830,195.0,19000,38.833958,-104.825349
2,Aurora,353108,154.0,15000,39.729432,-104.83192
3,Fort Collins,156480,55.0,5000,40.550853,-105.066808
4,Lakewood,149643,44.0,4000,39.631109,-105.110058
5,Thornton,130307,36.0,3000,39.869552,-104.985182
6,Arvada,113574,38.0,3000,39.821123,-105.220743
7,Westminster,112090,33.0,3000,39.836653,-105.037205
8,Pueblo,108423,54.0,5000,38.254447,-104.609141
9,Centennial,107201,29.0,2000,39.568064,-104.977831


#### Use geopy library to get the latitude and longitude values of centrally located Colorado Springs.

In [5]:
address = 'Colorado Springs, CO'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Colorado Springs, CO are 38.8339578, -104.8253485.


#### Create a map of CO with neighborhoods superimposed on top.

In [6]:
# create map of Colorado using latitude and longitude values
map_co = folium.Map(location=[latitude, longitude], zoom_start=8)

# add markers to map
for lat, lng, name in zip(df_topCities['Latitude'], df_topCities['Longitude'], df_topCities['Name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_co)  
    
map_co

#### Required info for Forsquare API

In [7]:
CLIENT_ID = 'PS3CQD4NJHYBP3AUFD1HQJSLT5DHBGKJHWINLED1CTZHPOSE' # your Foursquare ID
CLIENT_SECRET = 'VS5FVBTBJQFLD3KBD3EFP02G3S2O2AGFKQS5H4YTR1LAPVXP' # your Foursquare Secret
VERSION = '20190505' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PS3CQD4NJHYBP3AUFD1HQJSLT5DHBGKJHWINLED1CTZHPOSE
CLIENT_SECRET:VS5FVBTBJQFLD3KBD3EFP02G3S2O2AGFKQS5H4YTR1LAPVXP


#### Let's explore Fort Collins.

Get Fort Collins's latitude and longitude values.

In [8]:
foco_latitude = df_topCities.loc[3, 'Latitude'] # latitude value
foco_longitude = df_topCities.loc[3, 'Longitude'] # longitude value
#foco_radius = df_topCities.loc[3, 'Radius'] # radius value

foco_name = df_topCities.loc[3, 'Name'] # name

print('Latitude and longitude values of {} are {}, {}.'.format(foco_name, 
                                                               foco_latitude, 
                                                               foco_longitude))

Latitude and longitude values of Fort Collins are 40.5508527, -105.0668085.


#### Now, let's get the top 100 venues that are in Fort Collins within a radius of 3000 meters.

In [9]:
LIMIT = 50 # limit of number of venues returned by Foursquare API
radius = 3000

#radius = 3000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    foco_latitude, 
    foco_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()

We know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [10]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [11]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Spring Creek Trail,Trail,40.550484,-105.072643
1,Star of India,Indian Restaurant,40.549033,-105.076043
2,Sprouts Farmers Market,Grocery Store,40.55066,-105.059595
3,Cinemark XD Bistro,Movie Theater,40.541727,-105.07232
4,Maxline Brewing,Brewery,40.549659,-105.07927
5,Wilbur's Total Beverage,Liquor Store,40.559032,-105.0787
6,Maza Kabob,Mediterranean Restaurant,40.555105,-105.078081
7,Larkburger,Burger Joint,40.552987,-105.077409
8,La Creperie,Breakfast Spot,40.549127,-105.076612
9,Bad Daddy's Burger Bar,American Restaurant,40.542056,-105.072962


In [12]:
nearby_venues.groupby('categories').count().sort_values(by=['name'], ascending=False)

Unnamed: 0_level_0,name,lat,lng
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Coffee Shop,5,5,5
Cosmetics Shop,3,3,3
American Restaurant,2,2,2
Grocery Store,2,2,2
Fried Chicken Joint,2,2,2
Pizza Place,2,2,2
Seafood Restaurant,2,2,2
Mexican Restaurant,2,2,2
Vietnamese Restaurant,2,2,2
Brewery,2,2,2


And how many venues were returned by Foursquare?

In [13]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

50 venues were returned by Foursquare.


Function to Repeat the process for other cities

In [14]:
def getNearbyVenues(names, latitudes, longitudes):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:

colorado_venues = getNearbyVenues(names=df_topCities['Name'],
                                   latitudes=df_topCities['Latitude'],
                                   longitudes=df_topCities['Longitude']
                                  )

Denver
Colorado Springs
Aurora
Fort Collins
Lakewood
Thornton
Arvada
Westminster
Pueblo
Centennial
Boulder
Greeley
Longmont
Loveland
Broomfield
Grand Junction
Castle Rock
Commerce City
Parker
Littleton
Northglenn
Brighton
Englewood
Wheat Ridge
Lafayette
Windsor
Erie
Evans
Golden
Louisville
Montrose
Durango
Cañon City
Greenwood Village
Sterling
Lone Tree
Johnstown
Superior
Fruita
Steamboat Springs
Federal Heights


In [16]:
colorado_venues.head()

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Denver,39.739236,-104.984862,Sassafras American Eatery,39.739949,-104.982756,Breakfast Spot
1,Denver,39.739236,-104.984862,"City, O' City",39.736724,-104.984669,Vegetarian / Vegan Restaurant
2,Denver,39.739236,-104.984862,Denver Art Museum,39.736479,-104.988712,Art Museum
3,Denver,39.739236,-104.984862,Civic Center Park,39.73937,-104.988776,Park
4,Denver,39.739236,-104.984862,History Colorado Center,39.735565,-104.986971,History Museum


Odd that Avarda has only 9 venues

In [17]:
colorado_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arvada,9,9,9,9,9,9
Aurora,50,50,50,50,50,50
Boulder,50,50,50,50,50,50
Brighton,12,12,12,12,12,12
Broomfield,50,50,50,50,50,50
Castle Rock,50,50,50,50,50,50
Cañon City,50,50,50,50,50,50
Centennial,50,50,50,50,50,50
Colorado Springs,50,50,50,50,50,50
Commerce City,28,28,28,28,28,28


## 3. Analyze Each City

In [18]:
# one hot encoding
colorado_onehot = pd.get_dummies(colorado_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
colorado_onehot['City'] = colorado_venues['City'] 

# move neighborhood column to the first column
fixed_columns = [colorado_onehot.columns[-1]] + list(colorado_onehot.columns[:-1])
colorado_onehot = colorado_onehot[fixed_columns]

colorado_onehot.head()

Unnamed: 0,City,ATM,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Denver,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Denver,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Denver,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Denver,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Denver,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by city and by taking the mean of the frequency of occurrence of each category

In [19]:
colorado_grouped = colorado_onehot.groupby('City').mean().reset_index()
colorado_grouped

Unnamed: 0,City,ATM,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Arvada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aurora,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
2,Boulder,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brighton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Broomfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02
5,Castle Rock,0.0,0.1,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Cañon City,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Centennial,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Colorado Springs,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Commerce City,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each city along with the top 5 most common venues

In [20]:
num_top_venues = 5

for hood in colorado_grouped['City']:
    print("----"+hood+"----")
    temp = colorado_grouped[colorado_grouped['City'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arvada----
           venue  freq
0   Soccer Field  0.22
1           Park  0.22
2          Trail  0.11
3      Disc Golf  0.11
4  Garden Center  0.11


----Aurora----
                  venue  freq
0    Mexican Restaurant  0.10
1           Coffee Shop  0.08
2    Chinese Restaurant  0.06
3        Sandwich Place  0.06
4  Fast Food Restaurant  0.06


----Boulder----
               venue  freq
0     Sandwich Place  0.08
1     Ice Cream Shop  0.04
2  French Restaurant  0.04
3        Coffee Shop  0.04
4     Breakfast Spot  0.04


----Brighton----
               venue  freq
0  Convenience Store  0.25
1       Home Service  0.17
2     Farmers Market  0.08
3        Bus Station  0.08
4                Bar  0.08


----Broomfield----
                venue  freq
0    Sushi Restaurant  0.08
1       Grocery Store  0.08
2  Chinese Restaurant  0.06
3  Mexican Restaurant  0.06
4         Pizza Place  0.04


----Castle Rock----
                  venue  freq
0   American Restaurant  0.10
1           Coffee

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['City'] = colorado_grouped['City']

for ind in np.arange(colorado_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(colorado_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arvada,Soccer Field,Park,Home Service,Trail,Garden Center,Disc Golf,Dog Run,Fish Market,Flea Market,Flower Shop
1,Aurora,Mexican Restaurant,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Sandwich Place,Burger Joint,American Restaurant,Vietnamese Restaurant,Hotel,Big Box Store
2,Boulder,Sandwich Place,Ice Cream Shop,Sporting Goods Shop,Spa,New American Restaurant,Pizza Place,Breakfast Spot,Coffee Shop,French Restaurant,Café
3,Brighton,Convenience Store,Home Service,Gas Station,Golf Course,Park,Farmers Market,Bar,Pizza Place,Bus Station,Food & Drink Shop
4,Broomfield,Grocery Store,Sushi Restaurant,Mexican Restaurant,Chinese Restaurant,Park,Burger Joint,Pizza Place,Golf Course,Pharmacy,Noodle House


## 4. Cluster Neighborhoods

In [23]:
sample = colorado_grouped.drop('City', 1)
sample.head()

Unnamed: 0,ATM,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
2,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02


Run *k*-means to cluster the neighborhood into 5 clusters.

In [24]:
# set number of clusters
kclusters = 6

colorado_grouped_clustering = colorado_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(colorado_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 2, 5, 2, 2, 0, 2, 1, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [25]:
# add clustering labels
city_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

colorado_merged = df_topCities

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
colorado_merged = colorado_merged.join(city_venues_sorted.set_index('City'), on='Name')

colorado_merged.head() # check the last columns!


Unnamed: 0,Name,popEst2014,area(SqMi),Radius,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Denver,663862,154.0,15000,39.739236,-104.984862,2,Hotel,Sandwich Place,American Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Cocktail Bar,Yoga Studio,Breakfast Spot,Brewery,Juice Bar
1,Colorado Springs,445830,195.0,19000,38.833958,-104.825349,1,Coffee Shop,Brewery,Bar,Hotel,Juice Bar,Steakhouse,Italian Restaurant,Pizza Place,Breakfast Spot,Mexican Restaurant
2,Aurora,353108,154.0,15000,39.729432,-104.83192,2,Mexican Restaurant,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Sandwich Place,Burger Joint,American Restaurant,Vietnamese Restaurant,Hotel,Big Box Store
3,Fort Collins,156480,55.0,5000,40.550853,-105.066808,1,Coffee Shop,Cosmetics Shop,Brewery,Mexican Restaurant,American Restaurant,Breakfast Spot,Pizza Place,Fried Chicken Joint,Vietnamese Restaurant,Grocery Store
4,Lakewood,149643,44.0,4000,39.631109,-105.110058,2,Grocery Store,Mexican Restaurant,Athletics & Sports,Sushi Restaurant,Sandwich Place,Hardware Store,Coffee Shop,Furniture / Home Store,Fast Food Restaurant,Big Box Store


In [26]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=7.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(colorado_merged['Latitude'], colorado_merged['Longitude'], colorado_merged['Name'], colorado_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [27]:
kmeans.labels_

array([4, 2, 2, 5, 2, 2, 0, 2, 1, 0, 2, 1, 2, 1, 0, 0, 1, 3, 1, 2, 1, 2, 0,
       1, 2, 1, 2, 1, 1, 1, 0, 2, 1, 2, 2, 0, 2, 2, 2, 1, 0], dtype=int32)

In [28]:
colorado_merged

Unnamed: 0,Name,popEst2014,area(SqMi),Radius,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Denver,663862,154.0,15000,39.739236,-104.984862,2,Hotel,Sandwich Place,American Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Cocktail Bar,Yoga Studio,Breakfast Spot,Brewery,Juice Bar
1,Colorado Springs,445830,195.0,19000,38.833958,-104.825349,1,Coffee Shop,Brewery,Bar,Hotel,Juice Bar,Steakhouse,Italian Restaurant,Pizza Place,Breakfast Spot,Mexican Restaurant
2,Aurora,353108,154.0,15000,39.729432,-104.83192,2,Mexican Restaurant,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Sandwich Place,Burger Joint,American Restaurant,Vietnamese Restaurant,Hotel,Big Box Store
3,Fort Collins,156480,55.0,5000,40.550853,-105.066808,1,Coffee Shop,Cosmetics Shop,Brewery,Mexican Restaurant,American Restaurant,Breakfast Spot,Pizza Place,Fried Chicken Joint,Vietnamese Restaurant,Grocery Store
4,Lakewood,149643,44.0,4000,39.631109,-105.110058,2,Grocery Store,Mexican Restaurant,Athletics & Sports,Sushi Restaurant,Sandwich Place,Hardware Store,Coffee Shop,Furniture / Home Store,Fast Food Restaurant,Big Box Store
5,Thornton,130307,36.0,3000,39.869552,-104.985182,2,Mexican Restaurant,American Restaurant,Breakfast Spot,Fast Food Restaurant,Gym / Fitness Center,Coffee Shop,Vietnamese Restaurant,Sandwich Place,Hardware Store,Food
6,Arvada,113574,38.0,3000,39.821123,-105.220743,4,Soccer Field,Park,Home Service,Trail,Garden Center,Disc Golf,Dog Run,Fish Market,Flea Market,Flower Shop
7,Westminster,112090,33.0,3000,39.836653,-105.037205,2,Mexican Restaurant,Fast Food Restaurant,Sandwich Place,Hotel,Gym,Grocery Store,Sushi Restaurant,American Restaurant,Pet Store,Music Store
8,Pueblo,108423,54.0,5000,38.254447,-104.609141,2,Mexican Restaurant,Italian Restaurant,Bar,Pizza Place,Sushi Restaurant,Deli / Bodega,Burger Joint,Sandwich Place,Coffee Shop,Bakery
9,Centennial,107201,29.0,2000,39.568064,-104.977831,2,Gym / Fitness Center,Grocery Store,Pizza Place,BBQ Joint,Burger Joint,Gym,Mexican Restaurant,Vietnamese Restaurant,Movie Theater,Spa
