# Coursera Capstone week 5

This notebook will contain the Capstone Project - The Battle of Neighborhoods. In this project I will try to find the best borough for opening a new Italian restaurant in Miami. I will be using the coordinate information from the boroughs as well as information I get from the Foursquare API.

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

I will be scraping the table on the wikipedia page about the boroughs of Miami. This table contains the coordinates of the boroughs, which will make it easy to plot them on a map.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Miami"
html_data  = requests.get(url).text

In [3]:
soup = BeautifulSoup(html_data, features="html.parser")
table = soup.find('table')

In [4]:
df_br = pd.DataFrame(columns=["Borough", "Coordinates"])

for items in soup.find('table', class_='wikitable').find_all('tr')[1::1]:
    col = items.find_all(['th','td'])
    borough = col[0].text.strip()
    coordinates = col[5].text.strip()

    df_br = df_br.append({"Borough":borough, "Coordinates":coordinates}, ignore_index=True)

In [5]:
df_br[['Latitude','Longitude']] = df_br['Coordinates'].str.split(',',expand=True)

nan_value = float("NaN")
df_br.replace("", nan_value, inplace=True)
df_br.dropna(subset = ["Coordinates"], inplace=True)
df_br = df_br.drop(columns=['Coordinates'])

df_br['Latitude'] = df_br['Latitude'].astype(float)
df_br['Longitude'] = df_br['Longitude'].astype(float)
df_br

Unnamed: 0,Borough,Latitude,Longitude
0,Allapattah,25.815,-80.224
1,Arts & Entertainment District,25.799,-80.19
2,Brickell,25.758,-80.193
3,Buena Vista,25.813,-80.192
4,Coconut Grove,25.712,-80.257
5,Coral Way,25.75,-80.283
6,Design District,25.813,-80.193
7,Downtown,25.774,-80.193
8,Edgewater,25.802,-80.19
9,Flagami,25.762,-80.316


In [6]:
latitude = df_br["Latitude"][0]
longitude = df_br["Longitude"][0]
map_miami = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(df_br['Latitude'],
                             df_br['Longitude'],
                             df_br['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_miami)
    
map_miami

I will be using the Foursquare API to get data on the boroughs. I will get information on all the different types of venues within the boroughs, but I will also get some extra information on the Italian restaurants in the boroughs.

In [7]:
CLIENT_ID = 'xxx'
CLIENT_SECRET = 'xxx'
ACCESS_TOKEN = 'xxx'
VERSION = '20180605'
LIMIT = 100

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=2500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
miami_venues = getNearbyVenues(names=df_br['Borough'],
                               latitudes=df_br['Latitude'],
                               longitudes=df_br['Longitude'])

Allapattah
Arts & Entertainment District
Brickell
Buena Vista
Coconut Grove
Coral Way
Design District
Downtown
Edgewater
Flagami
Grapeland Heights
Liberty City
Little Haiti
Little Havana
Lummus Park
Midtown
Overtown
Park West
The Roads
Upper Eastside
Venetian Islands
Virginia Key
West Flagler
Wynwood


The miami_venues table contains all the different venues and venue categories for all the boroughs. This table will be transformed into a onehotencoded table with a column for every venue category. After this, the table will be grouped by the Borough and for every category column we will take the mean of the grouped rows.

In [10]:
miami_venues

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allapattah,25.815,-80.224,Club Tipico Dominicano,25.809557,-80.218593,Nightclub
1,Allapattah,25.815,-80.224,Plaza Seafood Market,25.805638,-80.223992,Seafood Restaurant
2,Allapattah,25.815,-80.224,Snappers Fish & Chicken,25.824110,-80.224870,Seafood Restaurant
3,Allapattah,25.815,-80.224,Subs On The Run,25.802749,-80.207111,Sandwich Place
4,Allapattah,25.815,-80.224,Papo Llega y Pon,25.803466,-80.223886,Cuban Restaurant
...,...,...,...,...,...,...,...
2282,Wynwood,25.804,-80.199,Joey's,25.800917,-80.199253,Italian Restaurant
2283,Wynwood,25.804,-80.199,Salsa Fiesta Grill,25.804980,-80.189173,Mexican Restaurant
2284,Wynwood,25.804,-80.199,Rácket,25.799776,-80.197884,Cocktail Bar
2285,Wynwood,25.804,-80.199,Basani's,25.807375,-80.191035,Italian Restaurant


In [11]:
miami_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allapattah,79,79,79,79,79,79
Arts & Entertainment District,100,100,100,100,100,100
Brickell,100,100,100,100,100,100
Buena Vista,100,100,100,100,100,100
Coconut Grove,100,100,100,100,100,100
Coral Way,100,100,100,100,100,100
Design District,100,100,100,100,100,100
Downtown,100,100,100,100,100,100
Edgewater,100,100,100,100,100,100
Flagami,100,100,100,100,100,100


In [12]:
print('There are {} uniques categories.'.format(len(miami_venues['Venue Category'].unique())))

There are 225 uniques categories.


In [13]:
# one hot encoding
miami_onehot = pd.get_dummies(miami_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
miami_onehot['Borough'] = miami_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [miami_onehot.columns[-1]] + list(miami_onehot.columns[:-1])
miami_onehot = miami_onehot[fixed_columns]

miami_onehot.head()

Unnamed: 0,Borough,Accessories Store,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Warehouse Store,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Allapattah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Allapattah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Allapattah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Allapattah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Allapattah,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
miami_grouped = miami_onehot.groupby('Borough').mean().reset_index()
miami_grouped.head()

Unnamed: 0,Borough,Accessories Store,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Warehouse Store,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Allapattah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arts & Entertainment District,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Brickell,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
3,Buena Vista,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,...,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0
4,Coconut Grove,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0


We will focus only on the top 3 venues for every borough. We will print the venue type and its frequency for every borough.

In [15]:
num_top_venues = 3

for borough in miami_grouped['Borough']:
    print("----"+borough+"----")
    temp = miami_grouped[miami_grouped['Borough'] == borough].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allapattah----
                  venue  freq
0  Fast Food Restaurant  0.10
1   Fried Chicken Joint  0.08
2           Gas Station  0.06


----Arts & Entertainment District----
            venue  freq
0     Art Gallery  0.09
1  Ice Cream Shop  0.08
2      Restaurant  0.06


----Brickell----
                venue  freq
0               Hotel  0.10
1  Italian Restaurant  0.07
2  Seafood Restaurant  0.05


----Buena Vista----
                venue  freq
0         Art Gallery  0.08
1      Ice Cream Shop  0.06
2  Italian Restaurant  0.05


----Coconut Grove----
                     venue  freq
0            Women's Store  0.05
1              Coffee Shop  0.04
2  New American Restaurant  0.04


----Coral Way----
                 venue  freq
0     Cuban Restaurant  0.04
1   Spanish Restaurant  0.04
2  Japanese Restaurant  0.03


----Design District----
                venue  freq
0         Art Gallery  0.08
1      Ice Cream Shop  0.06
2  Italian Restaurant  0.05


----Downtown----
           

In [16]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

We will use this information and bundle it into a table. This table contains the top 3 most common venues for every brorough. And this table will then be filtered on only the boroughs that have the category 'Italian Restaurant' in their top 3 of most common venues. We will use this table to create a parameter column that tells us if a borough has the Italian restaurant category in their top 3 most common venues.

In [17]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough'] = miami_grouped['Borough']

for ind in np.arange(miami_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(miami_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Allapattah,Fast Food Restaurant,Fried Chicken Joint,Gas Station
1,Arts & Entertainment District,Art Gallery,Ice Cream Shop,Restaurant
2,Brickell,Hotel,Italian Restaurant,Seafood Restaurant
3,Buena Vista,Art Gallery,Ice Cream Shop,Italian Restaurant
4,Coconut Grove,Women's Store,Coffee Shop,New American Restaurant
5,Coral Way,Cuban Restaurant,Spanish Restaurant,Japanese Restaurant
6,Design District,Art Gallery,Ice Cream Shop,Italian Restaurant
7,Downtown,Hotel,Seafood Restaurant,Italian Restaurant
8,Edgewater,Ice Cream Shop,Art Gallery,Coffee Shop
9,Flagami,Cuban Restaurant,Chinese Restaurant,Pizza Place


In [18]:
df1 = neighborhoods_venues_sorted[neighborhoods_venues_sorted["1st Most Common Venue"].str.contains('Italian Restaurant')]
df2 = neighborhoods_venues_sorted[neighborhoods_venues_sorted["2nd Most Common Venue"].str.contains('Italian Restaurant')]
df3 = neighborhoods_venues_sorted[neighborhoods_venues_sorted["3rd Most Common Venue"].str.contains('Italian Restaurant')]
df  = pd.concat([df1, df2, df3]).reset_index(drop=True)
df

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Little Haiti,Italian Restaurant,Coffee Shop,Art Gallery
1,The Roads,Italian Restaurant,Hotel,Argentinian Restaurant
2,Upper Eastside,Italian Restaurant,Gym,Café
3,Brickell,Hotel,Italian Restaurant,Seafood Restaurant
4,Buena Vista,Art Gallery,Ice Cream Shop,Italian Restaurant
5,Design District,Art Gallery,Ice Cream Shop,Italian Restaurant
6,Downtown,Hotel,Seafood Restaurant,Italian Restaurant
7,Lummus Park,Seafood Restaurant,Hotel,Italian Restaurant


In [20]:
restaurant_list = df['Borough'].tolist()
df_br["Italian Restaurant"] = df_br["Borough"].apply(lambda x: 1 if x in restaurant_list else 0)
df_br.head()

Unnamed: 0,Borough,Latitude,Longitude,Italian Restaurant
0,Allapattah,25.815,-80.224,0
1,Arts & Entertainment District,25.799,-80.19,0
2,Brickell,25.758,-80.193,1
3,Buena Vista,25.813,-80.192,1
4,Coconut Grove,25.712,-80.257,0


To get a good view on which boroughs have the Italian restaurant in their top 3 most common venues, we will plot the boroughs on a map again. This time with colors to see whichs ones do and which ones dont have the restaurant in their most common venues top 3.

In [21]:
# create map
latitude = df_br["Latitude"][0]
longitude = df_br["Longitude"][0]
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# # set color scheme for the clusters
x = np.arange(2)
ys = [i + x + (i*x)**2 for i in range(3)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, bor, rest in zip(df_br['Latitude'],
                                  df_br['Longitude'],
                                  df_br['Borough'],
                                  df_br['Italian Restaurant']):
    label = folium.Popup(str(bor) + ' Italian Restaurant ' + str(rest), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[rest-1],
        fill=True,
        fill_color=rainbow[rest-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now its time to take a deeper dive into the boroughs we are interested in. We will be searching for the Italian restaurants in the boroughs that have this category in their top 3. We will be using the Foursquare API to get this information.

In [22]:
def getVenueSearch(names, latitudes, longitudes, radius=200):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
        # return only relevant information for each nearby venue
        venues_list.extend([(
            name, 
            lat, 
            lng, 
            v['id'], 
            v['name']) for v in results])
    
    df_venues = pd.DataFrame(venues_list, columns = ['Borough', 
                                                     'Borough Latitude', 
                                                     'Borough Longitude',
                                                     'Venue id', 
                                                     'Venue name'])
    
    return(df_venues)

In [23]:
df_italian = df_br[(df_br['Italian Restaurant'] == 1)].reset_index(drop=True)
df_italian

Unnamed: 0,Borough,Latitude,Longitude,Italian Restaurant
0,Brickell,25.758,-80.193,1
1,Buena Vista,25.813,-80.192,1
2,Design District,25.813,-80.193,1
3,Downtown,25.774,-80.193,1
4,Little Haiti,25.824,-80.191,1
5,Lummus Park,25.777,-80.201,1
6,The Roads,25.756,-80.207,1
7,Upper Eastside,25.83,-80.183,1


By specifing a category, we can make the API only return Italian restaurant venues.

In [24]:
category = '4bf58dd8d48988d110941735'
LIMIT = 10

df_venue_names = getVenueSearch(names=df_italian['Borough'],
                                latitudes=df_italian['Latitude'],
                                longitudes=df_italian['Longitude'])

In [25]:
df_venue_names

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue id,Venue name
0,Brickell,25.758,-80.193,4c2765f7a852c9287122e86c,St. Jude's Catholic Church
1,Brickell,25.758,-80.193,528e9e3611d204ff966db242,Echo Brickell
2,Brickell,25.758,-80.193,5f00ab6a4196de5b8d863521,Rosetta Bakery
3,Brickell,25.758,-80.193,5888a923d9705c4cabe4abb6,Google Miami
4,Brickell,25.758,-80.193,4dbeb63a815439392fc2b57d,Bilzin Sumberg
...,...,...,...,...,...
75,Upper Eastside,25.830,-80.183,4bc334a0dce4eee13f7d719d,Seven Seas Motel
76,Upper Eastside,25.830,-80.183,5ade2614418686177a8c26d9,Midway
77,Upper Eastside,25.830,-80.183,4cb9c42c4495721e303f4c7a,Dubose Nursery
78,Upper Eastside,25.830,-80.183,4b7ef128f964a5201d0b30e3,Sudies.com


In [26]:
for index, row in df_venue_names.iterrows():
    print(row['Borough'],row['Venue id'], row['Venue name'])

Brickell 4c2765f7a852c9287122e86c St. Jude's Catholic Church
Brickell 528e9e3611d204ff966db242 Echo Brickell
Brickell 5f00ab6a4196de5b8d863521 Rosetta Bakery
Brickell 5888a923d9705c4cabe4abb6 Google Miami
Brickell 4dbeb63a815439392fc2b57d Bilzin Sumberg
Brickell 4f1c26dfe4b04ae083d20d1e Brickell Place Tennis Court
Brickell 5a706177ccad6b38fc044a99 JOE & THE JUICE
Brickell 4eab36f1b6347a596a63ff22 St Jude Melkite Catholic Church
Brickell 4c14d4b377cea593da28d160 Equinox Brickell Heights
Brickell 575702a9498ee5708be60740 MVP Luxury Suites Miami Brickell
Buena Vista 50c13dd6498e487e0df27f69 Hermès Miami
Buena Vista 5e0d5b68f717320008d614f8 Night Owl Cookies
Buena Vista 58754383dad2632841e55d9c OTL
Buena Vista 50805437e4b00dc4a08b83f6 Louis Vuitton
Buena Vista 4f917f94e4b079cca438e9c9 Cartier
Buena Vista 56fe9308498ea565c5e4717b Harry Winston
Buena Vista 50c3bc23e4b065e8ec0821be Prada
Buena Vista 4f5a83c4e4b008b1557bad2d Atlas Plaza
Buena Vista 4bcdff3b68f976b0b43c6583 Design and Architect

We will then use the Foursquare API to fetch the venue likes of these venues.

In [27]:
def getVenueLikes(v_id):
    
    venues_list=[]
    for v_id in v_id:
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/{}/likes?&client_id={}&client_secret={}&v={}'.format(
            v_id,
            CLIENT_ID, 
            CLIENT_SECRET,
            VERSION)
   
        # make the GET request
        results = requests.get(url).json()["response"]['likes']['count']
        
        # return only relevant information for each nearby venue
        likes_list.extend([(
            v_id, 
            results)])

    df_likes = pd.DataFrame(likes_list, columns = ['Venue id', 
                                                   'Venue Likes'])
    
    return(df_likes)

In [28]:
likes_list = []
df_likes = getVenueLikes(v_id=df_venue_names['Venue id'])
df_likes.head()

Unnamed: 0,Venue id,Venue Likes
0,4c2765f7a852c9287122e86c,46
1,528e9e3611d204ff966db242,1
2,5f00ab6a4196de5b8d863521,2
3,5888a923d9705c4cabe4abb6,3
4,4dbeb63a815439392fc2b57d,0


We will merge these likes back into our previous table and group them by borough, taking the sum of the venue likes.

In [29]:
likes_per_venue = pd.merge(df_venue_names[["Borough", "Venue id"]],
                           df_likes, on="Venue id").groupby("Borough").sum().reset_index()
likes_per_venue

Unnamed: 0,Borough,Venue Likes
0,Brickell,184
1,Buena Vista,238
2,Design District,787
3,Downtown,127
4,Little Haiti,33
5,Lummus Park,91
6,The Roads,40
7,Upper Eastside,79


Using the miami_grouped table from earlier, we will use a KMeans algorithm to group the boroughs based on their total venue categories. This result can be seen in df_br.

In [30]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 7

miami_grouped_clustering = miami_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(miami_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 1, 2, 1, 6, 0, 1, 2, 1, 0])

In [31]:
df_br.insert(1, 'Cluster', kmeans.labels_)
df_br

Unnamed: 0,Borough,Cluster,Latitude,Longitude,Italian Restaurant
0,Allapattah,4,25.815,-80.224,0
1,Arts & Entertainment District,1,25.799,-80.19,0
2,Brickell,2,25.758,-80.193,1
3,Buena Vista,1,25.813,-80.192,1
4,Coconut Grove,6,25.712,-80.257,0
5,Coral Way,0,25.75,-80.283,0
6,Design District,1,25.813,-80.193,1
7,Downtown,2,25.774,-80.193,1
8,Edgewater,1,25.802,-80.19,0
9,Flagami,0,25.762,-80.316,0


Again, we will use the coordinates to plot the data on a map. This time the boroughs have a color depending on their cluster.

In [33]:
# create map
latitude = df_br["Latitude"][0]
longitude = df_br["Longitude"][0]
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, bor, clust in zip(df_br['Latitude'],
                                  df_br['Longitude'],
                                  df_br['Borough'],
                                  df_br['Cluster']):
    label = folium.Popup(str(bor) + ' Cluster ' + str(clust), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[clust-1],
        fill=True,
        fill_color=rainbow[clust-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

This table contains the cluster information, the parameter that shows if the borough has the Italian restaurant in its top 3 most common venues and it shows the sum of likes of the 10 restaurants from these boroughs.

In [34]:
df_br = pd.merge(df_br, likes_per_venue, on="Borough", how='left').fillna(0)
df_br

Unnamed: 0,Borough,Cluster,Latitude,Longitude,Italian Restaurant,Venue Likes
0,Allapattah,4,25.815,-80.224,0,0.0
1,Arts & Entertainment District,1,25.799,-80.19,0,0.0
2,Brickell,2,25.758,-80.193,1,184.0
3,Buena Vista,1,25.813,-80.192,1,238.0
4,Coconut Grove,6,25.712,-80.257,0,0.0
5,Coral Way,0,25.75,-80.283,0,0.0
6,Design District,1,25.813,-80.193,1,787.0
7,Downtown,2,25.774,-80.193,1,127.0
8,Edgewater,1,25.802,-80.19,0,0.0
9,Flagami,0,25.762,-80.316,0,0.0


The very last table shows the clusters with the amount of boroughs that had the Italian restaurant in their top 3 most common venues and the sum of likes of these boroughs.

In [35]:
df_total = df_br[["Cluster", "Italian Restaurant", "Venue Likes"]].groupby("Cluster").sum().reset_index()
df_total

Unnamed: 0,Cluster,Italian Restaurant,Venue Likes
0,0,0,0.0
1,1,2,1025.0
2,2,4,442.0
3,3,0,0.0
4,4,0,0.0
5,5,0,0.0
6,6,2,112.0
