# Capstone Project - Sao Paulo overview

Double-click **here** for further instructions

<!---
Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

This submission will eventually become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.
--->

## Introduction/Business Problem

<!---
São Paulo é o maior e mais populoso municipio brasileiro, capital de um estado homonimo e centro coorporativo e financeiro da América do Sul. É uma cidade que acolhe imigrantes internos (outras cidades e estados do país) além de comunidades estrangeiras, imprimindo assim um caracter multicultural e cosmopolita a cidade. 
--->

## Data

In order to build our analysis, three data sources are required:

**1. Postcode, City, Neighborhood**

The 'Great São Paulo' is a vast metropolis, composed of the conurbation of a very large number of municipalities and with a population of approximately 20 million people. There are several levels of organization in the city and countless sub-municipalities, administrative regions and neighborhoods. There are compiled data that lists more than 1500 neighborhoods.

The list used in this work presents the 89 main neighborhoods in the city and was obtained from the website:

https://www.estadosecapitaisdobrasil.com/listas/lista-dos-bairros-de-sao-paulo/

**2. Geospatial data**

All geospatial coordinates were obtained using the geopy package and specifically the data provider ARCGIS, chosen specifically because it does not require the creation of an access key for simple data retrieval (latitude and longitude) of a location.

**3. Foursquare API**

The get method of the 'explore' endpoint will be used in particular, which returns the following information by default:

    - Neighborhood
    - Neighborhood Latitude
    - Neighborhood Longitude
    - Venue
    - Name of the venue e.g. the name of a store or restaurant
    - Venue Latitude
    - Venue Longitude
    - Venue Category

## Part 1 - Recovering, cleaning and organizing data

In [195]:
import numpy as np
import pandas as pd

In order to speed up the process of loading data from neighborhoods, the list obtained on the website was stored in csv format



In [196]:
df = pd.read_csv('bairros_puro.csv')
df.head()

Unnamed: 0,Bairro
0,Água Rasa‎
1,Alto de Pinheiros‎
2,Anhanguera‎
3,Aricanduva‎
4,Artur Alvim‎


Like many languages ​​of Latin origin, Portuguese uses a considerable set of special characters. All were removed to avoid treatment failures and subsequent visualization.

In [197]:
from unicodedata import normalize
def removeSpecialChar (text) :
		return normalize('NFKD', text).encode('ASCII', 'ignore').decode('ASCII')

In [198]:
df['Bairro'] = df['Bairro'].apply(removeSpecialChar)
df.rename(columns={'Bairro':'Neighborhood'}, inplace=True)
df.head()

Unnamed: 0,Neighborhood
0,Agua Rasa
1,Alto de Pinheiros
2,Anhanguera
3,Aricanduva
4,Artur Alvim
...,...
84,Vila Mariana
85,Vila Matilde
86,Vila Medeiros
87,Vila Prudente


Function created to facilitate the call to API providing geospatial data.

In [214]:
def get_latlgn(Neighborhood):
    g = geocoder.arcgis('{}, Sao Paulo, BR'.format(Neighborhood))
    return g.latlng

In [200]:
aux_dict = {}
count = 1
df_size = len(df['Neighborhood'])
for elem in df['Neighborhood']:
    aux_dict[elem] = get_latlgn(elem)
    print('{} : done - {}/{}'.format(elem, count, df_size))
    count += 1

Agua Rasa : done - 1/89
Alto de Pinheiros : done - 2/89
Anhanguera : done - 3/89
Aricanduva : done - 4/89
Artur Alvim : done - 5/89
Barra Funda : done - 6/89
Bela Vista : done - 7/89
Belem : done - 8/89
Bom Retiro : done - 9/89
Brasilandia : done - 10/89
Butanta : done - 11/89
Cachoeirinha : done - 12/89
Cambuci : done - 13/89
Campo Belo : done - 14/89
Campo Grande : done - 15/89
Campo Limpo : done - 16/89
Cangaiba : done - 17/89
Capao Redondo : done - 18/89
Carrao : done - 19/89
Casa Verde : done - 20/89
Cidade Ademar : done - 21/89
Cidade Dutra : done - 22/89
Cidade Lider : done - 23/89
Cidade Tiradentes : done - 24/89
Consolacao : done - 25/89
Cursino : done - 26/89
Ermelino Matarazzo : done - 27/89
Freguesia do O : done - 28/89
Grajau : done - 29/89
Guaianases : done - 30/89
Iguatemi : done - 31/89
Ipiranga : done - 32/89
Itaim Bibi : done - 33/89
Itaim Paulista : done - 34/89
Itaquera : done - 35/89
Jabaquara : done - 36/89
Jacana : done - 37/89
Jaguara : done - 38/89
Jaguare : do

Creation of a new dataframe with data from the Geospatial API:

In [1]:
df_geolocation = pd.DataFrame.from_dict(aux_dict, orient = 'index', columns = ['Latitude', 'Longitude'])
df_geolocation.reset_index(inplace = True)
df_geolocation = df_geolocation.rename(columns = {'index':'Neighborhood'})
df_geolocation.head()

NameError: name 'pd' is not defined

Unification of data from dataframes with neighborhood names and their respective geo-coordinates:

In [218]:
df_sp = pd.concat([df,df_geolocation[['Latitude','Longitude']]], axis = 1)
df_sp.head()

In [220]:
print('The dataframe has {} neighbourhoods'.format(len(df_sp['Neighborhood'].unique()),
df_sp.shape[0]))

The dataframe has 89 neighbourhoods


In [206]:
from geopy.geocoders import Nominatim
import folium

In [207]:
sp_geolocation = get_latlgn('Sao Paulo')
print('The geographical coordinate of Sao Paulo are {}, {}.'.format(sp_geolocation[0], sp_geolocation[1]))

The geographical coordinate of Sao Paulo are -23.562869999999975, -46.654679999999985.


In [221]:
map_sp = folium.Map(location=[sp_geolocation[0], sp_geolocation[1]], zoom_start = 11)

for lat, lng, neighborhood in zip(df_sp['Latitude'], df_sp['Longitude'], df_sp['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup= label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_sp)
    

map_sp

Exploring Sao Paulo using Foursquare geo API:

In [224]:
import json
import requests
from pandas.io.json import json_normalize # Convert JSON into pandas dataframe

from foursquare_credentials import CLIENT_ID, CLIENT_SECRET

VERSION = '20180605' # API version
LIMIT = 100 # Default API limite value
RADIUS = 500
df_sp['Neighborhood'].unique()

array(['Agua Rasa', 'Alto de Pinheiros', 'Anhanguera', 'Aricanduva',
       'Artur Alvim', 'Barra Funda', 'Bela Vista', 'Belem', 'Bom Retiro',
       'Brasilandia', 'Butanta', 'Cachoeirinha', 'Cambuci', 'Campo Belo',
       'Campo Grande', 'Campo Limpo', 'Cangaiba', 'Capao Redondo',
       'Carrao', 'Casa Verde', 'Cidade Ademar', 'Cidade Dutra',
       'Cidade Lider', 'Cidade Tiradentes', 'Consolacao', 'Cursino',
       'Ermelino Matarazzo', 'Freguesia do O', 'Grajau', 'Guaianases',
       'Iguatemi', 'Ipiranga', 'Itaim Bibi', 'Itaim Paulista', 'Itaquera',
       'Jabaquara', 'Jacana', 'Jaguara', 'Jaguare', 'Jaragua',
       'Jardim Angela', 'Jardim Helena', 'Jardim Paulista',
       'Jardim Sao Luis', 'Lapa', 'Liberdade', 'Limao', 'Mandaqui',
       'Marsilac', 'Moema', 'Mooca', 'Morumbi', 'Parelheiros', 'Pari',
       'Parque do Carmo', 'Penha', 'Perdizes', 'Pinheiros', 'Ponte Rasa',
       'Raposo Tavares', 'Republica', 'Rio Pequeno', 'Sacoma',
       'Santa Cecilia', 'Santana', 'Sa

The function below was provided by the IBM DataScience course to process FourSquare API venues requests:

In [225]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [228]:
sp_venues = getNearbyVenues(df_sp['Neighborhood'], df_sp['Latitude'], df_sp['Longitude'])

Agua Rasa
Alto de Pinheiros
Anhanguera
Aricanduva
Artur Alvim
Barra Funda
Bela Vista
Belem
Bom Retiro
Brasilandia
Butanta
Cachoeirinha
Cambuci
Campo Belo
Campo Grande
Campo Limpo
Cangaiba
Capao Redondo
Carrao
Casa Verde
Cidade Ademar
Cidade Dutra
Cidade Lider
Cidade Tiradentes
Consolacao
Cursino
Ermelino Matarazzo
Freguesia do O
Grajau
Guaianases
Iguatemi
Ipiranga
Itaim Bibi
Itaim Paulista
Itaquera
Jabaquara
Jacana
Jaguara
Jaguare
Jaragua
Jardim Angela
Jardim Helena
Jardim Paulista
Jardim Sao Luis
Lapa
Liberdade
Limao
Mandaqui
Marsilac
Moema
Mooca
Morumbi
Parelheiros
Pari
Parque do Carmo
Penha
Perdizes
Pinheiros
Ponte Rasa
Raposo Tavares
Republica
Rio Pequeno
Sacoma
Santa Cecilia
Santana
Santo Amaro
Sao Domingos
Sao Lucas
Sao Mateus
Sao Miguel Paulista
Sao Rafael
Sapopemba
Saude
Se
Tatuape
Tremembe
Tucuruvi
Vila Andrade
Vila Curuca
Vila Formosa
Vila Guilherme
Vila Jacui
Vila Leopoldina
Vila Maria
Vila Mariana
Vila Matilde
Vila Medeiros
Vila Prudente
Vila Sonia


In [229]:
print(sp_venues.shape)
sp_venues.head()

(2317, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agua Rasa,-23.55337,-46.58027,Padaria Carillo,-23.553214,-46.578554,Bakery
1,Agua Rasa,-23.55337,-46.58027,Portuga Bar e Restaurante,-23.553846,-46.579573,Brazilian Restaurant
2,Agua Rasa,-23.55337,-46.58027,Chama Supermercados,-23.554178,-46.58112,Market
3,Agua Rasa,-23.55337,-46.58027,Bona's Carnes,-23.552434,-46.583091,Steakhouse
4,Agua Rasa,-23.55337,-46.58027,Padaria Santa Branca,-23.553953,-46.583706,Bakery


In [230]:
sp_venues.groupby(by='Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agua Rasa,25,25,25,25,25,25
Alto de Pinheiros,20,20,20,20,20,20
Anhanguera,3,3,3,3,3,3
Aricanduva,11,11,11,11,11,11
Artur Alvim,10,10,10,10,10,10
...,...,...,...,...,...,...
Vila Mariana,51,51,51,51,51,51
Vila Matilde,36,36,36,36,36,36
Vila Medeiros,21,21,21,21,21,21
Vila Prudente,24,24,24,24,24,24


In [313]:
sp_venues['Venue Category'].unique()

array(['Bakery', 'Brazilian Restaurant', 'Market', 'Steakhouse',
       'Deli / Bodega', 'Sushi Restaurant', 'Café', 'Optical Shop',
       'Furniture / Home Store', 'Burger Joint', 'Soccer Field',
       'Brewery', 'Farmers Market', 'Arts & Crafts Store', 'Diner',
       'Pharmacy', 'Gym / Fitness Center', 'BBQ Joint', 'Garden Center',
       'Pet Store', 'Restaurant', 'Bar', 'Plaza', 'Supermarket',
       'Dog Run', 'Flea Market', 'Trail', 'Convenience Store',
       'Fast Food Restaurant', 'Bookstore', 'Flower Shop', 'IT Services',
       'Tennis Court', 'Lake', 'Dessert Shop', 'Food & Drink Shop',
       'Bank', 'Grocery Store', 'Cafeteria', 'Pizza Place', 'Food Truck',
       'Athletics & Sports', 'Lounge', 'Beer Bar', 'Nightclub', 'Museum',
       'Theater', 'Indoor Play Area', 'Sandwich Place',
       'Portuguese Restaurant', 'Music Venue', 'Indie Movie Theater',
       'Cultural Center', 'Northeastern Brazilian Restaurant', 'Hotel',
       'Martial Arts School', 'Automotive Sho

In [231]:
print('There are {} uniques categories.'.format(len(sp_venues['Venue Category'].unique())))

There are 269 uniques categories.


In [232]:
# Applying one hot encoding
sp_onehot = pd.get_dummies(sp_venues[['Venue Category']], prefix='', prefix_sep='')

# add neighbourdhood back to dataframe
sp_onehot['Neighborhood'] = sp_venues['Neighborhood']

# move neighborhood column to first column
fixed_columns = ['Neighborhood'] +[col for col in sp_onehot.columns if col != 'Neighborhood' ]
sp_onehot = sp_onehot[fixed_columns]

sp_onehot.head()



Unnamed: 0,Neighborhood,Acai House,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,...,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo
0,Agua Rasa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Agua Rasa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Agua Rasa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Agua Rasa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Agua Rasa,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [234]:
sp_onehot.shape

(2317, 270)

In [235]:
sp_grouped = sp_onehot.groupby('Neighborhood').mean().reset_index()
sp_grouped.head(10)

Unnamed: 0,Neighborhood,Acai House,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,...,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo
0,Agua Rasa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alto de Pinheiros,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Anhanguera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aricanduva,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Artur Alvim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Barra Funda,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.011628,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Belem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bom Retiro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.055556,0.0,0.0
8,Brasilandia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Butanta,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [311]:
print(sp_grouped.shape)
sp_grouped[sp_grouped['Neighborhood'] == 'Agua Rasa']

(80, 270)


Unnamed: 0,Neighborhood,Acai House,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,...,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo
0,Agua Rasa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [237]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [302]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sp_grouped['Neighborhood']

for ind in np.arange(sp_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sp_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agua Rasa,Bakery,Farmers Market,Brazilian Restaurant,Furniture / Home Store,Garden Center,Market,Optical Shop,Soccer Field,Steakhouse,Gym / Fitness Center
1,Alto de Pinheiros,Plaza,Pharmacy,Convenience Store,Bar,Dog Run,Flea Market,Flower Shop,Trail,Restaurant,Market
2,Anhanguera,Brazilian Restaurant,Restaurant,Lake,Zoo,Food,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop
3,Aricanduva,Gym / Fitness Center,Dessert Shop,Brazilian Restaurant,Cafeteria,Food & Drink Shop,Grocery Store,Bank,Bakery,Fast Food Restaurant,Pet Store
4,Artur Alvim,Farmers Market,Bakery,Pharmacy,Pizza Place,Food Truck,Supermarket,Flower Shop,Food,Fast Food Restaurant,Fish Market


### Clustering Neighborhoods
Let's run the k-mean to cluster the neighborhood into 5 cluster

In [239]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [241]:
sp_grouped_clustering = sp_grouped.drop('Neighborhood',1)

In [301]:
# set number of clusters
kclusters = 3

# run k-mens clustering
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(sp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30]

array([2, 0, 2, 2, 2, 0, 0, 0, 2, 0, 2, 0, 0, 2, 2, 0, 0, 2, 0, 2, 0, 0,
       2, 2, 0, 2, 0, 2, 0, 0], dtype=int32)

In [303]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sp_merged = df_sp

#merge sp_grouped with df_sp to add latitude, longitude for each neighborhood

sp_merged = sp_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')

sp_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agua Rasa,-23.55337,-46.58027,2.0,Bakery,Farmers Market,Brazilian Restaurant,Furniture / Home Store,Garden Center,Market,Optical Shop,Soccer Field,Steakhouse,Gym / Fitness Center
1,Alto de Pinheiros,-23.55273,-46.70916,0.0,Plaza,Pharmacy,Convenience Store,Bar,Dog Run,Flea Market,Flower Shop,Trail,Restaurant,Market
2,Anhanguera,-23.42097,-46.78517,2.0,Brazilian Restaurant,Restaurant,Lake,Zoo,Food,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop
3,Aricanduva,-23.56771,-46.51025,2.0,Gym / Fitness Center,Dessert Shop,Brazilian Restaurant,Cafeteria,Food & Drink Shop,Grocery Store,Bank,Bakery,Fast Food Restaurant,Pet Store
4,Artur Alvim,-23.55105,-46.48,2.0,Farmers Market,Bakery,Pharmacy,Pizza Place,Food Truck,Supermarket,Flower Shop,Food,Fast Food Restaurant,Fish Market


In [304]:
sp_merged.dropna(inplace = True)
sp_merged['Cluster Labels'] = sp_merged['Cluster Labels'].astype('int32')

Let's visualize the resulting clusters

In [249]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [354]:
# create map
map_cluster = folium.Map(location=[sp_geolocation[0], sp_geolocation[1]], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lng, neighborhood, cluster in zip(sp_merged['Latitude'], sp_merged['Longitude'], sp_merged['Neighborhood'], sp_merged['Cluster Labels']):
    label = folium.Popup(str(neighborhood) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_cluster)

map_cluster