# IBM Data Science Capstone Project Problem Definition and Data

## Problem definition

#### A theoretical business problem was chosen for this project. The question to be answered is the following.

     A successful owner of multiple mid to high-end restaurants decided to open a new restaurant in São Paulo, Brazil. Having visited the city many times in recent yearsm he couldn't disregard the big boom in gastronomy. He is keen on opening a new unit, which will focus on the Italian kitchen.
     Taking in account the price level at which the restaurants will operate, the intent is to find an optimal location in an aream where gastronomy is booming and which is easily accessible for tourists and for weaslthier local citizens as well.

## Assumptions, business logic

    The assumption behind the analysis is that we can use unsupervised machine learning to create cluesters of districts that will provide us with a lista of areas for consideration for the restaurant. The intent is that the restaurant to be situated close to one of the gastronomical centres.
    
## Data

    To perform this analysis, we will need the following data:
    
    1. Lis of the districts of São Paulo
    2. Geo-coordinates of the districts in São Paulo
    3. Top venues of districts.
    
    Lis of district will be obtained from wikipedia:                     

        https://pt.wikipedia.org/wiki/Lista_dos_distritos_de_São_Paulo_por_população


    Geo-coordinates of districts will be obtained with the help of the geocoder tool in the notebook.
    Top venues data will be obtained from Foursquare through an API.

## Use of Data and Methodology

    After tidyinf up and exploring the data, we will apply the K-means machine learning technique for creating cluster of districts. We will use the silhouette score for choosing the optimal number of clusters.

In [24]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


In [25]:
wiki = ' https://pt.wikipedia.org/wiki/Lista_dos_distritos_de_São_Paulo_por_população'
wiki_page = requests.get(wiki)

In [26]:
soup = BeautifulSoup(wiki_page.content, 'html.parser')

In [27]:
table_contents=[]
table=soup.find('table', attrs={'class':'wikitable sortable'}).tbody
rows=table.find_all('tr')
columns_names ={v.text.replace('\n','') for v in rows[0].find_all('th')}

for i in range(1, len(rows)):
    tds=rows[i].find_all('td')
    cells ={}
    
    if len(tds)==3:
        values = [td.text.replace('\n',"") for td in tds]
        cells['Posição']=values[0]
        cells['Distrito']=values[1]
        cells['População']=values[2]
        table_contents.append(cells)

df=pd.DataFrame(table_contents)
df=df.drop(["Posição","População"],axis=1)

In [28]:
result=pd.DataFrame()
district = df['Distrito']
location = None
latitude = None
longitude = None

In [29]:
for data in range(0, len(district)):
    dt = district[data]
    
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(' {},São Paulo, SP'.format(dt),viewbox=((-23.15,-47.17),(-23.59,-46.15))) #view box was used to limit the area of search
    latitude = location.latitude
    longitude = location.longitude
    print(latitude, longitude)

    result = result.append({ 'Distrito':dt,'Latitude': location.latitude,'Longitude': location.longitude}, ignore_index=True)

-23.78590725 -46.66519713060974
-23.7125278 -46.7687195
-23.6043265 -46.5098851
-23.6719026 -46.779435420915036
-23.683573250000002 -46.737762089371984
-23.6730116 -46.6552806
-23.4482715 -46.69026927092207
-23.6012824 -46.6025552
-23.5017648 -46.3996091
-23.6520656 -46.650037329076994
-23.5824973 -46.4092065
-23.632557650000003 -46.759666126372395
-23.5360799 -46.4555099
-23.4482881 -46.6029761
-23.71865045 -46.7010388456856
-23.446592000000003 -46.73617751601424
-23.53624775 -46.41002184740884
-23.485533 -46.7219385
-23.5982995 -46.4817046
-23.510151 -46.41789278409091
-22.741347 -46.894846
-23.627159 -46.45324064213981
-23.449511450000003 -46.66366119497354
-23.500294 -46.458717352058315
-23.594946 -46.545899798292474
-23.4874636 -46.6951317
-23.5058996 -46.5314253
-23.48228355 -46.423410226982504
-23.8326395 -46.70985686242149
-23.615177950000003 -46.643393343146286
-23.5837 -46.632740824206934
-23.487707 -46.5844955
-23.523683 -46.5437815
-23.6182115 -46.418977389793156
-23.625687

In [31]:
result

AttributeError: 'NoneType' object has no attribute 'items'

               Distrito   Latitude  Longitude
0                Grajaú -23.785907 -46.665197
1         Jardim Ângela -23.712528 -46.768720
2             Sapopemba -23.604326 -46.509885
3         Capão Redondo -23.671903 -46.779435
4       Jardim São Luís -23.683573 -46.737762
5         Cidade Ademar -23.673012 -46.655281
6           Brasilândia -23.448272 -46.690269
7                Sacomã -23.601282 -46.602555
8        Itaim Paulista -23.501765 -46.399609
9             Jabaquara -23.652066 -46.650037
10    Cidade Tiradentes -23.582497 -46.409207
11          Campo Limpo -23.632558 -46.759666
12             Itaquera -23.536080 -46.455510
13             Tremembé -23.448288 -46.602976
14         Cidade Dutra -23.718650 -46.701039
15              Jaraguá -23.446592 -46.736178
16              Lajeado -23.536248 -46.410022
17             Pirituba -23.485533 -46.721939
18           São Mateus -23.598299 -46.481705
19          Vila Curuçá -23.510151 -46.417893
21           São Rafael -23.627159

In [30]:
#Some outliers was observed. The best solution for these cases was to drop the rows
result=result[result.Distrito != 'Pedreira']
result=result[result.Distrito != 'Marsilac']
result=result[result.Distrito != 'Parelheiros']

In [12]:
map_saopaulo = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, district in zip(result['Latitude'], result['Longitude'], result['Distrito']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_saopaulo)  
    
map_saopaulo

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    CLIENT_ID = '0OLGHJWIL4I3JPD1NI3X13XOZLY2VI52SVWTEQNSG0WHTLZF' # your Foursquare ID
    CLIENT_SECRET = 'CXMXHLTPDR40SKZOTHHPHGKRCU00YPO3JTEEH12KM0JRA5ES' # your Foursquare Secret
    VERSION = '20180605' # Foursquare API version
    LIMIT = 100 # A default Foursquare API limit value

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
saopaulo_venues = getNearbyVenues(names=result.Distrito,
                                   latitudes=result.Latitude,
                                   longitudes=result.Longitude
                                  )

Grajaú
Jardim Ângela
Sapopemba
Capão Redondo
Jardim São Luís
Cidade Ademar
Brasilândia
Sacomã
Itaim Paulista
Jabaquara
Cidade Tiradentes
Campo Limpo
Itaquera
Tremembé
Cidade Dutra
Jaraguá
Lajeado
Pirituba
São Mateus
Vila Curuçá
São Rafael
Cachoeirinha
Vila Jacuí
São Lucas
Freguesia do Ó
Cangaíba
Jardim Helena
Saúde
Vila Mariana
Vila Medeiros
Penha
Iguatemi
Vila Andrade
Cidade Líder
José Bonifácio
Santana
Rio Pequeno
Ermelino Matarazzo
Vila Maria
Perdizes
Cursino
Vila Sônia
Mandaqui
Ipiranga
Artur Alvim
Vila Matilde
Vila Prudente
Guaianases
Campo Grande
Raposo Tavares
Tucuruvi
Vila Formosa
Jaçanã
Ponte Rasa
Itaim Bibi
São Miguel Paulista
Tatuapé
Aricanduva
Jardim Paulista
Casa Verde
Água Rasa
São Domingos
Santa Cecília
Moema
Carrão
Limão
Perus
Santo Amaro
Bela Vista
Liberdade
Parque do Carmo
Anhanguera
Lapa
Pinheiros
Mooca
Campo Belo
Consolação
Vila Guilherme
Butantã
República
Jaguaré
Morumbi
Belém
Alto de Pinheiros
Vila Leopoldina
Socorro
Cambuci
Bom Retiro
Brás
Jaguara
Sé
Pari
Barra F

In [18]:
saopaulo_venues

AttributeError: 'NoneType' object has no attribute 'items'

                 District  District Latitude  District Longitude  \
0           Jardim Ângela         -23.712528          -46.768720   
1           Jardim Ângela         -23.712528          -46.768720   
2           Jardim Ângela         -23.712528          -46.768720   
3               Sapopemba         -23.604326          -46.509885   
4               Sapopemba         -23.604326          -46.509885   
5               Sapopemba         -23.604326          -46.509885   
6               Sapopemba         -23.604326          -46.509885   
7               Sapopemba         -23.604326          -46.509885   
8               Sapopemba         -23.604326          -46.509885   
9               Sapopemba         -23.604326          -46.509885   
10              Sapopemba         -23.604326          -46.509885   
11              Sapopemba         -23.604326          -46.509885   
12              Sapopemba         -23.604326          -46.509885   
13          Capão Redondo         -23.671903    

In [19]:
saopaulo_venues.shape

(2645, 7)

In [21]:
# one hot encoding
saopaulo_onehot = pd.get_dummies(saopaulo_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
saopaulo_onehot['District'] = saopaulo_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [saopaulo_onehot.columns[-1]] + list(saopaulo_onehot.columns[:-1])
saopaulo_onehot = saopaulo_onehot[fixed_columns]

saopaulo_onehot.head()

AttributeError: 'NoneType' object has no attribute 'items'

        District  Acai House  Accessories Store  African Restaurant  \
0  Jardim Ângela           0                  0                   0   
1  Jardim Ângela           0                  0                   0   
2  Jardim Ângela           0                  0                   0   
3      Sapopemba           0                  0                   0   
4      Sapopemba           0                  0                   0   

   American Restaurant  Antique Shop  Arcade  Argentinian Restaurant  \
0                    0             0       0                       0   
1                    0             0       0                       0   
2                    0             0       0                       0   
3                    0             0       0                       0   
4                    0             0       0                       0   

   Art Gallery  Art Museum  Art Studio  Arts & Crafts Store  \
0            0           0           0                    0   
1            0

In [23]:
saopaulo_grouped = saopaulo_onehot.groupby('District').mean().reset_index()
saopaulo_grouped

AttributeError: 'NoneType' object has no attribute 'items'

               District  Acai House  Accessories Store  African Restaurant  \
0     Alto de Pinheiros    0.000000           0.000000            0.000000   
1            Anhanguera    0.000000           0.000000            0.000000   
2            Aricanduva    0.000000           0.000000            0.000000   
3           Artur Alvim    0.000000           0.000000            0.000000   
4           Barra Funda    0.000000           0.000000            0.000000   
5            Bela Vista    0.000000           0.000000            0.000000   
6                 Belém    0.000000           0.000000            0.000000   
7            Bom Retiro    0.000000           0.000000            0.000000   
8           Brasilândia    0.000000           0.000000            0.000000   
9                  Brás    0.000000           0.000000            0.000000   
10              Butantã    0.000000           0.000000            0.000000   
11              Cambuci    0.000000           0.000000          

In [240]:
num_top_venues = 5

for hood in saopaulo_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = saopaulo_grouped[saopaulo_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alto de Pinheiros----
                      venue  freq
0                     Plaza   0.4
1              Tennis Court   0.1
2  Bike Rental / Bike Share   0.1
3                       Spa   0.1
4                   Dog Run   0.1


----Anhanguera----
               venue  freq
0        Pizza Place  0.14
1      Grocery Store  0.14
2  Convenience Store  0.14
3          Pet Store  0.14
4              Plaza  0.14


----Aricanduva----
                  venue  freq
0                Bakery  0.33
1  Gym / Fitness Center  0.17
2        Clothing Store  0.17
3           Candy Store  0.17
4         Grocery Store  0.17


----Artur Alvim----
              venue  freq
0  Department Store  0.11
1            Bakery  0.11
2    Cosmetics Shop  0.11
3       Pizza Place  0.11
4  Recording Studio  0.05


----Barra Funda----
                  venue  freq
0           Music Venue  0.10
1  Brazilian Restaurant  0.08
2                  Café  0.08
3            Restaurant  0.08
4        Chocolate Shop  0.05


----

In [241]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [289]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = saopaulo_grouped['Neighborhood']

for ind in np.arange(saopaulo_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(saopaulo_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alto de Pinheiros,Plaza,Tennis Court,Bike Rental / Bike Share,Spa,Dog Run,Trail,Gym / Fitness Center,Pastelaria,Park,Paper / Office Supplies Store
1,Anhanguera,Pizza Place,Grocery Store,Convenience Store,Pet Store,Plaza,Bus Station,Gym / Fitness Center,Pastelaria,Park,Paper / Office Supplies Store
2,Aricanduva,Bakery,Gym / Fitness Center,Clothing Store,Candy Store,Grocery Store,Acai House,Pedestrian Plaza,Pastry Shop,Pastelaria,Park
3,Artur Alvim,Department Store,Bakery,Cosmetics Shop,Pizza Place,Recording Studio,Pharmacy,Pet Store,Chocolate Shop,Pastelaria,Sports Bar
4,Barra Funda,Music Venue,Brazilian Restaurant,Café,Restaurant,Chocolate Shop,Japanese Restaurant,Sandwich Place,Bookstore,Food & Drink Shop,Motel


In [290]:
# set number of clusters
kclusters = 10
kmeans=[]
saopaulo_grouped_clustering = saopaulo_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(saopaulo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

saopaulo_merged = result

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
saopaulo_merged = saopaulo_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Distrito')

saopaulo_merged.head() # check the last columns!

Unnamed: 0,Distrito,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Grajaú,-23.785907,-46.665197,,,,,,,,,,,
1,Jardim Ângela,-23.712528,-46.76872,9.0,Health & Beauty Service,Pastelaria,Bakery,Noodle House,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Perfume Shop
2,Sapopemba,-23.604326,-46.509885,5.0,Breakfast Spot,Gym,Pastelaria,Market,Pharmacy,Grocery Store,Plaza,Gastropub,Falafel Restaurant,Northern Brazilian Restaurant
3,Capão Redondo,-23.671903,-46.779435,0.0,Electronics Store,Plaza,Park,Flea Market,Acai House,Paella Restaurant,Pastry Shop,Pastelaria,Paper / Office Supplies Store,Outdoors & Recreation
4,Jardim São Luís,-23.683573,-46.737762,2.0,Pizza Place,Japanese Restaurant,Playground,Department Store,Organic Grocery,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Paella Restaurant


In [291]:
saopaulo_merged

Unnamed: 0,Distrito,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Grajaú,-23.785907,-46.665197,,,,,,,,,,,
1,Jardim Ângela,-23.712528,-46.76872,9.0,Health & Beauty Service,Pastelaria,Bakery,Noodle House,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Perfume Shop
2,Sapopemba,-23.604326,-46.509885,5.0,Breakfast Spot,Gym,Pastelaria,Market,Pharmacy,Grocery Store,Plaza,Gastropub,Falafel Restaurant,Northern Brazilian Restaurant
3,Capão Redondo,-23.671903,-46.779435,0.0,Electronics Store,Plaza,Park,Flea Market,Acai House,Paella Restaurant,Pastry Shop,Pastelaria,Paper / Office Supplies Store,Outdoors & Recreation
4,Jardim São Luís,-23.683573,-46.737762,2.0,Pizza Place,Japanese Restaurant,Playground,Department Store,Organic Grocery,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Paella Restaurant
5,Cidade Ademar,-23.673012,-46.655281,1.0,BBQ Joint,Bar,Gymnastics Gym,Paella Restaurant,Pedestrian Plaza,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Acai House
6,Brasilândia,-23.448272,-46.690269,8.0,Big Box Store,Food Truck,Pizza Place,Farmers Market,Northern Brazilian Restaurant,Office,Northeastern Brazilian Restaurant,Optical Shop,Noodle House,Pedestrian Plaza
7,Sacomã,-23.601282,-46.602555,5.0,Pharmacy,Department Store,Brazilian Restaurant,Bar,Bus Station,Chocolate Shop,Farmers Market,Cosmetics Shop,Diner,Clothing Store
8,Itaim Paulista,-23.501765,-46.399609,5.0,Dessert Shop,Japanese Restaurant,Gym / Fitness Center,Bakery,Bowling Alley,Pizza Place,Grocery Store,Food Truck,Chocolate Shop,Park
9,Jabaquara,-23.652066,-46.650037,5.0,Pizza Place,Convenience Store,Soccer Field,Miscellaneous Shop,Candy Store,Bakery,Brazilian Restaurant,Breakfast Spot,Office,Optical Shop


In [292]:
# create map
test=saopaulo_merged.iloc[1:]

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(test['Latitude'], test['Longitude'], test['Distrito'], test['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [293]:
test.loc[test['Cluster Labels'] == 0, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Capão Redondo,Plaza,Park,Flea Market,Acai House,Paella Restaurant,Pastry Shop,Pastelaria,Paper / Office Supplies Store,Outdoors & Recreation


In [294]:
test.loc[test['Cluster Labels'] == 1, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Cidade Ademar,Bar,Gymnastics Gym,Paella Restaurant,Pedestrian Plaza,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Acai House
13,Tremembé,Bar,Gym,Bakery,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Outdoors & Recreation
21,São Rafael,Health & Beauty Service,Circus,Park,Brewery,Plaza,Pool,Nightclub,Noodle House,Northeastern Brazilian Restaurant
63,São Domingos,Grocery Store,Gym / Fitness Center,Burger Joint,Diner,Snack Place,Chinese Restaurant,Pastry Shop,Pastelaria,Park
72,Parque do Carmo,Brazilian Restaurant,IT Services,Planetarium,Acai House,Outdoors & Recreation,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store
76,Mooca,Burger Joint,Mexican Restaurant,Japanese Restaurant,Gym / Fitness Center,Food Truck,Acai House,Drugstore,Snack Place,Chinese Restaurant


In [296]:
test.loc[test['Cluster Labels'] == 2, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Jardim São Luís,Japanese Restaurant,Playground,Department Store,Organic Grocery,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Paella Restaurant
29,Saúde,Pizza Place,Gym / Fitness Center,Pet Store,Pharmacy,Restaurant,Farmers Market,Sandwich Place,Bar,Bagel Shop
46,Artur Alvim,Bakery,Cosmetics Shop,Pizza Place,Recording Studio,Pharmacy,Pet Store,Chocolate Shop,Pastelaria,Sports Bar


In [297]:
test.loc[test['Cluster Labels'] == 3, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Cachoeirinha,Acai House,Perfume Shop,Pedestrian Plaza,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Paella Restaurant,Outdoors & Recreation


In [298]:
test.loc[test['Cluster Labels'] == 4, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,José Bonifácio,General Entertainment,Comfort Food Restaurant,Acai House,Outdoors & Recreation,Pedestrian Plaza,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store


In [299]:
test.loc[test['Cluster Labels'] == 5, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Sapopemba,Gym,Pastelaria,Market,Pharmacy,Grocery Store,Plaza,Gastropub,Falafel Restaurant,Northern Brazilian Restaurant
7,Sacomã,Department Store,Brazilian Restaurant,Bar,Bus Station,Chocolate Shop,Farmers Market,Cosmetics Shop,Diner,Clothing Store
8,Itaim Paulista,Japanese Restaurant,Gym / Fitness Center,Bakery,Bowling Alley,Pizza Place,Grocery Store,Food Truck,Chocolate Shop,Park
9,Jabaquara,Convenience Store,Soccer Field,Miscellaneous Shop,Candy Store,Bakery,Brazilian Restaurant,Breakfast Spot,Office,Optical Shop
10,Cidade Tiradentes,Furniture / Home Store,Wings Joint,Bus Station,Clothing Store,Pharmacy,Electronics Store,Acai House,Pastry Shop,Pastelaria
11,Campo Limpo,Plaza,Dessert Shop,Diner,Cosmetics Shop,Ice Cream Shop,Big Box Store,Fried Chicken Joint,Thrift / Vintage Store,Street Art
12,Itaquera,Clothing Store,Café,Pizza Place,Pharmacy,Convenience Store,Martial Arts School,Pastelaria,Persian Restaurant,Pet Store
14,Cidade Dutra,Plaza,Grocery Store,Pool,Convenience Store,Ice Cream Shop,Bus Station,Gym / Fitness Center,Snack Place,Park
16,Lajeado,Grocery Store,Martial Arts School,Pharmacy,Pizza Place,Asian Restaurant,Pastelaria,Park,Paper / Office Supplies Store,Paella Restaurant
17,Pirituba,Pharmacy,Comfort Food Restaurant,Chocolate Shop,Fast Food Restaurant,Tea Room,Gym / Fitness Center,Rest Area,Hot Dog Joint,Bar


In [300]:
test.loc[test['Cluster Labels'] == 6, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Jaraguá,Grocery Store,Italian Restaurant,Convenience Store,Brazilian Restaurant,Ice Cream Shop,Pet Store,Pharmacy,Gym / Fitness Center,Paella Restaurant
26,Cangaíba,Café,Pharmacy,Gym / Fitness Center,Chocolate Shop,Optical Shop,Organic Grocery,Outdoors & Recreation,Paella Restaurant,Perfume Shop
42,Cursino,Brazilian Restaurant,Gym / Fitness Center,Arts & Crafts Store,Pharmacy,Market,Food & Drink Shop,Farmers Market,Furniture / Home Store,Outdoors & Recreation
50,Campo Grande,Dessert Shop,Brazilian Restaurant,Convenience Store,Gym / Fitness Center,Chocolate Shop,Farmers Market,Candy Store,Pharmacy,Gym
59,Aricanduva,Gym / Fitness Center,Clothing Store,Candy Store,Grocery Store,Acai House,Pedestrian Plaza,Pastry Shop,Pastelaria,Park
68,Perus,Chocolate Shop,Convenience Store,Dessert Shop,Gymnastics Gym,Gym / Fitness Center,Pastelaria,Perfume Shop,Performing Arts Venue,Pedestrian Plaza


In [301]:
test.loc[test['Cluster Labels'] == 7, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Iguatemi,Gay Bar,Fast Food Restaurant,Outdoors & Recreation,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Paella Restaurant,Acai House
91,Jaguara,Gym,Seafood Restaurant,Electronics Store,Pedestrian Plaza,Pastry Shop,Pastelaria,Park,Paper / Office Supplies Store,Acai House


In [302]:
test.loc[test['Cluster Labels'] == 8, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Brasilândia,Food Truck,Pizza Place,Farmers Market,Northern Brazilian Restaurant,Office,Northeastern Brazilian Restaurant,Optical Shop,Noodle House,Pedestrian Plaza


In [303]:
test.loc[test['Cluster Labels'] == 9, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Jardim Ângela,Pastelaria,Bakery,Noodle House,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Perfume Shop


In [304]:
test.loc[test['Cluster Labels'] == 10, test.columns[[0] + list(range(5, test.shape[1]))]]

Unnamed: 0,Distrito,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
