<h1 align=center><font size = 5>Starbucks in Quito, EC</font></h1>
<h1 align=center><font size = 4>Finding the Best Location for the First Starbucks in Quito </font></h1>
<h1 align=center><font size = 3>by Andrés Borja</font></h1>


## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Introduction</a>

2. <a href="#item2">Data</a>

    2.1 <a href="#item3">Data sources</a>
    
    2.2 <a href="#item4">Download and explore dataset</a>
    
    2.3 <a href="#item5">Data cleaning</a>
    
3. <a href="#item6">Data analysis</a>
    
    3.1 <a href="#item7">Exploring the neighborhoods in Quito</a>
    
    3.2 <a href="#item8">Analyzing each neighborhood’s venue categories</a>
    
    3.3 <a href="#item9">Clustering neighborhoods</a>
    
    3.4 <a href="#item10">Examining resulting clusters</a>

  
</font>
</div>

## 1. Introduction 
In spite pf being one of Latin America's fastest growing coffee producers, Ecuador does not have a Starbucks store yet, not even in its capital city, Quito. However, Quito is a big city; in order to guarantee its success, Starbucks should carefully and strategically plan where exactly to open its first store in Ecuador’s capital. They should try to pick the best area or neighborhood based on the kind of venues that already exist there, which should be economically tied to coffee shops. **By classifying the neighborhoods of Quito in terms of its extant venues, Starbucks could make a better informed decision of where to open their first store in Ecuador’s capital, thus improving the likelihood of their success.** Fortunately, several data science and machine learning techniques can help Starbucks explore and cluster Quito’s neighborhoods in order to better understand the city, its structure and its dynamics, so that they can accomplish their business goals. 

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt  

# import k-means and some asociated libraries for model evaluation for the clustering stage
from sklearn.cluster import KMeans, SpectralClustering
from sklearn.preprocessing import StandardScaler 

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


## 2. Data 

### 2.1 Data sources
In order to be able to classify Quito’s neighborhoods on a map, we will need a database which includes all of the **city’s neighborhoods with their geographical coordinates**. The city’s municipality website has several georeferenced databases of the city’s geopolitical and administrative divisions. We picked the databased named “Barrio - Sector”, from their [website](http://gobiernoabierto.quito.gob.ec/?page_id=1122). We downloaded the data, converted it to a JSON file using ArcGIS and stored it in our GitHub repository. 

The classification parameter will be the kinds of venues that exist in each neighborhood. This information will be obtained from Foursquare, a location data provider with information about all manner of venues and events within an area of interest. Such information includes venues names, categories, locations, menus, ratings and even pictures. We will be using using the [Foursquare API](https://developer.foursquare.com/docs/places-api/) to explore the **venues surrounding the coordinates for each neighborhood**, and then classifying the neighborhoods based on their assigned categories. 

### 2.2 Download and Explore Dataset

In [3]:
!wget -q -O 'uio_barrios_urbanos.json' https://raw.githubusercontent.com/andresborja42/Coursera_Capstone/master/uio_barrios_urbanos.json
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

In [4]:
with open('uio_barrios_urbanos.json') as json_data:
    barrios_uio = json.load(json_data)

In [None]:
barrios_uio

All the relevant data seems to be in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
barrios_data = barrios_uio['features']

Let's take a look at the first item in this list.

In [7]:
barrios_data[0]

{'attributes': {'FID': 0,
  'NOMBRE': 'NUEVA VIDA',
  'BARRIO_ID': '01050014',
  'Lat_y': -0.273928072625,
  'Long_x': -78.5637147392},
 'geometry': {'rings': [[[492803.3856000006, 9969745.8419],
    [492867.42530000024, 9969782.9017],
    [493008.8746000007, 9969655.3123],
    [493001.6347000003, 9969645.5424],
    [492990.3047000002, 9969639.2724],
    [492982.8848000001, 9969635.9524],
    [492955.59490000084, 9969627.6625],
    [492919.9650999997, 9969613.6425],
    [492831.6054999996, 9969701.2221],
    [492803.3856000006, 9969745.8419]]]}}

#### Transform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [6]:
# Define the dataframe columns
column_names = ['ID', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

We take a look at the dataframe to make sure that the columns are as intended.

In [9]:
neighborhoods

Unnamed: 0,ID,Neighborhood,Latitude,Longitude


Now let's loop through the data and fill the dataframe one row at a time.

In [7]:
for data in barrios_data:
    ID = data['attributes']['BARRIO_ID'] 
    neighborhood_name = data['attributes']['NOMBRE']
    neighborhood_lat = data['attributes']['Lat_y']
    neighborhood_lon = data['attributes']['Long_x']
    
    neighborhoods = neighborhoods.append({'ID': ID,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [11]:
neighborhoods.head()

Unnamed: 0,ID,Neighborhood,Latitude,Longitude
0,1050014,NUEVA VIDA,-0.273928,-78.563715
1,1050026,VENCEREMOS,-0.272159,-78.564992
2,4070005,AVIACION CIVIL,-0.155464,-78.487854
3,1050017,S.MARTHA ALT CHIL,-0.275713,-78.561793
4,2010005,LA RAYA A,-0.255895,-78.543519


In [12]:
print(neighborhoods.shape)

(517, 4)


### 2.3 Data cleaning
The municipality’s neighborhoods database includes all of the neighborhoods that make up the Metropolitan District of Quito, which is classified in terms of urban and rural boroughs. There are a total of 1268 neighborhoods in the MD of Quito. We decided to work with the urban area neighborhoods only, because rural areas are too remote and isolated, with very low population density. Almost no venues could be found around the rural neighborhoods when exploring them through Foursquare, so it wouldn’t make sense to include them in our analysis. The urban area is made up of 516 neighborhoods. 

We now checked for missing values. By exploring the dataframe, we realized all those values in the "Neighborhood" column which start with the "SIN NOMBRE" string, which means "no name" in Spanish, are missing values. If we don't have manes for those neighborhoods, we better get rid of them. So we first replace the values which start with "SIN NOMBRE" for the Pandas object type for missing values, which is np.NaN

In [8]:
# Cleaning the dataframe 

neighborhoods.loc[neighborhoods['Neighborhood'].str.startswith("SIN NOMBRE"), 'Neighborhood'] = np.NaN
neighborhoods.head()

Unnamed: 0,ID,Neighborhood,Latitude,Longitude
0,1050014,NUEVA VIDA,-0.273928,-78.563715
1,1050026,VENCEREMOS,-0.272159,-78.564992
2,4070005,AVIACION CIVIL,-0.155464,-78.487854
3,1050017,S.MARTHA ALT CHIL,-0.275713,-78.561793
4,2010005,LA RAYA A,-0.255895,-78.543519


We verify if they are now of NaN type.

In [9]:
neighborhoods.isna().sum()

ID               0
Neighborhood    28
Latitude         0
Longitude        0
dtype: int64

And now we get rid of those 28 NaN values and reset the index.

In [10]:
neighborhoods = neighborhoods.dropna()
neighborhoods = neighborhoods.reset_index(drop=True)

In [16]:
neighborhoods.head()

Unnamed: 0,ID,Neighborhood,Latitude,Longitude
0,1050014,NUEVA VIDA,-0.273928,-78.563715
1,1050026,VENCEREMOS,-0.272159,-78.564992
2,4070005,AVIACION CIVIL,-0.155464,-78.487854
3,1050017,S.MARTHA ALT CHIL,-0.275713,-78.561793
4,2010005,LA RAYA A,-0.255895,-78.543519


In [17]:
print(neighborhoods.shape)

(489, 4)


So in the end, 28 of these urban neighborhoods did not have a name assigned in the database. We decided to exclude them from the analysis as well. In the end we kept **488 neighborhoods from Quito’s urban area**. 

#### Use geopy library to get the latitude and longitude values of Quito.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>uio_explorer</em>, as shown below.

In [11]:
address = 'Quito, EC'

geolocator = Nominatim(user_agent="uio_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Quito are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Quito are -0.2201641, -78.5123274.


#### Create a map of Quito with neighborhoods superimposed on top.

In [12]:
# create map of Quito using latitude and longitude values
map_uio = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, ID, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['ID'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, ID)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_uio)  
    
map_uio

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [13]:
CLIENT_ID = 'IT10XUOI4FLKLXPX1HGNZ1SORXYGADP0UHVYLG5BQ0KHM3Z2' 
CLIENT_SECRET = 'FPXPR1NXOEZPETRUTNY43TQH3ZLDNPLVIFLDOPYDZ3HTDXFX' 
VERSION = '20200619' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IT10XUOI4FLKLXPX1HGNZ1SORXYGADP0UHVYLG5BQ0KHM3Z2
CLIENT_SECRET:FPXPR1NXOEZPETRUTNY43TQH3ZLDNPLVIFLDOPYDZ3HTDXFX


#### Let's explore the first neighborhood in our dataframe.

In [14]:
neighborhoods.loc[0, 'Neighborhood']

'NUEVA VIDA'

Get the neighborhood's latitude and longitude values.

In [15]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of NUEVA VIDA are -0.273928072625, -78.5637147392.


#### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [16]:
LIMIT = 50
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=IT10XUOI4FLKLXPX1HGNZ1SORXYGADP0UHVYLG5BQ0KHM3Z2&client_secret=FPXPR1NXOEZPETRUTNY43TQH3ZLDNPLVIFLDOPYDZ3HTDXFX&v=20200619&ll=-0.273928072625,-78.5637147392&radius=1000&limit=50'

Send the GET request and examine the resutls.

In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ef529cf513bb60f51e6392f'},
  'headerLocation': 'Chillogallo',
  'headerFullLocation': 'Chillogallo, Quito',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': -0.264928063624991,
    'lng': -78.55473143157906},
   'sw': {'lat': -0.28292808162500904, 'lng': -78.57269804682093}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5340231c498e0578b618c49d',
       'name': 'Encebollado Manabita',
       'location': {'lat': -0.27782320976257324,
        'lng': -78.55779266357422,
        'labeledLatLngs': [{'label': 'display',
          'lat': -0.27782320976257324,
          'lng': -78.55779266357422}],
        'distance': 789,
        'cc': 'EC',
        'country': 'Ecuador',
        'for

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Encebollado Manabita,Latin American Restaurant,-0.277823,-78.557793
1,Fritadas El Juncal,BBQ Joint,-0.272065,-78.556384
2,El Español,Breakfast Spot,-0.2743,-78.555


In [21]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


We now have access to both the neighborhoods and the venues datasets. We can now start exploring the neighborhoods based on their surrounding venues.

## 3. Data analysis

### 3.1 Exploring the Neighborhoods in Quito

#### We create a function to repeat the same process to all the neighborhoods in Quito

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now we run the above function on each neighborhood and create a new dataframe called *uio_venues*.

In [23]:
uio_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

NUEVA VIDA
VENCEREMOS
AVIACION CIVIL
S.MARTHA ALT CHIL
LA RAYA A
RUPERTO ALARCON
SAN ANTONIO
LA LIBERTAD BAJO
STA BARBARA ALTA
SOLANDA
CELAUR
EUGENIO ESPEJO
SANTIAGO 1
S.ROSA CHIL 3ETP
S.TOSPAMBA
UNIVERSI CENTRAL
LOS PEDESTALES
QUITO W
20 DE MAYO
PRESIDENCIA REPUBLICA
S. TOSPAMBA
FRANKLIN TELLO
S.FERNANDO
VISTA HERMOSA
COLINAS DEL SUR
FELIXRIVADENEIRA
DAMMER 2
STA. BARBARA BAJA
TACHINA 2
LA MERCED
ANA MARIA
ALEGRIA N 1
STA.ROSA SINGUNA
LOS CIPRESES
SANTA INES 2
CAUSAYLLACTA
EL GUABO
S.ROSA ALTA CHIL
HDA GUAPULO
STA BARBARA BAJA
LOS CIPRESES
MIRAFLORES ALTO
S.CARLOS VENCEN
EL TRANSITO
CAMILO PONCE
PROF MUNICIPALES
LA ISLA
AREA VERDE
LA FLORIDA
MALDONADO
MADRIGAL
FERROVIARIA ALTA
VIRGENPATA
S.PEDRO MONJAS
S.AGUSTIN
EL PARAISO
BELLA MARIA
LOTIZ QUINGAIZA
ESPERANZA BAR
JARDINES DEL BATAN
FERROVIARIA MEDIA
MIRAFLORES BAJO
COND.S.PICHINCHA
EL ROCIO
LA ESTANCIA
STA.LUCIA ALTA
GERMAN AVILA
ARGENTINA
JOSEFINAENRIQUEZ
EL CARMEN
PALUCO B
LIBERTAD BAJO
LOS ARRAYANES
DAMMER
EL EDEN
EL PEDREGAL
MIRA

#### Let's check the size of the resulting dataframe

In [24]:
print(uio_venues.shape)
uio_venues.head(10)

(2547, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,NUEVA VIDA,-0.273928,-78.563715,S29C Y OE12E,-0.272326,-78.563025,Bus Stop
1,NUEVA VIDA,-0.273928,-78.563715,Mariachi Fiesta Mexicana,-0.275757,-78.565411,Convention Center
2,VENCEREMOS,-0.272159,-78.564992,S29C Y OE12E,-0.272326,-78.563025,Bus Stop
3,VENCEREMOS,-0.272159,-78.564992,Mariachi Fiesta Mexicana,-0.275757,-78.565411,Convention Center
4,AVIACION CIVIL,-0.155464,-78.487854,Menestras del Primo,-0.155252,-78.488654,Restaurant
5,AVIACION CIVIL,-0.155464,-78.487854,Cachorros,-0.153951,-78.491237,Gym
6,AVIACION CIVIL,-0.155464,-78.487854,La Michoacana,-0.158246,-78.490567,Mexican Restaurant
7,AVIACION CIVIL,-0.155464,-78.487854,El Manglar De Las Conchas,-0.155171,-78.488566,Seafood Restaurant
8,AVIACION CIVIL,-0.155464,-78.487854,Los legitimos helados de paila de la Concepcion,-0.153525,-78.491681,Ice Cream Shop
9,AVIACION CIVIL,-0.155464,-78.487854,La tortilla,-0.153528,-78.490682,Arepa Restaurant


We now check how many venues were returned for each neighborhood.

In [25]:
uio_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10 DE JUNIO,4,4,4,4,4,4
1RA ZONA AEREA,4,4,4,4,4,4
1RO MAYO MONJAS,5,5,5,5,5,5
2 DE FEBRERO,5,5,5,5,5,5
23 DE MAYO,1,1,1,1,1,1
23 JUNIO BARRIO,4,4,4,4,4,4
6 DE DICIEMBRE,26,26,26,26,26,26
AEREONAUTICO,8,8,8,8,8,8
AEROPUERTO,14,14,14,14,14,14
AGUA CLARA,5,5,5,5,5,5


#### Let's find out how many unique categories can be curated from all the returned venues

In [26]:
print('There are {} uniques categories.'.format(len(uio_venues['Venue Category'].unique())))

There are 217 uniques categories.


### 3.2 Analyzing each neighborhood's venue categories

**One hot encoding** is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction. For the K-means clustering algorithm, all unique items under "Venue Category" are one-hot encoded.

In [27]:
# one hot encoding
uio_onehot = pd.get_dummies(uio_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
uio_onehot['Neighborhood'] = uio_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [uio_onehot.columns[-1]] + list(uio_onehot.columns[:-1])
uio_onehot = uio_onehot[fixed_columns]

uio_onehot.head()

Unnamed: 0,Neighborhood,Airport Terminal,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Garden,Big Box Store,Bike Shop,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Cantonese Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Entertainment Service,Event Service,Event Space,Factory,Farmers Market,Fast Food Restaurant,Field,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Funeral Home,Furniture / Home Store,Garden,Gastropub,General Entertainment,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nightclub,Noodle House,Other Great Outdoors,Other Repair Shop,Outlet Mall,Paella Restaurant,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stadium,Stationery Store,Steakhouse,Storage Facility,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Theater,Theme Park,Tourist Information Center,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Zoo
0,NUEVA VIDA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,NUEVA VIDA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,VENCEREMOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,VENCEREMOS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,AVIACION CIVIL,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [28]:
uio_onehot.shape

(2547, 218)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [29]:
uio_grouped = uio_onehot.groupby('Neighborhood').mean().reset_index()
uio_grouped

Unnamed: 0,Neighborhood,Airport Terminal,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Garden,Big Box Store,Bike Shop,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Cantonese Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Entertainment Service,Event Service,Event Space,Factory,Farmers Market,Fast Food Restaurant,Field,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Funeral Home,Furniture / Home Store,Garden,Gastropub,General Entertainment,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nightclub,Noodle House,Other Great Outdoors,Other Repair Shop,Outlet Mall,Paella Restaurant,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Print Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Stadium,Stationery Store,Steakhouse,Storage Facility,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Theater,Theme Park,Tourist Information Center,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Zoo
0,10 DE JUNIO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1RA ZONA AEREA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1RO MAYO MONJAS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2 DE FEBRERO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,23 DE MAYO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,23 JUNIO BARRIO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,6 DE DICIEMBRE,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.307692,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.076923,0.0,0.0
7,AEREONAUTICO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,AEROPUERTO,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,AGUA CLARA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [30]:
uio_grouped.shape

(355, 218)

#### We now print each neighborhood along with the top 5 most common venues in it

In [31]:
num_top_venues = 5

for hood in uio_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = uio_grouped[uio_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----10 DE JUNIO----
                  venue  freq
0  Gym / Fitness Center  0.25
1            Restaurant  0.25
2    Athletics & Sports  0.25
3         Shopping Mall  0.25
4      Airport Terminal  0.00


----1RA ZONA AEREA----
                venue  freq
0         Bus Station  0.25
1         Coffee Shop  0.25
2  Seafood Restaurant  0.25
3         Pizza Place  0.25
4    Airport Terminal  0.00


----1RO MAYO MONJAS----
                        venue  freq
0  Construction & Landscaping   0.2
1                        Park   0.2
2               Auto Workshop   0.2
3                   BBQ Joint   0.2
4          Seafood Restaurant   0.2


----2 DE FEBRERO----
                venue  freq
0          Restaurant   0.2
1            Pharmacy   0.2
2    Business Service   0.2
3    Asian Restaurant   0.2
4  Seafood Restaurant   0.2


----23 DE MAYO----
                  venue  freq
0                 Hotel   1.0
1      Airport Terminal   0.0
2             Pet Store   0.0
3          Noodle House   0.0
4  

#### We save that into a *pandas* dataframe

First, we write a function to sort the venues in descending order.

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = uio_grouped['Neighborhood']

for ind in np.arange(uio_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(uio_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10 DE JUNIO,Restaurant,Athletics & Sports,Gym / Fitness Center,Shopping Mall,Zoo,Donut Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
1,1RA ZONA AEREA,Bus Station,Coffee Shop,Pizza Place,Seafood Restaurant,Zoo,Electronics Store,Flea Market,Field,Fast Food Restaurant,Farmers Market
2,1RO MAYO MONJAS,Park,Construction & Landscaping,Auto Workshop,BBQ Joint,Seafood Restaurant,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant
3,2 DE FEBRERO,Business Service,Restaurant,Asian Restaurant,Pharmacy,Seafood Restaurant,Zoo,Electronics Store,Flea Market,Field,Fast Food Restaurant
4,23 DE MAYO,Hotel,Zoo,Donut Shop,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
5,23 JUNIO BARRIO,Bakery,School,Tourist Information Center,Market,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
6,6 DE DICIEMBRE,Seafood Restaurant,Pizza Place,Wings Joint,South American Restaurant,Gym,Peruvian Restaurant,Rental Car Location,Restaurant,Chinese Restaurant,Café
7,AEREONAUTICO,Soccer Field,Farmers Market,Latin American Restaurant,Gym / Fitness Center,Burger Joint,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field
8,AEROPUERTO,Pizza Place,Pharmacy,Airport Terminal,Tourist Information Center,Park,Department Store,Chinese Restaurant,Seafood Restaurant,Bus Station,Bar
9,AGUA CLARA,Fast Food Restaurant,Pizza Place,Seafood Restaurant,BBQ Joint,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Farmers Market


### 3.3 Clustering Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [34]:
# set number of clusters
kclusters = 5

uio_grouped_clustering = uio_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(init="k-means++", n_clusters=kclusters, n_init=25, random_state=0).fit(uio_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 3, 3, 3, 4, 4, 3, 4, 4, 3], dtype=int32)

We now proceed to create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [35]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

uio_merged = neighborhoods

# merge 'uio_merged' with 'neighborhoods' to add latitude/longitude for each neighborhood
uio_merged = uio_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [36]:
uio_merged.head(10)

Unnamed: 0,ID,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1050014,NUEVA VIDA,-0.273928,-78.563715,4.0,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
1,1050026,VENCEREMOS,-0.272159,-78.564992,4.0,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
2,4070005,AVIACION CIVIL,-0.155464,-78.487854,4.0,Restaurant,Pizza Place,Pharmacy,Seafood Restaurant,Sculpture Garden,Chinese Restaurant,Mexican Restaurant,Gym,Pakistani Restaurant,Fried Chicken Joint
3,1050017,S.MARTHA ALT CHIL,-0.275713,-78.561793,4.0,Bus Stop,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
4,2010005,LA RAYA A,-0.255895,-78.543519,4.0,Restaurant,Cafeteria,Pizza Place,Latin American Restaurant,Garden,BBQ Joint,Zoo,Empanada Restaurant,Flea Market,Field
5,4060013,RUPERTO ALARCON,-0.1237,-78.514805,4.0,Electronics Store,Zoo,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
6,1050029,SAN ANTONIO,-0.28827,-78.578933,,,,,,,,,,,
7,3020006,LA LIBERTAD BAJO,-0.224198,-78.524245,,,,,,,,,,,
8,2010015,STA BARBARA ALTA,-0.262981,-78.56095,,,,,,,,,,,
9,2020010,SOLANDA,-0.268909,-78.539185,4.0,Convenience Store,Plaza,Dessert Shop,Big Box Store,Park,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field


There is a certain number of neighborhoods which don't have any venue nearby the established 500 radius. These are probably marginal neighborhoods, located far away from populated areas. We will not be able to include those neighborhoods because they will likely be clustered together by the algorithm and such a cluster (made of neighborhoods without any nearby venues) does not serve the purpose of our analysis. So we will drop the neighborhoods which contain NaN values for venues.

In [37]:
uio_merged.dropna('index', inplace=True)
uio_merged.reset_index(inplace=True)
uio_merged.head(10)

Unnamed: 0,index,ID,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,1050014,NUEVA VIDA,-0.273928,-78.563715,4.0,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
1,1,1050026,VENCEREMOS,-0.272159,-78.564992,4.0,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
2,2,4070005,AVIACION CIVIL,-0.155464,-78.487854,4.0,Restaurant,Pizza Place,Pharmacy,Seafood Restaurant,Sculpture Garden,Chinese Restaurant,Mexican Restaurant,Gym,Pakistani Restaurant,Fried Chicken Joint
3,3,1050017,S.MARTHA ALT CHIL,-0.275713,-78.561793,4.0,Bus Stop,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
4,4,2010005,LA RAYA A,-0.255895,-78.543519,4.0,Restaurant,Cafeteria,Pizza Place,Latin American Restaurant,Garden,BBQ Joint,Zoo,Empanada Restaurant,Flea Market,Field
5,5,4060013,RUPERTO ALARCON,-0.1237,-78.514805,4.0,Electronics Store,Zoo,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
6,9,2020010,SOLANDA,-0.268909,-78.539185,4.0,Convenience Store,Plaza,Dessert Shop,Big Box Store,Park,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field
7,11,1050009,EUGENIO ESPEJO,-0.273179,-78.5582,4.0,Breakfast Spot,Restaurant,BBQ Joint,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant
8,12,2060012,SANTIAGO 1,-0.250375,-78.5394,4.0,Supermarket,Pizza Place,Pharmacy,Shopping Mall,Zoo,Donut Shop,Field,Fast Food Restaurant,Farmers Market,Factory
9,13,1050021,S.ROSA CHIL 3ETP,-0.272262,-78.562653,4.0,Convention Center,Electronics Store,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space


And finally, we convert the data type of the 'Cluster labels' column, which ended up as a float but should in fact be an integer for the map we will create later.

In [38]:
uio_merged['Cluster Labels'] = uio_merged['Cluster Labels'].astype(int)

In [39]:
uio_merged.head(10)

Unnamed: 0,index,ID,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,1050014,NUEVA VIDA,-0.273928,-78.563715,4,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
1,1,1050026,VENCEREMOS,-0.272159,-78.564992,4,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
2,2,4070005,AVIACION CIVIL,-0.155464,-78.487854,4,Restaurant,Pizza Place,Pharmacy,Seafood Restaurant,Sculpture Garden,Chinese Restaurant,Mexican Restaurant,Gym,Pakistani Restaurant,Fried Chicken Joint
3,3,1050017,S.MARTHA ALT CHIL,-0.275713,-78.561793,4,Bus Stop,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
4,4,2010005,LA RAYA A,-0.255895,-78.543519,4,Restaurant,Cafeteria,Pizza Place,Latin American Restaurant,Garden,BBQ Joint,Zoo,Empanada Restaurant,Flea Market,Field
5,5,4060013,RUPERTO ALARCON,-0.1237,-78.514805,4,Electronics Store,Zoo,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
6,9,2020010,SOLANDA,-0.268909,-78.539185,4,Convenience Store,Plaza,Dessert Shop,Big Box Store,Park,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field
7,11,1050009,EUGENIO ESPEJO,-0.273179,-78.5582,4,Breakfast Spot,Restaurant,BBQ Joint,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant
8,12,2060012,SANTIAGO 1,-0.250375,-78.5394,4,Supermarket,Pizza Place,Pharmacy,Shopping Mall,Zoo,Donut Shop,Field,Fast Food Restaurant,Farmers Market,Factory
9,13,1050021,S.ROSA CHIL 3ETP,-0.272262,-78.562653,4,Convention Center,Electronics Store,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space


Finally, let's visualize the resulting clusters

In [41]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(uio_merged['Latitude'], uio_merged['Longitude'], uio_merged['Neighborhood'], uio_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 3.4 Examining Resulting Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we could then assign a name to each cluster.

#### Cluster 1

In [41]:
uio_merged.loc[uio_merged['Cluster Labels'] == 0, uio_merged.columns[[2] + list(range(5, uio_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,STA. BARBARA BAJA,0,Park,Convenience Store,Nightclub,Plaza,Zoo,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
27,MIRAFLORES ALTO,0,Park,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
109,SAN SALVADOR,0,Scenic Lookout,Park,Zoo,Donut Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
158,TOCTIUCO,0,Business Service,Park,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
199,ARGELIA INTERMEDIA,0,Burger Joint,Park,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
236,PAVON GRIJALVA,0,Park,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
276,CONSEJO PROVINCIAL,0,Park,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
298,CAMPO ALEGRE,0,Arts & Entertainment,Park,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
303,LAS BROMELIAS,0,Arts & Entertainment,Park,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
325,CAMPO ALEGRE,0,Arts & Entertainment,Park,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market


This cluster's most common venues seem to be mainly parks, zoos and fields, so it would not be a good match for Starbucks.

#### Cluster 2

In [42]:
uio_merged.loc[uio_merged['Cluster Labels'] == 1, uio_merged.columns[[2] + list(range(5, uio_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,CAUSAYLLACTA,1,Bus Station,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
43,EL ROCIO,1,Bus Station,Fast Food Restaurant,Farmers Market,Soccer Stadium,Grocery Store,Zoo,Food & Drink Shop,Flower Shop,Flea Market,Field
68,LA TOLA,1,Brewery,Park,Bus Station,Hostel,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
71,AREA DE PROTECCION,1,Bus Station,Art Museum,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
106,ALVARO PEREZ INDEPENDIENTE,1,Convenience Store,Bus Station,Department Store,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
133,LULUNCOTO,1,Bus Station,Breakfast Spot,Art Museum,Zoo,Entertainment Service,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant
157,HUAYRALLACTA,1,Bus Station,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
165,EL COMERCIO,1,Bus Station,Pharmacy,Auto Garage,Pet Store,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field,Fast Food Restaurant
166,LOS LIBERTADORES,1,Martial Arts Dojo,Bus Station,Furniture / Home Store,Bar,Zoo,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant
211,LA LOMA,1,Bus Station,Latin American Restaurant,Park,Wine Bar,Bar,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field


Neighborhoods in cluster 2 seem to be surrounded by bus stations, mostly, as well as miscellaneous categories not related with the food and beverages industry. Again, not the best match for our Starbucks shop.

#### Cluster 3

In [43]:
uio_merged.loc[uio_merged['Cluster Labels'] == 2, uio_merged.columns[[2] + list(range(5, uio_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,COLINAS DEL SUR,2,Gym,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space,Event Service
44,LA ESTANCIA,2,Gym,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space,Event Service
143,YAGUACHI,2,Gym,Fried Chicken Joint,Food & Drink Shop,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
177,LA LIBERTAD,2,Gym,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space,Event Service
314,SANTA LUCICIA 2,2,Gym,Gift Shop,Food & Drink Shop,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
320,S.FRANC HUARCAY,2,Gym,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space,Event Service
359,BUENOS AIRES,2,Gym,Bus Line,Zoo,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space


Neighborhoods in cluster 3 seem to be surrounded by gyms, mostly, as well as zoos flea markets and flower shops. Again, not the best match for our Starbucks shop.

#### Cluster 4

In [45]:
uio_merged.loc[uio_merged['Cluster Labels'] == 3, uio_merged.columns[[2] + list(range(5, uio_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,PRESIDENCIA REPUBLICA,3,Seafood Restaurant,Fast Food Restaurant,Pizza Place,Soccer Field,BBQ Joint,Donut Shop,Flea Market,Field,Farmers Market,Factory
35,MADRIGAL,3,Restaurant,Construction & Landscaping,Seafood Restaurant,Bed & Breakfast,Zoo,Electronics Store,Flea Market,Field,Fast Food Restaurant,Farmers Market
36,S.PEDRO MONJAS,3,Fast Food Restaurant,Restaurant,Clothing Store,Seafood Restaurant,Zoo,Donut Shop,Flea Market,Field,Farmers Market,Factory
41,MIRAFLORES BAJO,3,Burger Joint,Fast Food Restaurant,Snack Place,Seafood Restaurant,Electronics Store,Flower Shop,Flea Market,Field,Farmers Market,Factory
50,LOS ARRAYANES,3,Pizza Place,Burger Joint,Soccer Field,Seafood Restaurant,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
61,VERTIENTES SUR,3,Seafood Restaurant,BBQ Joint,Zoo,Electronics Store,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
73,PABLO ART SUAREZ,3,Construction & Landscaping,Health & Beauty Service,Seafood Restaurant,Grocery Store,Zoo,Donut Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market
80,CHIMBACALLE,3,Print Shop,Diner,Chinese Restaurant,Seafood Restaurant,Zoo,Electronics Store,Flea Market,Field,Fast Food Restaurant,Farmers Market
83,MONGE DONOSO,3,Bus Station,Motel,Clothing Store,Seafood Restaurant,Zoo,Electronics Store,Flea Market,Field,Fast Food Restaurant,Farmers Market
98,LA VICTORIA,3,Seafood Restaurant,Fried Chicken Joint,Hotel,Farmers Market,History Museum,Steakhouse,Science Museum,Donut Shop,Field,Fast Food Restaurant


This is the first cluster in which its neighborhoods seem to have a high variety of restaurants, which are related to a coffee shop. This cluster is a good candidate for Starbucks to consider when opening their first coffee shop in Quito.

#### Cluster 5

In [46]:
uio_merged.loc[uio_merged['Cluster Labels'] == 4, uio_merged.columns[[2] + list(range(5, uio_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,NUEVA VIDA,4,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
1,VENCEREMOS,4,Convention Center,Bus Stop,Zoo,Electronics Store,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
2,AVIACION CIVIL,4,Restaurant,Pizza Place,Pharmacy,Seafood Restaurant,Sculpture Garden,Chinese Restaurant,Mexican Restaurant,Gym,Pakistani Restaurant,Fried Chicken Joint
3,S.MARTHA ALT CHIL,4,Bus Stop,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory
4,LA RAYA A,4,Restaurant,Cafeteria,Pizza Place,Latin American Restaurant,Garden,BBQ Joint,Zoo,Empanada Restaurant,Flea Market,Field
5,RUPERTO ALARCON,4,Electronics Store,Zoo,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space
6,SOLANDA,4,Convenience Store,Plaza,Dessert Shop,Big Box Store,Park,Zoo,Empanada Restaurant,Flower Shop,Flea Market,Field
7,EUGENIO ESPEJO,4,Breakfast Spot,Restaurant,BBQ Joint,Zoo,Empanada Restaurant,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant
8,SANTIAGO 1,4,Supermarket,Pizza Place,Pharmacy,Shopping Mall,Zoo,Donut Shop,Field,Fast Food Restaurant,Farmers Market,Factory
9,S.ROSA CHIL 3ETP,4,Convention Center,Electronics Store,Food,Flower Shop,Flea Market,Field,Fast Food Restaurant,Farmers Market,Factory,Event Space


Cluster 5 is the largest and least specific cluster we obtained. Its structure is hard to interpret, given that its most common venues don't seem to be related. It may or may not be a good fit for Starbucks ideal neighborhood. However, on closer inspection, its neighborhoods seem to be all over the place, including areas far away from the city center and closer to the mountains and the city limits. Sadly, this geographical dispersion disqualifies cluster 5. Further analysis of this cluster should be conducted in order to better understand its underlying structure. 

### This concludes the data analysis we will be performing. Please refer to the attached report for an analysis and discussion of this results, as well as the conclusion and choice of the best cluster of neighborhoods for Starbucks to open their first coffee shop in Quito, Ecuador.