# IBM Data Science Professional Certificate Capstone Project

This notebook will be mainly used for capstone projects. This project is the final requirement to get the <b><i>IBM Data Science Professional Certificate<I></b>. 

This professional certificate was created for Coursera in colaboration with IBM and include a serie of eight courses plus a capstone project.

# Analysis of the Location of a Coffee Shop Branch

## Problem
<b>Café Tango</b> is a popular coffee shop located in the neighborhood of Palermo, Buenos Aires, Argentina. Due to the extraordinary results gotten during the last year, its owners decided to open a new branch in other zone of the city but they don't have a clear idea of what places is the best option.


## Objectives

The objetive of this project is to determine what neighborhood of Buenos Aires is the best option to locate the branch.

To get this we established the following sub-objetives:

1. Determine what group neighborhoods is more similar to Palermo
2. Determine what neighborhood of this similar group is the best option to install the new branch.

## Methodology 

The process of determine what is the best place to locate the new branch consisted of two parts. The first part of the project consisted on determine what neighborhoods are the most similart to the neighborhood where coffe shop is located (Palermo). In order to do this, clustering machine learning technique was used to group the neighborhood based on the silimarity of their most popular venues. The clustering method to get this was K-Means method.

The second part of the project consisted on determine what of those similar neighborhoods represent the best opportunity to a new branch. The <b>customer satisfaction level</b> of the local cafeteries was used to get this answer. The customer level of satisfaction indicate the relationship between what clients hope to receive and what they really receive. A low customer satisfaction impplies the customers are not getting what they hope to get and it represents an opportunity to Cafe Tango.<b>The specific indicator used to estimate the customer satisfaction it's the average rating of the most popular coffe venues of the neighborhood</b>. The neighborhood with the low average rating will be considerated the besto option to located the new branch.

## Data requirements and colletion

In order to get this objetive the data requiered is:

* Buenos Aires' neighborhoods and its location coordinates
* The most comun venues for each neighborhood
* The rating of the most popular coffe shop of the neighborhood

The list of Buenos Aires' neighborhood is gonna be taken from Wikipedia through a scrapping proccess of following <a href='https://es.wikipedia.org/wiki/Anexo:Barrios_de_la_ciudad_de_Buenos_Aires'>page</a>. To get the geolocation of each neighborhood, the <b>Geopy</b> will be used.

To get the information about the venues, its categories and its rating the Foursquare API is gonna be used.

#### Import the necesaries libreries

In [3]:
import requests

import numpy as np
import pandas as pd
from pandas.io.json import json_normalize


import random

from bs4 import BeautifulSoup
from IPython.display import Image
from IPython.display import HTML

from geopy.geocoders import Nominatim 


!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                       

### Get the Buenos Aires' neighborhoods and their location

#### Getting the list of neighborhoods

In [5]:
# This pages contain information abour Buenos Aires neighborhood

url='https://es.wikipedia.org/wiki/Anexo:Barrios_de_la_ciudad_de_Buenos_Aires'

Wiki_tables=pd.read_html(url)  

Wiki_table1=Wiki_tables[0]

Neighborhood= Wiki_table1['Nombre del barrio'].tolist()

Neighborhood[:5]

['Agronomía', 'Almagro', 'Balvanera', 'Barracas', 'Belgrano']

#### Getting the coordinates for each neighborhood

In [6]:
Neigh_latitude=[]
Neigh_longitude=[]

for neigh in Neighborhood:
    if neigh=='Villa Gral. Mitre':
        print('Villa General Mitre')
        address= 'Villa General Mitre, Buenos Aires, Argentina'.format(neigh)
        geolocator=Nominatim(user_agent='Tango_cafe')
        location=geolocator.geocode(address)
        Neigh_latitude.append(location.latitude)
        Neigh_longitude.append(location.longitude) 
    
    else:
        print(neigh)
        address= '{}, Buenos Aires, Argentina'.format(neigh)
        geolocator=Nominatim(user_agent='Tango_cafe')
        location=geolocator.geocode(address)
        Neigh_latitude.append(location.latitude)
        Neigh_longitude.append(location.longitude)
print('The latitudes are: ',Neigh_latitude,'\n The longitudes are: ', Neigh_longitude)

Agronomía
Almagro
Balvanera
Barracas
Belgrano
Boedo
Caballito
Chacarita
Coghlan
Colegiales
Constitución
Flores
Floresta
La Boca
La Paternal
Liniers
Mataderos
Montserrat
Monte Castro
Nueva Pompeya
Núñez
Palermo
Parque Avellaneda
Parque Chacabuco
Parque Chas
Parque Patricios
Puerto Madero
Recoleta
Retiro
Saavedra
San Cristóbal
San Nicolás
San Telmo
Vélez Sarsfield
Versalles
Villa Crespo
Villa del Parque
Villa Devoto
Villa General Mitre
Villa Lugano
Villa Luro
Villa Ortúzar
Villa Pueyrredón
Villa Real
Villa Riachuelo
Villa Santa Rita
Villa Soldati
Villa Urquiza
The latitudes are:  [-34.5915159, -34.6099883, -34.6092155, -34.6452854, -34.5613076, -34.6302518, -34.6200773, -34.5880107, -34.5599096, -34.5745154, -34.624246, -34.6282028, -34.6281055, -34.6335103, -34.5977397, -34.642403, -34.6578631, -34.6115595, -34.6188385, -34.6526296, -34.5453484, -34.5803362, -34.6494803, -34.638368, -34.5855111, -34.6374817, -34.6103764, -34.587358, -34.5916426, -34.5525291, -34.6240603, -33.3277089, -3

#### Join all the information into a dataframe

In [7]:
df_neighborhoods={'Neighborhood':Neighborhood, 'Latitude':Neigh_latitude, 'Longitude':Neigh_longitude}
df_neighborhoods=pd.DataFrame(df_neighborhoods)
df_neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agronomía,-34.591516,-58.485385
1,Almagro,-34.609988,-58.422233
2,Balvanera,-34.609215,-58.40314
3,Barracas,-34.645285,-58.387562
4,Belgrano,-34.561308,-58.456545


In [8]:
df_neighborhoods.shape

(48, 3)

#### Get Buenos Aires' location

In [9]:
address='Ciudad Autonoma de Buenos Aires, Argentina'
geolocator=Nominatim(user_agent='Tango_cafe')
bs_as_location=geolocator.geocode(address)
bs_as_lat= bs_as_location.latitude
bs_as_long= bs_as_location.longitude
print('The latitude and longitude of Buenos Aires are: ', bs_as_lat, bs_as_long)

The latitude and longitude of Buenos Aires are:  -34.6075682 -58.4370894


#### Visualize the city and their neighborhoods

In [10]:
bsas_map= folium.Map(location=[bs_as_lat, bs_as_long], zoom_start=12)


for neigh, lat, long in zip(df_neighborhoods['Neighborhood'],df_neighborhoods['Latitude'],df_neighborhoods['Longitude']):
    
    folium.features.CircleMarker( 
        [lat,long],
        radius=5, 
        color='Orange', 
        fill=True, 
        fill_color='Orange', 
        fill_opacity='0.6', 
        popup=neigh).add_to(bsas_map)
    
bsas_map

### Getting the most common venues for each neighborhood

#### Defining foursquare credentials

In [11]:
CLIENT_ID='CBLSKN0KJFETYLEQ4AR2MKESM0VRMT5WRO42MHVSIXJC3QAM'
CLIENT_SECRET='GZ3XEFTRE4PP3U2VU5O3VGZG24SK0IGBHEBJATES3JMAWDDR'
VERSION='20180605'

#### Defining the function to get the venues


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000, LIMIT=50):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Get the the venues

In [13]:
df_venues= getNearbyVenues(df_neighborhoods['Neighborhood'], df_neighborhoods['Latitude'], df_neighborhoods['Longitude'])

Agronomía
Almagro
Balvanera
Barracas
Belgrano
Boedo
Caballito
Chacarita
Coghlan
Colegiales
Constitución
Flores
Floresta
La Boca
La Paternal
Liniers
Mataderos
Montserrat
Monte Castro
Nueva Pompeya
Núñez
Palermo
Parque Avellaneda
Parque Chacabuco
Parque Chas
Parque Patricios
Puerto Madero
Recoleta
Retiro
Saavedra
San Cristóbal
San Nicolás
San Telmo
Vélez Sarsfield
Versalles
Villa Crespo
Villa del Parque
Villa Devoto
Villa Gral. Mitre
Villa Lugano
Villa Luro
Villa Ortúzar
Villa Pueyrredón
Villa Real
Villa Riachuelo
Villa Santa Rita
Villa Soldati
Villa Urquiza


In [14]:
df_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agronomía,-34.591516,-58.485385,Feria del Productor al Consumidor,-34.593981,-58.483098,Farmers Market
1,Agronomía,-34.591516,-58.485385,Social Parrilla,-34.588955,-58.484677,BBQ Joint
2,Agronomía,-34.591516,-58.485385,Dorian Café & Bar,-34.587906,-58.493465,Café
3,Agronomía,-34.591516,-58.485385,Club Comunicaciones,-34.596538,-58.490417,Sports Club
4,Agronomía,-34.591516,-58.485385,Vivero Agronomía,-34.5917,-58.488838,Garden Center


In [15]:
df_venues.shape

(1841, 7)

### Clustering the neighborhoods

In [16]:
from sklearn.cluster import KMeans

#### Getting the proportion of each type of venue per neighborhood

In [17]:
OneHotVenues=pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")
OneHotVenues[['Neighborhood']]=df_venues[['Neighborhood']]
neworder=[OneHotVenues.columns[-1]] + list(OneHotVenues.columns[:-1])
OneHotVenues= OneHotVenues[neworder]
OneHotVenues= OneHotVenues.groupby('Neighborhood').mean().reset_index()
OneHotVenues.head(2)

Unnamed: 0,Neighborhood,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Agronomía,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Almagro,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Control point
there must be 48 neighborhood

In [18]:
OneHotVenues.shape

(48, 228)

#### Clustering the Neighborhoods by the type of venues

we are going to cluter the neighborhoods into 6 different groups

In [26]:
ncluster=6

# get the pure data
OneHotData=OneHotVenues.drop('Neighborhood', 1)

# create and train the model
kcluster= KMeans(n_clusters=ncluster, random_state=4).fit(OneHotData)

# Get the labels
kcluster.labels_


array([0, 2, 0, 2, 0, 2, 0, 0, 2, 2, 2, 0, 2, 2, 0, 0, 0, 0, 1, 4, 0, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 5, 0,
       3, 0, 0, 2], dtype=int32)

In [27]:
df_neigh_grouped=df_neighborhoods.copy()
df_neigh_grouped.insert(3, 'Cluster', kcluster.labels_)
df_neigh_grouped

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster
0,Agronomía,-34.591516,-58.485385,0
1,Almagro,-34.609988,-58.422233,2
2,Balvanera,-34.609215,-58.40314,0
3,Barracas,-34.645285,-58.387562,2
4,Belgrano,-34.561308,-58.456545,0
5,Boedo,-34.630252,-58.41879,2
6,Caballito,-34.620077,-58.442489,0
7,Chacarita,-34.588011,-58.454156,0
8,Coghlan,-34.55991,-58.474714,2
9,Colegiales,-34.574515,-58.452282,2


#### Visualize the clusters

In [28]:
grouped_neigh_map= folium.Map(location=[bs_as_lat, bs_as_long], zoom_start=12)

colors=['blue', 'green', 'red','orange', 'black', 'yellow']

for neigh, lat, long, clu in zip(df_neigh_grouped['Neighborhood'],df_neigh_grouped['Latitude'],df_neigh_grouped['Longitude'], df_neigh_grouped['Cluster']):
    color=colors[clu]
    folium.features.CircleMarker( 
        [lat,long],
        radius=5, 
        color=color, 
        fill=True, 
        fill_color=color, 
        fill_opacity='0.6', 
        popup=neigh).add_to(grouped_neigh_map)

grouped_neigh_map

### Finding the best neighborhood

We have cluster the neighborhoods and now we know what neighborhoods are similar to <b>Palermo</b>(the neighborhood where Tango Cafe is located) in type of venues. The next step is to determine wich of those neighborhood is the best option to locate the branch office.


#### Getting the neighborhoods similar to Boedo

In [58]:
df_neigh_grouped[df_neigh_grouped['Neighborhood']=='Palermo'] 

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster
21,Palermo,-34.580336,-58.424524,1


In [59]:
df_neigh_options= df_neigh_grouped[df_neigh_grouped['Cluster']==1]
df_neigh_options.reset_index(inplace=True, drop=True)
df_neigh_options

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster
0,Monte Castro,-34.618839,-58.505947,1
1,Palermo,-34.580336,-58.424524,1
2,Puerto Madero,-34.610376,-58.362207,1
3,Recoleta,-34.587358,-58.39157,1
4,Retiro,-34.591643,-58.373307,1
5,San Nicolás,-33.327709,-60.216591,1
6,San Telmo,-34.621401,-58.37375,1


#### Discarting some neighborhood
The following neighborhood are gonna be deleted from the dataframe <br>
<b>Palermo</b>: We don´t need the data from Palermo because is the neighborhood where is placed the actual venue. <br>
<b>Puerto Madero</b>: This is a financial zone with very expensive locations

In [75]:
df_neigh_options= df_neigh_options[df_neigh_options['Neighborhood']!= 'Palermo']
df_neigh_options= df_neigh_options[df_neigh_options['Neighborhood']!= 'Puerto Madero']
df_neigh_options.reset_index(inplace=True, drop=True)
df_neigh_options

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster
0,Monte Castro,-34.618839,-58.505947,1
1,Recoleta,-34.587358,-58.39157,1
2,Retiro,-34.591643,-58.373307,1
3,San Nicolás,-33.327709,-60.216591,1
4,San Telmo,-34.621401,-58.37375,1


#### Getting the 10 most popular venues that sells coffe in each neighborhood

In [60]:
# define the function 

def popularCoffee(names, latitudes, longitudes, radius=1000, LIMIT=10):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL to get the 10 most popular coffee venues
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&sortByPopularity={}&query={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            '1', 
            'coffee')
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['id'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'id', 
                  'Venue Category']
    
    return(nearby_venues)

In [76]:
# Use the funtion to get the venues

df_coffes= popularCoffee(df_neigh_options['Neighborhood'],df_neigh_options['Latitude'], df_neigh_options['Longitude'])

Monte Castro
Recoleta
Retiro
San Nicolás
San Telmo


In [77]:
df_coffes.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,id,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Monte Castro,4,4,4,4,4
Recoleta,10,10,10,10,10
Retiro,10,10,10,10,10
San Nicolás,4,4,4,4,4
San Telmo,10,10,10,10,10


In [78]:
df_coffes

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,id,Venue Category
0,Monte Castro,-34.618839,-58.505947,The Coffee Store,577ff62f498e32d9355aa4ed,Coffee Shop
1,Monte Castro,-34.618839,-58.505947,Madrilia (Resto-Cafe),4c1913c0d4d9c928ce69f029,Café
2,Monte Castro,-34.618839,-58.505947,Café Martínez,5377669e498eda27e038b3d8,Café
3,Monte Castro,-34.618839,-58.505947,Pin Pun - Monte Castro,4c16bf4196040f47375a73a5,Pizza Place
4,Recoleta,-34.587358,-58.39157,Starbucks,4d6d1f5a4b86a0905bea5b0a,Coffee Shop
5,Recoleta,-34.587358,-58.39157,Bogotá Café,57150a75498e637e0fa84834,Café
6,Recoleta,-34.587358,-58.39157,La Biela,4b058715f964a520df7e22e3,Café
7,Recoleta,-34.587358,-58.39157,Las Esclavas,5006e72be4b086c13ec30c8e,Bakery
8,Recoleta,-34.587358,-58.39157,Gabu Deli & Coffee,55ca313f498e5a1421864c87,Café
9,Recoleta,-34.587358,-58.39157,Starbucks,4ca3152ad7c33704f3829d62,Coffee Shop


#### Get the rating for each venue

In [79]:
#we define new credentials because we have pass the limit for premiun calls

CLIENT_ID2= '1QKKZNIRBFSBW2Y555LNC0KHT5YV35HC3P5BDMOML0LYKPHY'
CLIENT_SECRET2='BCBAEFHRFC4CS1MZJO5ENGMDBS5CC0FSTOAMGY2DHIB40OMN'

In [80]:
rating_list=[]

for ID, v in zip(df_coffes['id'],df_coffes['Venue']):
    
    print(v)
    
    # fix the URL
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            ID,
            CLIENT_ID2, 
            CLIENT_SECRET2, 
            VERSION, 
            )
    # make the GET request and take the rating
    response= requests.get(url).json()["response"]
    
    try:
        rating=response['venue']['rating']
    except KeyError:
        rating='No disponible'
        print('.       no tiene rating disponible')
   
    #add the rating to the list
    rating_list.extend([rating])

The Coffee Store
Madrilia (Resto-Cafe)
Café Martínez
Pin Pun - Monte Castro
Starbucks
Bogotá Café
La Biela
Las Esclavas
Gabu Deli & Coffee
Starbucks
La Panera Rosa
Le Blé
Croque Madame
Pani
Florida Garden
Starbucks
Dos Escudos
NEGRO. Cueva de Café
Tostado Café Club
Starbucks
Dandy Bar & Grill
Starbucks
NEGRO. Cueva de Café
Havanna
Havanna
Jazz Bar
.       no tiene rating disponible
Esquina Mitre
.       no tiene rating disponible
Nicanor
Bar El Federal
Starbucks
Café Martinez
Café Rivas
La Poesía
Coffee Town
Havanna
La Esquinita
Starbucks
Starbucks


In [86]:
df_rating=df_coffes.copy()
df_rating.insert(6, 'Rating', rating_list)
df_rating.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,id,Venue Category,Rating
0,Monte Castro,-34.618839,-58.505947,The Coffee Store,577ff62f498e32d9355aa4ed,Coffee Shop,6.3
1,Monte Castro,-34.618839,-58.505947,Madrilia (Resto-Cafe),4c1913c0d4d9c928ce69f029,Café,6.0
2,Monte Castro,-34.618839,-58.505947,Café Martínez,5377669e498eda27e038b3d8,Café,7.0
3,Monte Castro,-34.618839,-58.505947,Pin Pun - Monte Castro,4c16bf4196040f47375a73a5,Pizza Place,7.2
4,Recoleta,-34.587358,-58.39157,Starbucks,4d6d1f5a4b86a0905bea5b0a,Coffee Shop,8.0


In [87]:
df_rating.shape

(38, 7)

In [88]:
# Drop the venues without rating available

df_rating= df_rating[df_rating['Rating']!='No disponible']
df_rating.reset_index(drop=True, inplace=True)
df_rating.shape

(36, 7)

#### Get the the neighborhood with the worst rating

In [93]:
# converting the rating column into a float
df_rating=df_rating.astype({'Rating': float})
df_rating.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,id,Venue Category,Rating
0,Monte Castro,-34.618839,-58.505947,The Coffee Store,577ff62f498e32d9355aa4ed,Coffee Shop,6.3
1,Monte Castro,-34.618839,-58.505947,Madrilia (Resto-Cafe),4c1913c0d4d9c928ce69f029,Café,6.0
2,Monte Castro,-34.618839,-58.505947,Café Martínez,5377669e498eda27e038b3d8,Café,7.0
3,Monte Castro,-34.618839,-58.505947,Pin Pun - Monte Castro,4c16bf4196040f47375a73a5,Pizza Place,7.2
4,Recoleta,-34.587358,-58.39157,Starbucks,4d6d1f5a4b86a0905bea5b0a,Coffee Shop,8.0


In [97]:
# calculate the average rating
df_raking=df_rating[['Neighborhood', 'Rating']]
df_raking=df_raking.groupby('Neighborhood').mean()
df_raking.sort_values(by='Rating', ascending=True)

Unnamed: 0_level_0,Rating
Neighborhood,Unnamed: 1_level_1
San Nicolás,6.35
Monte Castro,6.625
Recoleta,7.37
San Telmo,7.68
Retiro,7.87


The neighborhood with the worst average rating is <b> San Nicolas</b>.

### Result

Taking into account the similarity of the Neighborhoods and the average rating for coffee shop in them, the best neighborhood to located the branch is <b>San Nicolas</b>, with a average rating of 6.3 for its coffe shops. 

In general the performances of the neighborhoods coffee shops are not good. No neighborhood exceeds the 8 point in a scale of 10. 

### Disussion

An important fact is that the price of the locations in the neighborhoos wasn't taken into account. This factor may be included in posterior analysis

An interested data to dive in a posterior project is the fact that most of the neighborhood in the cluster 1 are geograpichal neighbors and they are located near the sea. Thise may indicate that the geography have some influence over the type of venues, a things that not happen on with the others clusters.





