<h1>Capstone Project - The Battle of the Neighborhoods (Week 2)
<h2>Applied Data Science Capstone by IBM/Coursera</h2>

<h2>Table of Contents</h2>
<ol>
    <li>Introduction: Business Problem</li>
    <li>Data</li>
    <li>Methodology</li>
    <li>Analysis</li>
    <li>Results & Discussion</li>
    <li>Conclusion</li>
</ol>

<h2 name="Introduction">1. Introduction: Business Problem</a></h2>

In this project, We will try to find what characterizes each neighborhood of the borough Sant Martí, in Barcelona, Spain.

Specifically, this report will be targeted to people who want to know which are the 10th most common places of each neighborhood in this borough, clustered by similar neighborhoods.

We will use my data science powers to find and express it clearly to the stakeholders.

<h2 name="Data">2.- Data</h2>

Based on definition of our problem, factors that will influence the conclusion are:

<ul>
    <li>the division of the neighborhoods of the borough Sant Martí</li>
    <li>the quantity of each venue category for each neighborhood</li>
</ul>

To collect the division of the neighborhoods of the borough Sant Martí, I will use its own page of Wikipedia: https://en.wikipedia.org/wiki/Districts_of_Barcelona

The quantity of each venue category for each neighborhood, will be generated algorithmically using the **Foursquare API**

<h3>Import libraries</h3>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
#!pip install BeautifulSoup4
from bs4 import BeautifulSoup

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install folium==0.5.0
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<h3>Create the DataFrame</h3>

In [2]:
column_names = ["Borough", "Neighborhood"] 
pd.set_option('max_colwidth', 300)

DataFrame = pd.DataFrame(columns=column_names)
DataFrame

Unnamed: 0,Borough,Neighborhood


<h3>Obtain and Clean the Data and Fill the DataFrame</h3>

With this steep, neighborhoods will be collected, but they still need to be divided.

In [3]:
#Obtain
url = requests.get("https://en.wikipedia.org/wiki/Districts_of_Barcelona").text
soup = BeautifulSoup(url, "html.parser")
tabla = soup.find_all("table")[3].text.split("\n")

#Clean
i = 0
for celda in tabla:
    if(celda == ""):
        tabla.pop(i)
    i = i+1
tabla = tabla[90:]

#Fill Borough
DataFrame.loc[0, "Borough"] = tabla[2] + ", Barcelona"

#Fill Neighborhood
DataFrame.loc[0, "Neighborhood"] = tabla[6]

DataFrame

Unnamed: 0,Borough,Neighborhood
0,"Sant Martí, Barcelona","El Besòs i el Maresme, el Clot, El Camp de l'Arpa del Clot, Diagonal Mar i el Front Marítim del Poblenou, el Parc i la Llacuna del Poblenou, Poblenou, Provençals del Poblenou, Sant Martí de Provençals, La Verneda i la Pau, la Vila Olímpica del Poblenou"


<h3>Divide the Neighborhoods</h3>

Now, they are divided to be able to get each latitude and longitude.

In [4]:
# define the dataframe columns
column_names = ["Borough", "Neighborhood"] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

d = 0
i = 0
for bor in DataFrame["Borough"]:
    for data in DataFrame.loc[i, "Neighborhood"].replace(" i ", ", ").split(", "):
        neighborhoods.loc[d, "Borough"] = DataFrame.loc[i, "Borough"]
        neighborhoods.loc[d, "Neighborhood"] = data
        d = d+1
    i = i+1
    
DataFrame = neighborhoods
DataFrame

Unnamed: 0,Borough,Neighborhood
0,"Sant Martí, Barcelona",El Besòs
1,"Sant Martí, Barcelona",el Maresme
2,"Sant Martí, Barcelona",el Clot
3,"Sant Martí, Barcelona",El Camp de l'Arpa del Clot
4,"Sant Martí, Barcelona",Diagonal Mar
5,"Sant Martí, Barcelona",el Front Marítim del Poblenou
6,"Sant Martí, Barcelona",el Parc
7,"Sant Martí, Barcelona",la Llacuna del Poblenou
8,"Sant Martí, Barcelona",Poblenou
9,"Sant Martí, Barcelona",Provençals del Poblenou


<h3>Obtain and Fill Latitude and Longitude of each Neighborhood</h3>

In [5]:
for b in range(len(DataFrame)):
    address = DataFrame.loc[b, "Neighborhood"] + ", " + DataFrame.loc[b, "Borough"]

    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    DataFrame.loc[b, "Latitude"] = location.latitude
    DataFrame.loc[b, "Longitude"] = location.longitude
        
DataFrame

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,"Sant Martí, Barcelona",El Besòs,41.420369,2.209713
1,"Sant Martí, Barcelona",el Maresme,41.410525,2.215634
2,"Sant Martí, Barcelona",el Clot,41.409953,2.190823
3,"Sant Martí, Barcelona",El Camp de l'Arpa del Clot,41.410754,2.182816
4,"Sant Martí, Barcelona",Diagonal Mar,41.405228,2.213352
5,"Sant Martí, Barcelona",el Front Marítim del Poblenou,41.403775,2.213803
6,"Sant Martí, Barcelona",el Parc,41.400733,2.191342
7,"Sant Martí, Barcelona",la Llacuna del Poblenou,41.399359,2.197699
8,"Sant Martí, Barcelona",Poblenou,41.400527,2.201729
9,"Sant Martí, Barcelona",Provençals del Poblenou,41.411948,2.204125


<h3>Create the Map</h3>

On the map, we can see the locations of the center of the neighborhoods.

In [6]:
address = "Sant Martí, Barcelona"

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_barcelona = location.latitude
longitude_barcelona = location.longitude

map_barcelona = folium.Map(location=[latitude_barcelona, longitude_barcelona], zoom_start=14)

# add markers to map
for lat, lng, borough, neighborhood in zip(DataFrame["Latitude"], DataFrame["Longitude"], DataFrame["Borough"], DataFrame["Neighborhood"]):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="blue",
        fill=True,
        fill_color="#3186cc",
        fill_opacity=0.7,
        parse_html=False).add_to(map_barcelona)  
    
map_barcelona

<h3>Define Foursquare Credentials, Version and URL to use Foursquare API</h3>

In [7]:
CLIENT_ID = "EXDV4CTZB120Z5Z4GOCB2R5O21BK2OD2GYQOWL3AGMFWBSJP" # your Foursquare ID
CLIENT_SECRET = "5RUIUSJRRPEVFHLBHVBWHZLHPWZT5TDE3MMDOU0XE5OTMG0X" # your Foursquare Secret
ACCESS_TOKEN = "BOHQHYBUQ1GWFMYPCZQLWOLFRXTVAN4QAI1XYFN125Y0UL5D"
VERSION = "20180604"

latitude = latitude_barcelona
longitude = longitude_barcelona
radius = 500
limit = 100

url = "https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}".format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, radius, limit)

<h3>Send a request through the Foursquare API</h3>

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '609be5a44332e5492dee6298'},
 'notifications': [{'type': 'notificationTray', 'item': {'unreadCount': 0}}],
 'response': {'venues': [{'id': '4dcda7d51f6eb1227066f65c',
    'name': 'Can Oliva',
    'location': {'address': 'Avenida Diagonal, 109',
     'crossStreet': 'Bac de Roda',
     'lat': 41.40688819284195,
     'lng': 2.203306870880938,
     'labeledLatLngs': [{'label': 'display',
       'lat': 41.40688819284195,
       'lng': 2.203306870880938}],
     'distance': 31,
     'postalCode': '08005',
     'cc': 'ES',
     'city': 'Barcelona',
     'state': 'Cataluña',
     'country': 'España',
     'formattedAddress': ['Avenida Diagonal, 109 (Bac de Roda)',
      '08005 Barcelona Cataluña']},
    'categories': [{'id': '4bf58dd8d48988d1c0941735',
      'name': 'Mediterranean Restaurant',
      'pluralName': 'Mediterranean Restaurants',
      'shortName': 'Mediterranean',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/mediterranean

<h3>Take the useful information of the venues</h3>

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)
DataFrame_venues = getNearbyVenues(names=DataFrame['Neighborhood'],
                                   latitudes=DataFrame['Latitude'],
                                   longitudes=DataFrame['Longitude']
                                  )
DataFrame_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,El Besòs,41.420369,2.209713,Bar Lafuente,Bar
1,El Besòs,41.420369,2.209713,La Pili,Tapas Restaurant
2,El Besòs,41.420369,2.209713,SIFON SIFON,Tapas Restaurant
3,El Besòs,41.420369,2.209713,Plaça de La Palmera,Plaza
4,El Besòs,41.420369,2.209713,La Cantonada de Prim,Tapas Restaurant


<h2 name="Methodology">3. Methodology</h2>

In this project we will limit our analysis to the 500 metres around each neighborhood center.

In first step (**Data collection**), we have collected the required data: the borough, the neighborhoods, their center locations and their venues, according to Wikipedia and Foursquare locations.

In second step in our **Analysis**, will be calculation and exploration of "venues categories density" for each neighborhood.

In third and final step, we will cluster the neighborhoods based on their similarity and analyze each one.

<h2 name="Analysis">4. Analysis</h2>

Let's perform some basic exploratory data analysis and derive some additional info from our raw data.
First let's count the **number of venues in every neighborhood:**

In [10]:
DataFrame_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Diagonal Mar,31,31,31,31
El Besòs,25,25,25,25
El Camp de l'Arpa del Clot,83,83,83,83
La Verneda,7,7,7,7
Poblenou,100,100,100,100
Provençals del Poblenou,26,26,26,26
Sant Martí de Provençals,26,26,26,26
el Clot,82,82,82,82
el Front Marítim del Poblenou,32,32,32,32
el Maresme,81,81,81,81


<h3>Unify the data in base on their venue category to be able to analyze it.</h3>

In [11]:
# one hot encoding
DataFrame_data = pd.get_dummies(DataFrame_venues[['Venue Category']], prefix="", prefix_sep="")

# move neighborhood column to the first column
DataFrame_data.insert(0, "Neighborhood", DataFrame_venues["Neighborhood"])

DataFrame_data.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Casino,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Empanada Restaurant,Escape Room,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Football Stadium,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Gaming Cafe,Garden Center,Gastropub,Gluten-free Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health Food Store,Historic Site,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Liquor Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Moroccan Restaurant,Multiplex,Music Store,Music Venue,Nightclub,Noodle House,Optical Shop,Organic Grocery,Paella Restaurant,Park,Pastry Shop,Pawn Shop,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Post Office,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Restaurant,Road,Rock Club,Roof Deck,Salon / Barbershop,Sandwich Place,Sauna / Steam Room,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Track,Tram Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,El Besòs,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,El Besòs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,El Besòs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,El Besòs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,El Besòs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<h4>See the size</h4>

In [12]:
DataFrame_data.shape

(796, 169)

<h3>Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category</h3>

In [13]:
DataFrame_grouped = DataFrame_data.groupby("Neighborhood").mean().reset_index()
DataFrame_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Casino,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Empanada Restaurant,Escape Room,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Football Stadium,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Gaming Cafe,Garden Center,Gastropub,Gluten-free Restaurant,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health Food Store,Historic Site,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Liquor Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Moroccan Restaurant,Multiplex,Music Store,Music Venue,Nightclub,Noodle House,Optical Shop,Organic Grocery,Paella Restaurant,Park,Pastry Shop,Pawn Shop,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Post Office,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Restaurant,Road,Rock Club,Roof Deck,Salon / Barbershop,Sandwich Place,Sauna / Steam Room,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Track,Tram Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Diagonal Mar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.064516,0.064516,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,El Besòs,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0
2,El Camp de l'Arpa del Clot,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.012048,0.0,0.096386,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.036145,0.0,0.0,0.0,0.060241,0.0,0.024096,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.060241,0.012048,0.024096,0.0,0.0,0.0,0.0,0.024096,0.024096,0.060241,0.0,0.0,0.0,0.0,0.036145,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.036145,0.0,0.0,0.024096,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.060241,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.048193,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.048193,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0
3,La Verneda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Poblenou,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.06,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.03,0.05,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.05,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0


<h4>See the new size</h4>

In [14]:
DataFrame_grouped.shape

(14, 169)

<h3>Obtain top 10 most common venues for each Neighborhood</h3>

In [15]:
num_top_venues = 10

for hood in DataFrame_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = DataFrame_grouped[DataFrame_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Diagonal Mar----
                      venue  freq
0  Mediterranean Restaurant  0.16
1                Restaurant  0.13
2        Italian Restaurant  0.06
3                     Hotel  0.06
4                     Beach  0.06
5                 Beach Bar  0.06
6                    Bistro  0.03
7      Fast Food Restaurant  0.03
8               Flea Market  0.03
9               Pizza Place  0.03


----El Besòs----
                           venue  freq
0               Tapas Restaurant  0.16
1                 Breakfast Spot  0.08
2                   Tram Station  0.08
3                  Metro Station  0.08
4               Pedestrian Plaza  0.08
5                           Café  0.08
6  Vegetarian / Vegan Restaurant  0.04
7                          Plaza  0.04
8                         Market  0.04
9                    Supermarket  0.04


----El Camp de l'Arpa del Clot----
                      venue  freq
0                    Bakery  0.10
1                     Hotel  0.06
2             Groc

<h3>Put it into a DataFrame</h3>

With this steep, will be all the necessary data together and will be able to cluster it.

In [16]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = DataFrame_grouped['Neighborhood']

for ind in np.arange(DataFrame_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(DataFrame_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Diagonal Mar,Mediterranean Restaurant,Restaurant,Hotel,Beach,Italian Restaurant,Beach Bar,Park,Café,Cafeteria,Fast Food Restaurant
1,El Besòs,Tapas Restaurant,Tram Station,Metro Station,Café,Breakfast Spot,Pedestrian Plaza,Vegetarian / Vegan Restaurant,Restaurant,Plaza,Market
2,El Camp de l'Arpa del Clot,Bakery,Grocery Store,Hotel,Pizza Place,Café,Restaurant,Spanish Restaurant,Italian Restaurant,Mediterranean Restaurant,Burger Joint
3,La Verneda,Food,Coffee Shop,Fast Food Restaurant,Soccer Field,Metro Station,Athletics & Sports,Smoke Shop,Empanada Restaurant,Flea Market,Farmers Market
4,Poblenou,Spanish Restaurant,Mediterranean Restaurant,Bakery,Italian Restaurant,Restaurant,Pizza Place,Indian Restaurant,Gastropub,Café,Tapas Restaurant


<h3>Cluster Neighborhoods</h3>

In [17]:
# set number of clusters
kclusters = 5

DataFrame_grouped_clustering = DataFrame_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(DataFrame_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 1, 0, 3, 0, 0, 2, 0, 4, 0], dtype=int32)

<h3>Create a DataFrame to include the cluster and the top 10 venues for each neighborhood.</h3>

In [18]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

DataFrame_merged = DataFrame

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
DataFrame_merged = DataFrame_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

DataFrame_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Sant Martí, Barcelona",El Besòs,41.420369,2.209713,1,Tapas Restaurant,Tram Station,Metro Station,Café,Breakfast Spot,Pedestrian Plaza,Vegetarian / Vegan Restaurant,Restaurant,Plaza,Market
1,"Sant Martí, Barcelona",el Maresme,41.410525,2.215634,0,Café,Hotel,Italian Restaurant,Diner,Restaurant,Burger Joint,Ice Cream Shop,Cafeteria,Big Box Store,Cocktail Bar
2,"Sant Martí, Barcelona",el Clot,41.409953,2.190823,0,Spanish Restaurant,Tapas Restaurant,Mediterranean Restaurant,Café,Coffee Shop,Hotel,Restaurant,Supermarket,Plaza,Park
3,"Sant Martí, Barcelona",El Camp de l'Arpa del Clot,41.410754,2.182816,0,Bakery,Grocery Store,Hotel,Pizza Place,Café,Restaurant,Spanish Restaurant,Italian Restaurant,Mediterranean Restaurant,Burger Joint
4,"Sant Martí, Barcelona",Diagonal Mar,41.405228,2.213352,4,Mediterranean Restaurant,Restaurant,Hotel,Beach,Italian Restaurant,Beach Bar,Park,Café,Cafeteria,Fast Food Restaurant


<h3>Visualize the clusters</h3>

In [19]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(DataFrame_merged['Latitude'], DataFrame_merged['Longitude'], DataFrame_merged['Neighborhood'], DataFrame_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>Identify what characterizes the clusters</h3>

<h3>Cluster 1</h3>

In [20]:
DataFrame_merged.loc[DataFrame_merged['Cluster Labels'] == 0, DataFrame_merged.columns[[1] + list(range(5, DataFrame_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,el Maresme,Café,Hotel,Italian Restaurant,Diner,Restaurant,Burger Joint,Ice Cream Shop,Cafeteria,Big Box Store,Cocktail Bar
2,el Clot,Spanish Restaurant,Tapas Restaurant,Mediterranean Restaurant,Café,Coffee Shop,Hotel,Restaurant,Supermarket,Plaza,Park
3,El Camp de l'Arpa del Clot,Bakery,Grocery Store,Hotel,Pizza Place,Café,Restaurant,Spanish Restaurant,Italian Restaurant,Mediterranean Restaurant,Burger Joint
6,el Parc,Hotel,Bar,Restaurant,Clothing Store,Spanish Restaurant,Mediterranean Restaurant,Tapas Restaurant,Coffee Shop,Music Venue,Sandwich Place
7,la Llacuna del Poblenou,Spanish Restaurant,Mediterranean Restaurant,Bakery,Italian Restaurant,Gastropub,Coffee Shop,Pizza Place,Restaurant,Empanada Restaurant,Cocktail Bar
8,Poblenou,Spanish Restaurant,Mediterranean Restaurant,Bakery,Italian Restaurant,Restaurant,Pizza Place,Indian Restaurant,Gastropub,Café,Tapas Restaurant
9,Provençals del Poblenou,Spanish Restaurant,Asian Restaurant,Pedestrian Plaza,Pizza Place,Mediterranean Restaurant,Bistro,Liquor Store,Soccer Field,Café,Recreation Center
13,la Vila Olímpica del Poblenou,Restaurant,Mediterranean Restaurant,Café,Italian Restaurant,Hookah Bar,Nightclub,Paella Restaurant,Spanish Restaurant,Lounge,Bar


Cluster 1 is characterized by having places for crouded eating and some tourism (many types of Restaurants, Hotels, Café, Pizza, Ice Cream, etc).

<h3>Cluster 2</h3>

In [21]:
DataFrame_merged.loc[DataFrame_merged['Cluster Labels'] == 1, DataFrame_merged.columns[[1] + list(range(5, DataFrame_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,El Besòs,Tapas Restaurant,Tram Station,Metro Station,Café,Breakfast Spot,Pedestrian Plaza,Vegetarian / Vegan Restaurant,Restaurant,Plaza,Market
12,la Pau,Café,Supermarket,Tapas Restaurant,Fast Food Restaurant,Bar,Plaza,Pharmacy,Coffee Shop,Hardware Store,Falafel Restaurant


Cluster 2 is characterized by having chill eating places or to buy food and some transport stations (Café, Tapas, Supermarket, Bar, Breakfast, Tram Station, Metro Station).

<h3>Cluster 3</h3>

In [22]:
DataFrame_merged.loc[DataFrame_merged['Cluster Labels'] == 2, DataFrame_merged.columns[[1] + list(range(5, DataFrame_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Sant Martí de Provençals,Spanish Restaurant,Supermarket,Coffee Shop,Soccer Field,Grocery Store,Pizza Place,Bakery,Pedestrian Plaza,Pawn Shop,Park


Cluster 3 is characterized by having a variety of food places (Restaurants, Supermarket, Coffe, Pizza).

<h3>Cluster 4</h3>

In [23]:
DataFrame_merged.loc[DataFrame_merged['Cluster Labels'] == 3, DataFrame_merged.columns[[1] + list(range(5, DataFrame_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,La Verneda,Food,Coffee Shop,Fast Food Restaurant,Soccer Field,Metro Station,Athletics & Sports,Smoke Shop,Empanada Restaurant,Flea Market,Farmers Market


Cluster 4 is characterized by having places for sports and food (Food, Athletics & Sports, Soccer Field, Restaurants).

<h3>Cluster 5</h3>

In [24]:
DataFrame_merged.loc[DataFrame_merged['Cluster Labels'] == 4, DataFrame_merged.columns[[1] + list(range(5, DataFrame_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Diagonal Mar,Mediterranean Restaurant,Restaurant,Hotel,Beach,Italian Restaurant,Beach Bar,Park,Café,Cafeteria,Fast Food Restaurant
5,el Front Marítim del Poblenou,Mediterranean Restaurant,Restaurant,Beach Bar,Beach,Café,Athletics & Sports,Hotel,Park,Cocktail Bar,Fast Food Restaurant


Cluster 5 is characterized by having beach places where you can eat (Beach, Beach Bars, Restaurants).

This concludes our analysis. We have created 5 clusters representing each type of neighborhood in Sant Martí and identified what characterizes them (based on their 10th most common venues in 500 meters of radius).

<h2 name="Results">5. Results and Discussion</h2>

Our analysis shows that there are 14 neighborhoods in Sant Martí, with different sizes and many types of venues.
With the type of venues divided, we identify that a good way to group the neighborhoods is in 5 clusters.

With the map, we identify that those clusters are characterized by:

 - *Cluster 1*, includes most of neighborhoods, and it is characterized by being crowded eating and some tourism places, and include all of neighborhoods from west, center, east and south.

 - In the north, there are more different types in less space than the rest (3 clusters of 5, clusters 2, 3 and 4), mainly characterized by food places.

 - Finally, we have identified a cluster (*Cluster 5*), with two neighborhoods that are characterized by being near of the beach, in the south-east.

<h2 name="Conclusion">6. Conclusion</h2>

Purpose of this project was to identify what characterizes each neighborhood in Sant Martí, clustered by similar neighborhoods, in order to allow people interested to know it.

Using Foursquare, we first identified their locations, then generated the extensive collection of locations which satisfy the data requirements, and used it to obtain the venue category density distribution in every neighborhood.

With this, we obtained those conclusions:

 - There are **14 neighborhoods** in Sant Martí, Barcelona.
 - Those neighborhoods can be **grouped by 5**.
 - **First** cluster, includes "<ins>el Maresme</ins>, "<ins>el Clot</ins>", "<ins>El Camp de l'Arpa del Clot</ins>", "<ins>el Parc</ins>", "<ins>la Llacuna del Poblenou</ins>", "<ins>Poblenou</ins>", "<ins>Provençals del Poblenou</ins>"  and "<ins>la Vila Olímpica del Poblenou</ins>", in west, center, east and south of Sant Martí, and it is characterized by having places for **crowded eating and some tourism** (many types of Restaurants, Hotels, Café, Pizza, Ice Cream, etc).
 - **Second** cluster, includes "<ins>el Besòs</ins>" and "<ins>la Pau</ins>", in north of the Sant Martí, and it is characterized by having **chill eating places or to buy food, and some transport stations** (Café, Tapas, Supermarket, Bar, Breakfast, Tram Station, Metro Station).
 - **Third** cluster, includes "<ins>Sant Martí de Provençals</ins>", in north of the Sant Martí, and it is characterized by having **a variety of food places** (Restaurants, Supermarket, Coffe, Pizza).
 - **Fourth** cluster, includes "<ins>La Verneda</ins>", in north of the Sant Martí, and it is characterized by having places for **sports and food** (Food, Athletics & Sports, Soccer Field, Restaurants).
 - **Fifth** cluster, includes "<ins>Diagonal Mar</ins>" and "<ins>el Front Marítim del Poblenou</ins>", in south-east of Sant Martí, and it is characterized by having **beach places where you can eat** (Beach, Beach Bars, Restaurants).