<a href="https://github.com/PhinanceScientist"><img src = "https://i.ibb.co/NLfc0SV/Deveaner.png" width = 100> </a>
<h1 align=center><font size = 5>Merida Neighbourhoods Clustered by Economic Vulnerability due to the COVID-19 Outbreak</font></h1>

## Introduction

For this project I will be using some prepared data from a postal public web page due to the lack of postal and geodata from Mérida, Yucatán in México. The goal is to obtain some relevant information from the economic vulnerability of neighborhoods from Merida based on the information retrieved by the Foursquare API. k-means will be used to group the neighbourhoods and finally I will use the Folium library to visualize the results.
This approach is an attempt for visualizing the main neighborhoods inside Merida in order to cluster the most economic vulnerable places as the COVID-19 expands.

Please do notice that if you want to render this Jupyter notebook (show the folium maps) you can use this link https://nbviewer.jupyter.org/

## Data

The data needed for this project  can be found on this local postal services web page called <a href="https://www.heraldo.com.mx/"> Heraldo.com.mx </a> where we can find several postal codes from México. In this case we will be focused on <a href="https://www.heraldo.com.mx/yucatan/merida/merida/">Mérida's postal codes</a>.<br>
As for the CSV file used it is based on the first 100 postal codes from Mérida (ascending order starting from the downtown area as common knowledge) and then linked to its own Latitude and Longitude as a result of a Google Maps Search for each one.<br>
The Foursquare's API will be used to retrieve information of the venue on each neighborhood, type of each venue will be our goal to determine how crowded they are and therefore the whole vulnerability of the surrounding area.  

***

## Methodology


## Importing Libraries

In [1]:
#Import requests for web scraping
import pandas as pd
import requests as rq
import numpy as np
import io

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries installed')

Libraries installed


## First we need to retrieve our data, in this case we will use the file created and hosted on my GitHub repository. <br>
CSV URL File: https://raw.githubusercontent.com/PhinanceScientist/Coursera_Capstone/master/merida_cp_1_prepared.csv

In [2]:
urlCSV = 'https://raw.githubusercontent.com/PhinanceScientist/Coursera_Capstone/master/merida_cp_1_prepared.csv' #Retreive the data
geoSpatial = pd.read_csv(urlCSV) #Turned to dataFrame
newdf = geoSpatial.rename(columns ={"cp":"Postcode","colonia":"Neighbourhood"}) #Rename our column in order to have the same Column title as our previous DataFrame
newdf


Unnamed: 0,Postcode,Neighbourhood,lat,lon
0,97000,jardines de san sebastian,20.989200,-89.756400
1,97000,privada del maestro,20.982308,-89.626156
2,97000,merida centro,20.968927,-89.645942
3,97000,los cocos,20.948595,-89.630134
4,97000,privada garcia gineres c - 29,20.989226,-89.638116
5,97003,los reyes,20.978651,-89.575990
6,97050,yucatan,20.994835,-89.628827
7,97050,alcal martin,20.991941,-89.622019
8,97059,seoorial,20.990414,-89.621707
9,97060,carrillo ancona,20.985286,-89.643317


In [43]:
newdf.drop_duplicates(subset='Postcode', keep="first")#Dropping duplicate postcodes and keeping only the first value

newdf.head()

Unnamed: 0,Postcode,Neighbourhood,lat,lon
0,97000,jardines de san sebastian,20.9892,-89.7564
1,97000,privada del maestro,20.982308,-89.626156
2,97000,merida centro,20.968927,-89.645942
3,97000,los cocos,20.948595,-89.630134
4,97000,privada garcia gineres c - 29,20.989226,-89.638116


In [42]:

newdf = newdf[:100]

***

***

## First we need to retrieve our data, in this case we will use the file given by the instructions <br>
CSV URL File: https://raw.githubusercontent.com/PhinanceScientist/Coursera_Capstone/master/merida_cp_prepared.csv

In [5]:
urlCSV = 'https://raw.githubusercontent.com/PhinanceScientist/Coursera_Capstone/master/merida_cp_1_prepared.csv' #Retreive the data
geoSpatial = pd.read_csv(urlCSV) #Turned to dataFrame
newdf = geoSpatial.rename(columns ={"cp":"Postcode","colonia":"Neighbourhood"}) #Rename our column in order to have the same Column title as our previous DataFrame
newdf


Unnamed: 0,Postcode,Neighbourhood,lat,lon
0,97000,jardines de san sebastian,20.989200,-89.756400
1,97000,privada del maestro,20.982308,-89.626156
2,97000,merida centro,20.968927,-89.645942
3,97000,los cocos,20.948595,-89.630134
4,97000,privada garcia gineres c - 29,20.989226,-89.638116
5,97003,los reyes,20.978651,-89.575990
6,97050,yucatan,20.994835,-89.628827
7,97050,alcal martin,20.991941,-89.622019
8,97059,seoorial,20.990414,-89.621707
9,97060,carrillo ancona,20.985286,-89.643317


In [6]:
newdf.drop_duplicates(subset='Postcode', keep="first")#Dropping duplicate postcodes and keeping only the first value
newdf.head()

Unnamed: 0,Postcode,Neighbourhood,lat,lon
0,97000,jardines de san sebastian,20.9892,-89.7564
1,97000,privada del maestro,20.982308,-89.626156
2,97000,merida centro,20.968927,-89.645942
3,97000,los cocos,20.948595,-89.630134
4,97000,privada garcia gineres c - 29,20.989226,-89.638116


In [44]:

newdf = newdf[:100]

### We need to import our libraries for visualization

In [8]:
!pip -q install folium
import folium
print('Folium imported')

Folium imported


## 1. Exploring the dataset


In [9]:
map_merida = folium.Map(location=[20.97537, -89.61696], zoom_start=11) # Create Map

# add markers to map
for lat, lng, borough, neighborhood in zip(newdf['lat'], newdf['lon'], newdf['Postcode'], newdf['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_merida) 
    
map_merida


### Adding the Foursquare credentials

In [10]:
#@hidden_cell
CLIENT_ID = 'DGBSOBI1JYHOTEEC5WQBC41VJNTTUGDB0IJH4U4GI5HITY4D' # your Foursquare ID
CLIENT_SECRET = 'NDTXJZISJVIJX0J5V5RSDHXJULPWBBI2ND2EN3JH11ULSJQO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DGBSOBI1JYHOTEEC5WQBC41VJNTTUGDB0IJH4U4GI5HITY4D
CLIENT_SECRET:NDTXJZISJVIJX0J5V5RSDHXJULPWBBI2ND2EN3JH11ULSJQO


### For this excercise we will use only the first 100 Neighbourhoods as they are the most commercial like

In [11]:

newdf.head()

Unnamed: 0,Postcode,Neighbourhood,lat,lon
0,97000,jardines de san sebastian,20.9892,-89.7564
1,97000,privada del maestro,20.982308,-89.626156
2,97000,merida centro,20.968927,-89.645942
3,97000,los cocos,20.948595,-89.630134
4,97000,privada garcia gineres c - 29,20.989226,-89.638116


In [12]:
neighborhood_latitude = newdf.loc[1, 'lat'] # neighborhood latitude value
neighborhood_longitude = newdf.loc[1, 'lon'] # neighborhood longitude value

neighborhood_name = newdf.loc[1, 'Neighbourhood'] # neighbourhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of privada del maestro are 20.9823085, -89.62615579999999.


### Let's create the GET request URL. 

In [13]:


LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=DGBSOBI1JYHOTEEC5WQBC41VJNTTUGDB0IJH4U4GI5HITY4D&client_secret=NDTXJZISJVIJX0J5V5RSDHXJULPWBBI2ND2EN3JH11ULSJQO&v=20180605&ll=20.9823085,-89.62615579999999&radius=500&limit=100'

### Send the GET request and examine the resutls

In [14]:
results = rq.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e9258359388d70028088d9a'},
 'response': {'headerLocation': 'Mérida',
  'headerFullLocation': 'Mérida',
  'headerLocationGranularity': 'city',
  'totalResults': 23,
  'suggestedBounds': {'ne': {'lat': 20.986808504500004,
    'lng': -89.62134521245169},
   'sw': {'lat': 20.977808495499993, 'lng': -89.6309663875483}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5149e0bde4b008ba38e2912d',
       'name': 'Bistro Cultural',
       'location': {'address': 'C. 66 Centro',
        'crossStreet': '43',
        'lat': 20.978596857641357,
        'lng': -89.62548953628293,
        'labeledLatLngs': [{'label': 'display',
          'lat': 20.978596857641357,
          'lng': -89.62548953628293}],
        'distance': 418,
        'postalCode': 

### Function that extracts the category of the venue

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Now we clean the json and structure it into a pandas dataframe.

## 2. Exploring Neighbourhoods in Merida 

### Function to repeat the same process to all the neighbourhoods in Merida

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = rq.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### The code to run the above function on each neighborhood and create a new dataframe called *dt_Merida_venues*.

In [17]:

dt_Merida_venues = getNearbyVenues(names=newdf['Neighbourhood'],
                                   latitudes=newdf['lat'],
                                   longitudes=newdf['lon']
                                  )

jardines de san sebastian
privada del maestro
merida centro
los cocos
privada garcia gineres c - 29
los reyes
yucatan
alcal martin
seoorial
carrillo ancona
itzaes
inalmbrica
dolores patron
el pedregal
garcia gineres
la huerta
santa cecilia
cupules
lourdes
waspa
itzimna 2
itzimna
rinconada itzmina
las arboledas
jes䚂s carranza
ferrocarrileros
xaman-tan
san antonio
montebello
gran royal altabrisa
sol campestre
monte alban
privada monterreal plus
hacienda dzodzil
cordemex
gonzalo guerrero
residencial san angelo
montes de ame
san antonio cucul
san ramon norte
plan de ayala
villas del sol
villas la hacienda
benito jurez nte
gonzalo guerrero
tecnolugico
campestre
del norte
centro sct yucatn
privada nuevo mexico
montejo
buenavista
privada mediterrneo
residencial colonia mexico
privada real mexico
mexico norte
emiliano zapata nte
vista alegre
san remo
san carlos
privada vista alegre
montecarlo norte
diaz ordaz
montecarlo
privada maya
altabrisa
cumbres de altabrisa
missan ii
residencial palmeral

### Check the size of the new dataFrame


In [18]:
print(dt_Merida_venues.shape)
dt_Merida_venues.head()

(2257, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,privada del maestro,20.982308,-89.626156,Bistro Cultural,20.978597,-89.62549,Café
1,privada del maestro,20.982308,-89.626156,Centro Cultural Ibérica,20.982269,-89.628082,Concert Hall
2,privada del maestro,20.982308,-89.626156,"Restaurant Reforma ""El popular Soberanis""",20.985473,-89.624678,Restaurant
3,privada del maestro,20.982308,-89.626156,Hotel Casa Nobel,20.980803,-89.626846,Bed & Breakfast
4,privada del maestro,20.982308,-89.626156,El Tío Ricardo,20.986303,-89.627185,Mexican Restaurant


### Let's group our dataframe by Neighbourhood and count how many venues they have

In [19]:
dt_Merida_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
alcal martin,8,8,8,8,8,8
altabrisa,64,64,64,64,64,64
benito jurez nte,26,26,26,26,26,26
buenavista,15,15,15,15,15,15
camara de comercio norte,19,19,19,19,19,19
campestre,23,23,23,23,23,23
carrillo ancona,26,26,26,26,26,26
centro sct yucatn,17,17,17,17,17,17
cordemex,38,38,38,38,38,38
cumbres de altabrisa,6,6,6,6,6,6


In [20]:
# Unique venues categories
print('There are {} uniques categories.'.format(len(dt_Merida_venues['Venue Category'].unique())))

There are 215 uniques categories.


## 3. Analyze Each Neighbourhood

In [21]:
# one hot encoding
dt_Merida_onehot = pd.get_dummies(dt_Merida_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_Merida_onehot['Neighbourhood'] = dt_Merida_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [dt_Merida_onehot.columns[-1]] + list(dt_Merida_onehot.columns[:-1])
dt_Merida_onehot = dt_Merida_onehot[fixed_columns]

dt_Merida_onehot.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Yucatecan Restaurant
0,privada del maestro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,privada del maestro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,privada del maestro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,privada del maestro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,privada del maestro,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
dt_Merida_onehot.shape

(2257, 216)

### Next, let's group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category

In [23]:
dt_Merida_grouped = dt_Merida_onehot.groupby('Neighbourhood').mean().reset_index()
dt_Merida_grouped

Unnamed: 0,Neighbourhood,ATM,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Yucatecan Restaurant
0,alcal martin,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
1,altabrisa,0.000000,0.000000,0.031250,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.015625,0.015625,0.000000,0.000000,0.000000,0.000000,0.0,0.015625,0.000000,0.000000
2,benito jurez nte,0.000000,0.000000,0.076923,0.000000,0.000000,0.00000,0.000000,0.000000,0.038462,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
3,buenavista,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
4,camara de comercio norte,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
5,campestre,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
6,carrillo ancona,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
7,centro sct yucatn,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.058824,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000
8,cordemex,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.026316,0.026316,0.000000,0.026316,0.000000,0.000000,0.0,0.026316,0.000000,0.026316
9,cumbres de altabrisa,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,0.000000,0.000000,...,0.166667,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000


In [24]:
dt_Merida_grouped.shape

(91, 216)

### Let's print each neighbourhood along with the top 5 most common venues

In [25]:
num_top_venues = 5

for hood in dt_Merida_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = dt_Merida_grouped[dt_Merida_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----alcal martin----
                venue  freq
0          Taco Place  0.25
1  Seafood Restaurant  0.25
2             Stadium  0.12
3         Coffee Shop  0.12
4  Mexican Restaurant  0.12


----altabrisa----
                 venue  freq
0       Clothing Store  0.08
1       Ice Cream Shop  0.05
2        Shopping Mall  0.05
3  Sporting Goods Shop  0.03
4          Coffee Shop  0.03


----benito jurez nte----
                 venue  freq
0        Movie Theater  0.08
1  American Restaurant  0.08
2           Steakhouse  0.08
3           Restaurant  0.08
4          Pizza Place  0.04


----buenavista----
                venue  freq
0          Restaurant  0.13
1   Convenience Store  0.13
2  Seafood Restaurant  0.07
3                 Spa  0.07
4         Bus Station  0.07


----camara de comercio norte----
                venue  freq
0      Ice Cream Shop  0.11
1  Mexican Restaurant  0.05
2         Karaoke Bar  0.05
3          Playground  0.05
4        Cocktail Bar  0.05


----campestre----
    

### Let's put that into a *pandas* dataframe

Sorting venues in descending order

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = dt_Merida_grouped['Neighbourhood']

for ind in np.arange(dt_Merida_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_Merida_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head(10)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,alcal martin,Taco Place,Seafood Restaurant,Coffee Shop,Stadium,Food Stand,Mexican Restaurant,Yucatecan Restaurant,Diner,Event Space,Event Service
1,altabrisa,Clothing Store,Ice Cream Shop,Shopping Mall,Snack Place,Multiplex,Coffee Shop,American Restaurant,Sporting Goods Shop,Shoe Store,Seafood Restaurant
2,benito jurez nte,American Restaurant,Movie Theater,Steakhouse,Restaurant,Pharmacy,Food Truck,Business Service,Café,Italian Restaurant,Ice Cream Shop
3,buenavista,Convenience Store,Restaurant,Food Truck,Seafood Restaurant,Bus Station,Spa,Taco Place,Dance Studio,Supermarket,Fast Food Restaurant
4,camara de comercio norte,Ice Cream Shop,French Restaurant,Soccer Field,Shopping Mall,Mexican Restaurant,Boutique,Karaoke Bar,Steakhouse,Motorcycle Shop,Swiss Restaurant
5,campestre,Bakery,Taco Place,Mexican Restaurant,Flower Shop,Tailor Shop,Sandwich Place,Diner,Breakfast Spot,Seafood Restaurant,Fast Food Restaurant
6,carrillo ancona,Mexican Restaurant,Coffee Shop,Convenience Store,Deli / Bodega,Clothing Store,Bakery,Burger Joint,Pharmacy,Seafood Restaurant,Sandwich Place
7,centro sct yucatn,Mexican Restaurant,Park,Gym,Pizza Place,Bar,Arts & Entertainment,Gymnastics Gym,Ice Cream Shop,Seafood Restaurant,Taco Place
8,cordemex,Clothing Store,Hotel,Ice Cream Shop,Multiplex,Cosmetics Shop,Yucatecan Restaurant,Bookstore,Sushi Restaurant,Fast Food Restaurant,Sporting Goods Shop
9,cumbres de altabrisa,Business Service,Eye Doctor,Big Box Store,Convenience Store,Restaurant,Vegetarian / Vegan Restaurant,Yucatecan Restaurant,Event Space,Event Service,Electronics Store


## 4. Cluster Neighbourhoods

Run *k*-means to cluster the neighbourhood into 10 clusters. we will be using k=10 as this is only for demostration on the Foursquare API and clustering, we are not analyzing the optimal k

In [28]:
# set number of clusters
kclusters = 10

dt_Merida_grouped_clustering = dt_Merida_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_Merida_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([6, 1, 1, 1, 1, 2, 2, 6, 1, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighbourhood.

In [29]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dt_Merida_merged = newdf

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dt_Merida_merged = dt_Merida_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

dt_Merida_merged.head() # check the last columns!

Unnamed: 0,Postcode,Neighbourhood,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,97000,jardines de san sebastian,20.9892,-89.7564,,,,,,,,,,,
1,97000,privada del maestro,20.982308,-89.626156,1.0,Paper / Office Supplies Store,Taco Place,Hotel,Food Truck,Public Art,Men's Store,Mexican Restaurant,Café,Bed & Breakfast,Bar
2,97000,merida centro,20.968927,-89.645942,8.0,Gym,Restaurant,Performing Arts Venue,Yucatecan Restaurant,Department Store,Event Space,Event Service,Electronics Store,Donut Shop,Dog Run
3,97000,los cocos,20.948595,-89.630134,6.0,Steakhouse,Mexican Restaurant,Convenience Store,Dessert Shop,Laundromat,Park,Bar,Taco Place,Athletics & Sports,Donut Shop
4,97000,privada garcia gineres c - 29,20.989226,-89.638116,2.0,Convenience Store,Mexican Restaurant,Restaurant,Pharmacy,Fast Food Restaurant,Snack Place,Burger Joint,Seafood Restaurant,Sandwich Place,Bar


Finally, let's visualize the resulting clusters

In [30]:
#Drop NaN results
dt_Merida_merged.dropna(inplace=True)
dt_Merida_merged


Unnamed: 0,Postcode,Neighbourhood,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,97000,privada del maestro,20.982308,-89.626156,1.0,Paper / Office Supplies Store,Taco Place,Hotel,Food Truck,Public Art,Men's Store,Mexican Restaurant,Café,Bed & Breakfast,Bar
2,97000,merida centro,20.968927,-89.645942,8.0,Gym,Restaurant,Performing Arts Venue,Yucatecan Restaurant,Department Store,Event Space,Event Service,Electronics Store,Donut Shop,Dog Run
3,97000,los cocos,20.948595,-89.630134,6.0,Steakhouse,Mexican Restaurant,Convenience Store,Dessert Shop,Laundromat,Park,Bar,Taco Place,Athletics & Sports,Donut Shop
4,97000,privada garcia gineres c - 29,20.989226,-89.638116,2.0,Convenience Store,Mexican Restaurant,Restaurant,Pharmacy,Fast Food Restaurant,Snack Place,Burger Joint,Seafood Restaurant,Sandwich Place,Bar
5,97003,los reyes,20.978651,-89.575990,6.0,Seafood Restaurant,Convenience Store,Taco Place,Restaurant,Park,Mexican Restaurant,Pizza Place,Concert Hall,Comfort Food Restaurant,Event Service
6,97050,yucatan,20.994835,-89.628827,6.0,Convenience Store,Bar,Food Truck,Taco Place,Sandwich Place,Diner,Mexican Restaurant,Bagel Shop,Fast Food Restaurant,Restaurant
7,97050,alcal martin,20.991941,-89.622019,6.0,Taco Place,Seafood Restaurant,Coffee Shop,Stadium,Food Stand,Mexican Restaurant,Yucatecan Restaurant,Diner,Event Space,Event Service
8,97059,seoorial,20.990414,-89.621707,2.0,Coffee Shop,Mexican Restaurant,Restaurant,Stadium,Beer Garden,Korean Restaurant,Seafood Restaurant,Taco Place,Bar,Breakfast Spot
9,97060,carrillo ancona,20.985286,-89.643317,2.0,Mexican Restaurant,Coffee Shop,Convenience Store,Deli / Bodega,Clothing Store,Bakery,Burger Joint,Pharmacy,Seafood Restaurant,Sandwich Place
10,97068,itzaes,20.978626,-89.643890,6.0,Bakery,Convenience Store,Taco Place,Seafood Restaurant,Cajun / Creole Restaurant,Stadium,Fast Food Restaurant,Pizza Place,Skating Rink,Pharmacy


In [31]:
# create map
map_clusters = folium.Map(location=[20.97537, -89.61696], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_Merida_merged['lat'], dt_Merida_merged['lon'], dt_Merida_merged['Neighbourhood'], dt_Merida_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examining the Clusters

#### Cluster 0 - Sport oriented venues around

In [32]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 0, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,residencial san angelo,Taco Place,Diner,Athletics & Sports,Baseball Field,Fast Food Restaurant,Cosmetics Shop,Costume Shop,Falafel Restaurant,Fabric Shop,Eye Doctor


#### Cluster 1 - Venues for High Income costumers & Tourism

In [33]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 1, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,privada del maestro,Paper / Office Supplies Store,Taco Place,Hotel,Food Truck,Public Art,Men's Store,Mexican Restaurant,Café,Bed & Breakfast,Bar
12,dolores patron,Restaurant,Chinese Restaurant,Hotel,Convenience Store,Coffee Shop,Mexican Restaurant,Seafood Restaurant,Taco Place,Farmers Market,Deli / Bodega
15,la huerta,Hotel,Restaurant,Mexican Restaurant,Historic Site,Food,Bar,Hot Dog Joint,Ice Cream Shop,Seafood Restaurant,Miscellaneous Shop
16,santa cecilia,Hotel,Ice Cream Shop,Coffee Shop,Restaurant,Bar,Hot Dog Joint,Salsa Club,Historic Site,Falafel Restaurant,Flower Shop
24,jes䚂s carranza,Caribbean Restaurant,Clothing Store,Optical Shop,Italian Restaurant,BBQ Joint,Furniture / Home Store,Restaurant,Big Box Store,Burger Joint,Costume Shop
25,ferrocarrileros,BBQ Joint,Convenience Store,Furniture / Home Store,Restaurant,Sporting Goods Shop,Gym,Concert Hall,Diner,Eye Doctor,Event Space
26,xaman-tan,Shopping Mall,Seafood Restaurant,Food Truck,Pizza Place,Breakfast Spot,Health & Beauty Service,Sushi Restaurant,Bar,Coffee Shop,Convenience Store
27,san antonio,BBQ Joint,Gym,Pet Store,Post Office,Convenience Store,Dessert Shop,Movie Theater,Mexican Restaurant,Soccer Field,Supermarket
28,montebello,Italian Restaurant,Gym,Food Truck,Restaurant,Seafood Restaurant,Burger Joint,Shopping Mall,Snack Place,Pie Shop,Taco Place
30,sol campestre,Convenience Store,Spa,Shopping Mall,Restaurant,Fast Food Restaurant,Breakfast Spot,Jewelry Store,Spanish Restaurant,Mexican Restaurant,Liquor Store


#### Cluster 2 - Restaurants, food venues mostly.

In [34]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 2, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,privada garcia gineres c - 29,Convenience Store,Mexican Restaurant,Restaurant,Pharmacy,Fast Food Restaurant,Snack Place,Burger Joint,Seafood Restaurant,Sandwich Place,Bar
8,seoorial,Coffee Shop,Mexican Restaurant,Restaurant,Stadium,Beer Garden,Korean Restaurant,Seafood Restaurant,Taco Place,Bar,Breakfast Spot
9,carrillo ancona,Mexican Restaurant,Coffee Shop,Convenience Store,Deli / Bodega,Clothing Store,Bakery,Burger Joint,Pharmacy,Seafood Restaurant,Sandwich Place
17,cupules,Diner,Sandwich Place,Paper / Office Supplies Store,Electronics Store,Bar,Food Truck,Sporting Goods Shop,Furniture / Home Store,Mexican Restaurant,Convenience Store
19,waspa,Theater,Dance Studio,Flea Market,Brewery,Taco Place,Bar,BBQ Joint,Mexican Restaurant,Cupcake Shop,Dog Run
20,itzimna 2,Mexican Restaurant,Bar,Italian Restaurant,Bakery,Coffee Shop,Beer Garden,Convenience Store,Monument / Landmark,Event Space,Rock Club
21,itzimna,Mexican Restaurant,Bar,Italian Restaurant,Bakery,Coffee Shop,Beer Garden,Convenience Store,Monument / Landmark,Event Space,Rock Club
22,rinconada itzmina,Mexican Restaurant,Restaurant,Electronics Store,Taco Place,Bar,Pharmacy,Sushi Restaurant,Sandwich Place,Dessert Shop,Food Truck
45,tecnolugico,Coffee Shop,Convenience Store,Pizza Place,Fast Food Restaurant,Diner,Taco Place,Tailor Shop,Bakery,Sandwich Place,Paper / Office Supplies Store
46,campestre,Bakery,Taco Place,Mexican Restaurant,Flower Shop,Tailor Shop,Sandwich Place,Diner,Breakfast Spot,Seafood Restaurant,Fast Food Restaurant


#### Cluster 3 - Big Box Stores

In [35]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 3, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,gran royal altabrisa,Big Box Store,Eye Doctor,Convenience Store,Food Truck,Yucatecan Restaurant,Discount Store,Fabric Shop,Event Space,Event Service,Electronics Store


#### Cluster 4 - Parks and recreational venues as movie theaters and shopping malls.

In [36]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 4, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
41,villas del sol,Coffee Shop,Mexican Restaurant,Electronics Store,Pharmacy,Pet Store,Park,Furniture / Home Store,Taco Place,Tailor Shop,Movie Theater
52,privada mediterrneo,Pizza Place,Gym,Park,Taco Place,Gaming Cafe,Movie Theater,Shipping Store,Bakery,Music Venue,Pharmacy
60,privada vista alegre,Taco Place,Movie Theater,Park,Pizza Place,Paper / Office Supplies Store,Basketball Stadium,Shopping Mall,Mexican Restaurant,Furniture / Home Store,Sports Bar
64,privada maya,Taco Place,Movie Theater,Park,Pizza Place,Paper / Office Supplies Store,Basketball Stadium,Shopping Mall,Mexican Restaurant,Furniture / Home Store,Sports Bar
67,missan ii,Taco Place,Movie Theater,Park,Pizza Place,Paper / Office Supplies Store,Basketball Stadium,Shopping Mall,Mexican Restaurant,Furniture / Home Store,Sports Bar


##### Cluster 5 - Residential area, no parks around but a lot of Convenience Store.

In [37]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 5, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,residencial palmerales de altabrisa,Convenience Store,Yucatecan Restaurant,Diner,Fabric Shop,Eye Doctor,Event Space,Event Service,Electronics Store,Donut Shop,Dog Run


#### Cluster 6 - Restaurants close to parks or gyms

In [38]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 6, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,los cocos,Steakhouse,Mexican Restaurant,Convenience Store,Dessert Shop,Laundromat,Park,Bar,Taco Place,Athletics & Sports,Donut Shop
5,los reyes,Seafood Restaurant,Convenience Store,Taco Place,Restaurant,Park,Mexican Restaurant,Pizza Place,Concert Hall,Comfort Food Restaurant,Event Service
6,yucatan,Convenience Store,Bar,Food Truck,Taco Place,Sandwich Place,Diner,Mexican Restaurant,Bagel Shop,Fast Food Restaurant,Restaurant
7,alcal martin,Taco Place,Seafood Restaurant,Coffee Shop,Stadium,Food Stand,Mexican Restaurant,Yucatecan Restaurant,Diner,Event Space,Event Service
10,itzaes,Bakery,Convenience Store,Taco Place,Seafood Restaurant,Cajun / Creole Restaurant,Stadium,Fast Food Restaurant,Pizza Place,Skating Rink,Pharmacy
11,inalmbrica,Taco Place,Pizza Place,Park,Cupcake Shop,Skating Rink,Restaurant,Convenience Store,Stadium,Pharmacy,Burger Joint
13,el pedregal,Mexican Restaurant,Convenience Store,Diner,Sporting Goods Shop,Park,Seafood Restaurant,Electronics Store,Furniture / Home Store,Paper / Office Supplies Store,Market
14,garcia gineres,Convenience Store,Food Truck,Restaurant,Fast Food Restaurant,Comedy Club,Dessert Shop,Bar,Seafood Restaurant,Mexican Restaurant,Optical Shop
18,lourdes,Boutique,Convenience Store,Park,Health & Beauty Service,History Museum,Hotel,Ice Cream Shop,Taco Place,Bar,Fabric Shop
23,las arboledas,Convenience Store,Baseball Field,Gym,Furniture / Home Store,Pharmacy,Restaurant,Mexican Restaurant,Taco Place,Coffee Shop,Garden


#### Cluster 7 - Stables around, outside the city.

In [39]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 7, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
98,vista alegre lotificacion,Stables,Buffet,Pool,Gym,Grocery Store,Gourmet Shop,Event Space,Event Service,Electronics Store,Donut Shop


#### Cluster 8 - Highly touristic places and gyms as the most common venue.

In [40]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 8, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,merida centro,Gym,Restaurant,Performing Arts Venue,Yucatecan Restaurant,Department Store,Event Space,Event Service,Electronics Store,Donut Shop,Dog Run


#### Cluster 9 - Outside the city with sports club as the most common venue.

In [41]:
dt_Merida_merged.loc[dt_Merida_merged['Cluster Labels'] == 9, dt_Merida_merged.columns[[1] + list(range(5, dt_Merida_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
99,san antonio cinta,Sports Club,Pharmacy,Auto Workshop,Fried Chicken Joint,Dessert Shop,Fabric Shop,Eye Doctor,Event Space,Event Service,Electronics Store


# Results and discussion<br>

### I decided to use the  first 100 postal codes from Merida for this excercise due to their exposure in Foursquare as the people found there are most likely to utilice this application for tips and reviews. A brief expected behaviour for each cluster is written next to it.  

### The clusters were defined by the most common type of venue: <br>
   <li> <b>Cluster 0, Sport oriented venues around:</b>   If the places remain closed for activities as expected, it should show moderate economic downturn and represent low risk of contagion.<br></li>
   <li> <b>Cluster 1, Venues for High Income costumers:</b> Common places known for nightlife, economic downturn and expected low risk of contagion as all this venues are closed. <br></li>
   <li> <b>Cluster 2, Restaurants, food venues mostly: </b> High economic downturn, probably most of the venues will received a hard hit on their operations and cashflow, expected shutdown of the smallest venues of this cluster. Low risk of contagion<br></li>
   <li> <b>Cluster 3, Big Box Stores:</b> Places still opened due to their food and basic needs distribution function. The risk of contagions its moderate as people gather for buying.<br></li>
   <li> <b>Cluster 4, Parks and recreational venues as movie theaters and shopping mall:</b> High economic downturn, low risk of contagion.<br></li>
   <li> <b>Cluster 5, Residential area, no parks around but a lot of Convenience Store:</b> Expected economic downturn and moderate contagion risk for the people gathering on the convenience stores. <br></li>
   <li> <b>Cluster 6, Restaurants close to parks or gyms:</b> High economic downturn, low risk of contagion as the gyms remain closed.<br></li>
   <li> <b>Cluster 7, Stables around, outside the city:</b> Expected economic downturn, low risk of contagion.<br></li>
   <li> <b>Cluster 8, Highly touristic places and gyms as the most common venue:</b> Very high economic downturn caused by lack of international tourism, food venues almost closed, moderate risk of contagion<br></li>
   <li> <b>Cluster 9, Outside the city with sports club as the most common venue:</b> Expected economic downturn, low risk of contagion.<br></li>
   

# Conclussion<br>
 In conclusion, we can observe that, regardless of the cluster, the most often venue are the restaurants. We should look for special importance to this as this kind of venue shows that it receives the most economic damage during this pandemic.</br> In Mexico, 97% of food related venues are classified as micro or small companies within 10 or fewer employees (CANIRAC, 2014), this means that they are an economic sector highly affected by situations such as COVID-19's outbreak. </br> As society, we and government should take special care for this business sector in an attempt to stop the disappearance of jobs created by restaurant entrepreneurs.

## Bibliografy
https://molekule.science/places-to-avoid-flu-virus/ <br>
https://www.babymed.com/health-news/8-public-places-avoid-during-cold-and-flu-season <br>
https://www.nhs.uk/conditions/coronavirus-covid-19/ <br>
https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/what-you-need-to-know-about-coronavirus-covid-19 <br>
https://www.who.int/emergencies/diseases/novel-coronavirus-2019/advice-for-public <br>
https://www.healthline.com/health-news/public-places-and-the-coronavirus-what-to-know#Coronavirus-can-spread-through-contact-with-contaminated-surfaces,-too <br>
https://www.cdc.gov/coronavirus/2019-ncov/prepare/transmission.html <br>
https://www.bbc.com/future/article/20200317-covid-19-how-long-does-the-coronavirus-last-on-surfaces <br>
https://canirac.org.mx/images//files/TODO%20SOBRE%20LA%20MESA%20ESTUDIOS%20DE%20LA%20INDUSTRIA.pdf

This notebook was <b>The final Capstone</b> from the week 5 of the Applied Data Science Capstone track from IBM Professional Certificate made by <a href='https://www.linkedin.com/in/novelo-luis/'> Luis Novelo </a>

***

***
