# **Looking for a place to set up a guided visit start point in Madrid**

## **Description of the problem:**

The objective of this studio is to select a set of coordinates to serve as a starting point for guided visits aimed at turists wanting to explore the monuments of Madrid (Spain).

This tours will be special though. Upon arriving to the starting point, each group of tourists will pick one of three sets of monuments to visit (the best way to group the monuments will also be calculated in this notebook). Thus, before calculating the starting point for the tour (which will be the final goal) we have to cluster the monuments in three groups.

Note that the paticular path for each of the tours is not an objective of this study, and it will not be calculated.

## **Description of the data:**

For this study, we will need to main sources of data:

On the one hand, we will need a mapa of Madrid to help us visualize the results of the study and to verify (with the naked eye) that they are reasonable.

On the other hand, we will need the coordinates of Madrid's most centrical monuments. For practical reasons, we will limit the search to the thirty monuments closest to the center of the city.

The aim is to cluster the monuments into three groups. After that, we should calculate the centroid of each group to serve as reference for the calculation of the starting point. And finally, we should be able to calculate a global centroid for the three previusly calculated points.

*Note that there are simpler ways to calculate the centroid, this one was selected as a way to showcase the knowledge adquired during the couse*

## **Methodology along with code:**

In [2]:
import numpy as np

import pandas as pd 

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import json
from pandas.io.json import json_normalize

import requests

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    numpy-1.14.2               |   py36hdbf6ddf_0         4.0 MB
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    pandas-0.23.4              |   py36hf8a1672_0        27.8 MB  conda-forge
    altair-2.2.2               |        py36_1001         494 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        32.4 MB

The following NEW packages will be INSTALLED:

    altai

The first thing that we need are the coordinates of the city of Madrid. Since this only requires a single pair of coordinates, they will be provided manually.
The latitude of Madrid is:  40.416775 , whereas the longitude is: -3.703790. Now lets verify that we have accurate coordinates by zooming into the world map:

In [3]:
latitude = 40.416775
longitude =  -3.703790

In [4]:
# create map of New York using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=12)
    
map_madrid

Since the map is centered in the city itself, we can affirm that the coordinates we selected are correct.

With the map ready, the next step is to obtain the list of monuments from foursquare and populate it:

In [5]:
CLIENT_ID = 'SECRET' # your Foursquare ID
CLIENT_SECRET = 'SECRET' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + '')
print('CLIENT_SECRET:' + '')

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


A preliminar search showed too few monument results in the city of Madrid. Thus, the search was tuned to include museums to fill as many of the thirty spots as possible.

In [6]:

LIMIT = 30

radius = 50000 

Category = '4bf58dd8d48988d181941735' #id of the museum category

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT,
    Category)
url 


'https://api.foursquare.com/v2/venues/explore?&client_id=5RVDEVHJOSGDWG5HLWGRRG5WGPSUASD32VWQML1UEV4EFTH1&client_secret=TZ3VBRFAYIEEG5IFRWAMPGQTEA4TY4TTHMV3OLMGBGUO2F5O&v=20180605&ll=40.416775,-3.70379&radius=50000&limit=30&categoryId=4bf58dd8d48988d181941735'

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c05451d4c1f676746a97e3d'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Madrid',
  'headerFullLocation': 'Madrid',
  'headerLocationGranularity': 'city',
  'query': 'museum',
  'totalResults': 55,
  'suggestedBounds': {'ne': {'lat': 40.86677545000045,
    'lng': -3.113836468272621},
   'sw': {'lat': 39.96677454999955, 'lng': -4.2937435317273795}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4adcda37f964a5200d3c21e3',
       'name': 'Museo Thyssen-Bornemisza',
       'location': {'address': 'P. del Prado, 8',
        'lat': 40.41635928605382,
        'lng': -3.6948908136558845,
        'distance': 755,
        'postalCode': '28014',
     

In [8]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,lat,lng
0,Museo Thyssen-Bornemisza,40.416359,-3.694891
1,Museo Nacional del Prado,40.414429,-3.692468
2,Palacio Real de Madrid,40.41794,-3.714259
3,Templo de Debod,40.423828,-3.716779
4,Museo Nacional Centro de Arte Reina Sofía (MNC...,40.408495,-3.694024
5,CaixaForum Madrid,40.411013,-3.693177
6,Santa Iglesia Catedral de Santa María la Real ...,40.415767,-3.714516
7,Círculo de Bellas Artes,40.418486,-3.696612
8,Museo Arqueológico Nacional (MAN),40.423259,-3.688432
9,Palacio de Linares - Casa de América,40.420254,-3.691827


Since there are not enough items in Madrid categorized as monuments, we will also search for plazas and museums

In [9]:

map_madrid = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(nearby_venues['lat'], nearby_venues['lng'], nearby_venues['name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
    
map_madrid

We have checked that we have accurate venue locations so it is time to move to the clusterization

In [10]:
kclusters = 3

madrid_clustered = nearby_venues.drop('name', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_clustered)

kmeans.labels_[0:29] 

array([1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 2, 0, 2, 1, 0, 2, 0, 1, 1, 1, 2, 0,
       1, 0, 2, 0, 2, 1, 0], dtype=int32)

In [11]:
monuments_final = nearby_venues
monuments_final['Cluster Labels'] = kmeans.labels_

In [12]:
monuments_final

Unnamed: 0,name,lat,lng,Cluster Labels
0,Museo Thyssen-Bornemisza,40.416359,-3.694891,1
1,Museo Nacional del Prado,40.414429,-3.692468,1
2,Palacio Real de Madrid,40.41794,-3.714259,0
3,Templo de Debod,40.423828,-3.716779,0
4,Museo Nacional Centro de Arte Reina Sofía (MNC...,40.408495,-3.694024,1
5,CaixaForum Madrid,40.411013,-3.693177,1
6,Santa Iglesia Catedral de Santa María la Real ...,40.415767,-3.714516,0
7,Círculo de Bellas Artes,40.418486,-3.696612,1
8,Museo Arqueológico Nacional (MAN),40.423259,-3.688432,1
9,Palacio de Linares - Casa de América,40.420254,-3.691827,1


Let's visualize the result on the map:

In [13]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(monuments_final['lat'], monuments_final['lng'], monuments_final['name'], monuments_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Seeing that the result seems logical, lets examine each cluster to determine their centroids and the proceed to determine the starting point

Cluster 1:

In [14]:
cluster1 = monuments_final.loc[monuments_final['Cluster Labels'] == 0]
cluster1

Unnamed: 0,name,lat,lng,Cluster Labels
2,Palacio Real de Madrid,40.41794,-3.714259,0
3,Templo de Debod,40.423828,-3.716779,0
6,Santa Iglesia Catedral de Santa María la Real ...,40.415767,-3.714516,0
11,Espacio Fundación Telefónica,40.420117,-3.701649,0
14,Museo del Romanticismo,40.425802,-3.69885,0
16,Imprenta Municipal,40.413663,-3.705448,0
21,Museo Cerralbo,40.423884,-3.714706,0
23,Palacio de Gaviria,40.417139,-3.706044,0
25,Museo de Historia (Museo Municipal de Madrid),40.42569,-3.700972,0
28,Real Academia de Bellas Artes de San Fernando,40.417546,-3.700352,0


In [15]:
Centroid1 = [cluster1["lat"].mean(),cluster1["lng"].mean()]
Centroid1

[40.420137637277364, -3.7073574836183654]

Cluster 2:

In [16]:
cluster2 = monuments_final.loc[monuments_final['Cluster Labels'] == 1]
cluster2

Unnamed: 0,name,lat,lng,Cluster Labels
0,Museo Thyssen-Bornemisza,40.416359,-3.694891,1
1,Museo Nacional del Prado,40.414429,-3.692468,1
4,Museo Nacional Centro de Arte Reina Sofía (MNC...,40.408495,-3.694024,1
5,CaixaForum Madrid,40.411013,-3.693177,1
7,Círculo de Bellas Artes,40.418486,-3.696612,1
8,Museo Arqueológico Nacional (MAN),40.423259,-3.688432,1
9,Palacio de Linares - Casa de América,40.420254,-3.691827,1
13,Central de Diseño / DIMAD,40.392015,-3.696827,1
17,Casa Museo Lope de Vega,40.414326,-3.697468,1
18,Palacio de Velázquez,40.415139,-3.681975,1


In [17]:
Centroid2 = [cluster2["lat"].mean(),cluster2["lng"].mean()]
Centroid2

[40.41178499298921, -3.6918059337662052]

Cluster 3:

In [18]:
cluster3 = monuments_final.loc[monuments_final['Cluster Labels'] == 2]
cluster3

Unnamed: 0,name,lat,lng,Cluster Labels
10,Museo Sorolla,40.435248,-3.692374,2
12,Fundación Juan March,40.431018,-3.68108,2
15,Museo Real Madrid,40.453095,-3.689367,2
20,Museo de Escultura al Aire Libre de La Castellana,40.433233,-3.688734,2
24,Andén Cero,40.432334,-3.697679,2
26,Museo Lázaro Galdiano,40.436929,-3.686186,2


In [19]:
Centroid3 = [cluster3["lat"].mean(),cluster3["lng"].mean()]
Centroid3

[40.43697601205222, -3.689236749062711]

Now, to calculate the starting point we will find the centroid of the 3 previously find points. We do so to give equivalent weight to each of the clusters.

In [20]:
StartingPoint = [(Centroid1[0]+Centroid2[0]+Centroid2[0])/3,(Centroid1[1]+Centroid2[1]+Centroid2[1])/3]
StartingPoint

[40.41456920775193, -3.696989783716925]

In [25]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(monuments_final['lat'], monuments_final['lng'], monuments_final['name'], monuments_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

folium.Marker([StartingPoint[0], StartingPoint[1]], 'Starting Point').add_to(map_clusters)
       
map_clusters

## **Results:**

As we expected, our starting point has ended up being near the very center of the city. 

Though the calculation of the centroids for each cluster ("each tour") helped to mitigate the weight of the two more clouded clusters, the starting point leaves the third cluster in clear disadvantage over the other two. 

Leaving aside the clusterization process, the result is a little bit disappointing. In the conclusions, I will try to delve a little bit more on what could have been done to avoid a situation like this.

## **Conclusion:**

In retrospect, some other approaches may have yielded better results:

For sure, one of the better alternatives would have been to implement a pathing algorithm to create an optimal route for each cluster, and pick the starting point according to the coordinates of the start of the routes.

Another good alternative would have been to select (leaving aside the pathing problem) the most centric point of each cluster and calculate the centroid based on those three point.
    