# Cultural trips 
We have a tourist business in the city of Madrid, we have found a good percentage of visitors are people in business trips with very little time to go out and enjoy the more interesting parts of the city. We are thinking about creating a service of short cultural trips so the businesspeople can see some of the city history and culture in their very limited time. This way, we need to find cultural places and hotels, where we are going to find our customers, that are close enough to the places they stay and we can provide a nice varied trip in a very small amount of time.


In [1]:
#Getting the libaries we are going to use first
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geopy.distance

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library
import wget

print('Libraries imported.')

Libraries imported.


## 1. Obtain the data

#### Location

We are based in the centre of the city of Madrid, so first we are going to get the geographical coordinates of the area

In [2]:
address = 'Madrid'

geolocator = Nominatim(user_agent="mad_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Madrid are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Madrid are 40.4167047, -3.7035825.


#### Define Foursquare Credentials and Version

To get the information of our cultural sites and hotels we are going to use Foursquare, so we introduce our acces information first

In [3]:
CLIENT_ID = 'HT0JPXUKRX3VMKPHIDMUOUH0R5ERT20UJ0UGNZHO1G0ZT3WK' # your Foursquare ID
CLIENT_SECRET = '2SWM32GS2354FUW4QG2ZRV2EE3ZFGOPSMAAVIGH3KHHFU4GS' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HT0JPXUKRX3VMKPHIDMUOUH0R5ERT20UJ0UGNZHO1G0ZT3WK
CLIENT_SECRET:2SWM32GS2354FUW4QG2ZRV2EE3ZFGOPSMAAVIGH3KHHFU4GS


#### Getting the venues of the area from Foursquare

We create the GET request URL. 

In [4]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=HT0JPXUKRX3VMKPHIDMUOUH0R5ERT20UJ0UGNZHO1G0ZT3WK&client_secret=2SWM32GS2354FUW4QG2ZRV2EE3ZFGOPSMAAVIGH3KHHFU4GS&v=20180604&ll=40.4167047,-3.7035825&radius=2000&limit=100'

Send the GET request and examine the resutls

In [5]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ebf9e01211536001b8784ad'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Sol',
  'headerFullLocation': 'Sol, Madrid',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 231,
  'suggestedBounds': {'ne': {'lat': 40.43470471800001,
    'lng': -3.6799843833873767},
   'sw': {'lat': 40.39870468199998, 'lng': -3.7271806166126233}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4adcda37f964a5201f3c21e3',
       'name': 'Puerta del Sol',
       'location': {'address': 'Pl. Puerta del Sol',
        'lat': 40.4170267569777,
        'lng': -3.703442763596807,
        'distance': 37,
        'postalCode': '28013',
        'cc': 'ES',
    

The most important information we need is the categories of the venues and their location to get their distances. 

In [6]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [7]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Puerta del Sol,Plaza,40.417027,-3.703443
1,Rosi La Loca,Tapas Restaurant,40.415821,-3.702955
2,La Pulpería de Victoria,Seafood Restaurant,40.416506,-3.701709
3,Club del Gourmet Corte Ingles,Gourmet Shop,40.417497,-3.704686
4,TOC Hostel,Hostel,40.417264,-3.705928


We check all the unique type of venues we have

In [8]:
nearby_venues['categories'].unique()

array(['Plaza', 'Tapas Restaurant', 'Seafood Restaurant', 'Gourmet Shop',
       'Hostel', 'Cosmetics Shop', 'Cocktail Bar', 'Electronics Store',
       'Argentinian Restaurant', 'History Museum', 'Hotel',
       'Other Nightlife', 'Restaurant', 'Theater', 'Bistro', 'Pub',
       'Spanish Restaurant', 'Ice Cream Shop', 'Indie Movie Theater',
       'Café', 'Movie Theater', 'Sushi Restaurant', 'Opera House',
       'Historic Site', 'Art Museum', 'Beer Store', 'Other Event',
       'Bookstore', 'Miscellaneous Shop', 'Food & Drink Shop',
       'American Restaurant', 'Gastropub', 'Mediterranean Restaurant',
       'Road', 'Department Store', 'Bar', 'Coffee Shop', 'Dessert Shop',
       'Market', 'Pizza Place', 'Fountain', 'Candy Store', 'Garden',
       'Palace', 'Peruvian Restaurant', 'Exhibit', 'Pie Shop',
       'Clothing Store', 'Mobile Phone Shop', 'BBQ Joint'], dtype=object)

For our new bussiness we are going to focus first in just hotels in the centre, so we get a dataset for them

In [9]:
hotels = nearby_venues[nearby_venues.categories == 'Hotel']
hotels

Unnamed: 0,name,categories,lat,lng
15,The Hat Madrid,Hotel,40.414343,-3.70712
18,Gran Vía Capital,Hotel,40.420693,-3.70646
22,Gran Meliá Palacio de los Duques *****,Hotel,40.419835,-3.709494
74,Only YOU Hotel&Lounge,Hotel,40.422227,-3.695762
99,Eric Vökel Boutique Apartments,Hotel,40.426291,-3.706963


Now we get another dataset for the cultural spaces

In [10]:
List = ['Plaza', 'Theater', 'History Museum','Performing Arts Venue','Historic Site','Monument / Landmark', 'Museum','Opera House']
cultural_sites = nearby_venues.loc[nearby_venues['categories'].isin(List)]
cultural_sites 

Unnamed: 0,name,categories,lat,lng
0,Puerta del Sol,Plaza,40.417027,-3.703443
5,Plaza de Santa Ana,Plaza,40.414631,-3.701033
10,Plaza del Callao,Plaza,40.420145,-3.705763
13,Plaza Mayor,Plaza,40.415527,-3.707506
14,Imprenta Municipal,History Museum,40.413663,-3.705448
19,Teatro de La Zarzuela,Theater,40.417184,-3.697055
31,Teatro Real de Madrid,Opera House,40.418226,-3.711064
33,Plaza de la Villa,Historic Site,40.415409,-3.710391
36,Plaza de Oriente,Plaza,40.418326,-3.712196
62,Teatro Del Barrio,Theater,40.409666,-3.699222


With all the data we need for our calculations we show the venues in a map. We can se the hotels in dark blue and the cultural sites in red

In [11]:
#We create map of Madrid using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=15)

# adding hotel markers to map
for lat, lng, name, in zip(hotels['lat'], hotels['lng'], hotels['name']):
    label1 = '{}'.format(name)
    label1 = folium.Popup(label1, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
# adding cultural venues markers to map
for lat, lng, ven_names, venues in zip (cultural_sites['lat'], cultural_sites['lng'],cultural_sites['name'], cultural_sites['categories']):
    label2 = '{}'.format(ven_names, venues)
    label2 = folium.Popup(label2, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label2,
        color='red',
        fill=True,
        fill_color='#FFB0B0',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
    
map_madrid

## 2. Calculations
Our objective now is obtaining several clusters of cultural sites using the distances to the hotels we have 

#### Distances
First we get the distances beteween the hotels and the cultural sites

In [12]:
y=[]   
for i in range(len(hotels)):
    hot_coor = hotels.iloc[i,2:]
    x=[]
    for index, row in cultural_sites.iterrows():
        cul_coor = (row.lat, row.lng)
        x.append(geopy.distance.distance(hot_coor, cul_coor).m)
    y.append(x)
Distances_df = pd.DataFrame(y, columns=cultural_sites.name)
Distances_df = Distances_df.transpose().reset_index()
Distances_df

Unnamed: 0,name,0,1,2,3,4
0,Puerta del Sol,431.5509,480.970538,600.85595,870.88781,1071.271376
1,Plaza de Santa Ana,517.667297,815.701918,921.81108,954.822297,1389.204399
2,Plaza del Callao,654.487561,84.904619,318.522294,879.704958,690.095862
3,Plaza Mayor,135.475621,580.560595,507.333451,1243.847724,1196.249062
4,Imprenta Municipal,160.745016,785.342108,766.587303,1257.039562,1408.144663
5,Teatro de La Zarzuela,910.666234,888.222946,1095.988431,570.665827,1315.191856
6,Teatro Real de Madrid,545.882969,477.275769,222.949848,1372.624287,960.895771
7,Plaza de la Villa,301.785952,675.043882,497.420077,1454.246928,1242.991444
8,Plaza de Oriente,617.414391,553.28637,284.060047,1460.49366,989.781027
9,Teatro Del Barrio,847.94652,1369.899474,1426.57555,1425.387382,1959.489567


In [13]:
cultural_sites_latlng = cultural_sites.loc[:,'lat':'lng'].reset_index()
Distances_latlng = Distances_df.join(cultural_sites_latlng.drop(['index'], axis=1))
Distances_latlng

Unnamed: 0,name,0,1,2,3,4,lat,lng
0,Puerta del Sol,431.5509,480.970538,600.85595,870.88781,1071.271376,40.417027,-3.703443
1,Plaza de Santa Ana,517.667297,815.701918,921.81108,954.822297,1389.204399,40.414631,-3.701033
2,Plaza del Callao,654.487561,84.904619,318.522294,879.704958,690.095862,40.420145,-3.705763
3,Plaza Mayor,135.475621,580.560595,507.333451,1243.847724,1196.249062,40.415527,-3.707506
4,Imprenta Municipal,160.745016,785.342108,766.587303,1257.039562,1408.144663,40.413663,-3.705448
5,Teatro de La Zarzuela,910.666234,888.222946,1095.988431,570.665827,1315.191856,40.417184,-3.697055
6,Teatro Real de Madrid,545.882969,477.275769,222.949848,1372.624287,960.895771,40.418226,-3.711064
7,Plaza de la Villa,301.785952,675.043882,497.420077,1454.246928,1242.991444,40.415409,-3.710391
8,Plaza de Oriente,617.414391,553.28637,284.060047,1460.49366,989.781027,40.418326,-3.712196
9,Teatro Del Barrio,847.94652,1369.899474,1426.57555,1425.387382,1959.489567,40.409666,-3.699222


Adding the names for better identification and the coordinates to represent them later in the map

In [16]:
hotels_names = hotels.loc[:,'name']
hotels_names
Distances_latlng.columns.values[1:6]= hotels_names
Distances = Distances_latlng
Distances

Unnamed: 0,name,The Hat Madrid,Gran Vía Capital,Gran Meliá Palacio de los Duques *****,Only YOU Hotel&Lounge,Eric Vökel Boutique Apartments,lat,lng
0,Puerta del Sol,431.5509,480.970538,600.85595,870.88781,1071.271376,40.417027,-3.703443
1,Plaza de Santa Ana,517.667297,815.701918,921.81108,954.822297,1389.204399,40.414631,-3.701033
2,Plaza del Callao,654.487561,84.904619,318.522294,879.704958,690.095862,40.420145,-3.705763
3,Plaza Mayor,135.475621,580.560595,507.333451,1243.847724,1196.249062,40.415527,-3.707506
4,Imprenta Municipal,160.745016,785.342108,766.587303,1257.039562,1408.144663,40.413663,-3.705448
5,Teatro de La Zarzuela,910.666234,888.222946,1095.988431,570.665827,1315.191856,40.417184,-3.697055
6,Teatro Real de Madrid,545.882969,477.275769,222.949848,1372.624287,960.895771,40.418226,-3.711064
7,Plaza de la Villa,301.785952,675.043882,497.420077,1454.246928,1242.991444,40.415409,-3.710391
8,Plaza de Oriente,617.414391,553.28637,284.060047,1460.49366,989.781027,40.418326,-3.712196
9,Teatro Del Barrio,847.94652,1369.899474,1426.57555,1425.387382,1959.489567,40.409666,-3.699222


#### Clusters

In [17]:
# set number of clusters
kclusters = 5
Distances_fit =  Distances_df.drop(columns=['name'])
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Distances_fit)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:15] 


array([1, 1, 4, 1, 1, 2, 4, 1, 4, 3, 0, 2, 2, 0, 3])

In [18]:
# add clustering labels
Distances.insert(0, 'Cluster Labels', kmeans.labels_)
Distances

Unnamed: 0,Cluster Labels,name,The Hat Madrid,Gran Vía Capital,Gran Meliá Palacio de los Duques *****,Only YOU Hotel&Lounge,Eric Vökel Boutique Apartments,lat,lng
0,1,Puerta del Sol,431.5509,480.970538,600.85595,870.88781,1071.271376,40.417027,-3.703443
1,1,Plaza de Santa Ana,517.667297,815.701918,921.81108,954.822297,1389.204399,40.414631,-3.701033
2,4,Plaza del Callao,654.487561,84.904619,318.522294,879.704958,690.095862,40.420145,-3.705763
3,1,Plaza Mayor,135.475621,580.560595,507.333451,1243.847724,1196.249062,40.415527,-3.707506
4,1,Imprenta Municipal,160.745016,785.342108,766.587303,1257.039562,1408.144663,40.413663,-3.705448
5,2,Teatro de La Zarzuela,910.666234,888.222946,1095.988431,570.665827,1315.191856,40.417184,-3.697055
6,4,Teatro Real de Madrid,545.882969,477.275769,222.949848,1372.624287,960.895771,40.418226,-3.711064
7,1,Plaza de la Villa,301.785952,675.043882,497.420077,1454.246928,1242.991444,40.415409,-3.710391
8,4,Plaza de Oriente,617.414391,553.28637,284.060047,1460.49366,989.781027,40.418326,-3.712196
9,3,Teatro Del Barrio,847.94652,1369.899474,1426.57555,1425.387382,1959.489567,40.409666,-3.699222


### 3. Results

We show the clusters in a map

In [19]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=15)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Distances['lat'], Distances['lng'], Distances['name'], Distances['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
# add hotel markers to map
for lat, lng, name, in zip(hotels['lat'], hotels['lng'], hotels['name']):
    label1 = '{}'.format(name)
    label1 = folium.Popup(label1, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)  
       
map_clusters