<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Content-based Recommendation System for Tourists in Paris</font></h1>

## Introduction

  Personal recommender system has emerged in recent years. For tourists in a strange city, maybe they would go to somewhere popular or find their friends on first days. Since they have visited some places, we could recommend similar or different places to them. For example, if they like somewhere, they tend to explore similar places on the next day, and vice versa.

## Download and Explore Dataset

  Boroughs and respective coordinates can be found in http://opendata.paris.fr, and venues information can be found with Foursquare API. Now we have the venues info of every borough, the distance between each other and crime index. 

In [50]:
import pandas as pd
import numpy as np
import requests
from collections import defaultdict
try:
    from geopy.geocoders import Nominatim
except:
    !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
    from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium

In [2]:
CLIENT_ID = 'ACRKVGW24PGQSDX3UWRFY5Y2BDI4QMUQLBJGASNSFM21ZNDD' # your Foursquare ID
CLIENT_SECRET = 'D3XZKMVQSGQG5OSTO3JZVALMUG44NXBRNJZGCPSDI1BNMFOQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ACRKVGW24PGQSDX3UWRFY5Y2BDI4QMUQLBJGASNSFM21ZNDD
CLIENT_SECRET:D3XZKMVQSGQG5OSTO3JZVALMUG44NXBRNJZGCPSDI1BNMFOQ


In [3]:
api='https://opendata.paris.fr/api/records/1.0/search/?dataset=arrondissements&rows=100'
paris_data=pd.DataFrame(columns=['Borough','Latitude','Longitude'])
results = requests.get(api)
results = results.json()
for neighborhood in results['records']:
    name = neighborhood['fields']['l_ar']
    latitude = neighborhood['fields']['geom_x_y'][0]
    longitude = neighborhood['fields']['geom_x_y'][1]
    paris_data = paris_data.append({'Borough':name,'Latitude':latitude,'Longitude':longitude},ignore_index=True)
paris_data.head()

Unnamed: 0,Borough,Latitude,Longitude
0,3ème Ardt,48.862872,2.360001
1,19ème Ardt,48.887076,2.384821
2,14ème Ardt,48.829245,2.326542
3,10ème Ardt,48.87613,2.360728
4,12ème Ardt,48.834974,2.421325


Draw map to visualize structure of Paris boroughs

In [4]:
address = 'Paris'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [5]:
map_paris = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(paris_data['Latitude'], paris_data['Longitude'], paris_data['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

In [10]:
CLIENT_ID = 'ACRKVGW24PGQSDX3UWRFY5Y2BDI4QMUQLBJGASNSFM21ZNDD' # your Foursquare ID
CLIENT_SECRET = 'D3XZKMVQSGQG5OSTO3JZVALMUG44NXBRNJZGCPSDI1BNMFOQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ACRKVGW24PGQSDX3UWRFY5Y2BDI4QMUQLBJGASNSFM21ZNDD
CLIENT_SECRET:D3XZKMVQSGQG5OSTO3JZVALMUG44NXBRNJZGCPSDI1BNMFOQ


In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500,LIMIT=20):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
paris_venues = getNearbyVenues(names=paris_data['Borough'],
                                   latitudes=paris_data['Latitude'],
                                   longitudes=paris_data['Longitude']
                                  )

3ème Ardt
19ème Ardt
14ème Ardt
10ème Ardt
12ème Ardt
16ème Ardt
11ème Ardt
2ème Ardt
4ème Ardt
17ème Ardt
18ème Ardt
1er Ardt
5ème Ardt
7ème Ardt
20ème Ardt
8ème Ardt
9ème Ardt
13ème Ardt
15ème Ardt
6ème Ardt


In [13]:
paris_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,3ème Ardt,48.862872,2.360001,Mmmozza,48.86391,2.360591,Sandwich Place
1,3ème Ardt,48.862872,2.360001,Square du Temple,48.864475,2.360816,Park
2,3ème Ardt,48.862872,2.360001,Marché des Enfants Rouges,48.862806,2.361996,Farmers Market
3,3ème Ardt,48.862872,2.360001,Chez Alain Miam Miam,48.862781,2.362064,Sandwich Place
4,3ème Ardt,48.862872,2.360001,Chez Alain Miam Miam,48.862369,2.36195,Sandwich Place


In [14]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Basque Restaurant,Beer Bar,Beer Store,...,Thai Restaurant,Theater,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Zoo
0,3ème Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3ème Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3ème Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3ème Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3ème Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [15]:
paris_onehot.shape

(377, 119)

In [17]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Basque Restaurant,Beer Bar,Beer Store,...,Thai Restaurant,Theater,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store,Zoo
0,10ème Ardt,0.1,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,11ème Ardt,0.05,0.0,0.05,0.05,0.0,0.05,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0
2,12ème Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2
3,13ème Ardt,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,...,0.1,0.0,0.05,0.0,0.0,0.0,0.2,0.0,0.0,0.0
4,14ème Ardt,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,15ème Ardt,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,16ème Ardt,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,17ème Ardt,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
8,18ème Ardt,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.05,...,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0
9,19ème Ardt,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0


We can draw some simple conclusions about different boroughs, for example, 12ème Ardt has many natural destinations, and 13ème Ardt are mainly Asians. And we can know that Paris is a hot trip destination because there are many hotels and restaurants.

In [18]:
num_top_venues = 5

for hood in paris_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----10ème Ardt----
                venue  freq
0  African Restaurant  0.10
1   French Restaurant  0.10
2              Bistro  0.10
3                Café  0.10
4          Boxing Gym  0.05


----11ème Ardt----
                 venue  freq
0          Pastry Shop  0.10
1   African Restaurant  0.05
2  Moroccan Restaurant  0.05
3       Sandwich Place  0.05
4                Plaza  0.05


----12ème Ardt----
                 venue  freq
0                  Zoo   0.2
1  Monument / Landmark   0.2
2                 Pool   0.2
3                 Park   0.2
4          Supermarket   0.2


----13ème Ardt----
                   venue  freq
0  Vietnamese Restaurant  0.20
1       Asian Restaurant  0.15
2      French Restaurant  0.15
3     Chinese Restaurant  0.15
4        Thai Restaurant  0.10


----14ème Ardt----
                  venue  freq
0     French Restaurant  0.45
1    Italian Restaurant  0.05
2                 Hotel  0.05
3           Pizza Place  0.05
4  Fast Food Restaurant  0.05


----15ème Ard

Now let's calculate the similarities between different boroughs

In [46]:
similarity_matrix = {}
for neighborhood in paris_grouped['Neighborhood']:
    similarity_matrix.setdefault(neighborhood,{})
    for neighborhood2 in paris_grouped['Neighborhood']:
        if neighborhood2 != neighborhood:
            vector1 = paris_grouped[paris_grouped['Neighborhood']==neighborhood].iloc[:,1:].values[0]
            vector2 = paris_grouped[paris_grouped['Neighborhood']==neighborhood2].iloc[:,1:].values[0]
            similarity_matrix[neighborhood][neighborhood2] = np.dot(vector1,vector2)
for neighborhood,sims in similarity_matrix.items():
    print('neighbor hood:',neighborhood)
    print(sorted(sims.items(),key=lambda x:x[1],reverse=True))
    print('.................')
    


neighbor hood: 10ème Ardt
[('14ème Ardt', 0.05500000000000001), ('5ème Ardt', 0.035), ('9ème Ardt', 0.03250000000000001), ('4ème Ardt', 0.0325), ('7ème Ardt', 0.030000000000000002), ('18ème Ardt', 0.027500000000000007), ('2ème Ardt', 0.027500000000000004), ('8ème Ardt', 0.027500000000000004), ('19ème Ardt', 0.025000000000000005), ('17ème Ardt', 0.025), ('20ème Ardt', 0.025), ('11ème Ardt', 0.022500000000000006), ('6ème Ardt', 0.020000000000000004), ('13ème Ardt', 0.02), ('3ème Ardt', 0.015000000000000003), ('15ème Ardt', 0.012500000000000002), ('1er Ardt', 0.012500000000000002), ('16ème Ardt', 0.008333333333333333), ('12ème Ardt', 0.0)]
.................
neighbor hood: 11ème Ardt
[('18ème Ardt', 0.0325), ('6ème Ardt', 0.030000000000000006), ('7ème Ardt', 0.025000000000000005), ('14ème Ardt', 0.025), ('10ème Ardt', 0.022500000000000006), ('16ème Ardt', 0.020833333333333336), ('17ème Ardt', 0.020000000000000004), ('19ème Ardt', 0.020000000000000004), ('1er Ardt', 0.020000000000000004), (

Now let's analyze some simple tourists' prefernce data

In [47]:
prefer_data = {'Eric':{'9ème Ardt':9,'5ème Ardt':1},'Paul':{'2ème Ardt':9,'16ème Ardt':1}}

In [54]:
def recommend(name):
    ranking = defaultdict(int)
    for neighborhood,rating in prefer_data[name].items():
        for related_neighborhood, score in similarity_matrix[neighborhood].items():
            ranking[related_neighborhood] += score*rating
    print(sorted(ranking.items(),key=lambda x:x[1],reverse=True)[:3])
    
for name in prefer_data.keys():
    recommend(name)
    

[('14ème Ardt', 0.9525000000000001), ('7ème Ardt', 0.5900000000000001), ('8ème Ardt', 0.535)]
[('14ème Ardt', 0.8025000000000001), ('7ème Ardt', 0.6083333333333334), ('17ème Ardt', 0.5383333333333334)]


Now we could recommend similar neighborhoods for tourists based on the venues of every neighborhood and tourists' bahavior. Actually the system could work better with more detailed venue info and more splendid behavior data.