# Final assigment: Content based filter to explore similar cities to go for holiday
### Applied Data Science Capstone by IBM/Coursera


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find the optimal solution to a common problem for many people. Where am I going on my next vacation? It is a complicated decision due to the incredible number of destinations that there are today, and many people choose without even considering all the possibilities.

To solve this problem, we will create a python script capable of recommending to a user where they should go on their next vacation, based on the opinions they have of sites they visited on their previous vacation.

For this, we will create a content based filter, and we will feed it with the data taken from the Foursquare API. Using this tool, we will define the characteristics of each city, which will be the number of places it has in each category (Italian, Asian, Mediterranean restaurants, beaches, ports, mountains, parks, ...) Once we have the cities with their characteristics We will pass our user through the algorithm, and it will tell us which are the most promising cities for it.

## Data <a name="data"></a>

For the project we will get the coordinates of the cities through the geocodres API, and the characteristics of the cities from the Foursquare API. We could directly enter the latitude and longitude of the city and ask Foursquare to return the most interesting sites that are nearby, but that would give us bad results.

Why? Well, because the API will return a maximum of 100 sites, but these are organized into more than 500 categories. This would create an underfitting problem in the data. To solve this we will take two actions.

* The first will be to group all these characteristics into 177 subgroups. By doing this we will group the most similar sites in the same category. For example, we will not distinguish between pasta and pizza restaurants, but they will be simply Italian.

* The second, will be to ask the api how many places of each category there are in each city, so that it will not only return the closest or most important places, but we will be able to know how many Italian restaurants there are, how many Koreans, how many Americans, how many Mediterranean, ... Thus, asking specifically for each of the categories and taking into account that each category can return 50 sites, we will take into account thousands of sites, and not only the first 100.

Once we have all the data of all the cities, we will ask the user for cities that he has visited previously and what grade would he give them, and in this way we will extract the user profile of this, and we will recommend the cities that best suit his tastes .

### Import Libraries

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

from geopy import geocoders  

!pip install jupyter_dash
!pip install dash

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
from jupyter_dash import JupyterDash
import plotly.express as px

import requests

Collecting folium==0.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/07/37/456fb3699ed23caa0011f8b90d9cad94445eddc656b601e6268090de35f5/folium-0.5.0.tar.gz (79kB)
[K     |████████████████████████████████| 81kB 3.6MB/s 
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25l[?25hdone
  Created wheel for folium: filename=folium-0.5.0-cp36-none-any.whl size=76240 sha256=156072c52898e01bb17ea57a34ce40dbd2b14a6eb92aa2986bd41fa85bedb3b3
  Stored in directory: /root/.cache/pip/wheels/f8/98/ff/954791afc47740d554f0d9e5885fa09dd60c2265d42578e665
Successfully built folium
[31mERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.5.0 which is incompatible.[0m
Installing collected packages: folium
  Found existing installation: folium 0.8.3
    Uninstalling folium-0.8.3:
      Successfully uninstalled folium-0.8.3
Successfully installed folium-0.5.0
Folium installed
Libraries imported.
Collecting jupyter_dash
 

### Cities Candidates

We are going to define a total of 100 candidate cities. We could define more cities, but with that number of cities it is enough to demonstrate the worth of the project.

We introduce all the names in a vector, and then we will create a dataframe with the City, Latitude and Longitude columns, and we will fill it with the values of the coordinates that we will get from the geocoder API

In [3]:
List_cities=["Puerto del Rosario, Canary Islands","Cairo","Kusadasi, Turkey","Chamonix","Beijing","Cannes","Amsterdam","Bodrum","Iguazu National Park, Argentina","Courchevel","Berlin","Aberdare","Amritsar","Edimburgh","New York","Orlando", "Sydney","London","Paris","Venice","Manhattan","Cape Town","Las Vegas","Rome","Rio de Janeiro","Maldives","Hawaii","South Island, New Zealand", "Grand Canyon", "San Diego","Niagara Falls","San Francisco","Los Angeles","Dubai","Auckland","Singapore","Seychelles", "Bali","Durban","Bangkok","Iceland","Whitsunday Islands National Park","Cairns","Costa del Sol","Antigua","Melbourne","Mallorca","Lake District","Barbados","Bahamas","Abu Simbel","Bora Bora","Sharm el Sheikh", "Madrid","Algarve","Zermatt","Victoria Falls","Marbella","Masai Mara, Kenya","Chichen Itza","Disney World","Florence","Puerto Banus","Toronto","Taj Mahal","Great Wall of china", "Menorca","Monaco","Luxor","Hong Kong","Banff National Park","Sorrento","Key West","Koh Samui, Thailand","Cancun","Nice","Machu Picchu","Yosemite","Oahu","Florida Keys","Guam","Dublin","Vancouver","Ayers Rock","La Digue Island","Cayman Islands","Naples","St. Pete Beach, Florida", "Barcelona", "Ibiza","Adelaide","Airlie Beach Queensland",'Benidorm',"Buenos Aires","Prague","Cuba","Paphos","Valley of the kings","Galapagos Islands","Isle of Man"]
print(len(List_cities))

100


#### Create a Dataframe with each city name, latitude and longitude
Next we use the geocoders api to get the coordinates of each city, and we store them in a new and diferent DataFrame

In [6]:
gn = geocoders.GeoNames(username="sergibago")   
data=[]
for city in List_cities:
  try:
    loc=gn.geocode(city, timeout=None)
  except:
    loc=gn.geocode(city, timeout=None)
  print(loc)
  latitude=loc.latitude
  longitude=loc.longitude
  data.append([city,latitude,longitude])

Cities_df=pd.DataFrame(data,columns=["Name","Latitude","Longitude"])

Puerto del Rosario, 53, ES
Cairo, 11, EG
Kusadasi, 09, TR
Chamonix, 84, FR
Beijing, 22, CN
Cannes, 93, FR
Amsterdam, 07, NL
Bodrum, 48, TR
Iguazú National Park, 14, AR
Courchevel, 84, FR
Berlin, 16, DE
Aberdare, WLS, GB
Amritsar, 23, IN
Edinburgh, SCT, GB
New York, NY, US
Orlando, FL, US
Sydney, 02, AU
London, ENG, GB
Paris, 11, FR
Venice, 20, IT
Manhattan, NY, US
Cape Town, 11, ZA
Las Vegas, NV, US
Rome, 07, IT
Rio de Janeiro, 21, BR
Maldives, 00, MV
Hawaii, HI, US
South Island, 00, NZ
Grand Cess Canyon
San Diego, CA, US
Niagara Falls, 08, CA
San Francisco, CA, US
Los Angeles, CA, US
Dubai, 03, AE
Auckland, E7, NZ
Singapore, 01, SG
Seychelles, 00, SC
Bali, 02, ID
Durban, 02, ZA
Bangkok, 40, TH
Iceland, 00, IS
Whitsunday Islands National Park, 04, AU
Cairns, 04, AU
Málaga Airport, 51, ES
Antigua Guatemala, 16, GT
Melbourne, 07, AU
Palma, 07, ES
Koroba-Lake Kopiago, 21, PG
Barbados, 00, BB
Bahamas, 00, BS
Abu Simbel Airport, 16, EG
Bora Bora, 02, PF
Sharm el Sheikh, 26, EG
Madrid, 29, E

Now we sort the dataframe from city

In [33]:
Cities_df=Cities_df.sort_values(by=["Name"])
Cities_df=Cities_df.reset_index(drop=True)
display(Cities_df)

Unnamed: 0,Name,Latitude,Longitude
0,Aberdare,51.71438,-3.44918
1,Abu Simbel,22.37571,31.61170
2,Adelaide,-34.92866,138.59863
3,Airlie Beach Queensland,-20.26751,148.71471
4,Algarve,37.08367,-8.24902
...,...,...,...
95,Venice,45.43713,12.33265
96,Victoria Falls,-17.93285,25.83066
97,Whitsunday Islands National Park,-20.24872,148.98025
98,Yosemite,36.77606,-119.71903


#### Plot cities

Next we use folium library to plot all the possible candidates. Lets see them.

In [7]:
map_cities = folium.Map(location=[0,0], zoom_start=3)
for lat, lon,Name in zip(Cities_df["Latitude"],Cities_df["Longitude"],Cities_df["Name"]):
    folium.Marker([lat,lon], popup=Name).add_to(map_cities)
map_cities

### Categories Candidates

As stated before, we will use a custom list of categories, wich we have handly selected, so we can group all teh subcetagories that doesn't really add new information, and get a reduced seat of features, so our model can fit better with less data. We will use a total of 100 categories. 

In [8]:
List_categories=["Aquarium","Arcade & Bowling","Casino","Cinema","Night club","Disco","Music","Art","Stadium","Theme Park","Water Park","Zoo","American Restaurant","African Restaurant","Italian Restaurant","Asian Restaurant","Bistro","Buffet","Cafeteria","Creperie","Bodega","Fast Food Restaurant","French Restaurant","Indian Resturant","Irish Pub","Italian restaurant","Latin American Restaurant","Mediterranean Restaurant","Mexican Restaurant","Seafood Restaurant","Steakhouse","Turkish Restaurant","Nightlife Spot","Bar","Beach Bar","Cocktail Bar","Karaoke","Pub","Sport bar","Brewery","Lounge","Nightclub","Golf","Bay","Beach","Surf spot","Botanical Garden","Bridge","Canal","Castle","Dive Spot","Field","Farm","Fishing spot","Forest","Garden","Harbour","Hill","Island","Lake","Lighthouse","Mountain","National Park","Park","Pedestrian Aera","Plaza","River","Ski Area","Stables","Vineyard","Volcano","Waterfall","Windmill","Government building","Library","Observatory","Office","Social Club","Spiritual Center","Antique shop","Arts store","Clothing store","Gift shop","Massage studio","Music store","Outlet","Airport","Bike rental","Boat rental","Ferry or Boat","Bus","Hotel","Resort","Motel","Hostel","Vacation Rental","Bed & Breakfast","Metro station","Pier","RV park"]
print(len(List_categories))

100


#### Foursquare API settings
We will set the foursquare api setting. Those are not showed. you should introduce yours.  

In [38]:
CLIENT_ID = # your Foursquare ID
CLIENT_SECRET = # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
radius=5000

#### Compute caracteristiscs of each city
Now we will iterate throught each city and compute the number of places of each category that it has. 

First we create a Dataframe with the atrributes as columns and the cities names as index. We also fit it with all 0.

In [10]:
Full_cities_df=pd.DataFrame(columns=List_categories,index=List_cities)
Full_cities_df=Full_cities_df.fillna(0).sort_index()
display(Full_cities_df)

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Aberdare,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Abu Simbel,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adelaide,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Airlie Beach Queensland,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Algarve,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Venice,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Victoria Falls,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Whitsunday Islands National Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Yosemite,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next we create a function that will call the API for every category of every city. Note that if we have 100 cities and 100 categories it will make 10.000 calls. 

Making 10,000 calls takes a long time, and can return several errors (timeout, quota exceeded, ...) That is why we will save the results to each city that we obtain. Every time we finish a city we will export a file to google drive with all the data collected to date. If all goes well, the last exported file will have all the data. If something goes wrong and there is an error, we will only lose the data of the current city, and not all the previous ones. This is important, due to the limit of calls to the Foursquare API that we can make.

In [11]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [12]:

import time

def Get_all_cities_values(Cities_df_lat_lon,List_categories):
  actual_export=1
  global radius, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN,VERSION,LIMIT,Full_cities_df,Full_Data_Locations_df
  AllData=[]
  id_num=0
  Client_ID=CLIENT_ID[id_num]
  Client_SECRET=CLIENT_SECRET[id_num]
  print("Changed client id to: ",str(id_num), " token: ", str(Client_ID))
  for name, lat, lon in zip(Cities_df_lat_lon["Name"],Cities_df_lat_lon["Latitude"],Cities_df_lat_lon["Longitude"]):
    print(name)
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(Client_ID, Client_SECRET, lat, lon, VERSION, radius, LIMIT)
    result=requests.get(url).json() 
    num=len(result['response']['venues'])
    if(num>90):
      for search_query in List_categories:
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(Client_ID, Client_SECRET, lat, lon, VERSION, search_query, radius, LIMIT)
        result=requests.get(url).json()
        exported=False
        print(result)
        time.sleep(1)
        if(result['meta']['code']==403 or result['meta']['code']==429):
          while(result['meta']['code']==403 or result['meta']['code']==429):      
            if(exported==False):
              print("Export: ",str(actual_export))
              Full_Data_Locations_df=pd.DataFrame(AllData,columns=["City","Category","Name","Latitude","Longitude"])
            
              Path="/content/gdrive/MyDrive/Coursera_IBM_final_Capstone"+str(actual_export)+".csv"
              print(Path)
              Full_Data_Locations_df.to_csv(Path, index = True)
              AllData=[]

              exported=True
              actual_export+=1;
            time.sleep(30)
            if(id_num==9):
              id_num=0
            else:
              id_num+=1
            Client_ID=CLIENT_ID[id_num]
            Client_SECRET=CLIENT_SECRET[id_num]
            print("Changed client id to: ",str(id_num), " token: ", str(Client_ID))
            url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(Client_ID, Client_SECRET, lat, lon, VERSION, search_query, radius, LIMIT)
          
            result=requests.get(url).json()
            print(result)

        exported=False
        results=result['response']['venues']
        print(results)
        count=0
        for i in results:
          Place_Name= i["name"]
          Lat= i["location"]["lat"]
          Lon= i["location"]["lng"]
          AllData.append([name,search_query,Place_Name,Lat,Lon])
          count+=1
        num_results = count
        Full_cities_df[search_query][name]=num_results
    
      Full_Data_Locations_df=pd.DataFrame(AllData,columns=["City","Category","Name","Latitude","Longitude"])
      Path="/content/gdrive/MyDrive/Coursera_IBM_final_Capstone"+str(actual_export)+".csv"
      print(Path)
      Full_Data_Locations_df.to_csv(Path, index = True)
      AllData=[]
      exported=True
      actual_export+=1;
  Full_Data_Locations_df=pd.DataFrame(AllData,columns=["City","Category","Name","Latitude","Longitude"])

We will use this cell both to export and to load the data. We will export them for the first time. Then, we will use the cell to load them, and in this way we will not have to call the API every time we want to use the program.

In [189]:
Path="/content/gdrive/MyDrive/Places IBM Capstone.csv"
#Full_cities_df.to_csv(Path, index = False)
Full_Data_Locations_df=pd.read_csv(Path)

## Methodology <a name="methodology"></a>

Now we have all the data for all the cities. In this project we will base ourselves on these data to find out which are the most similar cities to each other, and which have the attributes that the client likes the most, to recommend the best possible vacations.

To do this, we will look at the percentage of sites in each category that each city has. That is, we will look for each city what percentage of beach it has, what percentage of mountains, which Italian restaurants, ...

It is important to evaluate cities by percentages of each category, and not by the number of sites they have in each category, since otherwise large cities would always win. That is, if the client was a fan of Italy, and of the beach, they would surely like things like Italian restaurants, art, beaches, beach bars and music stores. However, a city such as Barcelona could have many more Italian restaurants, art galleries, music stores and beaches, and the program would recommend this city rather than a small city in Italy, which is what the user would prefer.
When using the percentages, although Barcelona still has many more places of those than for example Florence, Florence will be recommended much earlier, since the percentages of these things will be much higher than in Barcelona, ​​where there are many Italian restaurants, but many more Mediterranean. , Catalan or Spanish, and therefore Italian restaurants are overshadowed.

## Analysis <a name="analysis"></a>

First we look at the loaded data. We see that we have a total of +140.000 rows (that is, +140.000 sites) and each row has the attributes of city, category,  name, latitude and longitude. We will use the name of the place, latitude and longitude at the end of the program, since when we make the recommendation of cities to the user we will show them on a map the sites that we think may interest him/her the most, so that they do not miss anything on their vacations!

In [190]:
display(Full_Data_Locations_df)

Unnamed: 0,City,Category,Name,Latitude,Longitude
0,Aberdare,Cinema,Vue,51.738475,-3.377418
1,Aberdare,Night club,Aberdare Constitutional Club,51.713557,-3.447650
2,Aberdare,Night club,Aberdare Rugby Club,51.709252,-3.432933
3,Aberdare,Night club,Aberdare golf club,51.716892,-3.429162
4,Aberdare,Night club,Cwmdare Club,51.715940,-3.469329
...,...,...,...,...,...
150771,Zermatt,Metro station,Taxi Metro,46.068560,7.776482
150772,Zermatt,Metro station,Riffelalp Station,46.004908,7.753993
150773,Zermatt,Metro station,Bahnhof Zermatt,46.023864,7.748048
150774,Zermatt,Metro station,Green Motion Charging Station,46.067318,7.775392


The next step is to place each of the sites in its corresponding city and category.

In [139]:
for index,row in Full_Data_Locations_df.iterrows():
  Full_cities_df.loc[row['City']][row['Category']]+=1
display(Full_cities_df.head(15))

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Aberdare,0,0,0,4,80,0,0,4,0,34,0,0,12,12,12,12,4,0,0,0,0,0,12,4,12,3,0,12,12,12,0,16,4,64,64,64,0,12,80,4,...,4,104,72,68,0,0,4,0,0,0,0,4,0,32,20,0,48,80,4,48,36,36,48,4,32,0,0,16,0,0,56,24,0,0,0,0,4,52,0,0
Abu Simbel,0,0,0,0,0,0,0,0,0,0,0,0,4,4,4,4,0,0,0,0,0,0,4,0,0,1,0,4,4,4,0,4,0,4,4,4,0,0,4,0,...,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,4,0,0,0,0,12,0,0,0,0,0,0,0,0
Adelaide,4,140,40,96,200,92,132,200,36,100,0,28,200,200,200,200,124,20,8,20,0,8,200,148,100,50,0,200,200,200,4,200,68,200,200,200,44,96,200,8,...,4,12,200,200,0,200,120,200,4,16,0,0,12,200,200,8,200,200,124,200,200,200,200,200,200,16,72,156,108,0,200,200,12,64,24,68,56,200,8,0
Airlie Beach Queensland,0,0,0,0,44,4,4,0,0,16,0,0,44,44,44,44,4,0,0,0,0,0,44,0,8,11,0,48,44,68,0,44,0,124,200,124,0,8,124,0,...,0,0,40,32,0,4,0,20,0,0,0,0,0,0,4,0,8,28,0,16,12,16,20,20,16,0,8,12,52,0,12,40,92,12,4,12,24,8,0,0
Algarve,0,0,0,4,200,16,4,52,0,28,0,8,200,200,200,200,24,8,4,0,12,0,200,36,48,50,0,200,200,200,24,200,24,200,200,200,8,40,200,0,...,4,0,56,56,0,4,4,16,0,0,0,0,12,0,0,0,16,200,32,128,36,32,132,12,36,0,8,28,28,0,20,200,84,4,32,20,8,12,4,0
Amritsar,0,0,0,20,24,0,0,16,0,20,0,0,72,68,68,68,4,0,0,0,0,20,80,16,12,17,0,68,68,68,0,68,0,60,60,60,0,12,72,0,...,0,0,44,40,0,48,4,8,0,0,0,0,0,4,0,0,72,28,32,160,72,68,168,12,68,4,4,4,0,0,8,200,56,4,28,0,4,64,0,0
Amsterdam,20,28,88,136,200,132,200,200,36,100,0,104,200,200,200,200,156,12,28,20,12,28,200,200,200,50,8,200,200,200,200,200,200,200,200,200,40,200,200,48,...,24,16,200,200,4,200,136,200,4,16,0,0,16,172,100,12,200,200,200,200,200,200,200,200,200,108,64,200,200,12,200,200,32,8,172,144,200,200,152,0
Antigua,0,0,0,0,28,0,0,88,4,4,0,4,200,200,200,200,24,0,20,0,4,0,200,0,48,50,0,200,200,200,0,200,0,200,200,200,0,44,200,0,...,0,0,20,8,0,32,0,4,0,0,4,0,0,4,0,0,12,52,12,28,20,16,28,28,16,4,0,8,8,0,16,200,8,0,100,8,4,4,4,0
Auckland,8,80,20,64,200,64,124,200,24,100,0,44,200,200,200,200,92,60,12,8,4,4,200,200,124,50,0,200,200,200,12,200,48,200,200,200,68,104,200,24,...,12,36,200,200,4,112,16,140,12,16,12,0,12,200,164,4,200,200,140,200,200,200,200,200,200,40,28,188,200,8,200,200,28,80,60,124,80,200,108,0
Ayers Rock,0,0,0,0,0,0,0,8,0,6,0,0,4,4,4,4,0,0,0,0,0,0,4,0,4,1,0,4,4,4,0,4,0,20,20,20,0,4,20,0,...,0,0,12,12,0,0,0,4,0,0,0,0,0,4,0,0,8,0,0,4,4,0,4,4,0,0,12,8,8,0,4,28,36,0,0,8,0,8,0,0


Looks good. 

The next step is to see if there are any cities that should be discarded because it has few sites. This may be because Foursquare has little data about a site, because we put the coordinates wrong (for example, we put the name of a country or region instead of a city) or that the place simply has few sites. An example of a misnomer is for example the place "Hawaii". Hawaii is well known for being one of the most famous states of USA, and it is highly touristic. However, when looking for the coordinates in geocoders, putting the name of Hawaii instead of the name of its capital (Honolulu) returned us a location far from any city.

In [202]:
Lat=Cities_df[Cities_df['Name']=='Hawaii']['Latitude'].values[0]
Lon=Cities_df[Cities_df['Name']=='Hawaii']['Longitude'].values[0]
hawaii_map = folium.Map(location=[Lat+0.5,Lon-0.5], zoom_start=10)
folium.Marker([Lat,Lon], popup="Hawaii").add_to(hawaii_map)
folium.Marker([21.300150, -157.846462], popup="Honolulu").add_to(hawaii_map)
folium.Circle([Lat, Lon], radius=5000, color='red', fill=False).add_to(hawaii_map)
hawaii_map

Keep in mind that we set a 5km radius limit (the red circumference), and that is why Foursquare did not give us a good number of places that allow us to determine what type of destination it is. We could have put a bigger radius, but that would make the number of sites returned in big cities too large.

We then look at all cities that have less than 100 places (our minimum threshold) and eliminate them.



In [141]:
drop_index=[]
for e in range (len(Full_cities_df.index)):
  if(sum(Full_cities_df.iloc[e,:])<100):
    drop_index.append(Full_cities_df.index[e])
Citites_to_Drop=Full_cities_df[Full_cities_df.index.isin(drop_index)]
display(Citites_to_Drop)

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Abu Simbel,0,0,0,0,0,0,0,0,0,0,0,0,4,4,4,4,0,0,0,0,0,0,4,0,0,1,0,4,4,4,0,4,0,4,4,4,0,0,4,0,...,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,4,0,0,0,0,12,0,0,0,0,0,0,0,0
Cayman Islands,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Cuba,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0
Galapagos Islands,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Grand Canyon,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Hawaii,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Iceland,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,4,0,0,0,0,0,0,4,0,0,0,0
"Iguazu National Park, Argentina",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Lake District,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Manhattan,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


We are going to check the cities that we have left. We have 78 cities left.

In [142]:
Cities_dropped_df=Full_cities_df.drop(drop_index,axis=0)
display(Cities_dropped_df)
print(Cities_dropped_df.index)

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Aberdare,0,0,0,4,80,0,0,4,0,34,0,0,12,12,12,12,4,0,0,0,0,0,12,4,12,3,0,12,12,12,0,16,4,64,64,64,0,12,80,4,...,4,104,72,68,0,0,4,0,0,0,0,4,0,32,20,0,48,80,4,48,36,36,48,4,32,0,0,16,0,0,56,24,0,0,0,0,4,52,0,0
Adelaide,4,140,40,96,200,92,132,200,36,100,0,28,200,200,200,200,124,20,8,20,0,8,200,148,100,50,0,200,200,200,4,200,68,200,200,200,44,96,200,8,...,4,12,200,200,0,200,120,200,4,16,0,0,12,200,200,8,200,200,124,200,200,200,200,200,200,16,72,156,108,0,200,200,12,64,24,68,56,200,8,0
Airlie Beach Queensland,0,0,0,0,44,4,4,0,0,16,0,0,44,44,44,44,4,0,0,0,0,0,44,0,8,11,0,48,44,68,0,44,0,124,200,124,0,8,124,0,...,0,0,40,32,0,4,0,20,0,0,0,0,0,0,4,0,8,28,0,16,12,16,20,20,16,0,8,12,52,0,12,40,92,12,4,12,24,8,0,0
Algarve,0,0,0,4,200,16,4,52,0,28,0,8,200,200,200,200,24,8,4,0,12,0,200,36,48,50,0,200,200,200,24,200,24,200,200,200,8,40,200,0,...,4,0,56,56,0,4,4,16,0,0,0,0,12,0,0,0,16,200,32,128,36,32,132,12,36,0,8,28,28,0,20,200,84,4,32,20,8,12,4,0
Amritsar,0,0,0,20,24,0,0,16,0,20,0,0,72,68,68,68,4,0,0,0,0,20,80,16,12,17,0,68,68,68,0,68,0,60,60,60,0,12,72,0,...,0,0,44,40,0,48,4,8,0,0,0,0,0,4,0,0,72,28,32,160,72,68,168,12,68,4,4,4,0,0,8,200,56,4,28,0,4,64,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Taj Mahal,0,0,0,12,8,0,0,24,0,16,0,0,200,200,200,200,4,12,0,0,0,12,200,28,4,50,4,200,200,200,0,200,0,52,52,52,0,4,56,0,...,8,0,44,32,0,32,8,4,0,0,0,0,0,0,0,4,40,8,56,32,68,60,36,0,60,0,8,4,0,0,12,200,12,0,40,4,12,32,0,0
Valley of the kings,8,12,8,0,200,60,108,200,76,100,0,36,200,200,200,200,68,32,104,0,4,20,200,164,96,50,0,200,200,200,16,200,120,200,200,200,24,92,200,8,...,8,200,200,200,4,200,100,200,0,4,0,8,0,200,52,0,200,200,200,200,200,200,200,200,200,28,36,108,64,0,200,200,16,20,4,60,20,200,8,0
Vancouver,48,12,44,72,200,120,200,200,40,100,0,36,200,200,200,200,200,20,40,16,4,36,200,168,200,50,4,200,200,200,64,200,200,200,200,200,36,200,200,48,...,16,180,200,200,12,200,140,200,28,4,4,12,0,200,200,8,200,200,200,200,200,200,200,200,200,84,28,200,200,0,200,200,48,16,32,200,108,200,84,0
Victoria Falls,0,0,0,0,8,0,0,0,0,8,0,0,24,28,24,24,0,0,0,0,0,0,24,0,8,6,0,24,24,24,0,24,0,28,28,28,0,8,28,4,...,0,0,16,16,0,0,16,0,0,0,0,0,0,0,4,0,4,8,12,0,0,0,0,0,0,0,0,0,12,0,4,72,8,0,0,0,4,4,0,0


Index(['Aberdare', 'Adelaide', 'Airlie Beach Queensland', 'Algarve',
       'Amritsar', 'Amsterdam', 'Antigua', 'Auckland', 'Ayers Rock', 'Bahamas',
       'Bali', 'Banff National Park', 'Bangkok', 'Barbados', 'Barcelona',
       'Beijing', 'Benidorm', 'Berlin', 'Bodrum', 'Bora Bora', 'Buenos Aires',
       'Cairns', 'Cairo', 'Cancun', 'Cannes', 'Cape Town', 'Chamonix',
       'Chichen Itza', 'Costa del Sol', 'Courchevel', 'Disney World', 'Dubai',
       'Dublin', 'Durban', 'Edimburgh', 'Florence', 'Florida Keys',
       'Great Wall of china', 'Guam', 'Hong Kong', 'Ibiza', 'Isle of Man',
       'Key West', 'Koh Samui, Thailand', 'Kusadasi, Turkey',
       'La Digue Island', 'Las Vegas', 'London', 'Los Angeles', 'Luxor',
       'Machu Picchu', 'Madrid', 'Maldives', 'Mallorca', 'Marbella',
       'Melbourne', 'Monaco', 'Naples', 'New York', 'Nice', 'Orlando',
       'Paphos', 'Paris', 'Prague', 'Puerto del Rosario, Canary Islands',
       'Rio de Janeiro', 'San Diego', 'San Francisco', '

### Data processing

Let's start by normalizing our data set.

First we create a function to normalize all the data.

To normalize the data, what we will do is find the percentage of sites in each category that there are. For example, we will look at the total number of places found, what percentage are beaches, which mountains, which restaurants, ... If instead of normalizing doing the percentage we would normalize making the maximum number is 1 and the rest the proportional part ( for example, if there are 50 beaches and 25 mountains, the number in the beach category is 1 and 0.5 in the mountains category), because the algorithm seeks to maximize what the user prefers, the algorithm will determine what the user likes the most, and will look for the city that has the most in that category. If we did not do the percentages, the big cities would always win, because they are the ones that have the most things, when in truth what we want is not to find a city with many things, but a city of the same style as the ones that the user likes. Then, making the percentage, if the user wants a city that is 70% beach, 5% Italian restaurants, 10% resorts and 15% French restaurants, the algorithm will search for a city similar to this in percentages, and not for example, a city like Barcelona, ​​which may have many more beaches, Italian and French restaurants, and hotels and resorts, but it will not look anything like the city entered by the user.

In [143]:
def Apply_normalization(df):
  for e in range(len(df.index)):
    if(max(df.iloc[e,:])>0):
      df.iloc[e,:]=(df.iloc[e,:]/sum(df.iloc[e,:]))
  return df

Now we call the function, and then display the results to check if the normalization went well

In [172]:
Cities_grouped_nor=Cities_dropped_df.copy()
Cities_grouped_nor=Apply_normalization(Cities_grouped_nor)
display(Cities_grouped_nor.head(25))

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Aberdare,0.0,0.0,0.0,0.00273,0.054608,0.0,0.0,0.00273,0.0,0.023208,0.0,0.0,0.008191,0.008191,0.008191,0.008191,0.00273,0.0,0.0,0.0,0.0,0.0,0.008191,0.00273,0.008191,0.002048,0.0,0.008191,0.008191,0.008191,0.0,0.010922,0.00273,0.043686,0.043686,0.043686,0.0,0.008191,0.054608,0.00273,...,0.00273,0.07099,0.049147,0.046416,0.0,0.0,0.00273,0.0,0.0,0.0,0.0,0.00273,0.0,0.021843,0.013652,0.0,0.032765,0.054608,0.00273,0.032765,0.024573,0.024573,0.032765,0.00273,0.021843,0.0,0.0,0.010922,0.0,0.0,0.038225,0.016382,0.0,0.0,0.0,0.0,0.00273,0.035495,0.0,0.0
Adelaide,0.00039,0.013643,0.003898,0.009355,0.019489,0.008965,0.012863,0.019489,0.003508,0.009745,0.0,0.002729,0.019489,0.019489,0.019489,0.019489,0.012083,0.001949,0.00078,0.001949,0.0,0.00078,0.019489,0.014422,0.009745,0.004872,0.0,0.019489,0.019489,0.019489,0.00039,0.019489,0.006626,0.019489,0.019489,0.019489,0.004288,0.009355,0.019489,0.00078,...,0.00039,0.001169,0.019489,0.019489,0.0,0.019489,0.011694,0.019489,0.00039,0.001559,0.0,0.0,0.001169,0.019489,0.019489,0.00078,0.019489,0.019489,0.012083,0.019489,0.019489,0.019489,0.019489,0.019489,0.019489,0.001559,0.007016,0.015202,0.010524,0.0,0.019489,0.019489,0.001169,0.006237,0.002339,0.006626,0.005457,0.019489,0.00078,0.0
Airlie Beach Queensland,0.0,0.0,0.0,0.0,0.02317,0.002106,0.002106,0.0,0.0,0.008425,0.0,0.0,0.02317,0.02317,0.02317,0.02317,0.002106,0.0,0.0,0.0,0.0,0.0,0.02317,0.0,0.004213,0.005793,0.0,0.025276,0.02317,0.035808,0.0,0.02317,0.0,0.065298,0.105319,0.065298,0.0,0.004213,0.065298,0.0,...,0.0,0.0,0.021064,0.016851,0.0,0.002106,0.0,0.010532,0.0,0.0,0.0,0.0,0.0,0.0,0.002106,0.0,0.004213,0.014745,0.0,0.008425,0.006319,0.008425,0.010532,0.010532,0.008425,0.0,0.004213,0.006319,0.027383,0.0,0.006319,0.021064,0.048447,0.006319,0.002106,0.006319,0.012638,0.004213,0.0,0.0
Algarve,0.0,0.0,0.0,0.000816,0.0408,0.003264,0.000816,0.010608,0.0,0.005712,0.0,0.001632,0.0408,0.0408,0.0408,0.0408,0.004896,0.001632,0.000816,0.0,0.002448,0.0,0.0408,0.007344,0.009792,0.0102,0.0,0.0408,0.0408,0.0408,0.004896,0.0408,0.004896,0.0408,0.0408,0.0408,0.001632,0.00816,0.0408,0.0,...,0.000816,0.0,0.011424,0.011424,0.0,0.000816,0.000816,0.003264,0.0,0.0,0.0,0.0,0.002448,0.0,0.0,0.0,0.003264,0.0408,0.006528,0.026112,0.007344,0.006528,0.026928,0.002448,0.007344,0.0,0.001632,0.005712,0.005712,0.0,0.00408,0.0408,0.017136,0.000816,0.006528,0.00408,0.001632,0.002448,0.000816,0.0
Amritsar,0.0,0.0,0.0,0.008617,0.01034,0.0,0.0,0.006894,0.0,0.008617,0.0,0.0,0.031021,0.029298,0.029298,0.029298,0.001723,0.0,0.0,0.0,0.0,0.008617,0.034468,0.006894,0.00517,0.007324,0.0,0.029298,0.029298,0.029298,0.0,0.029298,0.0,0.025851,0.025851,0.025851,0.0,0.00517,0.031021,0.0,...,0.0,0.0,0.018957,0.017234,0.0,0.020681,0.001723,0.003447,0.0,0.0,0.0,0.0,0.0,0.001723,0.0,0.0,0.031021,0.012064,0.013787,0.068936,0.031021,0.029298,0.072383,0.00517,0.029298,0.001723,0.001723,0.001723,0.0,0.0,0.003447,0.08617,0.024128,0.001723,0.012064,0.0,0.001723,0.027574,0.0,0.0
Amsterdam,0.001605,0.002247,0.007061,0.010913,0.016049,0.010592,0.016049,0.016049,0.002889,0.008024,0.0,0.008345,0.016049,0.016049,0.016049,0.016049,0.012518,0.000963,0.002247,0.001605,0.000963,0.002247,0.016049,0.016049,0.016049,0.004012,0.000642,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.00321,0.016049,0.016049,0.003852,...,0.001926,0.001284,0.016049,0.016049,0.000321,0.016049,0.010913,0.016049,0.000321,0.001284,0.0,0.0,0.001284,0.013802,0.008024,0.000963,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.008666,0.005136,0.016049,0.016049,0.000963,0.016049,0.016049,0.002568,0.000642,0.013802,0.011555,0.016049,0.016049,0.012197,0.0
Antigua,0.0,0.0,0.0,0.0,0.007646,0.0,0.0,0.024031,0.001092,0.001092,0.0,0.001092,0.054615,0.054615,0.054615,0.054615,0.006554,0.0,0.005461,0.0,0.001092,0.0,0.054615,0.0,0.013108,0.013654,0.0,0.054615,0.054615,0.054615,0.0,0.054615,0.0,0.054615,0.054615,0.054615,0.0,0.012015,0.054615,0.0,...,0.0,0.0,0.005461,0.002185,0.0,0.008738,0.0,0.001092,0.0,0.0,0.001092,0.0,0.0,0.001092,0.0,0.0,0.003277,0.0142,0.003277,0.007646,0.005461,0.004369,0.007646,0.007646,0.004369,0.001092,0.0,0.002185,0.002185,0.0,0.004369,0.054615,0.002185,0.0,0.027307,0.002185,0.001092,0.001092,0.001092,0.0
Auckland,0.000748,0.007478,0.00187,0.005982,0.018695,0.005982,0.011591,0.018695,0.002243,0.009348,0.0,0.004113,0.018695,0.018695,0.018695,0.018695,0.0086,0.005609,0.001122,0.000748,0.000374,0.000374,0.018695,0.018695,0.011591,0.004674,0.0,0.018695,0.018695,0.018695,0.001122,0.018695,0.004487,0.018695,0.018695,0.018695,0.006356,0.009721,0.018695,0.002243,...,0.001122,0.003365,0.018695,0.018695,0.000374,0.010469,0.001496,0.013087,0.001122,0.001496,0.001122,0.0,0.001122,0.018695,0.01533,0.000374,0.018695,0.018695,0.013087,0.018695,0.018695,0.018695,0.018695,0.018695,0.018695,0.003739,0.002617,0.017573,0.018695,0.000748,0.018695,0.018695,0.002617,0.007478,0.005609,0.011591,0.007478,0.018695,0.010095,0.0
Ayers Rock,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023599,0.0,0.017699,0.0,0.0,0.011799,0.011799,0.011799,0.011799,0.0,0.0,0.0,0.0,0.0,0.0,0.011799,0.0,0.011799,0.00295,0.0,0.011799,0.011799,0.011799,0.0,0.011799,0.0,0.058997,0.058997,0.058997,0.0,0.011799,0.058997,0.0,...,0.0,0.0,0.035398,0.035398,0.0,0.0,0.0,0.011799,0.0,0.0,0.0,0.0,0.0,0.011799,0.0,0.0,0.023599,0.0,0.0,0.011799,0.011799,0.0,0.011799,0.011799,0.0,0.0,0.035398,0.023599,0.023599,0.0,0.011799,0.082596,0.106195,0.0,0.0,0.023599,0.0,0.023599,0.0,0.0
Bahamas,0.0,0.0,0.003306,0.000551,0.024793,0.006612,0.003857,0.004959,0.001102,0.00854,0.0,0.000551,0.027548,0.027548,0.027548,0.027548,0.001653,0.001653,0.001102,0.0,0.0,0.000551,0.027548,0.002204,0.012121,0.006887,0.0,0.027548,0.027548,0.027548,0.000551,0.027548,0.003306,0.027548,0.027548,0.027548,0.001102,0.01157,0.027548,0.001102,...,0.001653,0.0,0.027548,0.016529,0.0,0.01708,0.000551,0.001102,0.0,0.0,0.0,0.0,0.0,0.01157,0.006612,0.0,0.027548,0.025344,0.027548,0.027548,0.027548,0.027548,0.027548,0.011019,0.027548,0.002204,0.0,0.006612,0.010468,0.0,0.009917,0.008264,0.004959,0.000551,0.000551,0.006612,0.0,0.020386,0.001102,0.0


## Now we are going to sort each city by the top 10 most common places

#### First we creatge the function to sort

In [173]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

### Next we create a new dataframe and fill it with the most comomn places of each city


In [195]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = []
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

Cities_venues_sorted = pd.DataFrame(columns=columns,index=Cities_grouped_nor.index)
Cities_venues_sorted=Cities_venues_sorted.fillna(0).sort_index()

i=0
for ind,row in Cities_grouped_nor.iterrows():
    Cities_venues_sorted.iloc[i, 0:]=return_most_common_venues(row, num_top_venues)
    i+=1;


Lets see the results of our cities

In [196]:
Cities_venues_sorted.head(25)

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Aberdare,Mountain,Social Club,Night club,Sport bar,National Park,Park,Bar,Beach Bar,Cocktail Bar,Bus
Adelaide,African Restaurant,Social Club,French Restaurant,Library,Government building,Ski Area,Plaza,Park,National Park,Mediterranean Restaurant
Algarve,French Restaurant,Turkish Restaurant,American Restaurant,African Restaurant,Italian Restaurant,Mediterranean Restaurant,Asian Restaurant,Bar,Social Club,Beach Bar
Amritsar,Hotel,Gift shop,Antique shop,French Restaurant,American Restaurant,Arts store,Sport bar,Office,Asian Restaurant,Clothing store
Amsterdam,Plaza,Music,Indian Resturant,Ski Area,Steakhouse,Park,National Park,Irish Pub,Garden,American Restaurant
Antigua,Turkish Restaurant,Hotel,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,French Restaurant,Mediterranean Restaurant,Sport bar,Mexican Restaurant
Auckland,Garden,Bar,Park,National Park,Botanical Garden,Beach,Bay,Lounge,Sport bar,Cocktail Bar
Ayers Rock,Resort,Hotel,Sport bar,Cocktail Bar,Beach Bar,Bar,Botanical Garden,Garden,Airport,Park
Bahamas,Island,National Park,Antique shop,Spiritual Center,Office,Beach Bar,Italian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Clothing store
Bali,Waterfall,Bus,Italian Restaurant,American Restaurant,Mexican Restaurant,Mediterranean Restaurant,Turkish Restaurant,French Restaurant,Asian Restaurant,African Restaurant


We can see how the results make a lot of sense. The most common place in Aberdare is mountains. It is logical, considering that Aberdare is located in a mountain range in Kenya. In Airlie beach Queensland (a coastal town in Australia) the most common places are beach and beach bar.

# User part: 
#### From here we are going to define a user's score to previous cities to see where their next vacation should be

#### First we define user puntiations for pasts travels

In [176]:
userInput = [
            {'City':'New York', 'rating':2},
            {'City':'Barcelona', 'rating':2.5},
            {'City':'Bora Bora', 'rating':5},
            {'City':'Melbourne', 'rating':5},
            {'City':'Bangkok', 'rating':3},
            {'City':'Barbados', 'rating':4},
            {'City':'Airlie Beach Queensland', 'rating':5},
            {'City':'Cancun', 'rating':5},
            {'City':'Berlin', 'rating':2.5},
            {'City':'Vancouver', 'rating':3},
            {'City':'San Francisco', 'rating':4},
            {'City':'Las Vegas', 'rating':3.5},
            {'City':'Cairo', 'rating':4},
         ] 
inputCities = pd.DataFrame(userInput)
inputCities

Unnamed: 0,City,rating
0,New York,2.0
1,Barcelona,2.5
2,Bora Bora,5.0
3,Melbourne,5.0
4,Bangkok,3.0
5,Barbados,4.0
6,Airlie Beach Queensland,5.0
7,Cancun,5.0
8,Berlin,2.5
9,Vancouver,3.0


### Next we match the data from the user imput with the cities data. 
We create a new Dataframe, but just with the cities that the user has visited


In [177]:
UserCitiesPreferences=Cities_grouped_nor[Cities_grouped_nor.index.isin(inputCities["City"].tolist())]
display(UserCitiesPreferences.head(12))

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Airlie Beach Queensland,0.0,0.0,0.0,0.0,0.02317,0.002106,0.002106,0.0,0.0,0.008425,0.0,0.0,0.02317,0.02317,0.02317,0.02317,0.002106,0.0,0.0,0.0,0.0,0.0,0.02317,0.0,0.004213,0.005793,0.0,0.025276,0.02317,0.035808,0.0,0.02317,0.0,0.065298,0.105319,0.065298,0.0,0.004213,0.065298,0.0,...,0.0,0.0,0.021064,0.016851,0.0,0.002106,0.0,0.010532,0.0,0.0,0.0,0.0,0.0,0.0,0.002106,0.0,0.004213,0.014745,0.0,0.008425,0.006319,0.008425,0.010532,0.010532,0.008425,0.0,0.004213,0.006319,0.027383,0.0,0.006319,0.021064,0.048447,0.006319,0.002106,0.006319,0.012638,0.004213,0.0,0.0
Bangkok,0.003763,0.003494,0.003225,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.00672,0.0,0.008601,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.000806,0.000269,0.013439,0.013439,0.013439,0.013439,0.00336,0.0,0.013439,0.013439,0.013439,0.013439,0.013439,0.009139,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.00215,...,0.004032,0.006182,0.013439,0.013439,0.0,0.013439,0.013439,0.013439,0.0,0.001613,0.000806,0.001075,0.0,0.013439,0.013439,0.000538,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.013439,0.007795,0.013439,0.013439,0.001613,0.013439,0.013439,0.013439,0.001075,0.013439,0.011558,0.013439,0.013439,0.013439,0.0
Barbados,0.0,0.0,0.0,0.0,0.008759,0.00292,0.0,0.008759,0.0,0.00438,0.0,0.0,0.011679,0.014599,0.011679,0.014599,0.0,0.0,0.0,0.0,0.0,0.0,0.011679,0.0,0.0,0.00292,0.0,0.011679,0.011679,0.014599,0.0,0.011679,0.0,0.113869,0.119708,0.113869,0.0,0.0,0.116788,0.0,...,0.0,0.0,0.011679,0.008759,0.0,0.00292,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005839,0.005839,0.0,0.029197,0.008759,0.00292,0.029197,0.014599,0.008759,0.029197,0.005839,0.008759,0.0,0.0,0.008759,0.008759,0.0,0.005839,0.00292,0.00292,0.0,0.0,0.005839,0.0,0.017518,0.0,0.0
Barcelona,0.001291,0.001291,0.005162,0.009034,0.016132,0.016132,0.016132,0.016132,0.003226,0.008066,0.0,0.011615,0.016132,0.016132,0.016132,0.016132,0.014518,0.00613,0.016132,0.001936,0.016132,0.001936,0.016132,0.011937,0.016132,0.004033,0.000645,0.016132,0.016132,0.016132,0.002258,0.016132,0.010324,0.016132,0.016132,0.016132,0.00613,0.016132,0.016132,0.001613,...,0.0,0.000645,0.016132,0.016132,0.000323,0.016132,0.003872,0.016132,0.0,0.000323,0.000323,0.0,0.0,0.005162,0.002904,0.0,0.016132,0.016132,0.016132,0.016132,0.016132,0.016132,0.016132,0.016132,0.016132,0.016132,0.002904,0.016132,0.016132,0.000323,0.016132,0.016132,0.003549,0.002581,0.016132,0.016132,0.016132,0.016132,0.01097,0.0
Berlin,0.003848,0.003498,0.012244,0.008746,0.017492,0.017492,0.017492,0.017492,0.00105,0.008746,0.0,0.017492,0.017492,0.017492,0.017492,0.017492,0.017492,0.002799,0.017492,0.004198,0.001749,0.004898,0.017492,0.010495,0.017492,0.004373,0.0,0.017492,0.017492,0.017492,0.003498,0.017492,0.009795,0.017492,0.017492,0.017492,0.002099,0.017492,0.017492,0.00105,...,0.0007,0.001749,0.017492,0.017492,0.00035,0.013294,0.004548,0.017492,0.0,0.00035,0.0,0.0,0.0,0.005248,0.001749,0.00035,0.017492,0.017492,0.017492,0.017492,0.017492,0.017492,0.017492,0.017492,0.017492,0.010495,0.002799,0.017492,0.012244,0.0,0.017492,0.017492,0.002799,0.003848,0.017492,0.005947,0.007696,0.017492,0.008046,0.0
Bora Bora,0.002789,0.0,0.0,0.002789,0.008368,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.050209,0.050209,0.050209,0.050209,0.0,0.0,0.0,0.0,0.0,0.0,0.052999,0.0,0.0,0.012552,0.0,0.050209,0.050209,0.050209,0.0,0.050209,0.0,0.039052,0.083682,0.039052,0.0,0.0,0.039052,0.0,...,0.0,0.005579,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002789,0.011158,0.025105,0.005579,0.005579,0.005579,0.005579,0.0,0.005579,0.0,0.005579,0.0,0.013947,0.0,0.0,0.027894,0.083682,0.0,0.0,0.0,0.0,0.0,0.005579,0.0
Cairo,0.001799,0.00045,0.010342,0.011691,0.022482,0.001349,0.005845,0.022482,0.003147,0.010567,0.0,0.002248,0.022482,0.022482,0.022482,0.022482,0.002698,0.00045,0.008993,0.0,0.00045,0.001799,0.022482,0.015288,0.015737,0.005621,0.0,0.022482,0.022482,0.022482,0.00045,0.022482,0.003597,0.022482,0.022482,0.022482,0.0,0.015288,0.022482,0.0,...,0.00045,0.0,0.022482,0.021133,0.00045,0.01259,0.005845,0.007644,0.0,0.0,0.001349,0.0,0.0,0.022482,0.01259,0.0,0.022482,0.022482,0.022482,0.022482,0.022482,0.022482,0.022482,0.022482,0.022482,0.002698,0.002248,0.004496,0.010342,0.0,0.022482,0.022482,0.001349,0.000899,0.016637,0.000899,0.002698,0.022482,0.000899,0.0
Cancun,0.0,0.000527,0.003161,0.002107,0.026344,0.001054,0.006322,0.026344,0.001054,0.002898,0.0,0.002107,0.026344,0.026344,0.026344,0.026344,0.004742,0.002107,0.012118,0.000527,0.014752,0.000527,0.026344,0.0,0.026344,0.006586,0.0,0.026344,0.026344,0.026344,0.003161,0.026344,0.001581,0.026344,0.026344,0.026344,0.003161,0.026344,0.026344,0.000527,...,0.000527,0.0,0.005269,0.004742,0.0,0.026344,0.002107,0.005269,0.0,0.0,0.0,0.0,0.0,0.001054,0.0,0.0,0.014752,0.026344,0.026344,0.026344,0.01844,0.017387,0.026344,0.026344,0.023182,0.011591,0.004215,0.01686,0.011591,0.000527,0.008957,0.026344,0.009484,0.003688,0.013699,0.017914,0.002634,0.008957,0.003161,0.0
Las Vegas,0.000767,0.003452,0.019175,0.001151,0.019175,0.013806,0.006903,0.019175,0.001918,0.009588,0.0,0.000767,0.019175,0.019175,0.019175,0.019175,0.005753,0.010738,0.002685,0.0,0.000384,0.003068,0.019175,0.004986,0.013806,0.004794,0.0,0.019175,0.019175,0.019175,0.010355,0.019175,0.004219,0.019175,0.019175,0.019175,0.003452,0.013806,0.019175,0.001534,...,0.000384,0.00652,0.019175,0.019175,0.001534,0.019175,0.002685,0.010355,0.0,0.000767,0.000384,0.0,0.000384,0.019175,0.006136,0.000384,0.019175,0.019175,0.019175,0.019175,0.019175,0.019175,0.019175,0.019175,0.019175,0.019175,0.000767,0.014957,0.013423,0.0,0.019175,0.019175,0.009971,0.014957,0.001534,0.016491,0.004602,0.019175,0.003835,0.0
Melbourne,0.004042,0.009817,0.009817,0.010683,0.014436,0.00895,0.014436,0.014436,0.008373,0.007218,0.0,0.006929,0.014436,0.014436,0.014436,0.014436,0.014436,0.002021,0.003465,0.004042,0.001444,0.001444,0.014436,0.014436,0.014436,0.003609,0.0,0.014436,0.014436,0.014436,0.004908,0.014436,0.014436,0.014436,0.014436,0.014436,0.006063,0.014436,0.014436,0.005197,...,0.001732,0.00462,0.014436,0.014436,0.001444,0.014436,0.014436,0.014436,0.002021,0.001444,0.000289,0.0,0.000289,0.014436,0.014436,0.000289,0.014436,0.014436,0.014436,0.014436,0.014436,0.014436,0.014436,0.014436,0.014436,0.014436,0.002887,0.014436,0.014436,0.000289,0.014436,0.014436,0.001732,0.004042,0.003753,0.013281,0.011838,0.014436,0.011838,0.0


### Now we compute the user profile based on the caracteristics of the rated cities

In [178]:
UserCitiesPreferences_int=UserCitiesPreferences.copy().reset_index()
UserCitiesPreferences_int=UserCitiesPreferences_int.drop("index",axis=1)
User_profile=UserCitiesPreferences_int.transpose().dot(inputCities["rating"])
Top_15_user_preferences=User_profile.sort_values(ascending=False).head(15)
display(Top_15_user_preferences)

Beach Bar                   1.804576
Sport bar                   1.531414
Bar                         1.516815
Cocktail Bar                1.516815
Seafood Restaurant          1.006117
Asian Restaurant            0.980840
African Restaurant          0.980840
French Restaurant           0.977399
Mediterranean Restaurant    0.970454
American Restaurant         0.966242
Italian Restaurant          0.966242
Turkish Restaurant          0.966242
Mexican Restaurant          0.966242
Gift shop                   0.850035
Antique shop                0.845822
dtype: float64

We can see that the program says that the favourite place of the user is the Beach bar, Sports bar, Cocktail bar and Seafood restaurant. It makes a lot of sense, if we note that the highest puntuations from the user are to coastal cities, which are the ones that usually have a higher percentage of these establishments.

### Find recomendations

First we are going to quit the user entered cities from the dataset. If the user has already visited them, he/she doesn't need us to tell him/her if he/she will like them.

In [179]:
for city in inputCities["City"]:
  Cities_grouped_nor=Cities_grouped_nor[Cities_grouped_nor.index!=city]
display(Cities_grouped_nor)

Unnamed: 0,Aquarium,Arcade & Bowling,Casino,Cinema,Night club,Disco,Music,Art,Stadium,Theme Park,Water Park,Zoo,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,Bistro,Buffet,Cafeteria,Creperie,Bodega,Fast Food Restaurant,French Restaurant,Indian Resturant,Irish Pub,Italian restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant,Steakhouse,Turkish Restaurant,Nightlife Spot,Bar,Beach Bar,Cocktail Bar,Karaoke,Pub,Sport bar,Brewery,...,Lighthouse,Mountain,National Park,Park,Pedestrian Aera,Plaza,River,Ski Area,Stables,Vineyard,Volcano,Waterfall,Windmill,Government building,Library,Observatory,Office,Social Club,Spiritual Center,Antique shop,Arts store,Clothing store,Gift shop,Massage studio,Music store,Outlet,Airport,Bike rental,Boat rental,Ferry or Boat,Bus,Hotel,Resort,Motel,Hostel,Vacation Rental,Bed & Breakfast,Metro station,Pier,RV park
Aberdare,0.000000,0.000000,0.000000,0.002730,0.054608,0.000000,0.000000,0.002730,0.000000,0.023208,0.0,0.000000,0.008191,0.008191,0.008191,0.008191,0.002730,0.000000,0.000000,0.000000,0.000000,0.000000,0.008191,0.002730,0.008191,0.002048,0.000000,0.008191,0.008191,0.008191,0.000000,0.010922,0.002730,0.043686,0.043686,0.043686,0.000000,0.008191,0.054608,0.002730,...,0.002730,0.070990,0.049147,0.046416,0.000000,0.000000,0.002730,0.000000,0.000000,0.000000,0.000000,0.002730,0.000000,0.021843,0.013652,0.000000,0.032765,0.054608,0.002730,0.032765,0.024573,0.024573,0.032765,0.002730,0.021843,0.000000,0.000000,0.010922,0.000000,0.000000,0.038225,0.016382,0.000000,0.000000,0.000000,0.000000,0.002730,0.035495,0.000000,0.0
Adelaide,0.000390,0.013643,0.003898,0.009355,0.019489,0.008965,0.012863,0.019489,0.003508,0.009745,0.0,0.002729,0.019489,0.019489,0.019489,0.019489,0.012083,0.001949,0.000780,0.001949,0.000000,0.000780,0.019489,0.014422,0.009745,0.004872,0.000000,0.019489,0.019489,0.019489,0.000390,0.019489,0.006626,0.019489,0.019489,0.019489,0.004288,0.009355,0.019489,0.000780,...,0.000390,0.001169,0.019489,0.019489,0.000000,0.019489,0.011694,0.019489,0.000390,0.001559,0.000000,0.000000,0.001169,0.019489,0.019489,0.000780,0.019489,0.019489,0.012083,0.019489,0.019489,0.019489,0.019489,0.019489,0.019489,0.001559,0.007016,0.015202,0.010524,0.000000,0.019489,0.019489,0.001169,0.006237,0.002339,0.006626,0.005457,0.019489,0.000780,0.0
Algarve,0.000000,0.000000,0.000000,0.000816,0.040800,0.003264,0.000816,0.010608,0.000000,0.005712,0.0,0.001632,0.040800,0.040800,0.040800,0.040800,0.004896,0.001632,0.000816,0.000000,0.002448,0.000000,0.040800,0.007344,0.009792,0.010200,0.000000,0.040800,0.040800,0.040800,0.004896,0.040800,0.004896,0.040800,0.040800,0.040800,0.001632,0.008160,0.040800,0.000000,...,0.000816,0.000000,0.011424,0.011424,0.000000,0.000816,0.000816,0.003264,0.000000,0.000000,0.000000,0.000000,0.002448,0.000000,0.000000,0.000000,0.003264,0.040800,0.006528,0.026112,0.007344,0.006528,0.026928,0.002448,0.007344,0.000000,0.001632,0.005712,0.005712,0.000000,0.004080,0.040800,0.017136,0.000816,0.006528,0.004080,0.001632,0.002448,0.000816,0.0
Amritsar,0.000000,0.000000,0.000000,0.008617,0.010340,0.000000,0.000000,0.006894,0.000000,0.008617,0.0,0.000000,0.031021,0.029298,0.029298,0.029298,0.001723,0.000000,0.000000,0.000000,0.000000,0.008617,0.034468,0.006894,0.005170,0.007324,0.000000,0.029298,0.029298,0.029298,0.000000,0.029298,0.000000,0.025851,0.025851,0.025851,0.000000,0.005170,0.031021,0.000000,...,0.000000,0.000000,0.018957,0.017234,0.000000,0.020681,0.001723,0.003447,0.000000,0.000000,0.000000,0.000000,0.000000,0.001723,0.000000,0.000000,0.031021,0.012064,0.013787,0.068936,0.031021,0.029298,0.072383,0.005170,0.029298,0.001723,0.001723,0.001723,0.000000,0.000000,0.003447,0.086170,0.024128,0.001723,0.012064,0.000000,0.001723,0.027574,0.000000,0.0
Amsterdam,0.001605,0.002247,0.007061,0.010913,0.016049,0.010592,0.016049,0.016049,0.002889,0.008024,0.0,0.008345,0.016049,0.016049,0.016049,0.016049,0.012518,0.000963,0.002247,0.001605,0.000963,0.002247,0.016049,0.016049,0.016049,0.004012,0.000642,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.003210,0.016049,0.016049,0.003852,...,0.001926,0.001284,0.016049,0.016049,0.000321,0.016049,0.010913,0.016049,0.000321,0.001284,0.000000,0.000000,0.001284,0.013802,0.008024,0.000963,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.016049,0.008666,0.005136,0.016049,0.016049,0.000963,0.016049,0.016049,0.002568,0.000642,0.013802,0.011555,0.016049,0.016049,0.012197,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Sydney,0.002534,0.011826,0.002816,0.014079,0.014079,0.014079,0.014079,0.014079,0.003942,0.007039,0.0,0.013515,0.014079,0.014079,0.014079,0.014079,0.014079,0.001971,0.003379,0.003097,0.000845,0.001971,0.014079,0.014079,0.014079,0.003520,0.000563,0.014079,0.014079,0.014079,0.002253,0.014079,0.013515,0.014079,0.014079,0.014079,0.008447,0.014079,0.014079,0.004787,...,0.002253,0.004505,0.014079,0.014079,0.003097,0.014079,0.007039,0.014079,0.004224,0.000000,0.000563,0.000282,0.000845,0.014079,0.014079,0.002534,0.014079,0.014079,0.014079,0.014079,0.014079,0.014079,0.014079,0.014079,0.014079,0.014079,0.001971,0.014079,0.014079,0.000000,0.014079,0.014079,0.002816,0.001689,0.006195,0.012671,0.011263,0.014079,0.014079,0.0
Taj Mahal,0.000000,0.000000,0.000000,0.003814,0.002543,0.000000,0.000000,0.007629,0.000000,0.005086,0.0,0.000000,0.063573,0.063573,0.063573,0.063573,0.001271,0.003814,0.000000,0.000000,0.000000,0.003814,0.063573,0.008900,0.001271,0.015893,0.001271,0.063573,0.063573,0.063573,0.000000,0.063573,0.000000,0.016529,0.016529,0.016529,0.000000,0.001271,0.017800,0.000000,...,0.002543,0.000000,0.013986,0.010172,0.000000,0.010172,0.002543,0.001271,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.001271,0.012715,0.002543,0.017800,0.010172,0.021615,0.019072,0.011443,0.000000,0.019072,0.000000,0.002543,0.001271,0.000000,0.000000,0.003814,0.063573,0.003814,0.000000,0.012715,0.001271,0.003814,0.010172,0.000000,0.0
Valley of the kings,0.000772,0.001159,0.000772,0.000000,0.019309,0.005793,0.010427,0.019309,0.007337,0.009654,0.0,0.003476,0.019309,0.019309,0.019309,0.019309,0.006565,0.003089,0.010041,0.000000,0.000386,0.001931,0.019309,0.015833,0.009268,0.004827,0.000000,0.019309,0.019309,0.019309,0.001545,0.019309,0.011585,0.019309,0.019309,0.019309,0.002317,0.008882,0.019309,0.000772,...,0.000772,0.019309,0.019309,0.019309,0.000386,0.019309,0.009654,0.019309,0.000000,0.000386,0.000000,0.000772,0.000000,0.019309,0.005020,0.000000,0.019309,0.019309,0.019309,0.019309,0.019309,0.019309,0.019309,0.019309,0.019309,0.002703,0.003476,0.010427,0.006179,0.000000,0.019309,0.019309,0.001545,0.001931,0.000386,0.005793,0.001931,0.019309,0.000772,0.0
Victoria Falls,0.000000,0.000000,0.000000,0.000000,0.013468,0.000000,0.000000,0.000000,0.000000,0.013468,0.0,0.000000,0.040404,0.047138,0.040404,0.040404,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.040404,0.000000,0.013468,0.010101,0.000000,0.040404,0.040404,0.040404,0.000000,0.040404,0.000000,0.047138,0.047138,0.047138,0.000000,0.013468,0.047138,0.006734,...,0.000000,0.000000,0.026936,0.026936,0.000000,0.000000,0.026936,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.006734,0.000000,0.006734,0.013468,0.020202,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.020202,0.000000,0.006734,0.121212,0.013468,0.000000,0.000000,0.000000,0.006734,0.006734,0.000000,0.0


### Now we multiply the caracteristics by weight and make the weighted average

In this way we can find the value in which we believe that the user will like the city.

In [180]:

recommendationTable_df = ((Cities_grouped_nor*User_profile).sum(axis=1))/(User_profile.sum())
recommendationTable_df = recommendationTable_df.sort_values(ascending=False)
display(recommendationTable_df.head(15))

Maldives                              0.024405
La Digue Island                       0.023480
Isle of Man                           0.022069
Puerto del Rosario, Canary Islands    0.021413
Zermatt                               0.021052
Antigua                               0.020885
Courchevel                            0.020550
Chamonix                              0.020474
Sorrento                              0.020099
Luxor                                 0.019727
Machu Picchu                          0.019577
Victoria Falls                        0.019413
Algarve                               0.019077
Dubai                                 0.019050
Chichen Itza                          0.018828
dtype: float64

Looking at the recommendations, we can see how the first cities are islands and beach sites. There are also some mountain sites, but we see that there are no large cities, which was to be expected, since they are the places that obtained the worst score by the user.

Next we are going to see the least recommended cities:

In [181]:
display(recommendationTable_df.tail(10))

Madrid         0.014020
Prague         0.013630
San Diego      0.013614
Amsterdam      0.013601
Los Angeles    0.013505
Paris          0.013218
Hong Kong      0.012953
Sydney         0.012801
London         0.012230
Singapore      0.011756
dtype: float64

We see how the least recommended places are large cities, and not beach or mountain sites. Again, this was expected, and is another proof that the algorithm is working correctly.

# Results

Finally we get the first 15 matches, their match percentage, and the 10 most common sites, so the user can see if he/she will like it or not.

In [197]:
recommendationTable_df=recommendationTable_df[0:15]
recommendationTable_df=recommendationTable_df.sort_index()
recommendationTable_df=recommendationTable_df[recommendationTable_df.index.isin(recommendationTable_df.index[0:15].tolist())]
Found_Cities_venues_sorted=Cities_venues_sorted[Cities_venues_sorted.index.isin(recommendationTable_df.index.tolist())]
Found_Cities_venues_sorted["Match"]=recommendationTable_df.values
Found_Cities_venues_sorted=Found_Cities_venues_sorted.sort_values(by=["Match"],ascending=False)
columns=Found_Cities_venues_sorted.columns.tolist()
columns=columns[-1:]+columns[:-1]
Found_Cities_venues_sorted=Found_Cities_venues_sorted[columns]
display(Found_Cities_venues_sorted)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Match,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Maldives,0.024405,Sport bar,Beach Bar,Bar,Antique shop,Cocktail Bar,Island,Italian Restaurant,African Restaurant,Seafood Restaurant,French Restaurant
La Digue Island,0.02348,Beach Bar,Cocktail Bar,Bar,Sport bar,Island,Italian Restaurant,American Restaurant,Mexican Restaurant,Mediterranean Restaurant,Turkish Restaurant
Isle of Man,0.022069,Bar,Beach Bar,Cocktail Bar,Sport bar,Mountain,Irish Pub,African Restaurant,Mediterranean Restaurant,Waterfall,Turkish Restaurant
"Puerto del Rosario, Canary Islands",0.021413,Sport bar,Beach Bar,Cocktail Bar,Bar,Cafeteria,French Restaurant,American Restaurant,Turkish Restaurant,Seafood Restaurant,Mexican Restaurant
Zermatt,0.021052,Asian Restaurant,Sport bar,American Restaurant,Mediterranean Restaurant,African Restaurant,Mexican Restaurant,Seafood Restaurant,Hotel,Italian Restaurant,Turkish Restaurant
Antigua,0.020885,Turkish Restaurant,Hotel,American Restaurant,African Restaurant,Italian Restaurant,Asian Restaurant,French Restaurant,Mediterranean Restaurant,Sport bar,Mexican Restaurant
Courchevel,0.02055,Sport bar,Ski Area,Hotel,Cocktail Bar,Beach Bar,Bar,French Restaurant,Asian Restaurant,Mexican Restaurant,Italian Restaurant
Chamonix,0.020474,Hotel,Sport bar,Cocktail Bar,Beach Bar,Bar,French Restaurant,Italian Restaurant,American Restaurant,Mediterranean Restaurant,Asian Restaurant
Sorrento,0.020099,Cocktail Bar,Sport bar,Bar,Beach Bar,Hotel,Italian Restaurant,Mediterranean Restaurant,Night club,African Restaurant,American Restaurant
Luxor,0.019727,African Restaurant,Hotel,Italian Restaurant,Turkish Restaurant,American Restaurant,Asian Restaurant,French Restaurant,Mediterranean Restaurant,Mexican Restaurant,Seafood Restaurant


Next we will take the Dataframe where we saved all the data of all the sites taken from the Foursquare API, and we will reduce it only to the sites that are in one of the 15 recommended cities and are from one of the 15 favorite categories of the user. In this way, when we change the city selected on the map, it will load faster, and we will see only sites from the 15 best categories, and not all, making irrelevant sites not appear on the map, so it is cleaner and better.

In [192]:
Cities_Coordinates_df=Cities_df[Cities_df['Name'].isin(recommendationTable_df.index.tolist())].set_index("Name")
display(Cities_Coordinates_df)
Actual_full_Data_Locations_df=Full_Data_Locations_df[Full_Data_Locations_df['City'].isin(Cities_Coordinates_df.index.tolist())]
Actual_full_Data_Locations_df=Actual_full_Data_Locations_df[Actual_full_Data_Locations_df['Category'].isin(Top_15_user_preferences.index).tolist()]
Actual_full_Data_Locations_df['preference']=0
min_value=min(Top_15_user_preferences.values)
for i in range(len(Actual_full_Data_Locations_df.index)):
  Actual_full_Data_Locations_df.iloc[i,5]=(Top_15_user_preferences[Top_15_user_preferences.index == Actual_full_Data_Locations_df.iloc[i,1]][0]-(min_value-2))
display(Actual_full_Data_Locations_df)

Unnamed: 0,City,Category,Name,Latitude,Longitude,preference
3781,Algarve,American Restaurant,American Diner,37.097617,-8.228213,2.120420
3782,Algarve,American Restaurant,American Diner,37.088901,-8.252013,2.120420
3783,Algarve,American Restaurant,American Diner II,37.091527,-8.221664,2.120420
3784,Algarve,American Restaurant,American Dinner 3,37.092317,-8.237096,2.120420
3785,Algarve,American Restaurant,VersÃ¡til Restaurant,37.090049,-8.243731,2.120420
...,...,...,...,...,...,...
150693,Zermatt,Gift shop,Viktoria Shopping Center,46.023816,7.748115,2.004213
150694,Zermatt,Gift shop,Gifthittli,45.985933,7.775201,2.004213
150695,Zermatt,Gift shop,Edelweiss Shop,45.984745,7.741286,2.004213
150696,Zermatt,Gift shop,Edelweiss Shop,45.983493,7.783521,2.004213


We can se how we passed from over 140.000 places to just a few thousands

In the next cell we are going to create a Dash dashboard, where the user can select a city from the recomenended 15, and the city and it's most relevant places are shown. 

In [204]:
app = JupyterDash(__name__)

recommended_city=Found_Cities_venues_sorted.index[0]
cities_list=Found_Cities_venues_sorted.index.sort_values()
JupyterDash.infer_jupyter_proxy_config()
px.set_mapbox_access_token(open("/content/gdrive/MyDrive/mapbox_token.txt").read())
#df = px.data.carshare()

def serve_layout():
    return html.Div([html.H1('Recomended cities to visit and their top places',
                            style={'textAlign': 'center', 'color': '#D7DBDE',
                             'font-size': 45}),
                             html.Div([  html.Div(
                                            [ html.H2('Select a city:' ,
                                             style={'margin-right': '1em','font-size': '30px', 'color': '#D7DBDE','margin-left': '8em'})]
                                        ), dcc.Dropdown(id='input-city', 
                                                      options=[{'label': i, 'value': i} for i in cities_list],
                                                      value=recommended_city,
                                                      placeholder="Select a City",
                                                     style={'width':'80%', 'padding':'3px', 'font-size': '30px', 'color': '#000000', 'text-align-last' : 'center','align-items': 'center',}),], 
                                style={'width': '100%', 'display': 'flex', 'align-items': 'center', 'justify-content': 'center'}),
                                html.Div(dcc.Graph(id='city-plot', style={'width':'95%','height': '90vh'})),
                                ])

app.layout = serve_layout
                     
 # Callback decorator
@app.callback( Output('city-plot','figure'),
                [Input('input-city', 'value')])     
def get_graph(entered_city):
     global Cities_Coordinates_df;   

     Actual_Data_Locations_df=Actual_full_Data_Locations_df[Actual_full_Data_Locations_df['City']==entered_city]


     fig = px.scatter_mapbox(Actual_Data_Locations_df, lat="Latitude", lon="Longitude", color="Category", size="preference",text='Name',
              color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)

     fig.update_layout()

     return fig

if __name__ == '__main__':
    app.run_server( mode="inline",host="localhost",port=9000,debug=True )

<IPython.core.display.Javascript object>

## Results and Discussion <a name="results"></a>

The result of this project is the recommended sites for a particular user. As has been seen and explained, the scores entered clearly correspond to a person with little interest in big cities, someone who enjoys relaxing vacations much more, in quiet places and especially with the beach. We can see, as of the 15 recommended cities, except Dubai, the other cities are relatively quiet cities, and most are beach, so it seems that the algorithm works quite well, and the recommendations are good.


## Conclusion <a name="conclusion"></a>

In conclusion, we can say that this program is a good tool when planning a vacation, since due to the wide range of places to go, and how quickly they all change, it is difficult to know where to go, and where you will find what you are looking for. You could choose to manually search for sites that seem good to you, and use applications such as google maps or Foursquare to find out if those sites really have what you are looking for, but it is always better if they can give it to you done, as in this case!

The final decision of where to go will be up to the client, but the recommendations are made.
