# Guide Recommendation Pipeline
The system proposes to match the tourists with some potentially interesting guides, according to the users' preferences and requests.

The designed guide recommendation system relies upon datasets of:
- Ratings given by tourists to specific guides;
- Set of attributes about the guides;
- (eventually) Tourist attributes.

## Define working environment

In [1]:
# Import libraries

import numpy as np
import pandas as pd
import scipy.sparse as sps
import matplotlib.pyplot as pyplot
import ast

from tqdm import tqdm

## Load and preprocess data

In this part we are loading the previously generated dataframes:
- tourists.csv with the id, languages spoken, and keywords of the tourists.
- guides.csv with various attributes of the guides.
- ratings.csv which contains the tourist-guide interactions.


In [2]:
# Load dataframe for tourist attributes

tourist_file = open('Data/tourists_200.csv')

tourist_df = pd.read_csv(
    filepath_or_buffer = tourist_file,
    sep = ';',
    header = 0
)

tourist_df.rename(columns={tourist_df.columns[0]: 'id'}, inplace=True)

tourist_df

Unnamed: 0,id,languages,keywords
0,0,['bulgarian'],['cinema']
1,1,['deutsche'],[]
2,2,['spanish'],"['rafting', 'museums']"
3,3,['deutsche'],"['cinema', 'history', 'food']"
4,4,['chinese'],"['wine', 'rafting']"
...,...,...,...
195,195,['french'],['cinema']
196,196,['dutch'],"['food', 'tracking', 'history']"
197,197,['dutch'],"['museums', 'art', 'history']"
198,198,['italian'],"['rafting', 'literature']"


In [3]:
# Load dataframe for guide attributes

guide_file = open('Data/guides_40.csv')

guide_df = pd.read_csv(
    filepath_or_buffer = guide_file,
    sep = ';',
    header = 0,
    converters = {'languages_spoken':ast.literal_eval, 'keywords':ast.literal_eval}
)

guide_df.rename(columns={guide_df.columns[0]: 'id'}, inplace=True)

guide_df

Unnamed: 0,id,gender,name,birth_date,now_available,languages_spoken,price,education,biography,keywords,current_location,experience
0,0,male,Kevin Rodriguez,1967-11-01,True,[english],25,high-school,"Engineer, manufacturing",[museums],"{'lat': 40.342693584880706, 'lon': 18.16438078...",30
1,1,female,Diana Barnes,1965-09-19,True,"[italian, dutch]",36,middle-school,Animal technologist,"[cinema, rafting, history, wine]","{'lat': 40.3367551413547, 'lon': 18.1569120995...",25
2,2,female,Kristin Rogers,1978-10-12,True,"[chinese, french, english]",34,phd,"Editor, commissioning","[food, archeology, art]","{'lat': 40.362447049660204, 'lon': 18.14225129...",14
3,3,male,Jeremy Bowman,1992-05-30,True,[bulgarian],46,bachelor,Retail banker,"[countryside, rafting, art]","{'lat': 40.36584086550253, 'lon': 18.183002910...",5
4,4,male,Justin Lynch,1976-12-24,True,"[deutsche, french]",30,master,Secondary school teacher,"[countryside, tracking, beer]","{'lat': 40.354724889812935, 'lon': 18.20308322...",17
5,5,male,Charles Dunn,1988-06-19,True,"[deutsche, dutch, bulgarian]",27,middle-school,Oncologist,[],"{'lat': 40.35854623058413, 'lon': 18.183459879...",1
6,6,male,Mitchell Duncan,1987-12-29,True,"[english, chinese, french]",34,middle-school,Analytical chemist,"[rafting, art, sport]","{'lat': 40.35774331848761, 'lon': 18.163273185...",18
7,7,male,Charles Clarke,1990-07-17,True,"[spanish, dutch, french, italian]",33,bachelor,Associate Professor,"[museums, art, countryside, wine]","{'lat': 40.365193043282794, 'lon': 18.18548169...",11
8,8,male,Scott Sawyer,1992-06-10,True,"[bulgarian, english, chinese]",31,middle-school,Theatre manager,"[literature, cinema, music]","{'lat': 40.35265552781134, 'lon': 18.151723396...",2
9,9,male,Joseph Bradford,2004-04-17,True,"[chinese, bulgarian, dutch, english]",37,phd,"Therapist, occupational",[museums],"{'lat': 40.33774657696426, 'lon': 18.182064664...",1


In [4]:
guide_df.dtypes

id                   int64
gender              object
name                object
birth_date          object
now_available         bool
languages_spoken    object
price                int64
education           object
biography           object
keywords            object
current_location    object
experience           int64
dtype: object

In [5]:
# Load dataframe for ratings

rating_file = open('Data/ratings_200_40.csv')

rating_df = pd.read_csv(
    filepath_or_buffer = rating_file,
    sep = ';',
    header = 0
)

rating_df

Unnamed: 0,0,1,2
0,0,32,5.0
1,0,10,4.0
2,1,20,5.0
3,1,15,4.0
4,2,34,4.0
...,...,...,...
300,195,14,4.0
301,196,18,3.0
302,197,5,4.0
303,198,18,4.0


In [6]:
# Format the rating dataframe
rating_df.rename(columns={rating_df.columns[0]: 'tourist_id',
                          rating_df.columns[1]: 'guide_id',
                          rating_df.columns[2]: 'rating',
                         }, inplace=True)

# Check if there are duplicated interactions
rating_df.drop_duplicates(subset=['tourist_id','guide_id'],inplace=True)

rating_df

Unnamed: 0,tourist_id,guide_id,rating
0,0,32,5.0
1,0,10,4.0
2,1,20,5.0
3,1,15,4.0
4,2,34,4.0
...,...,...,...
300,195,14,4.0
301,196,18,3.0
302,197,5,4.0
303,198,18,4.0


## Print statistics

Some statistics of the acquired data are shown.

In [7]:
# Statistics about data
arr_tourists = tourist_df["id"].unique()
arr_guides = guide_df["id"].unique()

n_tourists = len(arr_tourists)
n_guides = len(arr_guides)
n_interactions = len(rating_df)

print("Number of tourists: {:d}".format(n_tourists))
print("Number of guides: {:d}".format(n_guides))
print("Number of interactions: {:d}".format(n_interactions))

print("Average interaction per tourist: {:.2f}".format(n_interactions/n_tourists))
print("Average interaction per guide: {:.2f}".format(n_interactions/n_guides))
print("Sparsity: {:.2f} %".format((1-float(n_interactions)/(n_guides*n_tourists))*100))

Number of tourists: 200
Number of guides: 40
Number of interactions: 305
Average interaction per tourist: 1.52
Average interaction per guide: 7.62
Sparsity: 96.19 %


## Create the URM

The **User Rating Matrix** describes the past interactions between tourists and guides, where rows represent tourists and columns represent guides and numbers are implicit or explicit ratings of the guides given by tourists with a rating scale from 1-5.
If there is no information about the opinion, the corresponding value will be set to 0.


In [8]:
# Create the User Rating Matrix
URM_all = sps.csr_matrix(
    (rating_df["rating"].values,
    (rating_df["tourist_id"].values, rating_df["guide_id"].values))
)

URM_all

<200x40 sparse matrix of type '<class 'numpy.float64'>'
	with 305 stored elements in Compressed Sparse Row format>

In [9]:
# Define the portion of data used for training the model: now we are using all the available data
URM_train = URM_all

## Create the ICM

The **Item Content Matrix** describes the list of guides with their attributes, with rows representing tourists and columns representing attributes. Each number in the ICM indicates how much important an attribute is in characterizing a guide. In this case we used the simplest form: the value is equal to 1 if the guide has that specific attribute and 0 otherwise. 

In [10]:
# Make a copy of the guide dataframe to build the ICM
icm_df = guide_df.copy(deep=True)

In [11]:
# Convert birthday information into categorical labels
def replace_birth_year(x):
    if x > 1984:
        return '20-40'
    else:
        return '40+'

icm_df['birth_date'] = icm_df['birth_date'].apply(
    lambda x: replace_birth_year(pd.to_datetime(x, format="%Y-%m-%d").year)
)

In [12]:
# Convert number of years of experience into categorical labels
def replace_experience(x):
    if x < 5:
        return 'junior'
    elif x < 10:
        return 'experienced'
    else:
        return 'senior'

icm_df['experience'] = icm_df['experience'].apply(replace_experience)

In [13]:
# Compute the mean value of prices
icm_df[['price']].mean(axis=0)

price    29.175
dtype: float64

In [14]:
# Split the price information into ranges
def replace_price(x):
    if x < 25:
        return 'low_cost'
    elif x < 35:
        return 'medium_cost'
    else:
        return 'high_cost'

icm_df['price'] = icm_df['price'].apply(replace_price)

In [15]:
icm_df

Unnamed: 0,id,gender,name,birth_date,now_available,languages_spoken,price,education,biography,keywords,current_location,experience
0,0,male,Kevin Rodriguez,40+,True,[english],medium_cost,high-school,"Engineer, manufacturing",[museums],"{'lat': 40.342693584880706, 'lon': 18.16438078...",senior
1,1,female,Diana Barnes,40+,True,"[italian, dutch]",high_cost,middle-school,Animal technologist,"[cinema, rafting, history, wine]","{'lat': 40.3367551413547, 'lon': 18.1569120995...",senior
2,2,female,Kristin Rogers,40+,True,"[chinese, french, english]",medium_cost,phd,"Editor, commissioning","[food, archeology, art]","{'lat': 40.362447049660204, 'lon': 18.14225129...",senior
3,3,male,Jeremy Bowman,20-40,True,[bulgarian],high_cost,bachelor,Retail banker,"[countryside, rafting, art]","{'lat': 40.36584086550253, 'lon': 18.183002910...",experienced
4,4,male,Justin Lynch,40+,True,"[deutsche, french]",medium_cost,master,Secondary school teacher,"[countryside, tracking, beer]","{'lat': 40.354724889812935, 'lon': 18.20308322...",senior
5,5,male,Charles Dunn,20-40,True,"[deutsche, dutch, bulgarian]",medium_cost,middle-school,Oncologist,[],"{'lat': 40.35854623058413, 'lon': 18.183459879...",junior
6,6,male,Mitchell Duncan,20-40,True,"[english, chinese, french]",medium_cost,middle-school,Analytical chemist,"[rafting, art, sport]","{'lat': 40.35774331848761, 'lon': 18.163273185...",senior
7,7,male,Charles Clarke,20-40,True,"[spanish, dutch, french, italian]",medium_cost,bachelor,Associate Professor,"[museums, art, countryside, wine]","{'lat': 40.365193043282794, 'lon': 18.18548169...",senior
8,8,male,Scott Sawyer,20-40,True,"[bulgarian, english, chinese]",medium_cost,middle-school,Theatre manager,"[literature, cinema, music]","{'lat': 40.35265552781134, 'lon': 18.151723396...",junior
9,9,male,Joseph Bradford,20-40,True,"[chinese, bulgarian, dutch, english]",high_cost,phd,"Therapist, occupational",[museums],"{'lat': 40.33774657696426, 'lon': 18.182064664...",junior


In [16]:
# Remove the columns that we would not consider as attributes
icm_df.drop(labels=['name', 'now_available', 'current_location'], axis=1, inplace=True)

In [17]:
# Split the categorical attributes into separate columns
multiclass_attributes = ['gender', 'price', 'experience', 'birth_date', 'education', 'biography', 'languages_spoken', 'keywords']

for n in multiclass_attributes:
    s = icm_df[n].explode()
    icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
    icm_df.drop(labels=n,axis=1,inplace=True)

  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
  icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)


In [18]:
icm_df

Unnamed: 0,id,female,male,high_cost,low_cost,medium_cost,experienced,junior,senior,20-40,...,countryside,food,history,literature,museums,music,rafting,sport,tracking,wine
0,0,0,1,0,0,1,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
1,1,1,0,1,0,0,0,0,1,0,...,0,0,1,0,0,0,1,0,0,1
2,2,1,0,0,0,1,0,0,1,0,...,0,1,0,0,0,0,0,0,0,0
3,3,0,1,1,0,0,1,0,0,1,...,1,0,0,0,0,0,1,0,0,0
4,4,0,1,0,0,1,0,0,1,0,...,1,0,0,0,0,0,0,0,1,0
5,5,0,1,0,0,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
6,6,0,1,0,0,1,0,0,1,1,...,0,0,0,0,0,0,1,1,0,0
7,7,0,1,0,0,1,0,0,1,1,...,1,0,0,0,1,0,0,0,0,1
8,8,0,1,0,0,1,0,1,0,1,...,0,0,0,1,0,1,0,0,0,0
9,9,0,1,1,0,0,0,1,0,1,...,0,0,0,0,1,0,0,0,0,0


In [19]:
# Print the list of attributes
attribute_list = icm_df.columns.tolist()
attribute_list

['id',
 'female',
 'male',
 'high_cost',
 'low_cost',
 'medium_cost',
 'experienced',
 'junior',
 'senior',
 '20-40',
 '40+',
 'bachelor',
 'high-school',
 'master',
 'middle-school',
 'phd',
 'Analytical chemist',
 'Animal nutritionist',
 'Animal technologist',
 'Associate Professor',
 'Brewing technologist',
 'Chief of Staff',
 'Civil Service fast streamer',
 'Conference centre manager',
 'Designer, television/film set',
 'Editor, commissioning',
 'Engineer, manufacturing',
 'Environmental health practitioner',
 'Environmental manager',
 'Equality and diversity officer',
 'Garment/textile technologist',
 'IT trainer',
 'Industrial buyer',
 'Information officer',
 'Journalist, broadcasting',
 'Market researcher',
 'Minerals surveyor',
 'Mining engineer',
 'Network engineer',
 'Oncologist',
 'Optometrist',
 'Producer, television/film/video',
 'Product designer',
 'Production assistant, radio',
 'Production designer, theatre/television/film',
 'Psychologist, counselling',
 'Research off

In [20]:
# Convert the names of attributes into numbers
def convert_index(x):
    if x == 'id':
        return x
    else:
        return attribute_list.index(x)

icm_df.rename(mapper=convert_index, axis=1, inplace=True)
icm_df

Unnamed: 0,id,1,2,3,4,5,6,7,8,9,...,68,69,70,71,72,73,74,75,76,77
0,0,0,1,0,0,1,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
1,1,1,0,1,0,0,0,0,1,0,...,0,0,1,0,0,0,1,0,0,1
2,2,1,0,0,0,1,0,0,1,0,...,0,1,0,0,0,0,0,0,0,0
3,3,0,1,1,0,0,1,0,0,1,...,1,0,0,0,0,0,1,0,0,0
4,4,0,1,0,0,1,0,0,1,0,...,1,0,0,0,0,0,0,0,1,0
5,5,0,1,0,0,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
6,6,0,1,0,0,1,0,0,1,1,...,0,0,0,0,0,0,1,1,0,0
7,7,0,1,0,0,1,0,0,1,1,...,1,0,0,0,1,0,0,0,0,1
8,8,0,1,0,0,1,0,1,0,1,...,0,0,0,1,0,1,0,0,0,0
9,9,0,1,1,0,0,0,1,0,1,...,0,0,0,0,1,0,0,0,0,0


In [21]:
# Re-organize data structure for building the ICM
icm_df = pd.melt(icm_df, id_vars='id', var_name='label')
icm_df = icm_df[icm_df["value"]==1]
icm_df

Unnamed: 0,id,label,value
1,1,1,1
2,2,1,1
10,10,1,1
11,11,1,1
12,12,1,1
...,...,...,...
3028,28,76,1
3029,29,76,1
3041,1,77,1
3047,7,77,1


In [22]:
# Create the Item Content Matrix
ICM_all = sps.csr_matrix(
    (icm_df["value"].values,
    (icm_df["id"].values, icm_df["label"].values))
)

ICM_all

<40x78 sparse matrix of type '<class 'numpy.int64'>'
	with 412 stored elements in Compressed Sparse Row format>

In [23]:
print(ICM_all.todense())

[[0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 [0 1 0 ... 0 0 0]
 ...
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 0]
 [0 1 0 ... 0 0 0]]


In [24]:
for l in attribute_list:
    print(attribute_list.index(l), l)

0 id
1 female
2 male
3 high_cost
4 low_cost
5 medium_cost
6 experienced
7 junior
8 senior
9 20-40
10 40+
11 bachelor
12 high-school
13 master
14 middle-school
15 phd
16 Analytical chemist
17 Animal nutritionist
18 Animal technologist
19 Associate Professor
20 Brewing technologist
21 Chief of Staff
22 Civil Service fast streamer
23 Conference centre manager
24 Designer, television/film set
25 Editor, commissioning
26 Engineer, manufacturing
27 Environmental health practitioner
28 Environmental manager
29 Equality and diversity officer
30 Garment/textile technologist
31 IT trainer
32 Industrial buyer
33 Information officer
34 Journalist, broadcasting
35 Market researcher
36 Minerals surveyor
37 Mining engineer
38 Network engineer
39 Oncologist
40 Optometrist
41 Producer, television/film/video
42 Product designer
43 Production assistant, radio
44 Production designer, theatre/television/film
45 Psychologist, counselling
46 Research officer, political party
47 Research scientist (maths)
48 

In [45]:
# Feature engineering: attribute different weights to the parameters
# Default weights = 1
feature_columns = {
    'low_cost': 4,
    'male' : 2,
    'female': 1,
    'senior': 8,
    '20-40': 9,
    'language': [54,61]
}

In [46]:
importance_weights = {
    'low_cost': 10,
    'senior': 1,
    '20-40': 1,
    'language': 1,
    'male': 1,
    'female': 1,
}

In [47]:
new_icm_df = icm_df.copy(deep=True)

In [48]:
for feature in importance_weights:
    if importance_weights[feature] > 1:
        print(feature)
        if (feature=='language'):
            condition = (new_icm_df.label >= feature_columns[feature][0]) & (new_icm_df.label <= feature_columns[feature][1])
        else:
            condition = (new_icm_df.label == feature_columns[feature])
        
        new_icm_df.loc[condition,'value'] = importance_weights[feature]

low_cost


In [49]:
ICM_modified = sps.csr_matrix(
    (new_icm_df["value"].values,
    (new_icm_df["id"].values, new_icm_df["label"].values))
)

ICM_modified

<40x78 sparse matrix of type '<class 'numpy.int64'>'
	with 412 stored elements in Compressed Sparse Row format>

In [50]:
print(ICM_modified.todense())

[[0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 1]
 [0 1 0 ... 0 0 0]
 ...
 [0 0 1 ... 0 0 0]
 [0 1 0 ... 0 0 0]
 [0 1 0 ... 0 0 0]]


In [51]:
ICM_train = ICM_modified

## Build the model

### Collaborative Filtering

**Collaborative filtering** recommends guides to tourists based on the preferences and behavior of similar users. It focuses on identifying patterns and similarities in interactions between guides and tourists based on the dataset of tourists feedback.

In this case **item-based** collaborative filtering technique has been used. It calculates the similarity between each pair of guides, according to the number of users that have the same opinion on them. 

In [52]:
from Recommenders.Compute_Similarity_Python import Compute_Similarity_Python

In [53]:
class ItemKNNCFRecommender(object):
    
    def __init__(self, URM):
        self.URM = URM
        
            
    def fit(self, topK=5, shrink=3, normalize=True, similarity="cosine"):
        
        similarity_object = Compute_Similarity_Python(self.URM, shrink=shrink, 
                                                  topK=topK, normalize=normalize, 
                                                  similarity = similarity)
        
        self.W_sparse = similarity_object.compute_similarity()

        
    def recommend(self, user_id, at=None, exclude_seen=True):
        # compute the scores using the dot product
        user_profile = self.URM[user_id]
        scores = user_profile.dot(self.W_sparse).toarray().ravel()

        if exclude_seen:
            scores = self.filter_seen(user_id, scores)

        # rank items
        ranking = scores.argsort()[::-1]
            
        return ranking[:at]
    
    # guides that have been rated by tourists will be excluded
    def filter_seen(self, user_id, scores):

        start_pos = self.URM.indptr[user_id]
        end_pos = self.URM.indptr[user_id+1]

        user_profile = self.URM.indices[start_pos:end_pos]
        
        scores[user_profile] = -np.inf

        return scores

### Content-based Filtering

**Content-based filtering** recommends guides to tourists based on the attributes of features of the guides themselves and the tourist's preferences. It focuses on analyzing the characteristics of items and matching them to the tourists's profile or preferences.

In this case **item** content-based filtering technique has been used. It suggests guides to tourists based on the attributes or content of the items themselves, without relying on user behavior or preferences.

In [54]:
class ItemKNNCBFRecommender(object):

    def __init__(self, URM, ICM):
        self.URM = URM
        self.ICM = ICM


    def fit(self, topK=50, shrink=100, normalize = True, similarity = "cosine"):

        similarity_object = Compute_Similarity_Python(self.ICM.T, shrink=shrink,
                                                  topK=topK, normalize=normalize,
                                                  similarity = similarity)

        self.W_sparse = similarity_object.compute_similarity()


    def recommend(self, user_id, at=None, exclude_seen=True):
        # compute the scores using the dot product
        user_profile = self.URM[user_id]
        scores = user_profile.dot(self.W_sparse).toarray().ravel()

        if exclude_seen:
            scores = self.filter_seen(user_id, scores)

        # rank items
        ranking = scores.argsort()[::-1]

        return ranking[:at]


    # guides that have been rated by toursits will be excluded
    def filter_seen(self, user_id, scores):

        start_pos = self.URM.indptr[user_id]
        end_pos = self.URM.indptr[user_id+1]

        user_profile = self.URM.indices[start_pos:end_pos]

        scores[user_profile] = -np.inf

        return scores

## Fit the model

In [55]:
model_type = 'cbf'

In [56]:
if model_type == 'cf':
    recommender = ItemKNNCFRecommender(URM_train)
    recommender.fit(shrink=0.5, topK=5)

In [57]:
if model_type == 'cbf':
    recommender = ItemKNNCBFRecommender(URM_train, ICM_train)
    recommender.fit(shrink=0.5, topK=5)

Similarity column 40 (100.0%), 11932.58 column/sec. Elapsed time 0.00 sec


## Generate outputs

Generation of recommendations of guides to tourists:

In [58]:
recommendations = []

for i,id in tqdm(enumerate(arr_tourists)):
    # 3 recommendations for each tourist
    rec = recommender.recommend(id, at=5, exclude_seen=True)
    rec_list = rec
    rec_row = ' '.join(str(s) for s in rec_list)
    recommendations.append(rec_row)

200it [00:00, 4432.02it/s]


In [59]:
# print recommendations for the first 10 users
for i in range(10):
    print("For user " + str(arr_tourists[i]) + " recommended tourists: " + recommendations[i])

For user 0 recommended tourists: 11 31 23 12 18
For user 1 recommended tourists: 28 14 1 4 33
For user 2 recommended tourists: 24 29 21 39 7
For user 3 recommended tourists: 18 12 23 10 11
For user 4 recommended tourists: 20 21 6 15 0
For user 5 recommended tourists: 23 11 31 10 39
For user 6 recommended tourists: 32 27 29 2 24
For user 7 recommended tourists: 26 34 13 25 8
For user 8 recommended tourists: 2 29 27 38 37
For user 9 recommended tourists: 26 3 17 6 24


In [60]:
result_df = pd.DataFrame(
    data = {'tourist_id': arr_tourists,
            'guides': recommendations}
)

result_df

Unnamed: 0,tourist_id,guides
0,0,11 31 23 12 18
1,1,28 14 1 4 33
2,2,24 29 21 39 7
3,3,18 12 23 10 11
4,4,20 21 6 15 0
...,...,...
195,195,20 6 15 0 9
196,196,12 23 31 11 10
197,197,8 17 26 9 16
198,198,12 23 31 11 10


Let's randomly choose a tourist and check for information of recommended guides:

In [61]:
# Show some examples: select a tourist by id to visualize the received recommendations
sample_tourist = 54
pd.DataFrame(tourist_df.loc[sample_tourist,:])

Unnamed: 0,54
id,54
languages,['dutch']
keywords,[]


In [62]:
# List of recommended guides
sample_guide_list = list(map(int, recommendations[sample_tourist].split(" ")))
guide_df.loc[sample_guide_list,:]

Unnamed: 0,id,gender,name,birth_date,now_available,languages_spoken,price,education,biography,keywords,current_location,experience
12,12,female,Crystal Mitchell,1961-01-01,True,"[dutch, french, italian]",19,bachelor,"Designer, television/film set","[sport, music, beer]","{'lat': 40.34268462008425, 'lon': 18.187542175...",41
23,23,female,Susan Miller,1962-11-08,True,"[english, italian, bulgarian]",20,middle-school,Animal nutritionist,[tracking],"{'lat': 40.364408581391295, 'lon': 18.16428124...",18
31,31,male,Kent Horton,1958-11-06,True,"[deutsche, french, english]",23,master,Civil Service fast streamer,"[music, countryside, food]","{'lat': 40.339171732936364, 'lon': 18.16792590...",6
11,11,female,Rhonda Hull,1960-12-06,True,[bulgarian],23,phd,Garment/textile technologist,"[cinema, beer, rafting]","{'lat': 40.367394960036485, 'lon': 18.17553586...",25
10,10,female,Elizabeth Castillo,1987-10-02,True,"[bulgarian, deutsche]",23,high-school,Environmental manager,"[cinema, rafting, sport, museums, countryside]","{'lat': 40.36612881009787, 'lon': 18.182042843...",5


## Evaluation