# Tour Recommendation Pipeline
The system proposes to match the tourists with some tours published by guides, according to the users' preferences and requests.

The tour recommendation system uses datasets of:
- Ratings given by tourists to specific tours;
- Set of attributes about the tours.

## Define working environment

In [57]:
# Import libraries

import numpy as np
import pandas as pd
import scipy.sparse as sps
import matplotlib.pyplot as pyplot
import ast

from tqdm import tqdm

## Load and preprocess data

In this part we are loading the previously generated dataframes:
- tourists.csv with the id, languages spoken, and keywords of the tourists.
- guides.csv with various attributes of the guides.
- ratings.csv which contains the tourist-guide interactions.


In [58]:
# Load dataframe for tourist attributes

tourist_file = open('Data/tourists_200.csv')

tourist_df = pd.read_csv(
    filepath_or_buffer = tourist_file,
    sep = ';',
    header = 0
)

tourist_df.rename(columns={tourist_df.columns[0]: 'id'}, inplace=True)

tourist_df

Unnamed: 0,id,languages,keywords
0,0,['bulgarian'],"['beer', 'sport', 'cinema', 'tracking', 'rafti..."
1,1,['bulgarian'],"['history', 'food', 'tracking']"
2,2,['spanish'],"['tracking', 'cinema', 'wine', 'literature', '..."
3,3,['deutsche'],"['sport', 'tracking']"
4,4,['spanish'],"['beer', 'tracking', 'food', 'literature', 'sp..."
...,...,...,...
195,195,['italian'],"['museums', 'cinema', 'literature', 'history',..."
196,196,['dutch'],"['wine', 'countryside', 'museums', 'art', 'raf..."
197,197,['italian'],"['rafting', 'archeology', 'art', 'museums', 'b..."
198,198,['italian'],"['music', 'art', 'museums', 'history', 'food',..."


In [59]:
# Load dataframe for guide attributes

tour_file = open('Data/tours_40.csv')

tour_df = pd.read_csv(
    filepath_or_buffer = tour_file,
    sep = ';',
    header = 0,
    converters = {'languages':ast.literal_eval, 'keywords':ast.literal_eval, 'attractions':ast.literal_eval}
)

tour_df.rename(columns={tour_df.columns[0]: 'id'}, inplace=True)

tour_df

Unnamed: 0,id,guide,languages,city,attractions,keywords,price,date,duration
0,0,0,[english],Lecce,"[Church of San Matteo, Roman Theatre]","[archeology, literature]",38,2024-06-27,12
1,1,1,"[italian, dutch]",Lecce,"[Piazza Sant'Oronzo, Torre del Parco]","[art, history]",27,2024-06-02,11
2,2,2,"[chinese, french, english]",Lecce,"[Colonna di Sant'Oronzo, Celestine Convent]","[art, literature]",29,2024-06-09,12
3,3,3,[bulgarian],Lecce,[Museo Faggiano],[museums],30,2024-06-26,9
4,4,4,"[deutsche, french]",Lecce,"[Palazzo Carafa, Church of Santa Chiara]","[history, literature]",30,2024-06-03,11
5,5,5,"[deutsche, dutch, bulgarian]",Lecce,"[Villa Comunale di Lecce, Church of San Matteo...","[countryside, literature, museums]",35,2024-06-29,11
6,6,6,"[english, chinese, french]",Lecce,"[Torre del Parco, Palazzo dei Celestini]",[history],37,2024-06-17,12
7,7,7,"[spanish, dutch, french, italian]",Lecce,"[Museo Faggiano, Palazzo dei Celestini]","[history, museums]",28,2024-06-29,7
8,8,8,"[bulgarian, english, chinese]",Lecce,"[Piazza Sant'Oronzo, Villa Comunale di Lecce]","[art, countryside]",38,2024-06-28,10
9,9,9,"[chinese, bulgarian, dutch, english]",Lecce,[Celestine Convent],[literature],34,2024-07-01,7


In [60]:
# Load dataframe for ratings

rating_file = open('Data/tours_ratings_200_40.csv')

rating_df = pd.read_csv(
    filepath_or_buffer = rating_file,
    sep = ';',
    header = 0
)

rating_df

Unnamed: 0,0,1,2
0,0,1,4.0
1,1,24,5.0
2,2,20,5.0
3,2,24,3.0
4,4,30,4.0
...,...,...,...
257,197,35,4.0
258,197,1,4.0
259,198,1,5.0
260,199,29,5.0


In [61]:
# Format the rating dataframe
rating_df.rename(columns={rating_df.columns[0]: 'tourist_id',
                          rating_df.columns[1]: 'tour_id',
                          rating_df.columns[2]: 'rating',
                         }, inplace=True)

# Check if there are duplicated interactions
rating_df.drop_duplicates(subset=['tourist_id','tour_id'],inplace=True)

rating_df

Unnamed: 0,tourist_id,tour_id,rating
0,0,1,4.0
1,1,24,5.0
2,2,20,5.0
3,2,24,3.0
4,4,30,4.0
...,...,...,...
257,197,35,4.0
258,197,1,4.0
259,198,1,5.0
260,199,29,5.0


## Print statistics

Some statistics of the acquired data are shown.

In [62]:
# Statistics about data
arr_tourists = tourist_df["id"].unique()
arr_tours = tour_df["id"].unique()

n_tourists = len(arr_tourists)
n_tours = len(arr_tours)
n_interactions = len(rating_df)

print("Number of tourists: {:d}".format(n_tourists))
print("Number of tours: {:d}".format(n_tours))
print("Number of interactions: {:d}".format(n_interactions))

print("Average interaction per tourist: {:.2f}".format(n_interactions/n_tourists))
print("Average interaction per tour: {:.2f}".format(n_interactions/n_tours))
print("Sparsity: {:.2f} %".format((1-float(n_interactions)/(n_tours*n_tourists))*100))

Number of tourists: 200
Number of tours: 40
Number of interactions: 262
Average interaction per tourist: 1.31
Average interaction per tour: 6.55
Sparsity: 96.72 %


In [63]:
# Statistics about ratings

print("Average rating: {:.6f}".format(rating_df.loc[:, 'rating'].mean()))
print("Maximum rating: {:.6f}".format(rating_df.loc[:, 'rating'].max()))
print("Minimum rating: {:.6f}".format(rating_df.loc[:, 'rating'].min()))

Average rating: 4.091603
Maximum rating: 5.000000
Minimum rating: 3.000000


## Create the URM

The **User Rating Matrix** describes the interactions between tourists and organized tours, where rows represent tourists and columns represent tours. The values in the cells can be defined by an implicit or explicit approach:
- Explicit ratings are given directly by the tourists to the tours, according to a rating scale.
- Implicit ratings are obtained according to specific criteria based on tourists' behaviour, without asking for an opinion explicitly. The corresponding value of the interaction is set to 1 if we think that the tourist could be interested in the tour, otherwise 0.

In our problem, we decide to use **explicit ratings** with a rating scale from 1 to 5.

In [64]:
# Create the User Rating Matrix
URM_all = sps.csr_matrix(
    (rating_df["rating"].values,
    (rating_df["tourist_id"].values, rating_df["tour_id"].values))
)

URM_all

<200x40 sparse matrix of type '<class 'numpy.float64'>'
	with 262 stored elements in Compressed Sparse Row format>

In [65]:
# Define the portion of data used for training the model: now we are using all the available data
URM_train = URM_all

## Create the ICM

The **Item Content Matrix** describes the list of tours with their attributes, with rows representing tours and columns representing attributes. Each number in the ICM indicates how much important an attribute is in characterizing a tour.

In this case we started from the simplest form: the cell value is equal to 1 if the tour has that specific attribute, and 0 otherwise.

In [66]:
# Make a copy of the tour dataframe to build the ICM
icm_df = tour_df.copy(deep=True)

In [67]:
icm_df

Unnamed: 0,id,guide,languages,city,attractions,keywords,price,date,duration
0,0,0,[english],Lecce,"[Church of San Matteo, Roman Theatre]","[archeology, literature]",38,2024-06-27,12
1,1,1,"[italian, dutch]",Lecce,"[Piazza Sant'Oronzo, Torre del Parco]","[art, history]",27,2024-06-02,11
2,2,2,"[chinese, french, english]",Lecce,"[Colonna di Sant'Oronzo, Celestine Convent]","[art, literature]",29,2024-06-09,12
3,3,3,[bulgarian],Lecce,[Museo Faggiano],[museums],30,2024-06-26,9
4,4,4,"[deutsche, french]",Lecce,"[Palazzo Carafa, Church of Santa Chiara]","[history, literature]",30,2024-06-03,11
5,5,5,"[deutsche, dutch, bulgarian]",Lecce,"[Villa Comunale di Lecce, Church of San Matteo...","[countryside, literature, museums]",35,2024-06-29,11
6,6,6,"[english, chinese, french]",Lecce,"[Torre del Parco, Palazzo dei Celestini]",[history],37,2024-06-17,12
7,7,7,"[spanish, dutch, french, italian]",Lecce,"[Museo Faggiano, Palazzo dei Celestini]","[history, museums]",28,2024-06-29,7
8,8,8,"[bulgarian, english, chinese]",Lecce,"[Piazza Sant'Oronzo, Villa Comunale di Lecce]","[art, countryside]",38,2024-06-28,10
9,9,9,"[chinese, bulgarian, dutch, english]",Lecce,[Celestine Convent],[literature],34,2024-07-01,7


In [68]:
# Split the price information into three ranges

def replace_price(x):
    if x < 25:
        return 'low_cost'
    elif x < 35:
        return 'medium_cost'
    else:
        return 'high_cost'

icm_df['price'] = icm_df['price'].apply(replace_price)

In [69]:
icm_df

Unnamed: 0,id,guide,languages,city,attractions,keywords,price,date,duration
0,0,0,[english],Lecce,"[Church of San Matteo, Roman Theatre]","[archeology, literature]",high_cost,2024-06-27,12
1,1,1,"[italian, dutch]",Lecce,"[Piazza Sant'Oronzo, Torre del Parco]","[art, history]",medium_cost,2024-06-02,11
2,2,2,"[chinese, french, english]",Lecce,"[Colonna di Sant'Oronzo, Celestine Convent]","[art, literature]",medium_cost,2024-06-09,12
3,3,3,[bulgarian],Lecce,[Museo Faggiano],[museums],medium_cost,2024-06-26,9
4,4,4,"[deutsche, french]",Lecce,"[Palazzo Carafa, Church of Santa Chiara]","[history, literature]",medium_cost,2024-06-03,11
5,5,5,"[deutsche, dutch, bulgarian]",Lecce,"[Villa Comunale di Lecce, Church of San Matteo...","[countryside, literature, museums]",high_cost,2024-06-29,11
6,6,6,"[english, chinese, french]",Lecce,"[Torre del Parco, Palazzo dei Celestini]",[history],high_cost,2024-06-17,12
7,7,7,"[spanish, dutch, french, italian]",Lecce,"[Museo Faggiano, Palazzo dei Celestini]","[history, museums]",medium_cost,2024-06-29,7
8,8,8,"[bulgarian, english, chinese]",Lecce,"[Piazza Sant'Oronzo, Villa Comunale di Lecce]","[art, countryside]",high_cost,2024-06-28,10
9,9,9,"[chinese, bulgarian, dutch, english]",Lecce,[Celestine Convent],[literature],medium_cost,2024-07-01,7


In [70]:
# Split the categorical attributes into separate columns
multiclass_attributes = ['languages', 'attractions', 'keywords', 'price']

for n in multiclass_attributes:
    s = icm_df[n].explode()
    icm_df = icm_df.join(pd.crosstab(s.index, s).astype(object)).fillna(0)
    icm_df.drop(labels=n,axis=1,inplace=True)

In [71]:
icm_df

Unnamed: 0,id,guide,city,date,duration,bulgarian,chinese,deutsche,dutch,english,...,Villa Comunale di Lecce,archeology,art,countryside,history,literature,museums,high_cost,low_cost,medium_cost
0,0,0,Lecce,2024-06-27,12,0,0,0,0,1,...,0,1,0,0,0,1,0,1,0,0
1,1,1,Lecce,2024-06-02,11,0,0,0,1,0,...,0,0,1,0,1,0,0,0,0,1
2,2,2,Lecce,2024-06-09,12,0,1,0,0,1,...,0,0,1,0,0,1,0,0,0,1
3,3,3,Lecce,2024-06-26,9,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,1
4,4,4,Lecce,2024-06-03,11,0,0,1,0,0,...,0,0,0,0,1,1,0,0,0,1
5,5,5,Lecce,2024-06-29,11,1,0,1,1,0,...,1,0,0,1,0,1,1,1,0,0
6,6,6,Lecce,2024-06-17,12,0,1,0,0,1,...,0,0,0,0,1,0,0,1,0,0
7,7,7,Lecce,2024-06-29,7,0,0,0,1,0,...,0,0,0,0,1,0,1,0,0,1
8,8,8,Lecce,2024-06-28,10,1,1,0,0,1,...,1,0,1,1,0,0,0,1,0,0
9,9,9,Lecce,2024-07-01,7,1,1,0,1,1,...,0,0,0,0,0,1,0,0,0,1


In [72]:
# Print the list of attributes
attribute_list = icm_df.columns.tolist()
attribute_list

['id',
 'guide',
 'city',
 'date',
 'duration',
 'bulgarian',
 'chinese',
 'deutsche',
 'dutch',
 'english',
 'french',
 'italian',
 'spanish',
 'Basilica di Santa Croce',
 'Castello di Carlo V',
 'Celestine Convent',
 'Church of San Francesco della Scarpa',
 'Church of San Matteo',
 'Church of San Niccol貌 e Cataldo',
 'Church of Santa Chiara',
 "Colonna di Sant'Oronzo",
 'Lecce Cathedral',
 'Museo Faggiano',
 'Palazzo Carafa',
 'Palazzo dei Celestini',
 "Piazza Sant'Oronzo",
 'Porta Napoli',
 'Roman Amphitheatre',
 'Roman Theatre',
 'San Giovanni Battista Church',
 'Torre del Parco',
 'Villa Comunale di Lecce',
 'archeology',
 'art',
 'countryside',
 'history',
 'literature',
 'museums',
 'high_cost',
 'low_cost',
 'medium_cost']

In [73]:
# Convert the names of attributes into numbers
def convert_index(x):
    if x == 'id':
        return x
    else:
        return attribute_list.index(x)

icm_df.rename(mapper=convert_index, axis=1, inplace=True)
icm_df

Unnamed: 0,id,1,2,3,4,5,6,7,8,9,...,31,32,33,34,35,36,37,38,39,40
0,0,0,Lecce,2024-06-27,12,0,0,0,0,1,...,0,1,0,0,0,1,0,1,0,0
1,1,1,Lecce,2024-06-02,11,0,0,0,1,0,...,0,0,1,0,1,0,0,0,0,1
2,2,2,Lecce,2024-06-09,12,0,1,0,0,1,...,0,0,1,0,0,1,0,0,0,1
3,3,3,Lecce,2024-06-26,9,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,1
4,4,4,Lecce,2024-06-03,11,0,0,1,0,0,...,0,0,0,0,1,1,0,0,0,1
5,5,5,Lecce,2024-06-29,11,1,0,1,1,0,...,1,0,0,1,0,1,1,1,0,0
6,6,6,Lecce,2024-06-17,12,0,1,0,0,1,...,0,0,0,0,1,0,0,1,0,0
7,7,7,Lecce,2024-06-29,7,0,0,0,1,0,...,0,0,0,0,1,0,1,0,0,1
8,8,8,Lecce,2024-06-28,10,1,1,0,0,1,...,1,0,1,1,0,0,0,1,0,0
9,9,9,Lecce,2024-07-01,7,1,1,0,1,1,...,0,0,0,0,0,1,0,0,0,1


In [74]:
# Re-organize data structure for building the ICM
icm_df = pd.melt(icm_df, id_vars='id', var_name='label')
icm_df = icm_df[icm_df["value"]==1]
icm_df

Unnamed: 0,id,label,value
1,1,1,1
163,3,5,1
165,5,5,1
168,8,5,1
169,9,5,1
...,...,...,...
1590,30,40,1
1592,32,40,1
1593,33,40,1
1594,34,40,1


In [82]:
icm_df.astype('int64', copy=False)

Unnamed: 0,id,label,value
1,1,1,1
163,3,5,1
165,5,5,1
168,8,5,1
169,9,5,1
...,...,...,...
1590,30,40,1
1592,32,40,1
1593,33,40,1
1594,34,40,1


In [87]:
icm_df.astype('int64').dtypes

id       int64
label    int64
value    int64
dtype: object

In [88]:
# Create the Item Content Matrix
ICM_all = sps.csr_matrix(
    (icm_df.astype('int64')["value"].values,
    (icm_df.astype('int64')["id"].values, icm_df.astype('int64')["label"].values))
)

ICM_all

<40x41 sparse matrix of type '<class 'numpy.int64'>'
	with 274 stored elements in Compressed Sparse Row format>

In [89]:
print(ICM_all.todense())

[[0 0 0 ... 1 0 0]
 [0 1 0 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 ...
 [0 0 0 ... 1 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]


### Feature Engineering
It is possible to model the importance of the features by weighting them differently in the ICM: so we can attribute a higher value to the features that we consider more relevant for the recommendation problem, such as the languages spoken by the guides.

In [90]:
# Print the full list of attributes
for l in attribute_list:
    print(attribute_list.index(l), l)

0 id
1 guide
2 city
3 date
4 duration
5 bulgarian
6 chinese
7 deutsche
8 dutch
9 english
10 french
11 italian
12 spanish
13 Basilica di Santa Croce
14 Castello di Carlo V
15 Celestine Convent
16 Church of San Francesco della Scarpa
17 Church of San Matteo
18 Church of San Niccol貌 e Cataldo
19 Church of Santa Chiara
20 Colonna di Sant'Oronzo
21 Lecce Cathedral
22 Museo Faggiano
23 Palazzo Carafa
24 Palazzo dei Celestini
25 Piazza Sant'Oronzo
26 Porta Napoli
27 Roman Amphitheatre
28 Roman Theatre
29 San Giovanni Battista Church
30 Torre del Parco
31 Villa Comunale di Lecce
32 archeology
33 art
34 countryside
35 history
36 literature
37 museums
38 high_cost
39 low_cost
40 medium_cost


In [91]:
# Index of columns in the dataset containing specific features
# (procedure to convert in function?)
feature_columns = {
    'guide': 1,
    'language': [5,12]
}

In [92]:
# Feature engineering: attribute different weights to the parameters
# Default weights = 1
importance_weights = {
    'guide': 1,
    'language': 10
}

In [96]:
# Create a copy of the original ICM
new_icm_df = icm_df.astype('int64').copy(deep=True)

In [97]:
# Modify the cell values with respect to the weights we want to give

for feature in importance_weights:
    if importance_weights[feature] > 1:
        print(feature)
        if feature=='language':
            condition = (new_icm_df.label >= feature_columns[feature][0]) & (new_icm_df.label <= feature_columns[feature][1])
        else:
            condition = (new_icm_df.label == feature_columns[feature])
        
        new_icm_df.loc[condition,'value'] = importance_weights[feature]

language


In [98]:
# Build the modified ICM

ICM_modified = sps.csr_matrix(
    (new_icm_df["value"].values,
    (new_icm_df["id"].values, new_icm_df["label"].values))
)

ICM_modified

<40x41 sparse matrix of type '<class 'numpy.int64'>'
	with 274 stored elements in Compressed Sparse Row format>

In [99]:
print(ICM_modified.todense())

[[0 0 0 ... 1 0 0]
 [0 1 0 ... 0 0 1]
 [0 0 0 ... 0 0 1]
 ...
 [0 0 0 ... 1 0 0]
 [0 0 0 ... 0 0 1]
 [0 0 0 ... 0 1 0]]


In [100]:
ICM_train = ICM_modified

## Build the model

In [101]:
def vector_similarity(urm: sps.csc_matrix, shrink: int):
    item_weights = np.sqrt(
        np.sum(urm.power(2), axis=0)
    ).A.flatten()

    num_items = urm.shape[1]
    urm_t = urm.T
    weights = np.empty(shape=(num_items, num_items))
    for item_id in range(num_items):
        numerator = urm_t.dot(urm[:, item_id]).A.flatten()
        denominator = item_weights[item_id] * item_weights + shrink + 1e-6

        weights[item_id] = numerator / denominator

    np.fill_diagonal(weights, 0.0)
    return weights


### Collaborative Filtering

**Collaborative filtering** recommends guides to tourists based on the preferences and behavior of similar users. It focuses on identifying patterns and similarities in interactions between guides and tourists based on the dataset of tourists feedback.

In this case **item-based** collaborative filtering technique has been used. It calculates the similarity between each pair of guides, according to the number of users that have the same opinion on them.

In [102]:
class ItemKNNCFRecommender(object):
    
    def __init__(self, URM):
        self.URM = URM
        
            
    def fit(self, shrink=3):
        self.W_sparse = vector_similarity(self.URM.tocsc(), shrink=shrink)
        with np.printoptions(threshold=np.inf):
            print(self.W_sparse)

        
    def recommend(self, user_id, at=None, exclude_seen=True):
        # compute the scores using the dot product
        user_profile = self.URM[user_id]
        scores = user_profile.dot(self.W_sparse).ravel()

        if exclude_seen:
            scores = self.filter_seen(user_id, scores)

        # rank items
        ranking = scores.argsort()[::-1]
            
        return ranking[:at]
    
    # guides that have been rated by tourists will be excluded
    def filter_seen(self, user_id, scores):

        start_pos = self.URM.indptr[user_id]
        end_pos = self.URM.indptr[user_id+1]

        user_profile = self.URM.indices[start_pos:end_pos]
        
        scores[user_profile] = -np.inf

        return scores

### Content-based Filtering

**Content-based filtering** recommends guides to tourists based on the attributes of features of the guides themselves and the tourist's preferences. It focuses on analyzing the characteristics of items and matching them to the tourists's profile or preferences.

In this case **item** content-based filtering technique has been used. It suggests guides to tourists based on the attributes or content of the items themselves, without relying on user behavior or preferences.

In [103]:
class ItemKNNCBFRecommender(object):

    def __init__(self, URM, ICM):
        self.URM = URM
        self.ICM = ICM


    def fit(self, shrink=20):
        self.W_sparse = vector_similarity(self.ICM.T, shrink=shrink)
        with np.printoptions(threshold=np.inf):
            print(self.W_sparse)


    def recommend(self, user_id, at=None, exclude_seen=True):
        # compute the scores using the dot product
        user_profile = self.URM[user_id]
        scores = user_profile.dot(self.W_sparse).ravel()

        if exclude_seen:
            scores = self.filter_seen(user_id, scores)

        # rank items
        ranking = scores.argsort()[::-1]

        return ranking[:at]


    # guides that have been rated by toursits will be excluded
    def filter_seen(self, user_id, scores):

        start_pos = self.URM.indptr[user_id]
        end_pos = self.URM.indptr[user_id+1]

        user_profile = self.URM.indices[start_pos:end_pos]

        scores[user_profile] = -np.inf

        return scores

## Fit the model

In [104]:
ICM_train = ICM_modified

In [143]:
model_type = 'cf'

In [144]:
if model_type == 'cf':
    recommender = ItemKNNCFRecommender(URM_train)
    recommender.fit(shrink=0.5)

[[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.06084043 0.19105248 0.
  0.         0.         0.         0.         0.09222517 0.
  0.18040278 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.17521699 0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.07629568
  0.         0.         0.09741078 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.22906604 0.         0.         0.09221902
  0.         0.         0.15301338 0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.  

In [145]:
if model_type == 'cbf':
    recommender = ItemKNNCBFRecommender(URM_train, ICM_train)
    recommender.fit(shrink=0.5)

## Generate outputs

Here the generation of recommendations of guides to tourists is reported.

In [146]:
# Set the number of guides to recommend to each tourist
n_recommendations_per_tourist = 5

In [147]:
# Generate recommendations for each tourist

recommendations = []

for i,id in tqdm(enumerate(arr_tourists)):
    # 3 recommendations for each tourist
    rec = recommender.recommend(id, at=n_recommendations_per_tourist, exclude_seen=True)
    rec_list = rec
    rec_row = ' '.join(str(s) for s in rec_list)
    recommendations.append(rec_row)

200it [00:00, 16667.87it/s]


In [148]:
# print recommendations for the first 10 users
for i in range(10):
    print("For user " + str(arr_tourists[i]) + " recommended tours: " + recommendations[i])

For user 0 recommended tours: 32 38 8 35 5
For user 1 recommended tours: 29 11 0 20 2
For user 2 recommended tours: 18 36 29 11 0
For user 3 recommended tours: 39 38 17 16 15
For user 4 recommended tours: 27 34 2 39 19
For user 5 recommended tours: 23 35 25 1 22
For user 6 recommended tours: 38 19 35 22 0
For user 7 recommended tours: 11 0 10 20 22
For user 8 recommended tours: 16 24 34 22 15
For user 9 recommended tours: 22 33 39 34 1


In [149]:
# Show the output dataframe

result_df = pd.DataFrame(
    data = {'tourist_id': arr_tourists,
            'tours': recommendations}
)

result_df

Unnamed: 0,tourist_id,tours
0,0,32 38 8 35 5
1,1,29 11 0 20 2
2,2,18 36 29 11 0
3,3,39 38 17 16 15
4,4,27 34 2 39 19
...,...,...
195,195,10 1 25 17 27
196,196,39 38 17 16 15
197,197,32 13 38 7 29
198,198,32 38 8 35 5


Let's randomly choose a tourist and check for information of recommended guides.

In [156]:
# Show some examples: select a tourist by id to visualize the received recommendations
sample_tourist = 10
pd.DataFrame(tourist_df.loc[sample_tourist,:])

Unnamed: 0,10
id,10
languages,['dutch']
keywords,"['sport', 'tracking', 'literature', 'beer', 'w..."


In [157]:
# List of recommended guides
sample_guide_list = list(map(int, recommendations[sample_tourist].split(" ")))
tour_df.loc[sample_guide_list,:]

Unnamed: 0,id,guide,languages,city,attractions,keywords,price,date,duration
17,17,17,[dutch],Lecce,"[Museo Faggiano, San Giovanni Battista Church]","[literature, museums]",17,2024-06-09,9
22,22,22,"[chinese, spanish]",Lecce,"[Church of San Matteo, Villa Comunale di Lecce...","[countryside, history, literature]",27,2024-06-13,9
39,39,39,"[spanish, dutch]",Lecce,"[Porta Napoli, Villa Comunale di Lecce]","[art, countryside]",19,2024-06-11,5
34,34,34,"[chinese, spanish, bulgarian, italian]",Lecce,[Church of San Francesco della Scarpa],[literature],26,2024-06-16,12
1,1,1,"[italian, dutch]",Lecce,"[Piazza Sant'Oronzo, Torre del Parco]","[art, history]",27,2024-06-02,11
