<a href="https://colab.research.google.com/github/aaubs/ds-master/blob/main/notebooks/M1_Recommender_System_v5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Building a Recommender System in Python



In today's information age, we are bombarded with choices. We have access to millions of products, movies, music, and articles, but it can be overwhelming to know where to start. Recommender systems help us to navigate this information overload by providing us with personalized recommendations based on our interests and past behavior.



In [None]:
%%html

<iframe width="1280" height="720" src="https://www.youtube.com/embed/hqFHAnkSP2U" title="Netflix Quick Guide: How Does Netflix Make TV Show and Movie Suggestions? | Netflix" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

####**Question 1: What Are the Different Types of Recommender Systems?**

Recommender systems can be broadly categorized into the following types, each with its own advantages and disadvantages:

- Collaborative Filtering:
  *   User-Based: Recommends items based on the similarities between users. If Alice and Bob like the same movies, then the movies liked by Alice will be recommended to Bob and vice versa.
  *   Item-Based: Recommends items that are similar to a given item. For example, if you liked the movie "Inception," you might also like other sci-fi thrillers. The recommendation is based on the fact that many users who liked one movie also liked another, regardless of the content of the movies themselves.


- Content-Based Filtering:
   - Recommends items by comparing the content of the items and a user profile, with content described in terms of several descriptors or terms that are inherent to the item. The recommendation is based on the inherent characteristics (content) of the movie you liked, such as its genre and themes, rather than what other users liked.


![image.png](https://raw.githubusercontent.com/aaubs/ds-master/main/data/recommendation_sys_1.png)

##1. Data Preprocessing



###1.1 Load Data
The first step in any data analysis or machine learning project is to load the data. We'll use the Pandas library for this purpose. We've already loaded the data, but let's do it again for the sake of the tutorial.

Here's how you can load a CSV file using Pandas:

In [None]:
# Importing necessary libraries.
import pandas as pd

df_trips = pd.read_csv('https://sds-aau.github.io/SDS-master/M1/data/trips.csv')

##2. Build a Simple Recommender System
Now that we have cleaned and preprocessed the data, let's move on to building a recommender system. This process consists of three steps:


> - Step 1: Label Encoding and Matrix Creation
> - Step 2: Perform Dimensionality Reduction
> - Step 3: Calculate The Similarity Matrix


##**2.1. User-Based Collaborative Filtering**

Recommends items by finding users who are similar to the target user. If user A is similar to user B, then the items liked by user B can be recommended to user A.

### **2.1.1 Label Encoding and Matrix Creation**
This part label-encodes the usernames and place slugs, creating numerical IDs for each. It then builds a sparse matrix to capture the interactions between users and places.

In [None]:
# Importing necessary libraries.
import scipy.sparse as ss
import numpy as np
from sklearn.preprocessing import LabelEncoder

In [None]:
# A. Initialize label encoders


In [None]:
# A. Label encode usernames and place slugs



In [None]:
# Checking the dimensions of the dataset.
df_trips.shape

(46510, 14)

In [None]:
# Checking the dimensions of the dataset.

df_trips[['username_id','place_slug_id']].values.shape

(46510, 2)

In [None]:
# B. Matrix Creation


place_slug_id,0,1,2,3,4,5,6,7,8,9,...,951,952,953,954,955,956,957,958,959,960
username_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,3,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2866,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2867,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2868,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2869,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# B. Matrix Creation



place_slug_id,0,1,2,3,4,5,6,7,8,9,...,951,952,953,954,955,956,957,958,959,960
username_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,3,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2866,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2867,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2868,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2869,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# matrix[matrix > 0] = 1

####**Exercise 1: Implementing the first step using both username and country_code columns**


Implement the first step of a recommender system using both username and country as input parameters

Index(['Unnamed: 0', 'username', 'country', 'country_code', 'country_slug',
       'date_end', 'date_start', 'latitude', 'longitude', 'place',
       'place_slug', 'duration_of_stay', 'username_id', 'place_slug_id'],
      dtype='object')

249

2871

In [None]:
# Step 1: Label Encoding and Matrix Creation
# Importing necessary libraries.
from sklearn.preprocessing import LabelEncoder



country_id,0,1,2,3,4,5,6,7,8,9,...,174,175,176,177,178,179,180,181,182,183
username_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2866,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2867,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2868,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2869,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### **2.1.2 Perform Dimensionality Reduction**
This section uses Truncated Singular Value Decomposition (SVD) to reduce the dimensionality of the user-item interaction matrix.

####**Question 2: Why Is Dimensionality Reduction Important in Recommender Systems?**

Dimensionality reduction is a critical aspect of building efficient and effective recommender systems for several reasons:

- **Computational Efficiency:** Reducing the number of dimensions can significantly speed up algorithmic computations and save storage space, making the system more scalable.


- **Latent Feature Discovery:** Techniques like Singular Value Decomposition (SVD) can uncover hidden features that capture the underlying structure of the data more effectively than the original high-dimensional features.


In [None]:
#User_based apparoach
matrix_user_place.shape

(2871, 961)

In [None]:
df_trips[df_trips['username_id'] == 1]

Unnamed: 0.1,Unnamed: 0,username,country,country_code,country_slug,date_end,date_start,latitude,longitude,place,place_slug,duration_of_stay,username_id,place_slug_id,country_id
699,699,@0chucha0,Sri Lanka,LK,sri-lanka,2018-03-18,2018-02-18,7,80,Sri Lanka,sri-lanka,28.0,1,824,95
700,700,@0chucha0,Thailand,TH,thailand,2018-02-20,2017-11-21,9,100,Ko Samui,ko-samui-thailand,91.0,1,452,159


In [None]:
# Importing necessary libraries.
from sklearn.decomposition import TruncatedSVD

svd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)

In [None]:
df_trips.username_id.nunique()

2871

In [None]:
#latent features capture a user's preference for certain types of places


(2871, 5)

In [None]:
df_trips.place_slug_id.nunique()

961

####**Exercise 2: Implementing the second step (Performing Dimensionality Reduction) using both username and country columns**


Implement the second step of a recommender system using both username and country as input parameters

2871

184

In [None]:
#latent features capture a user's preference for certain types of places


(2871, 5)

In [None]:
matrix_user_country_dr

array([[ 7.29196724e-02,  1.66254205e-01, -4.57704336e-02,
        -1.83673661e-02, -1.03092161e-01],
       [ 1.60759265e-01,  1.43194378e-01, -4.75427543e-01,
         8.15290240e-02, -6.31330682e-01],
       [ 1.60110688e+01, -2.35467393e-01, -2.35376450e+00,
         3.32847607e+00,  2.75745652e+00],
       ...,
       [ 2.45324228e-02,  1.02946384e-02, -3.14777425e-02,
         1.08377754e-02, -2.29527938e-02],
       [ 2.81399123e+00, -8.47864826e-01,  3.12524105e-01,
        -7.25425546e-02,  3.72400609e-01],
       [ 2.76397962e-02,  8.93782216e-03, -2.83449818e-02,
        -2.22850264e-02,  3.05026782e-03]])

### **2.1.3 Calculate The Similarity Matrix**


In [None]:
# Importing necessary libraries.
from sklearn.metrics.pairwise import cosine_distances

In [None]:
# Calculate the cosine distances between all pairs of users in the matrix_users


(2871, 2871)

In [None]:
# check index for user @cw


array([555])

In [None]:
# Displaying the first few rows of the dataset for user @cw


Unnamed: 0.1,Unnamed: 0,username,country,country_code,country_slug,date_end,date_start,latitude,longitude,place,place_slug,duration_of_stay,username_id,place_slug_id,country_id
46261,46261,@cw,Spain,ES,spain,2016-10-14,2016-09-19,36,-4,Mijas,spain,25.0,555,821,51
46262,46262,@cw,United Kingdom,UK,united-kingdom,2016-09-18,2016-08-25,51,0,London,london-united-kingdom,24.0,555,506,170
46263,46263,@cw,United Kingdom,UK,united-kingdom,2016-08-24,2016-08-16,52,-1,Lichfield,united-kingdom,8.0,555,898,170
46264,46264,@cw,United Kingdom,UK,united-kingdom,2016-08-15,2016-07-21,55,-4,Glasgow,glasgow-united-kingdom,25.0,555,319,170
46265,46265,@cw,United Kingdom,UK,united-kingdom,2016-07-20,2016-07-16,51,0,London,london-united-kingdom,4.0,555,506,170


array([1.03515366, 1.19543098, 0.93828879, ..., 0.61240899, 0.25019367,
       0.51783577])

In [None]:
# find the most similar indices using np.argsort


array([ 555, 2644,  153, 1371,  702])

In [None]:
# Find the name of the most similar indices using inverse_transform


array(['@cw', '@travpreneur', '@andrea', '@katehuentelman', '@dpashley',
       '@avizcarrondo', '@darep', '@adrianavecc', '@bneiluj', '@marsty5'],
      dtype=object)

In [None]:
# Displaying the first few rows of the dataset for user @cw


Unnamed: 0.1,Unnamed: 0,username,country,country_code,country_slug,date_end,date_start,latitude,longitude,place,place_slug,duration_of_stay,username_id,place_slug_id,country_id
22884,22884,@travpreneur,India,IN,india,2018-07-20,2018-07-05,18,73,Pune,pune-india,15.0,2644,710,76
22885,22885,@travpreneur,United Kingdom,UK,united-kingdom,2018-07-05,2018-07-04,51,0,London,london-united-kingdom,1.0,2644,506,170
22886,22886,@travpreneur,United Kingdom,UK,united-kingdom,2018-07-04,2018-07-03,53,-2,Manchester,manchester-united-kingdom,1.0,2644,542,170
22887,22887,@travpreneur,United Kingdom,UK,united-kingdom,2018-07-03,2018-06-25,51,0,London,london-united-kingdom,8.0,2644,506,170
22888,22888,@travpreneur,United States,US,united-states,2018-06-24,2018-06-22,37,-122,San Francisco,san-francisco-ca-united-states,2.0,2644,765,171


In [None]:
# Find the list of places for the target user


array([821, 506, 898, 319, 300, 614, 373, 870, 407, 594, 472, 135, 839,
       559, 507, 765, 891, 167, 790, 101, 743, 694, 531, 609, 899, 193,
       480, 841, 665, 590, 761,  69])

In [None]:
# Find the list of places for the most similar user to the target user



array([710, 506, 542, 765, 764, 899, 766, 609,  34, 109, 101, 589, 916,
       415, 246, 381, 265,  91, 898, 308, 393, 888, 328,  62, 911, 743,
       865, 480,  84, 821, 665, 362, 826,  15, 258, 806, 803,  78, 430])

array([710, 542, 764, 766,  34, 109, 589, 916, 415, 246, 381, 265,  91,
       308, 393, 888, 328,  62, 911, 865,  84, 362, 826,  15, 258, 806,
       803,  78, 430])

In [None]:
# Find elements in userbased_user_places_ids that are NOT in target_user_places_ids



Is NOT in target_user_places_ids: [710 542 764 766  34 109 589 916 415 246 381 265  91 308 393 888 328  62
 911 865  84 362 826  15 258 806 803  78 430]


In [None]:
# List of recommended places


array(['pune-india', 'manchester-united-kingdom',
       'san-diego-ca-united-states', 'san-jose-ca-united-states',
       'amsterdam-netherlands', 'birmingham-united-kingdom',
       'mumbai-india', 'victoria-seychelles', 'johannesburg-south-africa',
       'delhi-india'], dtype=object)

####**Exercise 3: Implementing a user-based recommender system using both username and country as input parameters**




To build a recommender system, we should proceed with the following three steps:

- Step 1: Label Encoding and Matrix Creation
- Step 2: Perform Dimensionality Reduction
- Step 3: Calculate the Similarity Matrix



(2871, 5)

(2871, 2871)

array([ 555,  374, 2174,  253,  358])

array(['@cw', '@brucemarsh', '@reboramuriel', '@austingrandt', '@brett',
       '@yrezgui', '@katekendall', '@joanna', '@chewx', '@evamamartin'],
      dtype=object)

array([ 51, 170,  56,  35,  82,   9, 171,  73,  43,  79, 135, 101,   8])

array([ 51,  56,  79,   8, 170, 135, 171])

array([ 35,  82,   9,  73,  43, 101])

array(['CN', 'JP', 'AU', 'ID', 'DE', 'MA'], dtype=object)

A cosine distance matrix of item or user is a table that shows how similar each item or user is to every other item or user, based on their cosine similarity. Here, we can decide which type of collaborative model to implement, based on our specific needs and goals.


##**2.2. Item-Based Collaborative Filtering**

Recommends items based on their similarity to items that the target user has interacted with. This is generally more scalable and can handle a larger item set. It also performs better for new users but may not be as personalized as user-based models.

####**Question 3: What is the main difference between item-based and user-based recommender systems?**

Item-based recommender systems focus on recommending travel destinations based on similarities between the destinations themselves, while user-based recommender systems make recommendations based on similarities between users' past behaviors and preferences. Item-based is generally good for new users, whereas user-based provides more personalized recommendations but may struggle with new users.







### **2.2.1 Label Encoding and Matrix Creation**
In the item-based approach, we use items as the first index instead of users. This process label-encodes the usernames and place slugs, creating numerical IDs for each. It then builds a sparse matrix to capture the interactions between users and places.

In [None]:
# Importing necessary libraries.
import scipy.sparse as ss
import numpy as np
from sklearn.preprocessing import LabelEncoder

In [None]:
# A. Initialize label encoders


In [None]:
# A. Label encode usernames and place slugs



In [None]:
# Checking the dimensions of the dataset.


(46510, 15)

In [None]:
# Checking the dimensions of the dataset.


2871

In [None]:
# Checking the dimensions of the dataset.



961

In [None]:
# B. Matrix Creation


username_id,0,1,2,3,4,5,6,7,8,9,...,2861,2862,2863,2864,2865,2866,2867,2868,2869,2870
place_slug_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
956,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
957,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
958,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
959,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# Matrix Creation



username_id,0,1,2,3,4,5,6,7,8,9,...,2861,2862,2863,2864,2865,2866,2867,2868,2869,2870
place_slug_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
956,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
957,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
958,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
959,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# Matrix Transpose


(961, 2871)

### **2.2.2 Perform Dimensionality Reduction**
This section uses Truncated Singular Value Decomposition (SVD) to reduce the dimensionality of the user-item interaction matrix.

In [None]:
# Importing necessary libraries.
from sklearn.decomposition import TruncatedSVD


961

(961, 2871)

In [None]:
#latent features capture a user's preference for certain types of places


(961, 5)

2871

### **2.2.3 Calculate The Similarity Matrix**


In [None]:
# Importing necessary libraries.
from sklearn.metrics.pairwise import cosine_distances

In [None]:
# Calculate the cosine distances between all pairs of places in the matrix_places


(961, 961)

In [None]:
# item-based similarity matrix


array([[0.        , 0.93737257, 0.94042055, ..., 0.55101577, 0.94479183,
        0.66700793],
       [0.93737257, 0.        , 0.2219977 , ..., 0.55865144, 1.01683665,
        0.55894041],
       [0.94042055, 0.2219977 , 0.        , ..., 0.42547908, 0.90915149,
        0.12785817],
       ...,
       [0.55101577, 0.55865144, 0.42547908, ..., 0.        , 0.35913659,
        0.36126287],
       [0.94479183, 1.01683665, 0.90915149, ..., 0.35913659, 0.        ,
        0.89817704],
       [0.66700793, 0.55894041, 0.12785817, ..., 0.36126287, 0.89817704,
        0.        ]])

In [None]:
# transform and inverse_transform - 'ubud-bali-indonesia'


array([891])

In [None]:
# check index 891 for place_slug_id


Unnamed: 0_level_0,count
place,Unnamed: 1_level_1
Ubud,460
"Ubud, Bali",13


In [None]:
# find the most similar indices using np.argsort


array([891, 244, 450, 676, 803])

In [None]:
# check index 234 for place_slug_id


Unnamed: 0_level_0,count
place,Unnamed: 1_level_1
Darwin,9


Finally, we recommend items that these similar users have interacted with but the target user has not.

In [None]:
# Find the name of the most similar indices using inverse_transform


array(['ubud-bali-indonesia', 'darwin-australia', 'ko-pha-ngan-thailand',
       'phnom-penh-cambodia', 'siem-reap-cambodia', 'gurgaon-india',
       'ho-chi-minh-city-vietnam', 'riga-latvia', 'kuala-lumpur-malaysia',
       'taipei-taiwan'], dtype=object)

In [None]:
# Define a function to perform this process
def find_similar_places(place, num_similar):
    # Convert the place name to its corresponding numerical index
    place_index = le_items.transform([place])[0]

    # Retrieve the cosine distances for this place from the matrix
    distances = cosine_distance_matrix_place_user_dr[place_index, :]

    # Get indices of num_similar smallest distances
    closest_indices = np.argsort(distances)[:num_similar]

    # Convert these indices back to place names
    closest_places = le_items.inverse_transform(closest_indices)

    return closest_places

In [None]:
# check the function
find_similar_places('ko-pha-ngan-thailand', 10)

array(['ko-pha-ngan-thailand', 'ubud-bali-indonesia', 'darwin-australia',
       'vietnam', 'ho-chi-minh-city-vietnam', 'canberra-australia',
       'siem-reap-cambodia', 'phnom-penh-cambodia', 'delhi-india',
       'taipei-taiwan'], dtype=object)

####**Exercise 4: Implementing a item-based recommender system using both username and country as input parameters**






To build a recommender system, we should proceed with the following three steps:

- Step 1: Label Encoding and Matrix Creation
- Step 2: Perform Dimensionality Reduction
- Step 3: Calculate the Similarity Matrix

In [None]:
# Step 1: Label Encoding and Matrix Creation



184

username_id,0,1,2,3,4,5,6,7,8,9,...,2861,2862,2863,2864,2865,2866,2867,2868,2869,2870
country_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
179,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
180,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
181,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
182,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


username_id,0,1,2,3,4,5,6,7,8,9,...,2861,2862,2863,2864,2865,2866,2867,2868,2869,2870
country_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
179,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
180,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
181,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
182,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# Step 2: Perform Dimensionality Reduction


(184, 5)

In [None]:
# Step 3: Calculate the Similarity Matrix
from sklearn.metrics.pairwise import cosine_distances



(184, 184)

array([[0.00000000e+00, 4.04511641e-01, 6.56767853e-01, ...,
        4.59136978e-01, 4.41932065e-01, 4.51022069e-01],
       [4.04511641e-01, 0.00000000e+00, 4.72167878e-01, ...,
        6.88382891e-01, 6.77192184e-01, 4.36866662e-01],
       [6.56767853e-01, 4.72167878e-01, 0.00000000e+00, ...,
        3.74260612e-01, 3.70705809e-01, 1.60250978e-01],
       ...,
       [4.59136978e-01, 6.88382891e-01, 3.74260612e-01, ...,
        0.00000000e+00, 2.24800531e-04, 1.25824401e-01],
       [4.41932065e-01, 6.77192184e-01, 3.70705809e-01, ...,
        2.24800531e-04, 0.00000000e+00, 1.22962936e-01],
       [4.51022069e-01, 4.36866662e-01, 1.60250978e-01, ...,
        1.25824401e-01, 1.22962936e-01, 0.00000000e+00]])

In [None]:
le_country.transform(['US'])

array([171])

In [None]:
le_country.inverse_transform([171])

array(['US'], dtype=object)

In [None]:
np.argsort(cosine_distance_matrix_country_user_dr[171,:])[:5]

array([171, 145,  93,  34,  66])

In [None]:
le_country.inverse_transform(np.argsort(cosine_distance_matrix_country_user_dr[171,:])[:10])

array(['US', 'SC', 'LB', 'CM', 'GT', 'JM', 'PR', 'HN', 'HT', 'SL'],
      dtype=object)

In [None]:
df_trips[df_trips.country_code == 'SC']['country']

Unnamed: 0,country
5142,Seychelles
5155,Seychelles
5625,Seychelles
5626,Seychelles
5627,Seychelles
10127,Seychelles
20167,Seychelles
22904,Seychelles
34017,Seychelles
35187,Seychelles


In [None]:
le_country.transform(['DK'])

array([45])

In [None]:
np.argsort(cosine_distance_matrix_country_user_dr[45,:])[:5]

array([ 45,  31,  42, 122,  49])

In [None]:
le_country.inverse_transform(np.argsort(cosine_distance_matrix_country_user_dr[45,:])[:10])

array(['DK', 'CH', 'CZ', 'NO', 'EE', 'ZA', 'VE', 'IE', 'IT', 'PA'],
      dtype=object)

In [None]:
df_trips[df_trips.country_code == 'CH']['country']

Unnamed: 0,country
162,Switzerland
434,Switzerland
519,Switzerland
908,Switzerland
909,Switzerland
...,...
45736,Switzerland
46023,Switzerland
46024,Switzerland
46025,Switzerland
