<a href="https://colab.research.google.com/github/Sugam1530/Productionization-of-ML-Systems/blob/main/Travel_Recommandation_System_Streamlit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Travel Recommendation System



##### **Project Type**    - EDA/Classification
##### **Contribution**    - Individual


# **Project Summary -**

### Project Summary: Hotel Recommendation System

This project focuses on building a **Hotel Recommendation System** using collaborative filtering techniques and deploying it as an interactive web application through **Streamlit** and **ngrok**. The project was developed in **Google Colab** and leverages **Singular Value Decomposition (SVD)** from the **Surprise** library to generate personalized hotel recommendations for users based on historical data.

#### Key Features:
1. **Dataset Preparation:**
   - The dataset used contains columns such as `userCode`, `name`, `place`, `days`, `price`, `total`, `day_of_week`, and `month`.
   - Basic **Exploratory Data Analysis (EDA)** and feature engineering were performed to clean and prepare the data for model training.

2. **Collaborative Filtering with SVD:**
   - **SVD** (Singular Value Decomposition), a popular collaborative filtering algorithm, was used to generate personalized hotel recommendations.
   - The model was trained to predict user preferences for different hotels based on past interactions, considering factors such as `userCode` and `place`.

3. **Streamlit Web Application:**
   - The recommendation system was integrated into a **Streamlit** web application for easy interaction and visualization.
   - Users can input their `userCode` and select a destination `place` to get hotel recommendations.

4. **Filtering and Sorting Options:**
   - Additional functionality was added to allow users to filter hotels by price range and sort the recommendations by price (either from low to high or high to low).
   - This enhanced the user experience by providing more control over the results.

5. **Deployment with ngrok:**
   - The web application was deployed using **ngrok**, which exposes the local Streamlit app to the internet, making it accessible via a public URL.
   - This approach allowed seamless deployment and testing directly from Google Colab.

6. **User-Friendly Interface:**
   - The app provides a simple and intuitive interface for selecting destination places, entering user codes, and applying filters.
   - Results are displayed in a clear and concise format, allowing users to easily find relevant hotels based on their preferences.

#### Project Outcomes:
- The project successfully implemented a **collaborative filtering-based recommendation system** that provides personalized hotel recommendations.
- By integrating **Streamlit** with **ngrok**, the project demonstrates a practical approach to deploying data science models as interactive web applications directly from Google Colab.
- The addition of filtering and sorting features further enhances the utility of the system, making it adaptable to real-world scenarios.

This project showcases the application of machine learning techniques to the travel and tourism industry, offering personalized recommendations in an accessible and user-friendly format.

# **GitHub Link -**

https://github.com/Sugam1530/Productionization-of-ML-Systems

# **Problem Statement**


**Build a recommendation model to provide hotel suggestions based on user preferences and historical data. Develop a Streamlit web application to display insights and visualizations derived from the deployed travel recommendation model, offering an interactive and user-friendly interface for data exploration.**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
!pip install surprise



In [None]:
# Import Libraries
from google.colab import drive
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LogisticRegression
import joblib
import pickle
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from flask import Flask, request, jsonify
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from surprise import Dataset, Reader, SVD, KNNBasic, SVDpp
from surprise.model_selection import cross_validate
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
file_path = '/content/drive/MyDrive/Colab Notebooks/travel_capstone/hotels.csv'

In [None]:
hotels_df = pd.read_csv(file_path)

### Dataset First View

In [None]:
hotels_df.tail()

Unnamed: 0,userCode,name,place,days,price,total,day_of_week,month
40547,1339,4,1,3,247.62,742.86,3,6
40548,1339,4,1,1,247.62,247.62,3,6
40549,1339,5,2,3,60.39,181.17,3,7
40550,1339,5,2,3,60.39,181.17,3,7
40551,1339,3,4,4,242.88,971.52,3,7


In [None]:
hotels_df.nunique()

Unnamed: 0,0
travelCode,40552
userCode,1310
name,9
place,9
days,4
price,9
total,36
date,199


### Dataset Rows & Columns count

In [None]:
hotels_df.shape

(40552, 8)

### Dataset Information

In [None]:
hotels_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40552 entries, 0 to 40551
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   travelCode  40552 non-null  int64  
 1   userCode    40552 non-null  int64  
 2   name        40552 non-null  object 
 3   place       40552 non-null  object 
 4   days        40552 non-null  int64  
 5   price       40552 non-null  float64
 6   total       40552 non-null  float64
 7   date        40552 non-null  object 
dtypes: float64(2), int64(3), object(3)
memory usage: 2.5+ MB


#### Duplicate Values

In [None]:
hotels_df.duplicated().sum()

0

#### Missing Values/Null Values

In [None]:
hotels_df.isnull().sum()

Unnamed: 0,0
travelCode,0
userCode,0
name,0
place,0
days,0
price,0
total,0
date,0


### What did you know about your dataset?

These 3 datasets are perfect datasets to do ML operations. There is not null values or not even any duplication of values.

## ***2. Understanding Your Variables***

In [None]:
hotels_df.columns

Index(['travelCode', 'userCode', 'name', 'place', 'days', 'price', 'total',
       'date'],
      dtype='object')

In [None]:
hotels_df.describe()

Unnamed: 0,travelCode,userCode,days,price,total
count,40552.0,40552.0,40552.0,40552.0,40552.0
mean,67911.794461,666.963726,2.499679,214.439554,536.229513
std,39408.199333,391.136794,1.119326,76.742305,319.331482
min,0.0,0.0,1.0,60.39,60.39
25%,33696.75,323.0,1.0,165.99,247.62
50%,67831.0,658.0,2.0,242.88,495.24
75%,102211.25,1013.0,4.0,263.41,742.86
max,135942.0,1339.0,4.0,313.02,1252.08


### Check Unique Values for each variable.

In [None]:
hotels_df.nunique()

Unnamed: 0,0
travelCode,40552
userCode,1310
name,9
place,9
days,4
price,9
total,36
date,199


In [None]:
label_encoders = {}
for column in ['name', 'place']:
    le = LabelEncoder()
    hotels_df[column] = le.fit_transform(hotels_df[column])
    label_encoders[column] = le

In [None]:
hotels_df['date'] = pd.to_datetime(hotels_df['date'])
hotels_df['day_of_week'] = hotels_df['date'].dt.dayofweek
hotels_df['month'] = hotels_df['date'].dt.month

In [None]:
hotels_df = hotels_df.drop(columns=['date'])

In [None]:
hotels_df = hotels_df.drop(columns=['travelCode'])
# hotels_df = hotels_df.drop(columns=['userCode'])

In [None]:
hotels_df.head()

Unnamed: 0,userCode,name,place,days,price,total,day_of_week,month
0,0,0,3,4,313.02,1252.08,3,9
1,0,7,7,2,263.41,526.82,3,10
2,0,7,7,3,263.41,790.23,3,11
3,0,7,7,4,263.41,1053.64,3,12
4,0,0,3,1,313.02,313.02,3,12


In [None]:
train_df, test_df = train_test_split(hotels_df, test_size=0.2, random_state=42)

In [None]:
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(hotels_df[['userCode', 'name', 'price']], reader)
data

<surprise.dataset.DatasetAutoFolds at 0x78df5ade2110>

In [None]:
# Build and evaluate the SVD model
algo = SVD()
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

# Train the model on the entire dataset
trainset = data.build_full_trainset()
algo.fit(trainset)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    223.4350223.3715222.9544222.2830223.2361223.05600.4204  
MAE (testset)     209.7316209.7747209.7826208.5395209.3693209.43950.4755  
Fit time          1.15    1.81    1.59    1.12    1.29    1.39    0.27    
Test time         0.25    0.32    0.07    0.17    0.09    0.18    0.09    


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x78df5ade2500>

In [None]:
with open('model.pkl', 'wb') as f:
    pickle.dump(algo, f)

In [None]:
from google.colab import files

# Download the model.pkl file
files.download('model.pkl')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
!pip install pyngrok streamlit

Collecting pyngrok
  Downloading pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB)
Collecting streamlit
  Downloading streamlit-1.38.0-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting tenacity<9,>=8.1.0 (from streamlit)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit)
  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting watchdog<5,>=2.1.5 (from streamlit)
  Downloading watchdog-4.0.2-py3-none-manylinux2014_x86_64.whl.metadata (38 kB)
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3.0.7->streamlit)
  Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 kB)
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.19,<4,>=3.0.7->streamlit)
  Downloading smmap-5.0.1-py3-none-any.whl.metadata (4.3 kB)
Downloading pyngrok-7.2.0-py3-none-any.whl (22 kB)
Downloading str

In [None]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip


--2024-09-08 18:06:51--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 54.237.133.81, 52.202.168.65, 54.161.241.46, ...
Connecting to bin.equinox.io (bin.equinox.io)|54.237.133.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13921656 (13M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip’


2024-09-08 18:06:51 (58.2 MB/s) - ‘ngrok-stable-linux-amd64.zip’ saved [13921656/13921656]

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


In [None]:
!./ngrok authtoken '2V1dW3QU9dMtAmG2PST5tArWbtq_54ssb8xUdLtCh5Z5uvFhL'

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


In [None]:
# Start ngrok to tunnel Streamlit port 8501
!nohup streamlit run app.py &
!nohup streamlit run --server.port 8501 app.py > output.log &


nohup: appending output to 'nohup.out'
nohup: redirecting stderr to stdout


In [None]:
# Remove any existing ngrok installations
!rm -f /usr/local/bin/ngrok
!pip uninstall -y pyngrok ngrok


Found existing installation: pyngrok 7.2.0
Uninstalling pyngrok-7.2.0:
  Successfully uninstalled pyngrok-7.2.0
[0m

In [None]:
# Reinstall pyngrok
!pip install pyngrok

# Download the latest version of ngrok directly
!wget -q -O ngrok.zip https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.zip
!unzip -o ngrok.zip -d /usr/local/bin/


Collecting pyngrok
  Using cached pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB)
Using cached pyngrok-7.2.0-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.0
Archive:  ngrok.zip
  inflating: /usr/local/bin/ngrok    


In [None]:
# Download the latest version of ngrok
!wget -q -O ngrok.zip https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.zip

# Unzip the downloaded file
!unzip -o ngrok.zip

# Move ngrok to /usr/local/bin to make it executable from anywhere
!mv ngrok /usr/local/bin/ngrok

# Verify the installation
!ngrok version


Archive:  ngrok.zip
  inflating: ngrok                   
ngrok version 3.15.1


In [None]:
from pyngrok import ngrok

# Set your ngrok auth token
NGROK_AUTH_TOKEN = '2V1dW3QU9dMtAmG2PST5tArWbtq_54ssb8xUdLtCh5Z5uvFhL'
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Start ngrok tunnel to expose port 8501
public_url = ngrok.connect(8501)
print('Public URL:', public_url)

streamlit_script = f"""
import streamlit as st
import pandas as pd
import joblib

# Load the SVD model
svd_model = joblib.load('model.pkl')

# Load the dataset
hotels_df = pd.read_csv('{file_path}')

# Define your recommendation function with filters
def get_recommendations(user_code, place, model, data, min_price, max_price, sort_option):
    # Filter hotels based on the input place and price range
    place_filtered_df = data[(data['place'].str.lower() == place.lower()) &
                             (data['price'] >= min_price) & (data['price'] <= max_price)]

    if place_filtered_df.empty:
        return pd.DataFrame()  # Return an empty DataFrame if no hotels are found for the place

    # Generate predictions for each hotel in the filtered list
    hotel_ids = place_filtered_df['userCode'].unique()  # Assuming 'userCode' is the hotel/item ID

    predictions = []
    for hotel_id in hotel_ids:
        # Predict the rating for each hotel using the model
        prediction = model.predict(uid=user_code, iid=hotel_id)
        predictions.append((hotel_id, prediction.est))  # 'est' is the estimated rating

    # Sort the hotels by predicted rating
    top_predictions = sorted(predictions, key=lambda x: x[1], reverse=True)[:5]  # Top 5 recommendations

    # Get the top hotels' details
    top_hotel_ids = [pred[0] for pred in top_predictions]
    top_hotels = place_filtered_df[place_filtered_df['userCode'].isin(top_hotel_ids)]

    # Apply sorting based on the selected option
    if sort_option == "Low to High":
        top_hotels = top_hotels.sort_values(by='price', ascending=True)
    elif sort_option == "High to Low":
        top_hotels = top_hotels.sort_values(by='price', ascending=False)

    # Define the columns we want to display, checking if they exist
    columns_to_display = ['name', 'place', 'price', 'days', 'total']

    if 'day_of_week' in top_hotels.columns:
        columns_to_display.append('day_of_week')
    if 'month' in top_hotels.columns:
        columns_to_display.append('month')

    # Return the selected columns for display
    return top_hotels[columns_to_display]

# Streamlit app interface
st.title("Hotel Recommendation System")

# Create a dropdown for the available places
available_places = hotels_df['place'].unique().tolist()
place = st.selectbox("Select Destination Place:", available_places)

# Create user input for recommendations
user_code = st.number_input("Enter User Code:", min_value=1, max_value=hotels_df['userCode'].max())

# Price range filter
min_price = st.number_input("Minimum Price:", min_value=0, value=int(hotels_df['price'].min()))
max_price = st.number_input("Maximum Price:", min_value=0, value=int(hotels_df['price'].max()))

# Sorting option
sort_option = st.selectbox("Sort by Price:", ["None", "Low to High", "High to Low"])

if st.button("Recommend Hotels"):
    recommendations = get_recommendations(user_code, place, svd_model, hotels_df, min_price, max_price, sort_option)
    if recommendations.empty:
        st.write("No recommendations found for the specified place.")
    else:
        st.write("Recommended Hotels:")
        st.write(recommendations)
"""

with open('app.py', 'w') as f:
    f.write(streamlit_script)

# Run your Streamlit app on port 8501
!streamlit run app.py --server.port 8501 &

# Keep the cell running to keep the tunnel alive
import time
while True:
    time.sleep(100)




Public URL: NgrokTunnel: "https://84ca-34-138-152-109.ngrok-free.app" -> "http://localhost:8501"

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.138.152.109:8501[0m
[0m


# **Conclusion**

The Hotel Recommendation System project successfully demonstrates the power of collaborative filtering and machine learning in providing personalized hotel recommendations. By leveraging Singular Value Decomposition (SVD), the system was able to generate accurate predictions based on users' past interactions with hotels.

The integration of the recommendation model into a Streamlit web application, deployed via ngrok, showcases a practical approach to building and deploying data-driven applications in a cloud-based environment like Google Colab. The user-friendly interface, combined with the added features of filtering and sorting, makes the application versatile and adaptable to different user needs.

This project highlights the importance of combining machine learning models with intuitive front-end interfaces to create powerful, accessible tools that can enhance user experiences. Overall, the project is a valuable demonstration of deploying real-world recommendation systems in the travel and tourism industry, with potential applications extending to various domains where personalized recommendations are key to user satisfaction.