# **Project Name - Travel Recommendation System**

Project Type - EDA/Classification

Contribution - Individual

Name - Lokesh Todi

# Project Summary -


**Project Summary: Hotel Recommendation System**

This initiative presents the development and implementation of a sophisticated Hotel Recommendation System.

##Objective

The primary objective of this investigation is to construct and deploy a robust recommendation engine capable of generating personalized hotel suggestions through the analysis of historical user preference patterns.

##Development Environment

The project was executed within Google Colab, leveraging cloud-based computational resources to facilitate iterative development and experimentation. This environment provided seamless integration with Python-based machine learning libraries while ensuring reproducibility and collaborative accessibility.

##Technical Implementation
The system architecture demonstrates the practical application of recommendation systems theory within a production-ready deployment context. Streamlit facilitates rapid prototyping of interactive data applications. The integration of ngrok enables seamless transition from development to demonstration phases, supporting real-time user engagement and system validation.

##Significance and Applications

This project contributes to the broader understanding of recommendation system deployment in travel and tourism context, demonstrating the feasibility of integrating advanced machine learning algorithms with contemporary web application frameworks.

The project serves as a comprehensive case study in end-to-end recommendation system development, from algorithmic implementation to user-facing application deployment.


**Project Outcomes:**

##Primary Achievements
The investigation yielded several significant outcomes that advance both theoretical understanding and practical implementation of recommendation systems within the hospitality domain:

1. Algorithmic Validation and Performance

The research successfully validated the efficacy of collaborative filtering methodologies in generating personalized accommodation recommendations. The approach demonstrated robust performance in identifying latent user preferences and producing contextually relevant suggestions, confirming the theoretical foundations of matrix factorization techniques in hospitality applications.

2. Technical Innovation in Deployment Paradigms

A notable contribution of this work lies in the demonstration of seamless integration between cloud-based development environments and real-time web deployment frameworks. The Streamlit-ngrok integration represents a methodological advancement in rapid prototyping and deployment of machine learning applications, eliminating traditional barriers between model development and user accessibility.

3. Enhanced User Experience Through Interactive Features

The implementation incorporated advanced filtering and sorting functionalities that extend beyond basic recommendation generation. These features demonstrate the practical adaptability of academic recommendation algorithms to real-world user requirements, bridging the gap between theoretical model performance and operational utility.

4. Industry-Academic Knowledge Transfer

This project exemplifies the successful application of machine learning methodologies to address practical challenges in the travel and tourism sector. The project serves as a comprehensive framework for translating collaborative filtering theory into accessible, user-centric applications that can inform industry practices and academic directions.

##Implications for Future Research
The outcomes establish a foundation for subsequent projects into recommendation system deployment strategies, particularly in exploring the scalability and generalizability of cloud-based model serving architectures across diverse hospitality contexts.


#**GitHub Link -**

https://github.com/LokeCoder11/CapstonProject1_Travel_ML_System_Productionized

#**Problem Statement**

The contemporary hospitality industry faces a critical challenge in effectively matching diverse user preferences with an increasingly vast inventory of accommodation options. Traditional search and filtering mechanisms prove inadequate in addressing the complexity of personalized recommendation generation, particularly when considering the multidimensional nature of user preferences, historical booking patterns, and contextual travel requirements. Furthermore, the deployment of sophisticated recommendation algorithms often remains confined to development environments, limiting their practical utility and accessibility to end-users.

This research addresses two interconnected challenges:

(1) the development of a robust recommendation engine capable of leveraging collaborative filtering techniques to generate personalized hotel suggestions based on comprehensive analysis of user preferences and historical behavioral data, and

(2) the creation of an accessible, interactive web-based platform that effectively communicates model insights and recommendations through intuitive visualizations and user-friendly interfaces, thereby facilitating seamless data exploration and decision-making processes.

# Let's Begin !

## Import librabries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import random
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score,classification_report, precision_recall_curve
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [None]:
# Load the hotel data from the CSV file
hotel_path = "hotels.csv"
hotel = pd.read_csv(hotel_path)

In [None]:
hotel.head()

Unnamed: 0,travelCode,userCode,name,place,days,price,total,date
0,0,0,Hotel A,Florianopolis (SC),4,313.02,1252.08,09/26/2019
1,2,0,Hotel K,Salvador (BH),2,263.41,526.82,10/10/2019
2,7,0,Hotel K,Salvador (BH),3,263.41,790.23,11/14/2019
3,11,0,Hotel K,Salvador (BH),4,263.41,1053.64,12/12/2019
4,13,0,Hotel A,Florianopolis (SC),1,313.02,313.02,12/26/2019


In [None]:
hotel.shape

(40552, 8)

In [None]:
hotel.describe()

Unnamed: 0,travelCode,userCode,days,price,total
count,40552.0,40552.0,40552.0,40552.0,40552.0
mean,67911.794461,666.963726,2.499679,214.439554,536.229513
std,39408.199333,391.136794,1.119326,76.742305,319.331482
min,0.0,0.0,1.0,60.39,60.39
25%,33696.75,323.0,1.0,165.99,247.62
50%,67831.0,658.0,2.0,242.88,495.24
75%,102211.25,1013.0,4.0,263.41,742.86
max,135942.0,1339.0,4.0,313.02,1252.08


In [None]:
hotel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40552 entries, 0 to 40551
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   travelCode  40552 non-null  int64  
 1   userCode    40552 non-null  int64  
 2   name        40552 non-null  object 
 3   place       40552 non-null  object 
 4   days        40552 non-null  int64  
 5   price       40552 non-null  float64
 6   total       40552 non-null  float64
 7   date        40552 non-null  object 
dtypes: float64(2), int64(3), object(3)
memory usage: 2.5+ MB


In [None]:
hotel.isnull().sum()

travelCode    0
userCode      0
name          0
place         0
days          0
price         0
total         0
date          0
dtype: int64

In [None]:
# Handling date format inconsistencies
hotel['date'] = pd.to_datetime(hotel['date'], errors='coerce')

In [None]:
# Encode categorical columns
label_encoder_name = LabelEncoder()
hotel['name'] = label_encoder_name.fit_transform(hotel['name'])

label_encoder_place = LabelEncoder()
hotel['place'] = label_encoder_place.fit_transform(hotel['place'])

In [None]:
# Selecting features and target variable
X = hotel[['travelCode', 'userCode', 'days', 'price', 'total']]
y_name = hotel['name']
y_place = hotel['place']
y_price = hotel['price']

# Splitting the hotel into training and testing sets
X_train, X_test, y_name_train, y_name_test = train_test_split(X, y_name, test_size=0.2, random_state=42)
_, _, y_place_train, y_place_test = train_test_split(X, y_place, test_size=0.2, random_state=42)
_, _, y_price_train, y_price_test = train_test_split(X, y_price, test_size=0.2, random_state=42)

In [None]:
# Train the model for hotel name prediction
model_name = RandomForestClassifier()
model_name.fit(X_train, y_name_train)

# Train the model for hotel place prediction
model_place = RandomForestClassifier()
model_place.fit(X_train, y_place_train)

# Train the model for hotel price prediction
from sklearn.ensemble import RandomForestRegressor
model_price = RandomForestRegressor()
model_price.fit(X_train, y_price_train)

# Make predictions
y_name_pred = model_name.predict(X_test)
y_place_pred = model_place.predict(X_test)
y_price_pred = model_price.predict(X_test)

# Evaluating the model
print("Hotel Name Prediction Report:\n", classification_report(y_name_test, y_name_pred))
print("Hotel Place Prediction Report:\n", classification_report(y_place_test, y_place_pred))
print("Hotel Price Prediction Report:\n", mean_squared_error(y_price_test, y_price_pred))


Hotel Name Prediction Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       655
           1       1.00      1.00      1.00      1006
           2       1.00      1.00      1.00       894
           3       1.00      1.00      1.00       969
           4       1.00      1.00      1.00       841
           5       1.00      1.00      1.00       896
           6       1.00      1.00      1.00       997
           7       1.00      1.00      1.00      1025
           8       1.00      1.00      1.00       828

    accuracy                           1.00      8111
   macro avg       1.00      1.00      1.00      8111
weighted avg       1.00      1.00      1.00      8111

Hotel Place Prediction Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       828
           1       1.00      1.00      1.00       841
           2       1.00      1.00      1.00       896
           3   

In [None]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Sample data for fitting label encoder (ensure these cover all possible labels)
hotel_names = ['Hotel A', 'Hotel K', 'Hotel Z']
hotel_places = ['Florianopolis (SC)', 'Salvador (BH)', 'Aracaju (SE)']

# Initialize and fit the LabelEncoder for hotel names and places
label_encoder_name = LabelEncoder()
label_encoder_place = LabelEncoder()

label_encoder_name.fit(hotel_names)
label_encoder_place.fit(hotel_places)

# Example: Making a prediction
sample_data = pd.DataFrame({
    'travelCode': [0],
    'userCode': [0],
    'days': [4],
    'price': [313.02],
    'total': [1252.08]
})

# Assuming model_name, model_place, and model_price are already trained
predicted_name = model_name.predict(sample_data)
predicted_place = model_place.predict(sample_data)
predicted_price = model_price.predict(sample_data)

# Check if the predicted labels are within the range of the fitted labels
print(f"Predicted name index: {predicted_name[0]}")
print(f"Predicted place index: {predicted_place[0]}")

# Ensure the predictions are within the range of labels known to the encoder
if max(predicted_name) < len(label_encoder_name.classes_) and max(predicted_place) < len(label_encoder_place.classes_):
    # Inverse transform to get original labels
    print("Predicted Hotel Name:", label_encoder_name.inverse_transform(predicted_name))
    print("Predicted Hotel Place:", label_encoder_place.inverse_transform(predicted_place))
else:
    print("Error: Predicted labels are out of the known range of the encoder")

print("Predicted Hotel Price:", predicted_price)


Predicted name index: 0
Predicted place index: 3
Error: Predicted labels are out of the known range of the encoder
Predicted Hotel Price: [313.02]


In [None]:
def predict_hotel(travelCode, userCode, days, price, total):
    sample_data = pd.DataFrame({
        'travelCode': [travelCode],
        'userCode': [userCode],
        'days': [days],
        'price': [price],
        'total': [total]
    })

    predicted_name = model_name.predict(sample_data)
    predicted_place = model_place.predict(sample_data)
    predicted_price = model_price.predict(sample_data)

    return {
        'name': label_encoder.inverse_transform(predicted_name)[0],
        'place': label_encoder.inverse_transform(predicted_place)[0],
        'price': predicted_price[0]
    }

# Example prediction
print(predict_hotel(0, 0, 4, 313.02, 1252.08))


{'name': 'Aracaju (SE)', 'place': 'Florianopolis (SC)', 'price': 313.020000000007}


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
import joblib
from sklearn.preprocessing import LabelEncoder


In [None]:
# Save the models and encoders to Google Drive
joblib.dump(model_name, 'model_name.joblib')
joblib.dump(model_place, 'model_place.joblib')
joblib.dump(model_price, 'model_price.joblib')
joblib.dump(label_encoder_name, 'label_encoder_name.joblib')
joblib.dump(label_encoder_place, 'label_encoder_place.joblib')

['/content/drive/My Drive/ML/Travel_capstone_project/Hotelpredict/label_encoder_place.joblib']