<a href="https://colab.research.google.com/github/Ayushx29/Voyage-Analytics-Integrating-MLOps-in-Travel-Productionization-of-ML-Systems/blob/main/Hotel_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Voyage Analytics: Integrating MLOps in Travel Productionization of ML Systems


*  **Project Type**    - Regression/Classification

*   **Contribution**    - Individual

*    **Name -** Ayush Bhagat





# **Project Summary -**

This project is an end-to-end machine learning application that predicts hotel prices based on user and travel attributes. It uses Random Forest models for both regression and classification tasks, and is deployed as a Flask web application with a styled HTML form interface and Ngrok integration for easy public access.

# **GitHub Link -**

https://github.com/Ayushx29/Voyage-Analytics-Integrating-MLOps-in-Travel-Productionization-of-ML-Systems

# **Problem Statement**


This capstone project explores the intersection of data analytics and machine learning in the travel and tourism industry by leveraging datasets on users, flights, and hotels. The goal is to develop predictive models for flight price forecasting, hotel recommendations, and gender classification to enhance travel personalization and decision-making. Additionally, the project incorporates MLOps techniques such as model deployment, automation, and scalability using Flask, Docker, Kubernetes, Jenkins, Apache Airflow, and MLFlow, ensuring a seamless and efficient machine learning pipeline.



# ***Let's Begin !***

In [43]:
import numpy as np
import pandas as pd
import seaborn as sns
import random
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier , RandomForestRegressor
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score,classification_report, precision_recall_curve
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [44]:
# Load the hotel data from the CSV file
hotel_path = "/content/drive/MyDrive/AlmaBetter Masters Projects/Voyage Analytics/travel_capstone/hotels.csv"
hotel = pd.read_csv(hotel_path)

In [45]:
hotel.head(10)

Unnamed: 0,travelCode,userCode,name,place,days,price,total,date
0,0,0,Hotel A,Florianopolis (SC),4,313.02,1252.08,09/26/2019
1,2,0,Hotel K,Salvador (BH),2,263.41,526.82,10/10/2019
2,7,0,Hotel K,Salvador (BH),3,263.41,790.23,11/14/2019
3,11,0,Hotel K,Salvador (BH),4,263.41,1053.64,12/12/2019
4,13,0,Hotel A,Florianopolis (SC),1,313.02,313.02,12/26/2019
5,15,0,Hotel BD,Natal (RN),2,242.88,485.76,01/09/2020
6,22,0,Hotel Z,Aracaju (SE),2,208.04,416.08,02/27/2020
7,29,0,Hotel AU,Recife (PE),4,312.83,1251.32,04/16/2020
8,32,0,Hotel AF,Sao Paulo (SP),2,139.1,278.2,05/07/2020
9,33,0,Hotel K,Salvador (BH),4,263.41,1053.64,05/14/2020


In [46]:
hotel.shape

(40552, 8)

In [47]:
hotel.describe()

Unnamed: 0,travelCode,userCode,days,price,total
count,40552.0,40552.0,40552.0,40552.0,40552.0
mean,67911.794461,666.963726,2.499679,214.439554,536.229513
std,39408.199333,391.136794,1.119326,76.742305,319.331482
min,0.0,0.0,1.0,60.39,60.39
25%,33696.75,323.0,1.0,165.99,247.62
50%,67831.0,658.0,2.0,242.88,495.24
75%,102211.25,1013.0,4.0,263.41,742.86
max,135942.0,1339.0,4.0,313.02,1252.08


In [48]:
hotel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40552 entries, 0 to 40551
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   travelCode  40552 non-null  int64  
 1   userCode    40552 non-null  int64  
 2   name        40552 non-null  object 
 3   place       40552 non-null  object 
 4   days        40552 non-null  int64  
 5   price       40552 non-null  float64
 6   total       40552 non-null  float64
 7   date        40552 non-null  object 
dtypes: float64(2), int64(3), object(3)
memory usage: 2.5+ MB


In [49]:
hotel.isnull().sum()

Unnamed: 0,0
travelCode,0
userCode,0
name,0
place,0
days,0
price,0
total,0
date,0


In [50]:
# Handling date format inconsistencies
hotel['date'] = pd.to_datetime(hotel['date'], errors='coerce')

In [51]:
# Encode categorical columns
label_encoder_name = LabelEncoder()
hotel['name'] = label_encoder_name.fit_transform(hotel['name'])

label_encoder_place = LabelEncoder()
hotel['place'] = label_encoder_place.fit_transform(hotel['place'])

In [52]:
# Selecting features and target variable
X = hotel[['travelCode', 'userCode', 'days', 'price', 'total']]
y_name = hotel['name']
y_place = hotel['place']
y_price = hotel['price']

# Train-test split
X_train, X_test, y_name_train, y_name_test = train_test_split(X, y_name, test_size=0.2, random_state=42)
_, _, y_place_train, y_place_test = train_test_split(X, y_place, test_size=0.2, random_state=42)
_, _, y_price_train, y_price_test = train_test_split(X, y_price, test_size=0.2, random_state=42)

In [53]:
# Train models
model_name = RandomForestClassifier()
model_name.fit(X_train, y_name_train)

model_place = RandomForestClassifier()
model_place.fit(X_train, y_place_train)

model_price = RandomForestRegressor()
model_price.fit(X_train, y_price_train)

# Predictions
y_name_pred = model_name.predict(X_test)
y_place_pred = model_place.predict(X_test)
y_price_pred = model_price.predict(X_test)

# Evaluation
print("Hotel Name Prediction Report:\n", classification_report(y_name_test, y_name_pred))
print("Hotel Place Prediction Report:\n", classification_report(y_place_test, y_place_pred))
print("Hotel Price Prediction MSE:\n", mean_squared_error(y_price_test, y_price_pred))

Hotel Name Prediction Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       655
           1       1.00      1.00      1.00      1006
           2       1.00      1.00      1.00       894
           3       1.00      1.00      1.00       969
           4       1.00      1.00      1.00       841
           5       1.00      1.00      1.00       896
           6       1.00      1.00      1.00       997
           7       1.00      1.00      1.00      1025
           8       1.00      1.00      1.00       828

    accuracy                           1.00      8111
   macro avg       1.00      1.00      1.00      8111
weighted avg       1.00      1.00      1.00      8111

Hotel Place Prediction Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       828
           1       1.00      1.00      1.00       841
           2       1.00      1.00      1.00       896
           3   

In [54]:
# Prediction function
def predict_hotel(travelCode, userCode, days, price, total):
    sample_data = pd.DataFrame({
        'travelCode': [travelCode],
        'userCode': [userCode],
        'days': [days],
        'price': [price],
        'total': [total]
    })

    predicted_name = model_name.predict(sample_data)
    predicted_place = model_place.predict(sample_data)
    predicted_price = model_price.predict(sample_data)

    return {
        'name': label_encoder_name.inverse_transform(predicted_name)[0],
        'place': label_encoder_place.inverse_transform(predicted_place)[0],
        'price': round(float(predicted_price[0]), 2)
    }

# Example Prediction
print(predict_hotel(0, 0, 4, 313.02, 1252.08))

{'name': 'Hotel A', 'place': 'Florianopolis (SC)', 'price': 313.02}


In [56]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
import joblib
from sklearn.preprocessing import LabelEncoder

In [57]:
# Save models and encoders
joblib.dump(model_name, 'model_name.joblib')
joblib.dump(model_place, 'model_place.joblib')
joblib.dump(model_price, 'model_price.joblib')
joblib.dump(label_encoder_name, 'label_encoder_name.joblib')
joblib.dump(label_encoder_place, 'label_encoder_place.joblib')

['label_encoder_place.joblib']

# **Streamlit App -**