# **Project Name**    - Hotel Booking Analysis 



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**            - Akash V

# **Project Summary -**

The hotel booking dataset provides valuable information for analyzing hotel bookings in terms of when to book a hotel room, the optimal length of stay, and whether or not a hotel is likely to receive a disproportionately high number of special requests. This data set includes booking information for a city hotel and a resort hotel and includes data such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces.

The data set is useful for anyone interested in the hotel industry, including hotel managers, marketing teams, and data analysts. By analyzing the data, it is possible to gain insights into factors that govern hotel bookings, which can help inform business decisions, such as pricing strategies and marketing campaigns.

One important factor to consider is the timing of bookings. The data set can be used to analyze trends in booking behavior, such as the best time of year to book a hotel room. For example, it may be found that booking a hotel room in the off-season results in lower rates, while booking during peak season results in higher rates. Additionally, the data can be used to analyze the optimal length of stay in order to get the best daily rate. This information can help inform pricing strategies and can be used to attract guests who are looking for value for their money.

Another important factor to consider is the type of guest who is making the booking. The data set includes information on the number of adults, children, and/or babies, as well as the number of available parking spaces. This information can be used to analyze the types of guests who are most likely to book a hotel room, and can help inform marketing campaigns and promotions that target specific types of guests.

Finally, the data set can be used to analyze the likelihood of a hotel receiving a disproportionately high number of special requests. Special requests can include anything from room upgrades to special accommodations for guests with disabilities. By analyzing the data, it may be possible to identify trends in special request behavior, which can help inform hotel policies and procedures.

In conclusion, the hotel booking dataset is a valuable resource for anyone interested in the hotel industry. By analyzing the data, it is possible to gain insights into factors that govern hotel bookings, which can help inform business decisions, such as pricing strategies and marketing campaigns. Ultimately, the data set can be used to improve the guest experience and increase revenue for hotels.

# **GitHub Link -**

 GitHub Link

 
 https://github.com/Akash1141/Python-Projects-EDA.git

# **Problem Statement**


The problem statement for the hotel booking dataset is to analyze the data to discover key factors that affect hotel bookings, such as the best time of year to book a room, the optimal length of stay, and the likelihood of a hotel receiving special requests. The goal is to gain insights that can inform pricing strategies and marketing campaigns to improve the guest experience and increase revenue for hotels.


#### **Define Your Business Objective?**

The objective for the hotel booking dataset are:

*   To identify patterns and trends within the data to gain insights into important factors that influence hotel bookings, such as booking timing, length of stay, and special requests. By understanding these factors, 

*   The objective is to inform business decisions, such as pricing strategies and marketing campaigns, 

*   To improve the guest experience and increase revenue for hotels. 

*   Ultimately, the objective is to optimize hotel bookings and increase profitability for hotel businesses.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing all the required Libraries

import numpy as np # This imports the NumPy library and gives it the alias "np" for convenience.
import pandas as pd # This imports the Pandas library and gives it the alias "pd" for convenience.
from numpy import math # This imports the math module from NumPy, which provides mathematical functions and constants like pi and e.
from numpy import loadtxt # This imports the loadtxt function from NumPy, which can be used to load data from text files.
from sklearn.model_selection import train_test_split # this is used to split a dataset into training and testing subsets
from sklearn.model_selection import GridSearchCV # This is used to perform hyperparameter tuning by exhaustively searching over specified parameter values for an estimator.
import seaborn as sns # This imports the Seaborn library, which provides high-level interface for creating attractive statistical graphics.  

%matplotlib inline 
# This allows plots to be displayed in the notebook itself.
import matplotlib.pyplot as plt
sns.set_style("whitegrid",{'grid.linestyle': '--'}) # This sets the plotting style of Seaborn library to "whitegrid" with dashed grid lines.
import warnings #  This line imports the warnings module, which provides a way to control warning messages.
warnings.filterwarnings("ignore") # This line sets the warning filter to "ignore", which suppresses all warning messages.


### Dataset Loading

In [None]:
data = "https://raw.githubusercontent.com/Akash1141/Python-Projects-EDA/main/Hotel%20Bookings.csv"

In [None]:
# Loading  Dataset
df = pd.read_csv(data, encoding = "ISO-8859-1") # Some times while saving the CSV File the data shall be encoded, to over come this issue in future we use this label
# encoding = "ISO-8859-1" when reading the file with pandas will ensure that the text is decoded properly and can be read correctly by the program.

### Dataset First View

In [None]:
# Dataset First Look
df.head() # This gives the first 5 rows of the dataset

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape # This provides the total number of rows and coloums present in the dataset

### Dataset Information

In [None]:
# Dataset Info
df.info() # This helps us to get to know all the coloumns of the dataset and the data type of the each and every single coloumn

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

# check for duplicates
if df.duplicated().any():
    print("There are duplicates in the dataset.")
else:
    print("There are no duplicates in the dataset.")

# Check for duplicates count
duplicates = df.duplicated()
print('\nDuplicates:\n', duplicates.sum())


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

In [None]:
# Visualizing the missing values

# Create a heatmap of missing/null values in the DataFrame
sns.heatmap(df.isnull(), cmap='coolwarm')

# Show the plot
plt.show()


In [None]:
# Visualizing the missing values, if heat map is difficult to understand.

# create a bar chart of the null values in the dataframe
df.isnull().sum().plot(kind='bar')

# Show the plot
plt.show()

### What did you know about your dataset?

The given Dataset is of Hotel Booking.

**Generally!!!**

The data is from 2 hotels namely - City Hotel and Resort Hotel with all of their facilities which are required for the customer.
This dataset also contains few elements from the customer side such as " was the booking cancled" , "date and duration" , "Are they repeated customers" and so on which shall be very helpful for the analysis with insights.

**Technically!!!**

From the above data operations made we get to know that:

The dataset has **31,994 duplicates.**

The column children has 4 missing values.
The column country has 488 missing values.
The column agent has 16,340 missing values.
The column company has 112,593 missing values.

From this information, we can see that there are a significant number of missing values in the country, agent, and company columns. Additionally, there are some missing values in the children column. It would be important to handle these missing values appropriately depending on the goals of the analysis. We can also see that there are many duplicates in the dataset which may need to be removed or handled appropriately to avoid any bias in the analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns) # This prints all the coloumns present in the dataset

In [None]:
# Dataset Describe
df.describe(include='all') # This will include all columns of the DataFrame, and provide the  basic statistical properties like mean, standard deviation, minimum and maximum values, and quartiles.

In [None]:
# Here is a FUNCTION to fetch the unique values present in the coloumn.

def get_unique_values(df, column_name):#   Returns an array of the unique values in the specified column of a pandas DataFrame, sorted in the order in which they appear in the DataFrame.

    unique_values = df[column_name].unique()
    return unique_values

In [None]:
# Calling the function by sepcifing the coloumn name of which we have to fetch the unique values.

unique_names = get_unique_values(df, 'hotel')

# Print the unique values
print(unique_names)


In [None]:
# Now to reduce the efforts we shall print the unique values with thier count and the values itself

#Function 2

def print_unique_values(df, column_name):

    unique_values = df[column_name].unique()
    num_unique_values = len(unique_values)
    print(f"The column '{column_name}' has {num_unique_values} unique values.")
    return unique_values

In [None]:
# We shall call the Unique functions with its count and values

unique_names = print_unique_values(df, 'hotel')

# Print the unique values
print(unique_names)

In [None]:
# Now we shall print all the unique values present in the coloumn in 1 go using a for loop

for column in df.columns:
    unique_values = df[column].unique()
    num_unique_values = len(unique_values)
    print(f"The column '{column}' has {num_unique_values} unique values:")
    print(unique_values)
    print()



##**By usig the above Data results we can easily describe the coloumns and their values**

### Variables Description 

**Here are the discription of all the coloumns** 

There are totally 32 coloumns  

 **1   hotel :** - The column 'hotel' has 2 unique values:
['Resort Hotel' 'City Hotel'] 

 **2   is_canceled :** -The column 'is_canceled' has 2 unique values:
[0 1] where if the booking was cancled its represented by 1 else 0
 
 **3   lead_time :** - This is the duration in days between Booking date and Arival date

 **4   arrival_date_year :** - The column 'arrival_date_year' has 3 unique values:
[2015 2016 2017], hence we can conclude that, all of these data are from the specified 3 years only.

 **5   arrival_date_month :** - This is just the Arrival Month of the customer

 **6   arrival_date_week_number :** - There are 53 weeks in a year, this are the number of the week that the customer had arrived
           
 **7   arrival_date_day_of_month :** - Arrival date of the month 
 
 **8   stays_in_weekend_nights :** - Number of weekend nights the guest stayed or booked
        
 **9   stays_in_week_nights :** - Number of week nights the guest stayed or booked
    
 **10   adults :** - Number of Adults 
                 
 **11  children :** - Number of Children      
                
 **12 babies :** - Number of Babies       

 **13  meal :** - The column 'meal' has 5 unique values:
['BB' 'FB' 'HB' 'SC' 'Undefined']

Which are :      

*   BB: Bed and Breakfast

*   FB: Full Board (includes breakfast, lunch, and dinner)

*   HB: Half Board (includes breakfast and one other meal, usually dinner)

*   SC: Self Catering (no meals included, guests are responsible for their own food)

*   Undefined: This may indicate that the meal plan was not specified or recorded for some bookings.
               

**14  country :** - This column contain the country code of the guests, The column 'country' has 178 unique values.      
                
 **15  market_segment :** - This specifies to which segment does the customer belongs to.

There are 8 unique values in this column, and they are 'Direct', 'Corporate', 'Online TA', 'Offline TA/TO', 'Complementary', 'Groups', 'Undefined', and 'Aviation'.

*   Direct: The booking was made directly with the hotel, for example through their 
website or over the phone.

*   Corporate: The booking was made through a corporate account or business travel agency.

*   Online TA: The booking was made through an online travel agency, such as Expedia or Booking.com.

*   Offline TA/TO: The booking was made through a traditional travel agency or tour operator.

*   Complementary: The booking is for a complementary or free stay, typically offered to reward loyalty or as part of a promotion.

*   Groups: The booking is for a group of travelers, such as a tour group or conference attendees.

*   Undefined: This may indicate that the market segment was not specified or recorded for some bookings.

*   Aviation: The booking is for airline crew members or other aviation-related personnel. 
           
 **16  distribution_channel :** - This specifies how the customer accessed the stay.

The column 'distribution_channel' has 5 unique values:
['Direct' 'Corporate' 'TA/TO' 'Undefined' 'GDS']

*   GDS: The hotel sells its rooms through a global distribution system, which is a computerized network used by travel agents and online travel agencies to book flights, hotels, and other travel services.
       
**17  is_repeated_guest :** - The column 'is_repeated_guest' has 2 unique values:
[0 1], which represents Yes or No 
       
 **18  previous_cancellations :**  This specifies if there was a previous cancellation, if yes the number of times is mentioned as int values.

        
 **19  previous_bookings_not_canceled :** his specifies if the booking was not canclled and the count for the same is mentioned as int values.

 **20  reserved_room_type :** - A distince value is given to the room according to the luxury, The column 'reserved_room_type' has 10 unique values:
['C' 'A' 'D' 'E' 'G' 'F' 'H' 'L' 'P' 'B']
         
 **21  assigned_room_type :** -  A distince value is given to the room according to the luxury, The column 'assigned_room_type' has 12 unique values:
['C' 'A' 'D' 'E' 'G' 'F' 'I' 'B' 'H' 'P' 'L' 'K']
      
 **22  booking_changes :** - If the booking was changes if yes the corresponding value is given.
                
 **23  deposit_type :**- The column 'deposit_type' has 3 unique values:
['No Deposit' 'Refundable' 'Non Refund']

 **24  agent :** - If the room was externally booked by the agent, the agent ID is mentioned.    

 **25  company :** - Gives Company ID as well.

 **26  days_in_waiting_list :** - The total Number og days in the waiting List

 **27  customer_type :** - The column 'customer_type' has 4 unique values:
['Transient' 'Contract' 'Transient-Party' 'Group']

*   Transient: A transient customer is one who is not part of a group and is not under a contract. These customers usually make individual reservations and stay for a short period of time.

*   Contract: A contract customer is one who has a pre-negotiated agreement with the hotel for a certain period of time. These customers are usually businesses or organizations that have frequent and/or long-term stays at the hotel.

*   Transient-Party: A transient-party customer is a group of people who are not part of a contract but are traveling together, such as a family or a group of friends. These customers usually make individual reservations but are staying at the hotel for the same reason.

*   Group: A group customer is one who has a pre-arranged agreement with the hotel for a certain period of time and has a minimum number of rooms reserved. These customers are usually organizations, clubs, or other groups that are traveling together.

**28  adr :** This is Average Daily Rate, which is a key performance metric used in the hotel industry to measure the average rate paid for each occupied room per day. It is calculated by dividing the total room revenue by the number of occupied rooms on a given day. 

 **29  required_car_parking_spaces :** - Car parking requirment, If yes how many? is specified.

 **30  total_of_special_requests :** - Any special requests by the customer, If yes, how many?


 **31  reservation_status :** - The column 'reservation_status' has 3 unique values:
['Check-Out' 'Canceled' 'No-Show']

*   Check-Out: This reservation status indicates that the guest has checked out of the hotel and their stay is complete.

*   Canceled: This reservation status indicates that the guest has canceled their reservation before their scheduled arrival date. The room that was reserved for them may be available for other guests to book.

*   No-Show: This reservation status indicates that the guest did not show up for their reservation and did not cancel it. In this case, the room that was reserved for them may go unused, resulting in lost revenue for the hotel.

 **32  reservation_status_date :** - These are the reveservation dates accordingly.
 

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for column in df.columns:
    unique_values = df[column].unique()
    num_unique_values = len(unique_values)
    print(f"The column '{column}' has {num_unique_values} unique values:")
    print(unique_values)
    print()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Creating a copy of the Dataset to keep the original data safe.

dff = df.copy()

In [None]:
print(dff.isnull().sum())

In [None]:
# As we know there are 2 huge number of data missing in Agent and company we are droppinf it.

# Drop the columns 'agent' and 'company'
dff = dff.drop(['agent', 'company'], axis=1)

# Print the first 5 rows of the resulting dataframe
dff.head()

In [None]:
print(dff.isnull().sum())

In [None]:
# As the null values of the country is less, We shall fill the 488 county null values as "UNKNOWN"
# As the children coloumn has just 4 null values we shall understand that those null values can be predicted as ZERO Children of the guest.

dff['country'].fillna('Unknown', inplace=True) # Filling Unknown for 488 missed country names
dff['children'].fillna(0, inplace=True) # Filling 0 Children for 4 missing Children count.
print(dff.isnull().sum())

## Now We have ZERO NULL values in the DATASET

### We shall check how many Numeric and Categorical Data do we have




In [None]:
numeric_cols = dff.select_dtypes(include=['int64', 'float64']).columns # This gets all the Numeric coloumns
categorical_cols = dff.select_dtypes(include=['object', 'category']).columns # THis gets all the Categorical coloumns

print(f"Number of numeric columns: {len(numeric_cols)}")
print()
print(f"Numeric columns: {numeric_cols}")
print()
print()
print(f"Categorical columns: {categorical_cols}")
print()
print(f"Number of categorical columns: {len(categorical_cols)}")

### What all manipulations have you done and insights you found?




### Here are the Data manipulations made and the insights for the same with an example.

1. The Data is Manipulated and is prepared to undergo the **UBM Rule**

2. The data varibales are understood by getting to the what unique vales does each of the coloumn contain.

3. The NULL values are calculated.

4. As we have the Unique values of each coloumn it is easy to undrstand their properties and relationships with the other variable.

5. The high number of Null value coloumns are dropped so that the result is more aurate

6. All the small null values beloew 500 are filled using the simplest method as the data was in categorical. IF the data was in numerical we could have done other methos such as mean, median and mode etc.

7. A new copy of the data is assigned to a new variable named "dff" so that the original data is not manuplated.

8. The data is fit to proceede with the visulation for all the insights that we have to implement.

9. As the data is ready, we can now vizualize the data and their relationships to get the insights results such as, the 'stays_in_weekend_nights' and 'stays_in_week_nights' columns can be combined to create a new feature called 'total_stays', which gives the total number of nights stayed.

10. By all of these insights and the variable components and their relationship among them, it very easy to get to know all the required answerd in the favour of Hotel Management and also the customer as well.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***