# **Project Name**    -



## **Exploratory Data Analysis of Hotel Bookings (2015-2016)**
##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 - Aniket Itankar**


# **Project Summary -**

This project involves conducting an exploratory data analysis (EDA) of hotel bookings from **2015 to 2016**, leveraging a dataset with 36 columns capturing various aspects of reservations. The objectives encompass in-depth analyses, including investigating cancellation factors, exploring booking trends over time, examining guest demographics, analyzing room preferences and changes, and segmenting customers based on market segments. The methodology employs a systematic EDA approach, incorporating statistical and visual techniques, along with potential machine learning applications for predictive analysis. Anticipated outcomes include the identification of cancellation contributors, insights into booking trends, an understanding of guest demographics, improved comprehension of room preferences, and customer segmentation insights. This comprehensive analysis aims to provide actionable insights for the hotel industry, enabling informed decisions to enhance operational efficiency, guest experience, and overall business performance.



# **GitHub Link -**



Provide your GitHub Link here.

# **Problem Statement**


**Problem Statement for EDA of Hotel Booking:**

The hotel industry faces several challenges that hinder operational efficiency and strategic decision-making. These challenges include difficulties in accurately predicting booking cancellations, leading to revenue loss and operational disruptions. Additionally, there is a lack of comprehensive insights into temporal booking patterns, hindering effective resource management. Incomplete guest demographic analysis limits personalized services, impacting guest satisfaction and loyalty. Challenges in understanding room preferences and changes result in inefficient room allocation strategies and potential guest dissatisfaction. The absence of a robust customer segmentation strategy further complicates targeted marketing and service customization. The overarching goal of this project is to address these challenges through a thorough Exploratory Data Analysis (EDA) of hotel bookings, utilizing a dataset with features such as 'hotel', 'is_canceled', 'lead_time', 'arrival_date_year', and others. **

#### **Define Your Business Objective?**

The outcomes aim to provide actionable insights for the hotel industry, facilitating improved operational efficiency, enhanced guest experiences, and strategic decision-making for optimized overall business performance.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
from google.colab import drive
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

drive.mount('/content/drive')


### Dataset Loading

In [None]:
# Load Dataset
filepath ='drive/My Drive/EDA Capstone Hotel Booking/Hotel Bookings.csv'
df = pd.read_csv(filepath,header = 0)


### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

In [None]:
df.drop_duplicates(inplace = True)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
null = pd.DataFrame(df[['country','agent','company','children']].isnull().sum()).reset_index().rename(columns = {'index':'Features',0:'Null count'}).sort_values(by = 'Null count',ascending = False)
print(null)

sns.barplot(x = null['Features'],y  = null['Null count'])

In [None]:
# handling the missing value in children column by imputing 0 to the nan
df['children'] = df['children'].fillna(value = 0)

In [None]:
df['children'].isnull().sum()

### What did you know about your dataset?

**Basic Summery of the Dataset**


* Dataset contain 36 fields i.e. columns and 119390 entries i.e. rows.
Outoff 119390 entries, 31994 entries are duplicate  This duplicate rows are not usefull and makes data noisy, so dropped those rows.
* Country,agent and company field contains null value, among them company field have maximum number of missing values.
* Children field only 4 null value, so we imputed or replaced the null value by 0.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description



1. **is_canceled:**
   - This binary variable indicates whether a booking was canceled (1) or not (0).
   - It provides insights into the cancellation patterns within the dataset.

2. **lead_time:**
   - Represents the number of days between the booking date and the arrival date.
   - Offers information on the planning horizon for reservations.

3. **arrival_date_year:**
   - Specifies the year of arrival for the booking.
   - Helps in analyzing booking trends over different years.

4. **stays_in_weekend_nights:**
   - Indicates the number of weekend nights (Saturday and Sunday) the guest stays.
   - Offers insights into the weekend stay patterns of guests.

5. **stays_in_week_nights:**
   - Represents the number of weekday nights (Monday to Friday) the guest stays.
   - Provides information on the duration of the stay during the weekdays.

6. **adults, children, babies:**
   - These columns represent the number of adults, children, and babies included in the booking.
   - Help in understanding the composition of guests in terms of age groups.

7. **is_repeated_guest:**
   - A binary variable indicating whether the guest is a repeated guest (1) or not (0).
   - Offers insights into the proportion of repeat guests.

8. **previous_cancellations, previous_bookings_not_canceled:**
   - Reflect the count of previous cancellations and bookings not canceled by the guest.
   - Contribute to understanding the guest's historical booking behavior.

9. **booking_changes:**
   - Represents the number of changes made to the booking before arrival.
   - Provides insights into the flexibility and adaptability of bookings.

10. **agent, company:**
    - These columns contain numerical identifiers for the booking agent and the company.
    - Help in identifying the entities associated with the booking.

11. **days_in_waiting_list:**
    - Indicates the number of days the booking was in the waiting list before it was confirmed.
    - Provides insights into the demand and waiting times for reservations.

12. **adr:**
    - Stands for Average Daily Rate and represents the average rental income per paid occupied room per day.
    - Offers insights into the pricing strategy and revenue generation.

13. **required_car_parking_spaces:**
    - Specifies the number of car parking spaces requested by the guest.
    - Provides information on the parking needs of guests.

14. **total_of_special_requests:**
    - Represents the total number of special requests made by the guest (e.g., extra beds, specific room preferences).
    - Offers insights into guest preferences and customization requirements.

These descriptions aim to provide a brief overview of the numerical fields in the dataset, highlighting their significance for exploratory data analysis.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# Unique value for 'hotel' field
df.hotel.unique()

In [None]:
# Unique value for 'is_canceled' field
df['is_canceled'].unique()

In [None]:
# Unique value for 'lead_time' field
df['lead_time'].unique()

In [None]:
# Unique value for 'arrival_date_year' field
df['arrival_date_year'].unique()

In [None]:
# Unique value for 'arrival_date_month' field
df['arrival_date_month'].unique()

In [None]:
# Unique value for 'arrival_date_week_number' field
df['arrival_date_week_number'].unique()

In [None]:
# Unique value for 'arrival_date_day_of_month' field
df['arrival_date_day_of_month'].unique()

In [None]:
# Unique value for 'stays_in_weekend_nights' field
df['stays_in_weekend_nights'].unique()

In [None]:
# Unique value for 'stays_in_week_nights' field
df['stays_in_week_nights'].unique()

In [None]:
# Unique value for 'adults' field
df['adults'].unique()

In [None]:
# Unique value for 'children' field
df['children'].unique()

In [None]:
# Unique value for 'babies' field
df['babies'].unique()

In [None]:
# Unique value for 'meal' field
df['meal'].unique()

In [None]:
# Unique value for 'country' field
df['country'].unique()

In [None]:
# Unique value for 'market_segment' field
df['market_segment'].unique()

In [None]:
# Unique value for 'distribution_channel' field
df['distribution_channel'].unique()

In [None]:
# Unique value for 'is_repeated_guest' field
df['is_repeated_guest'].unique()

In [None]:
# Unique value for 'previous_cancellations' field
df['previous_cancellations'].unique()

In [None]:
# Unique value for 'previous_bookings_not_canceled' field
df['previous_bookings_not_canceled'].unique()

In [None]:
# Unique value for 'reserved_room_type' field
df['reserved_room_type'].unique()

In [None]:
# Unique value for 'assigned_room_type' field
df['assigned_room_type'].unique()

In [None]:
# Unique value for 'booking_changes' field
df['booking_changes'].unique()

In [None]:
# Unique value for 'deposit_type' field
df['deposit_type'].unique()

In [None]:
# Unique value for 'agent' field
df['agent'].unique()

In [None]:
# Unique value for 'company' field
df['company'].unique()

In [None]:
# Unique value for 'days_in_waiting_list' field
df['days_in_waiting_list'].unique()

In [None]:
# Unique value for 'customer_type' field
df['customer_type'].unique()

In [None]:
# Unique value for 'adr' field
df['adr'].unique()

In [None]:
# Unique value for 'required_car_parking_spaces' field
df['required_car_parking_spaces'].unique()

In [None]:
# Unique value for 'total_of_special_requests' field
df['total_of_special_requests'].unique()

In [None]:
# Unique value for 'reservation_status' field
df['reservation_status'].unique()

In [None]:
# Unique value for 'reservation_status_date' field
df['reservation_status_date'].unique()

df.columns

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])
df['children'] = df['children'].astype('int64')



In [None]:
df


In [None]:
df.info()

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***