# **Project Name**    -



## **Exploratory Data Analysis of Hotel Bookings (2015-2016)**
##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 - Aniket Itankar**


# **Project Summary -**

This project involves conducting an exploratory data analysis (EDA) of hotel bookings from **2015 to 2016**, leveraging a dataset with 36 columns capturing various aspects of reservations. The objectives encompass in-depth analyses, including investigating cancellation factors, exploring booking trends over time, examining guest demographics, analyzing room preferences and changes, and segmenting customers based on market segments. The methodology employs a systematic EDA approach, incorporating statistical and visual techniques, along with potential machine learning applications for predictive analysis. Anticipated outcomes include the identification of cancellation contributors, insights into booking trends, an understanding of guest demographics, improved comprehension of room preferences, and customer segmentation insights. This comprehensive analysis aims to provide actionable insights for the hotel industry, enabling informed decisions to enhance operational efficiency, guest experience, and overall business performance.



# **GitHub Link -**



Provide your GitHub Link here.

# **Problem Statement**


**Problem Statement for EDA of Hotel Booking:**

The hotel industry faces several challenges that hinder operational efficiency and strategic decision-making. These challenges include difficulties in accurately predicting booking cancellations, leading to revenue loss and operational disruptions. Additionally, there is a lack of comprehensive insights into temporal booking patterns, hindering effective resource management. Incomplete guest demographic analysis limits personalized services, impacting guest satisfaction and loyalty. Challenges in understanding room preferences and changes result in inefficient room allocation strategies and potential guest dissatisfaction. The absence of a robust customer segmentation strategy further complicates targeted marketing and service customization. The overarching goal of this project is to address these challenges through a thorough Exploratory Data Analysis (EDA) of hotel bookings, utilizing a dataset with features such as 'hotel', 'is_canceled', 'lead_time', 'arrival_date_year', and others. **

#### **Define Your Business Objective?**

The outcomes aim to provide actionable insights for the hotel industry, facilitating improved operational efficiency, enhanced guest experiences, and strategic decision-making for optimized overall business performance.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
from google.colab import drive
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

drive.mount('/content/drive')


### Dataset Loading

In [None]:
# Load Dataset
filepath ='drive/My Drive/EDA Capstone Hotel Booking/Hotel Bookings.csv'
df = pd.read_csv(filepath,header = 0)


### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

In [None]:
df.drop_duplicates(inplace = True)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
null = pd.DataFrame(df[['country','agent','company','children']].isnull().sum()).reset_index().rename(columns = {'index':'Features',0:'Null count'}).sort_values(by = 'Null count',ascending = False)
print(null)

sns.barplot(x = null['Features'],y  = null['Null count'])

In [None]:
# handling the missing value in children column by imputing 0 to the nan
df['children'] = df['children'].fillna(value = 0)

In [None]:
df['children'].isnull().sum()

### What did you know about your dataset?

**Basic Summery of the Dataset**


* Dataset contain 36 fields i.e. columns and 119390 entries i.e. rows.
Outoff 119390 entries, 31994 entries are duplicate  This duplicate rows are not usefull and makes data noisy, so dropped those rows.
* Country,agent and company field contains null value, among them company field have maximum number of missing values.
* Children field only 4 null value, so we imputed or replaced the null value by 0.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description



1. **is_canceled:**
   - This binary variable indicates whether a booking was canceled (1) or not (0).
   - It provides insights into the cancellation patterns within the dataset.

2. **lead_time:**
   - Represents the number of days between the booking date and the arrival date.
   - Offers information on the planning horizon for reservations.

3. **arrival_date_year:**
   - Specifies the year of arrival for the booking.
   - Helps in analyzing booking trends over different years.

4. **stays_in_weekend_nights:**
   - Indicates the number of weekend nights (Saturday and Sunday) the guest stays.
   - Offers insights into the weekend stay patterns of guests.

5. **stays_in_week_nights:**
   - Represents the number of weekday nights (Monday to Friday) the guest stays.
   - Provides information on the duration of the stay during the weekdays.

6. **adults, children, babies:**
   - These columns represent the number of adults, children, and babies included in the booking.
   - Help in understanding the composition of guests in terms of age groups.

7. **is_repeated_guest:**
   - A binary variable indicating whether the guest is a repeated guest (1) or not (0).
   - Offers insights into the proportion of repeat guests.

8. **previous_cancellations, previous_bookings_not_canceled:**
   - Reflect the count of previous cancellations and bookings not canceled by the guest.
   - Contribute to understanding the guest's historical booking behavior.

9. **booking_changes:**
   - Represents the number of changes made to the booking before arrival.
   - Provides insights into the flexibility and adaptability of bookings.

10. **agent, company:**
    - These columns contain numerical identifiers for the booking agent and the company.
    - Help in identifying the entities associated with the booking.

11. **days_in_waiting_list:**
    - Indicates the number of days the booking was in the waiting list before it was confirmed.
    - Provides insights into the demand and waiting times for reservations.

12. **adr:**
    - Stands for Average Daily Rate and represents the average rental income per paid occupied room per day.
    - Offers insights into the pricing strategy and revenue generation.

13. **required_car_parking_spaces:**
    - Specifies the number of car parking spaces requested by the guest.
    - Provides information on the parking needs of guests.

14. **total_of_special_requests:**
    - Represents the total number of special requests made by the guest (e.g., extra beds, specific room preferences).
    - Offers insights into guest preferences and customization requirements.

These descriptions aim to provide a brief overview of the numerical fields in the dataset, highlighting their significance for exploratory data analysis.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# Unique value for 'hotel' field
df.hotel.unique()

In [None]:
# Unique value for 'is_canceled' field
df['is_canceled'].unique()

In [None]:
# Unique value for 'lead_time' field
df['lead_time'].unique()

In [None]:
# Unique value for 'arrival_date_year' field
df['arrival_date_year'].unique()

In [None]:
# Unique value for 'arrival_date_month' field
df['arrival_date_month'].unique()

In [None]:
# Unique value for 'arrival_date_week_number' field
df['arrival_date_week_number'].unique()

In [None]:
# Unique value for 'arrival_date_day_of_month' field
df['arrival_date_day_of_month'].unique()

In [None]:
# Unique value for 'stays_in_weekend_nights' field
df['stays_in_weekend_nights'].unique()

In [None]:
# Unique value for 'stays_in_week_nights' field
df['stays_in_week_nights'].unique()

In [None]:
# Unique value for 'adults' field
df['adults'].unique()

In [None]:
# Unique value for 'children' field
df['children'].unique()

In [None]:
# Unique value for 'babies' field
df['babies'].unique()

In [None]:
# Unique value for 'meal' field
df['meal'].unique()

In [None]:
# Unique value for 'country' field
df['country'].unique()

In [None]:
# Unique value for 'market_segment' field
df['market_segment'].unique()

In [None]:
# Unique value for 'distribution_channel' field
df['distribution_channel'].unique()

In [None]:
# Unique value for 'is_repeated_guest' field
df['is_repeated_guest'].unique()

In [None]:
# Unique value for 'previous_cancellations' field
df['previous_cancellations'].unique()

In [None]:
# Unique value for 'previous_bookings_not_canceled' field
df['previous_bookings_not_canceled'].unique()

In [None]:
# Unique value for 'reserved_room_type' field
df['reserved_room_type'].unique()

In [None]:
# Unique value for 'assigned_room_type' field
df['assigned_room_type'].unique()

In [None]:
# Unique value for 'booking_changes' field
df['booking_changes'].unique()

In [None]:
# Unique value for 'deposit_type' field
df['deposit_type'].unique()

In [None]:
# Unique value for 'agent' field
df['agent'].unique()

In [None]:
# Unique value for 'company' field
df['company'].unique()

In [None]:
# Unique value for 'days_in_waiting_list' field
df['days_in_waiting_list'].unique()

In [None]:
# Unique value for 'customer_type' field
df['customer_type'].unique()

In [None]:
# Unique value for 'adr' field
df['adr'].unique()

In [None]:
# Unique value for 'required_car_parking_spaces' field
df['required_car_parking_spaces'].unique()

In [None]:
# Unique value for 'total_of_special_requests' field
df['total_of_special_requests'].unique()

In [None]:
# Unique value for 'reservation_status' field
df['reservation_status'].unique()

In [None]:
# Unique value for 'reservation_status_date' field
df['reservation_status_date'].unique()

df.columns

In [None]:

df.columns

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])
df['children'] = df['children'].astype('int64')



### What all manipulations have you done and insights you found?

* Changed the datatype of reservation_status_date field from object to datetime.
* Changed the datatype of children field from float to int64.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

print(df.hotel.value_counts())
sns.countplot(df.hotel)

##### 1. Why did you pick the specific chart?

**Answer** : Using bar charts, specifically countplots, facilitates a straightforward illustration of the most sought-after hotel types in 2015, 2016, and 2017. This visual representation allows for a quick and clear comparison of demand trends for different hotel categories .

##### 2. What is/are the insight(s) found from the chart?

**Answer :** The data indicates that city hotels have the highest number of bookings, around 52,000, while resort hotels have approximately 35,000 bookings consistently across the years 2015, 2016, and 2017. This insight highlights the popularity and higher demand for city hotels compared to resort hotels during this time period.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :** The insight of higher bookings in city hotels presents a positive opportunity for targeted strategies and resource allocation. However, the consistently lower booking count in resort hotels calls for strategic measures to avoid negative growth, including focused marketing efforts and unique offerings for resort destinations.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
print(df.arrival_date_year.value_counts())
sns.countplot(x = df.arrival_date_year)

##### 1. Why did you pick the specific chart?

**Answer** : The bar graph presents a clear comparison of annual bookings for the years 2015, 2016, and 2017. Each bar corresponds to the respective year, showcasing the total number of bookings.

##### 2. What is/are the insight(s) found from the chart?

**Answer**  The data reveals that 2016 had the highest number of bookings, followed by 2017, while 2015 had the least number of bookings overall.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer** The gained insights can potentially help create a positive business impact. Recognizing the peak booking years (2016 and 2017) provides an opportunity to capitalize on successful strategies during those periods. However, potential challenges may arise from the lower booking numbers in 2015. It's essential to investigate the reasons behind the lower performance in that year and adapt strategies to avoid negative growth. Adjusting marketing efforts and identifying factors contributing to the lower bookings in 2015 can contribute to a more balanced and positive business trajectory.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

print(df.arrival_date_month.value_counts())
sns.countplot( df.arrival_date_month)

##### 1. Why did you pick the specific chart?

**Answer** : The bar graph provides a convenient comparison of monthly bookings for the years 2015, 2016 and 2017. Each bar represents a specific month, allowing a quick visual assessment of booking trends over this period.

##### 2. What is/are the insight(s) found from the chart?

**Answer** :
* The data reveals a seasonal pattern in bookings, with August and July experiencing the highest demand.
*  Additionally, a consistent linear growth is observed from March to May. Conversely, January records the fewest bookings, with November and December also showing lower activity.
* This information highlights the importance of considering seasonal variations when analyzing booking trends.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer** :
To optimize staffing levels, increase hospitality staff during peak booking periods to meet high demand. During off-seasons, consider offering special discounts to attract customers and compensate for lower booking volumes. This adaptive approach ensures efficient resource allocation and enhances customer engagement throughout the year.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.countplot( x = df['meal'])

##### 1. Why did you pick the specific chart?


The bar graph visually compares customer demand for meals in 2015, 2016 and 2017. Each bar represents a specific meal, offering a convenient and quick assessment of the changes in demand over the time.



##### 2. What is/are the insight(s) found from the chart?

**Answer Here** :
* The highest demand is observed for BB meals, indicating a strong preference among consumers.
* HB and SC meals show relatively equal but lower demand compared to BB.
* FB meals have very low demand, and there is minimal interest in options not specified in the provided categories.

**Overall, BB meals appear to be the most popular, while FB and undefined choices have limited appeal.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* The insight that BB meals have the highest demand presents a positive business opportunity for growth.
* However, the very low demand for FB meals suggests a potential negative impact, urging the business to assess and address issues within that category to prevent negative growth.

**Maximize overall sales by focusing on the high-demand BB category, optimizing FB meals based on customer feedback, implementing effective marketing, diversifying the menu, ensuring efficient operations, competitive pricing, and leveraging technology for convenience and loyalty programs.**

#### Chart - 5

In [None]:
# Chart - 5 visualization code
temp_df = df[['country']].value_counts().reset_index().head(15)
temp_df.columns  = ['Country','Count']
print(temp_df)
sns.barplot(x = temp_df['Country'], y = temp_df['Count'])

##### 1. Why did you pick the specific chart?

**Answer** : Visualizing the top 15 countries with the highest hotel bookings in 2015, 2016, and 2017 is straightforward using a bar graph. The reasons for these countries having the maximum bookings may vary.

##### 2. What is/are the insight(s) found from the chart?

**Answer**
* Observing the data, PRT has the highest hotel bookings at around 27,000.
* While GBR, FRA, ESP, and DEU show a linear decrease in bookings.
* ITA and IRL have approximately 4,000 bookings each.
* BEL, BRA, NLD, and USA share a similar booking count, each around 2,500.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer**
* The insight that PRT has the highest hotel bookings presents a positive business opportunity.
* However, the linear decrease in bookings for GBR, FRA, ESP, and DEU warrants attention to prevent negative growth.
* Countries like ITA and IRL offer growth potential, while BEL, BRA, NLD, and USA show stable performance that can be maintained or explored for further growth.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
temp_df = df['market_segment'].value_counts().reset_index()
temp_df.columns = ['Market Segment','Count']
print(temp_df)

In [None]:
plt.figure(figsize= [10,5])
sns.barplot(x = temp_df['Market Segment'], y = temp_df['Count'])

##### 1. Why did you pick the specific chart?

**Answer** : Bar plots facilitate easy comparison of market segments customer belong to based on total bookings from 2015 to 2017. This visual tool simplifies the analysis of trends and variations across the specified time frame.

##### 2. What is/are the insight(s) found from the chart?

**Answer** :
* The online TA market segment dominates with over 50,000 bookings.
* While offline TA/TO and Direct segments have fewer bookings at around 12,000 and 11,000, respectively.
* Groups and Corporate segments both hover around 5,000 bookings.
* Complementary, Aviation, and Undefined segments show minimal booking activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :** High online TA bookings (50,000+) offer a positive business opportunity, while lower numbers in offline TA/TO and Direct segments suggest potential negative growth. Addressing challenges in these segments is crucial for business optimization.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

temp_df = df['distribution_channel'].value_counts().reset_index()
temp_df.columns = ['Distribution Channel','Count']
print(temp_df)
sns.barplot(x = temp_df['Distribution Channel'],y =temp_df['Count'])

##### 1. Why did you pick the specific chart?

**Answer** : Bar charts facilitate a straightforward comparison of booking distribution channels across the years 2015, 2016, and 2017.

##### 2. What is/are the insight(s) found from the chart?

**Answer** :
* The TA/TO distribution channel leads with nearly 70,000 bookings, showcasing a high demand.
* The Direct channel follows with approximately 12,000 bookings, while Corporate has around 5,000.
* In contrast, GDS has minimal bookings, and the undefined distribution channel is almost negligible, with only 5 bookings that might not even be noticeable on a bar chart.

**This data emphasizes the dominance of TA/TO and highlights potential areas for strategic focus and improvement in other channels.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer** : The TA/TO distribution channel stands out with nearly 70,000 bookings, offering a positive business opportunity. However, the Direct and Corporate channels, with approximately 12,000 and 5,000 bookings respectively, showcase areas for growth. In contrast, the GDS and undefined channels have minimal bookings, signaling potential challenges that need strategic attention to avoid negative growth and optimize overall business performance.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

print(df['deposit_type'].value_counts())
sns.countplot(x = df['deposit_type'])

##### 1. Why did you pick the specific chart?

**Answer  :**
Utilizing a bar chart makes it easier to visually illustrate customer preferences regarding deposit types. The chart clearly shows a significant inclination towards the "No Deposit" option, with a notably higher count, suggesting a strong customer preference for bookings without deposit requirements. Meanwhile, the lower counts for both refundable and non-refundable deposit types underscore the relatively lesser popularity of these options. This visual representation provides a quick and effective means to grasp and communicate customer preferences regarding deposit conditions.

##### 2. What is/are the insight(s) found from the chart?

**Answer**
* The key insight derived from the bar chart is the substantial preference for bookings without any deposit ("No Deposit"), as evidenced by the highest count of approximately 86,000.
* This suggests that a significant portion of customers opts for flexibility in reservation conditions.
* In contrast, the limited number of bookings with non-refundable deposits indicates a less favored and expected choice.
* Additionally, the scarcity of bookings with refundable deposits highlights a low demand for this particular reservation condition.

**Overall, the insight emphasizes the prominence of flexibility in booking terms among the observed data.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer  :** The insights gained from the summary can indeed contribute to positive business impact. The significant preference for "No Deposit" bookings highlights an opportunity for the business to attract more customers by offering flexible reservation terms. This could lead to increased customer satisfaction and loyalty.

However, potential negative impacts may arise from the limited popularity of refundable and non-refundable deposit options. If the business heavily relies on revenue generated from deposit-based bookings, the lower counts for these options could indicate a missed revenue opportunity. It's essential to assess the business model and financial strategies to ensure that the predominant preference for "No Deposit" does not adversely affect revenue goals. Adjusting marketing strategies or offering incentives for other deposit types could be considered to balance customer preferences with business objectives.

#### Chart - 9

In [None]:
# Chart - 9 visualization code

plt.figure(figsize = [12,4])
temp_df_reserved_room = df['reserved_room_type'].value_counts().reset_index()
temp_df_reserved_room.columns = ['Room Type','No. of Bookings']
plt.subplot(1,2,1)
plt.title('Type of room Reserved by the Customer')
sns.barplot(x = temp_df_reserved_room['Room Type'] , y = temp_df_reserved_room['No. of Bookings'])


temp_df_assigned_room = df['assigned_room_type'].value_counts().reset_index()
temp_df_assigned_room.columns = ['Room Type','No. of Bookings']
plt.subplot(1,2,2)
plt.title('Type of room Assigned to the Customer')
sns.barplot(x = temp_df_assigned_room['Room Type'] , y = temp_df_assigned_room['No. of Bookings'])

##### 1. Why did you pick the specific chart?

**Answer :**  Using a bar plot simplifies the comparison between customers' preferred room types at the time of reservation and the room types actually assigned to them.

##### 2. What is/are the insight(s) found from the chart?

**Answer**
* Approximately 55,000 customers preferred A-type rooms, but only 45,000 received the requested type.
* For D-type rooms, around 17,000 customers preferred it, and 22,000 were fortunate to get their preference.
* Conversely, nearly 7,000 customers who favored E-type rooms ended up with that type.
* There seems to be a mismatch in assignments, possibly with A-type customers being assigned D and E-type rooms.
* Preferences for L and P-type rooms were minimal.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :** Insights indicate room assignment mismatches, with customer preferences not consistently aligning with actual allocations. While opportunities for improvement exist to enhance satisfaction, the current disparities, especially for A, D, and E types, pose a risk of negative growth due to potential customer dissatisfaction and its impact on business reputation. Adjustments in the assignment process are essential to mitigate these risks and foster positive customer experiences.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
temp_df = df['customer_type'].value_counts().reset_index()
temp_df.columns = ['Customer Type',"Count"]
print(temp_df)
sns.barplot(x = temp_df['Customer Type'], y = temp_df['Count'])


##### 1. Why did you pick the specific chart?

**Answer  :** The bar chart succinctly displays booking trends for various customer types, including transient, transient party, contract, and group. This visual insight facilitates a quick understanding of the distribution, enabling businesses to adapt strategies to meet specific customer segment preferences effectively.

##### 2. What is/are the insight(s) found from the chart?

**Answer  :**
* Transient customers lead with around 72,000 bookings, indicating the highest demand.
* Transient parties follow with approximately 12,000 bookings.
* Contract bookings are relatively low at around 3,000.
* Group bookings are the least among all categories.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :**
The insights from the summary can guide positive business impact by highlighting the popularity of transient customers, indicating potential areas for targeted marketing and service enhancement.
However, the lower counts for contract and group bookings may pose challenges. If the business heavily relies on revenue from these segments, the lower demand could impact growth negatively. Addressing the reasons behind the lower bookings for contract and group customers, such as adjusting offerings or marketing strategies, may be necessary to mitigate potential negative effects and create a more balanced revenue stream.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
print(df['required_car_parking_spaces'].value_counts())
sns.histplot(df['required_car_parking_spaces'])

##### 1. Why did you pick the specific chart?

**Answer :** Utilizing a histogram allows for a visual representation of the distribution of customer requirements for parking space. The histogram will display the frequency or count of different levels of parking space requirements, providing a clear overview of the demand or preference among customers.

##### 2. What is/are the insight(s) found from the chart?

**Answer :**
* The histogram illustrates the parking space requirements of customers, revealing that the majority, around 80,000, do not have any parking space needs.
*  Approximately 7,000 customers specifically require parking space for a single vehicle, while the rest exhibit negligible demand for parking.
* This distribution provides a clear understanding of the varying levels of parking space requirements among customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :**
Knowing that the majority of customers (around 80,000) do not require parking space suggests that, for this segment, providing or advertising parking may not be a significant factor in attracting bookings. Businesses can allocate resources more efficiently based on this information, potentially improving overall customer satisfaction.

However, there might be a potential negative impact on revenue if there are services or charges associated with providing parking spaces. With only around 7,000 customers needing parking for a single vehicle, the business needs to balance resources and investments in parking facilities to align with actual demand. Overcommitting resources to parking for a small fraction of customers could result in inefficiencies and negative growth in terms of return on investment. Therefore, careful consideration of parking-related services and infrastructure is essential to optimize business outcomes.







#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize = [22,5])
temp_df = df['reservation_status_date'].value_counts().reset_index().head(20)
temp_df.columns = ['Date of Reservation','No. of Reservations']
print(temp_df)
plt.title('Figure 1')
sns.barplot(x = temp_df['Date of Reservation'], y = temp_df['No. of Reservations'])


In [None]:
plt.figure(figsize = [22,5])
plt.title('Figure 2 ')
sns.histplot(df['reservation_status_date'])

##### 1. Why did you pick the specific chart?

**Answer :**
* Fig.1, a bar graph, delineates the top 20 dates in 2015, 2016, and 2017 with the highest number of bookings.
* Fig.2, a histogram, provides an overview of the distribution or trend in bookings during the same time period. Together, these visualizations offer valuable insights into peak booking dates and the overall booking pattern across the specified years.

##### 2. What is/are the insight(s) found from the chart?

**Answer :**
* The observation from Fig.2 reveals a linear growth pattern from August 2015 to October 2015, followed by a stable trend from February 2016 to November 2016.
* There are ten peak days with the maximum number of bookings, including dates like 2016-02-14, 2017-05-25, 2015-10-21, 2016-10-06, 2016-03-28, 2017-05-05, 2016-11-21, 2017-02-15, 2017-03-06, and 2017-02-12.
* These specific dates stand out as periods of exceptionally high booking activity, offering insights into potential factors influencing customer demand during those times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :** The gained insights can potentially contribute to a positive business impact. Understanding the linear growth and stable trends during specific periods allows businesses to align resources, marketing efforts, and promotions to capitalize on high-demand intervals. Identifying peak booking days, such as those mentioned, enables targeted strategies to enhance customer engagement during these times, potentially leading to increased revenue and customer satisfaction.

However, potential negative impacts may arise if the business fails to adapt to or capitalize on the observed patterns. For instance, not adjusting staff levels or marketing strategies during high-demand periods could result in missed opportunities for revenue growth. It's crucial for the business to use these insights to inform proactive and strategic decisions, ensuring that they align with customer behavior and maximize positive outcomes.

#### Chart - 13

In [None]:
# Chart - 13 visualization code

temp = df.copy()
temp['arrival_date_year'] = temp['arrival_date_year'].astype('object')
plt.figure(figsize = [15,5])
sns.countplot(data = temp,x = 'arrival_date_month',hue = 'arrival_date_year')

##### 1. Why did you pick the specific chart?

**Answer :** The bar graph visually compares the number of bookings in each month for the years 2015, 2016, and 2017. The distinctive colors for each year enable a quick and efficient comparison of booking trends across the months, offering insights into patterns and variations over the three years.

##### 2. What is/are the insight(s) found from the chart?

**Answer: The booking trends in the specified years reveal distinct patterns.**

* In 2015, despite overall lower bookings compared to 2016 and 2017, September had the highest, followed by October and August.
* In 2016, August and October marked the highest bookings, while July, September, March, April, and May had relatively similar counts.
* In 2017, May and July witnessed the maximum bookings, with April, June, and August following suit.

**These insights provide a clear understanding of the monthly variations in booking activity across the three years.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :** The insights on monthly booking trends across 2015, 2016, and 2017 can have a positive business impact. Identifying peak months allows businesses to optimize marketing strategies, allocate resources efficiently, and cater to increased demand during specific periods. However, potential challenges may arise from lower booking months, especially in 2015, requiring targeted efforts to boost reservations during those times. Strategic planning around peak and off-peak months can mitigate negative impacts and enhance overall business performance.

#### Chart - 14

In [None]:
# Chart - 14 Vizialisation Code
temp_df = df[['country','is_repeated_guest']].groupby(['country']).agg(['count','sum'])[['is_repeated_guest']].reset_index()
temp_df.columns = ['Country',"total",'repeated']
sorted_temp_df = temp_df.sort_values(by = ['total','repeated'],ascending = False).head(15)
print(sorted_temp_df)

plt.figure(figsize = (15,5))
sns.barplot(data = sorted_temp_df, x = 'Country', y  = 'total' , hue = 'repeated')

##### 1. Why did you pick the specific chart?

**Answer :**
The bar graph, incorporating hue color, effectively visualizes the top 15 countries with total bookings in the years 2015, 2016, and 2017. The additional hue component illustrates the number of repeated customers within the total bookings for each country. This comprehensive visualization provides insights into both the overall booking trends and the prevalence of repeated customers across different countries over the specified years.

##### 2. What is/are the insight(s) found from the chart?

**Answer :**  
* Portugal (PRT) leads with approximately 27,000 bookings, featuring a substantial number of repeated customers exceeding 2,500.
* Conversely, GBR, FRA, ESP, and DEU exhibit a linear decrease in bookings.
* Italy (ITA) and Ireland (IRL) each have around 4,000 bookings.
* Belgium (BEL), Brazil (BRA), Netherlands (NLD), and the United States (USA) share a similar booking count, approximately 2,500 each.

**Remarkably, besides Portugal, no other country records more than 200 repeated bookings. These insights provide a clear understanding of the distribution of hotel bookings and repeated customer behavior across different nations.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :**
The insights suggest positive opportunities in countries like Portugal with high bookings and repeated customers. However, some countries show a linear decrease in bookings, indicating potential challenges. Additionally, except for Portugal, most countries have fewer repeated bookings, emphasizing the need for targeted customer retention strategies and improved marketing efforts. Overall, these insights can guide positive business impact through targeted actions.

#### Chart - 15

In [None]:
# Chart 15 Visualization Code

plt.figure(figsize = (22,5))
plt.subplot(1,2,1)
plt.title('Fig.1 Market Segemnt ')
sns.countplot(data = df, x = df['market_segment'],hue = 'is_repeated_guest')

plt.subplot(1,2,2)
plt.title('Fig.2 Distribution_channel')
sns.countplot(data = df, x = df['distribution_channel'],hue = df['is_repeated_guest'])

##### 1. Why did you pick the specific chart?

**Answer :**
In Fig.1, the bar plots provide a simple comparison of market segments based on total bookings from 2015 to 2017. This visualization eases the analysis of trends and variations in customer segments over the specified time frame.

In Fig.2, the bar charts enable a direct comparison of booking distribution channels for the years 2015, 2016, and 2017. The addition of the hue parameter in both figures signifies the presence of repeated guests within each category, offering insights into the distribution and prevalence of repeat customers across market segments and booking channels.

##### 2. What is/are the insight(s) found from the chart?

**Answer :**

**In Fig. 1**,
* the dominance of the online TA market segment is evident, with over 50,000 bookings and few repeated guests.
* Offline TA/TO and Direct segments have fewer bookings, around 12,000 and 11,000, respectively.
* The Corporate segment, while having around 5,000 bookings, stands out with the highest number of repeated guests compared to the online TA segment.
* Complementary, Aviation, and Undefined segments show minimal booking activity.

**In Fig. 2**,
* the TA/TO distribution channel leads with nearly 70,000 bookings, highlighting high demand.
* Direct follows with around 12,000 bookings, and Corporate has around 5,000. * * The Corporate distribution channel features the maximum number of repeated customers, followed by almost equal numbers of repeated guests for Direct and TA/TO distribution channels.
* GDS has minimal bookings, and the undefined distribution channel is negligible with only 5 bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :**
The gained insights can contribute to a positive business impact. Identifying the dominance of the online TA market segment and high demand in TA/TO distribution channel provides opportunities for targeted marketing and resource allocation. The Corporate segment, with a notable number of repeated guests, signals a potential for customer loyalty programs and enhanced service offerings.

However, challenges may arise in the minimal booking activity observed in segments like Complementary, Aviation, and Undefined, which could impact overall growth. It's essential to assess whether these segments align with business objectives or if there's potential for improvement through targeted strategies. Adjusting marketing efforts and services for the less active segments can mitigate negative impacts and promote overall positive growth.

#### Chart - 16


In [None]:
# Chart 16 Visualization Code
plt.figure(figsize = (20,5))
plt.subplot(1,2,1)
plt.title('Fig 1 : Reserved room by the Customers')
sns.countplot(data = df, x = df['reserved_room_type'],hue = df['is_repeated_guest'])


plt.subplot(1,2,2)
plt.title('Fig 2 : Assigned room to the customers')
sns.countplot(data = df, x = df['assigned_room_type'],hue = df['is_repeated_guest'])


##### 1. Why did you pick the specific chart?

**Answer :**
In Fig. 1, the bar plot simplifies the comparison of customers' preferred room types at the time of reservation.

Fig. 2 presents a bar plot illustrating the room types actually assigned to them.

Both figures include a hue parameter, indicating the presence of repeated customers. These visualizations provide a clear overview of room type preferences versus actual assignments, with the added dimension of identifying repeated customers within each category.


##### 2. What is/are the insight(s) found from the chart?

**Answer :**  The data analysis reveals that approximately 55,000 customers preferred A-type rooms, out of which almost 3,000 were repeated customers. However, only 45,000 received their requested A-type rooms, including around 2,000 repeated customers. For D-type rooms, around 17,000 customers preferred them, and 22,000 were fortunate to get their preference, with repeated customers being provided with D-type rooms. On the contrary, nearly 7,000 customers who favored E-type rooms ended up with that type. There appears to be a mismatch in assignments, potentially with A-type customers being assigned D and E-type rooms. Preferences for L and P-type rooms were minimal.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :**
The gained insights can potentially help create a positive business impact. Understanding the discrepancies between customer preferences and actual room assignments, especially the mismatch in A-type rooms, provides an opportunity to enhance the accuracy of room allocations. Improving this alignment can lead to increased customer satisfaction and loyalty.

However, the negative impact may arise from instances where customers do not receive their preferred room types, potentially affecting their overall experience and leading to dissatisfaction. This misalignment might influence repeat bookings and customer retention negatively. Therefore, addressing and minimizing such discrepancies is crucial to ensure positive growth and maintain a favorable reputation among customers.

#### Chart - 17

In [None]:
# Chart 18 Visualization Code
sns.countplot(data = df, x = df['customer_type'],hue = df['is_repeated_guest'])

##### 1. Why did you pick the specific chart?

**Answer :**
The bar chart effectively illustrates booking trends for different customer types—transient, transient party, contract, and group. Enhanced with a hue parameter indicating repeated customers, the visualization offers a quick and clear understanding of the distribution within each segment. This insight is valuable for businesses to tailor strategies and adapt services to meet specific preferences within various customer segments effectively.

##### 2. What is/are the insight(s) found from the chart?

**Answer :**

* The data reveals that transient customers lead with around 70,000 bookings, showcasing the highest demand, though all are non-repeated customers.
* Transient-parties follow with approximately 12,000 non-repeated bookings and minimal repeated bookings.
* Contract bookings are relatively low at around 3,000. Group bookings have the lowest count among all categories, including some repeated customers.
* This insight provides a clear understanding of the distribution of bookings and repeated customer patterns across different customer types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Answer :**
The gained insights can contribute to a positive business impact. Understanding the high demand for transient customers and the presence of repeated customers within this segment offers opportunities for targeted marketing and personalized services. However, potential challenges may arise from the relatively low number of repeated bookings in transient-party and contract segments. Strategies to enhance customer retention within these segments might be necessary to avoid negative growth. Adapting services to cater to the unique needs of each customer type can optimize positive business outcomes.

#### Chart - 18 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize = (7,5))
sns.heatmap(df.corr())

##### 1. Why did you pick the specific chart?

**Answer :**
A heatmap is a visual representation of data in a grid format, where each box is shaded to indicate the correlation between features through numerical values. Lighter shades represent positive correlations, while darker shades indicate negative correlations. This visualization method provides a quick and intuitive way to understand the relationships between different features in a dataset.

##### 2. What is/are the insight(s) found from the chart?

**Answer :**The summary indicates positive correlations between stays in weeknights and weekend nights, as well as a positive trend for adults staying during both weekdays and weekends. Repeated customers tend to have fewer cancellations compared to those with a history of cancellations, which aligns with expectations. Additionally, a negative correlation suggests that customers requiring car parking may cancel when parking is not provided. Overall, these insights provide valuable information for understanding guest behavior and improving service offerings.

#### Chart - 19 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)

##### 1. Why did you pick the specific chart?

**Answer :**
* Utilizing a pairplot for the hotel booking dataset proves beneficial, offering a robust visualization tool for exploring relationships among multiple numerical variables.
* The pairplot displays scatterplots and histograms, providing a comprehensive view of correlations and distributions. This aids in identifying patterns, trends, and potential outliers efficiently.
* The visualization's quick and insightful nature makes it a valuable asset for deriving meaningful insights and informing subsequent analyses or decision-making during data exploration and interpretation.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the insights gathered from the data:

1. **Targeted Marketing and Service Enhancement:** Focus on targeted marketing strategies and service enhancements for transient customers, as they represent the highest demand. Implement personalized services to attract and retain transient customers.

2. **Addressing Room Assignment Mismatches:** Address discrepancies between customer preferences and actual room assignments. Implement measures to ensure accurate room allocations, improving overall customer satisfaction and loyalty.

3. **Optimizing Booking Channels:** Given the dominance of online TA and high demand in TA/TO distribution channels, optimize marketing efforts and resource allocation for these segments. Consider loyalty programs for corporate customers to enhance retention.

4. **Flexible Booking Terms:** Capitalize on the popularity of "No Deposit" bookings by promoting flexible reservation terms. Adjust marketing strategies to leverage this preference and potentially increase customer satisfaction.

5. **Strategic Business Growth in Countries:** Leverage insights on hotel bookings by country to identify growth opportunities. Focus on markets with high bookings like Portugal, explore growth potential in countries like Italy and Ireland, and maintain stability in countries with consistent booking counts.

6. **Adaptive Seasonal Strategies:** Implement adaptive seasonal strategies, such as optimizing staffing levels during peak months and offering special discounts during off-seasons. Align resources with seasonal variations to maximize efficiency.

7. **Catering to Preferred Meal Types:** Maximize overall sales by focusing on the high-demand BB meal category. Optimize FB meals based on customer feedback, implement effective marketing, diversify the menu, ensure competitive pricing, and leverage technology for convenience and loyalty programs.

8. **Parking Space Allocation:** Optimize resource allocation for parking space based on customer demand. Consider potential impacts on revenue and ensure efficient infrastructure utilization.

9. **Capitalizing on Peak Booking Periods:** Align marketing strategies and promotions with observed peak booking periods. Proactively adapt staffing levels and promotions during high-demand intervals to capitalize on increased customer activity.

10. **Strategic Focus on Customer Segments:** Tailor strategies and services to the preferences of different customer segments, especially transient customers. Consider targeted promotions and loyalty programs to enhance customer retention within specific segments.

By implementing these strategies, the client can work towards achieving their business objectives, including increasing customer satisfaction, optimizing revenue, and fostering positive growth in various aspects of their operations.Answer Here.

# **Conclusion**

In conclusion, leveraging the insights from the data enables the client to tailor strategic approaches for targeted marketing, service enhancements, and resource optimization. Addressing specific areas, such as transient customer engagement, room assignment accuracy, and channel optimization, will contribute to increased customer satisfaction, loyalty, and overall business growth. The adaptive strategies proposed align with observed trends, ensuring a proactive and customer-centric approach for positive business impact.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***