# 🏨 **Mission: Save the Hotel Industry with Data!**

## **Introduction**  
Alright, data scientists-in-training! You’ve just been hired as the *Chief Data Analyst* for a group of luxury hotels, and they’ve got a big problem. Guests are booking rooms, but then canceling at the last minute or simply not showing up. The result? Empty rooms, lost revenue, and very unhappy hotel managers.

Your challenge: **Can you analyze the booking data and figure out the patterns behind cancellations?** If you can predict whether a reservation will be honored or canceled, you’ll not only save the hotel chain but also impress your bosses with your data wizardry! No pressure, right?  

The dataset in your hands contains details about hotel reservations—how many nights were booked, whether car parking was requested, the price of rooms, and even special requests. Your task is to clean, analyze, and visualize this data to uncover the secrets behind cancellations. Are you ready to dive into the numbers and help the hotels keep their rooms full? Let’s get to work!  

---

## **Dataset**  
This dataset contains detailed information about hotel reservations. Each row represents a booking, and the columns describe various attributes related to the reservation. Here’s a breakdown of what you’ll find:

- **Booking_ID**: Unique identifier for each booking.  
- **no_of_adults**: Number of adults included in the booking.  
- **no_of_children**: Number of children included in the booking.  
- **no_of_weekend_nights**: Number of weekend nights (Saturday or Sunday) booked.  
- **no_of_week_nights**: Number of weekdays (Monday to Friday) booked.  
- **type_of_meal_plan**: The meal plan selected by the customer.  
- **required_car_parking_space**: Whether the guest requested a car parking space (0 = No, 1 = Yes).  
- **room_type_reserved**: Type of room reserved, encoded for privacy.  
- **lead_time**: Number of days between booking and arrival.  
- **arrival_year**: Year of the booking arrival date.  
- **arrival_month**: Month of the booking arrival date.  
- **arrival_date**: Day of the month for the arrival date.  
- **market_segment_type**: The market segment the booking came from (e.g., online, corporate).  
- **repeated_guest**: Whether the guest is a returning customer (0 = No, 1 = Yes).  
- **no_of_previous_cancellations**: Number of previous bookings canceled by the customer.  
- **no_of_previous_bookings_not_canceled**: Number of previous bookings that were not canceled.  
- **avg_price_per_room**: Average price per day for the reservation (in euros).  
- **no_of_special_requests**: Number of special requests made by the guest (e.g., high floor, extra pillows).  
- **booking_status**: Whether the booking was honored or canceled.  

---

## **Timeline**  

Here’s your step-by-step plan for tackling this dataset. Follow it closely and make sure you show your skills at every stage:

1. **Load the dataset into a pandas DataFrame** *(17:15 - 17:20)*  
   - Load the dataset into your notebook and take a look at the first few rows.  
   - Check the data types and identify any missing values.  

2. **Introductory EDA (Exploratory Data Analysis)** *(17:20 - 17:40)*  
   - Perform an initial exploration of the dataset.  
   - Check the distribution of key variables like `avg_price_per_room`, `lead_time`, and `booking_status`.  
   - Look for patterns or outliers that could impact your analysis.  

3. **Data Treatment and Training Preparation** *(17:45 - 18:00)*  
   - Handle missing values by filling or dropping them.  
   - Encode categorical variables (e.g., `type_of_meal_plan`, `room_type_reserved`, `market_segment_type`).  
   - Normalize or scale numerical features like `avg_price_per_room` and `lead_time` if necessary.  
   - Split the data into training and testing sets.  

4. **Baseline Model** *(18:00 - 18:15)*  
   - Create a baseline model to predict cancellations using a simple classifier (e.g., Logistic Regression, Decision Tree).  
   - This will give you a starting point for model evaluation.  

5. **Model Evaluation** *(18:15 - 18:30)*  
   - Evaluate your baseline model using appropriate metrics such as accuracy, precision, recall, and F1-score.  
   - Create confusion matrices to visualize the performance of your model.  
   - Compare the model's performance on the training and test sets to check for overfitting or underfitting.  

6. **Iterate and Improve** *(18:30 - 19:00)*  
   - Try different models (e.g., Random Forest, Gradient Boosting, or XGBoost) to improve performance.  
   - Fine-tune the hyperparameters of your models to get the best results.  
   - Visualize the performance of the models using ROC curves or AUC scores.  

7. **Final Insights and Presentation** *(19:00 - 19:15)*  
   - Summarize your findings: Which factors are most important for predicting cancellations?  
   - Create compelling visualizations (e.g., bar charts, heatmaps) to present your results.  
   - Prepare a final report with insights and recommendations for the hotel managers.    

**Submit your code no later than 19h25**
---

## **Your Challenge**  
Can you spot the key factors that influence cancellations? Are guests with more special requests more likely to cancel? Does lead time play a role? By the end of this analysis, you should be able to **predict cancellations** and help the hotel chain keep its rooms full.  

Good luck, and may the pandas (library) be with you! 🐼
