# **üìå An√°lisis de Revenue Management - Hotel Booking Demand**

---

# **üìñ Descripci√≥n del Proyecto**

Este proyecto tiene como objetivo analizar y optimizar la gesti√≥n de ingresos (Revenue Management) en hoteles utilizando datos de reservas hist√≥ricas. Se trabajar√° con el [dataset](https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand) Hotel Booking Demand, el cual contiene informaci√≥n sobre reservas de dos tipos de hoteles (City Hotel y Resort Hotel).

El an√°lisis se divide en dos partes clave:


---



1Ô∏è‚É£ Segmentaci√≥n de clientes basada en patrones de reserva

Identificaci√≥n de grupos de clientes seg√∫n comportamiento de reserva.
An√°lisis del Lead Time, ADR (Tarifa Promedio), duraci√≥n de estancia y canal de reserva.
Creaci√≥n de estrategias para optimizar la ocupaci√≥n y la rentabilidad.


---


2Ô∏è‚É£ Impacto de la estacionalidad en la tarifa promedio (ADR)

Evaluaci√≥n de c√≥mo var√≠a el ADR seg√∫n la temporada del a√±o.
An√°lisis comparativo entre hoteles urbanos y resorts.
Estrategias de pricing basadas en la demanda estacional.

---
# ‚öôÔ∏è **Proceso del An√°lisis**
‚úÖ ETL (Extracci√≥n, Transformaci√≥n y Carga) ‚Üí Limpieza y preparaci√≥n de datos.
‚úÖ Exploraci√≥n y visualizaci√≥n ‚Üí Identificaci√≥n de patrones y tendencias.
‚úÖ Modelos predictivos (Opcional) ‚Üí Predicci√≥n de tarifas √≥ptimas seg√∫n temporada y segmento de clientes.

üìå Este proyecto permite entender c√≥mo los factores temporales y de comportamiento del cliente afectan la rentabilidad de los hoteles, ayudando a optimizar estrategias de precios y ocupaci√≥n. üöÄüî•


In [None]:
import pandas as pd
import numpy as np

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**PASOS DEL AN√ÅLISIS**

En primer lugar, subimos nuestro dataset. En este caso, es un dataset relacionado a la demanda de reservas en hoteles. Fue descargado de Kaggle, aqu√≠: https://www.kaggle.com/datasets/jessemostipak/hotel-booking-*demand*

In [None]:
# subimos nuestro dataset
data = pd.read_csv('/content/drive/MyDrive/Data Analysis/hotel_bookings.csv', sep=';')

Ahora, *debemos preparar los datos*

**ETL para el An√°lisis de Booking Demand**

**1. EXTRACCI√ìN (E)**

In [None]:
#revisamos el head y la estructura de los datos que acabamos de subir
data.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,1/07/2015
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,1/07/2015
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2/07/2015
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2/07/2015
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,3/07/2015


In [None]:
len(data)

119390

In [None]:
#tambi√©n revisamos la estructura y el tipo de datos del dataset que subimos
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal            

**2. TRANSFORMACI√ìN (T)** ‚õë

In [None]:
#revisamos si hay valores nulos
print(data.isnull().sum())

hotel                                  0
is_canceled                            0
lead_time                              0
arrival_date_year                      0
arrival_date_month                     0
arrival_date_week_number               0
arrival_date_day_of_month              0
stays_in_weekend_nights                0
stays_in_week_nights                   0
adults                                 0
children                               4
babies                                 0
meal                                   0
country                              488
market_segment                         0
distribution_channel                   0
is_repeated_guest                      0
previous_cancellations                 0
previous_bookings_not_canceled         0
reserved_room_type                     0
assigned_room_type                     0
booking_changes                        0
deposit_type                           0
agent                              16340
company         

**solo hay una (1) variable que es n√∫merica, que es el n√∫mero de hijos "children" entonces reemplazaremos esos valores nulos (4) con la moda**

In [39]:
data["children"].fillna(data["children"].mean(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data["children"].fillna(data["children"].mean(), inplace=True)


**Como los dem√°s datos nulos son variables categ√≥ricas, vamos a reemplazar estos valores nulos con "N/A"**

In [31]:
# Rellenar nulos en 'agent', 'company' y "country" con "Sin agencia", "Sin empresa" y "na", respectivamente:
data["agent"].fillna("Sin agencia", inplace=True)
data["company"].fillna("Sin empresa", inplace=True)
data["country"].fillna("na", inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data["country"].fillna("na", inplace=True)


In [32]:
#verificamos que las actualizaciones se hayan hecho correctamente
print(data.isnull().sum())

hotel                             0
is_canceled                       0
lead_time                         0
arrival_date_year                 0
arrival_date_month                0
arrival_date_week_number          0
arrival_date_day_of_month         0
stays_in_weekend_nights           0
stays_in_week_nights              0
adults                            0
children                          4
babies                            0
meal                              0
country                           0
market_segment                    0
distribution_channel              0
is_repeated_guest                 0
previous_cancellations            0
previous_bookings_not_canceled    0
reserved_room_type                0
assigned_room_type                0
booking_changes                   0
deposit_type                      0
agent                             0
company                           0
days_in_waiting_list              0
customer_type                     0
adr                         