# Trabalho de casa 3

## Data Dictionary

|variable                       |description |
|:------------------------------|:-----------|
|hotel                          | Hotel (H1 = Resort Hotel or H2 = City Hotel) |
|is_canceled                    | Value indicating if the booking was canceled (1) or not (0) |
|lead_time                      | Number of days that elapsed between the entering date of the booking into the PMS (Property Management System) and the arrival date |
|arrival_date_year              | Year of arrival date|
|arrival_date_month             | Month of arrival date|
|arrival_date_week_number       | Week number of year for arrival date|
|arrival_date_day_of_month      | Day of arrival date|
|stays_in_weekend_nights        | Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel |
|stays_in_week_nights           |  Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel|
|adults                         | Number of adults|
|children                       | Number of children|
|babies                         |Number of babies |
|meal                           | Type of meal booked. Categories are presented in standard hospitality meal packages: <br> Undefined/SC – no meal package;<br>BB – Bed & Breakfast; <br> HB – Half board (breakfast and one other meal – usually dinner); <br> FB – Full board (breakfast, lunch and dinner) |
|country                        | Country of origin. Categories are represented in the ISO 3155–3:2013 format |
|market_segment                 | Market segment designation. In categories, the term "TA" means "Travel Agents" and "TO" means "Tour Operators" |
|distribution_channel           | Booking distribution channel. The term "TA" means "Travel Agents" and "TO" means "Tour Operators" |
|is_repeated_guest              | Value indicating if the booking name was from a repeated guest (1) or not (0) |
|previous_cancellations         | Number of previous bookings that were cancelled by the customer prior to the current booking |
|previous_bookings_not_canceled | Number of previous bookings not cancelled by the customer prior to the current booking |
|reserved_room_type             | Code of room type reserved. Code is presented instead of designation for anonymity reasons |
|assigned_room_type             | Code for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type due to hotel operation reasons (e.g. overbooking) or by customer request. Code is presented instead of designation for anonymity reasons |
|booking_changes                | Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation|
|deposit_type                   | Indication on if the customer made a deposit to guarantee the booking. This variable can assume three categories:<br>No Deposit – no deposit was made;<br>Non Refund – a deposit was made in the value of the total stay cost;<br>Refundable – a deposit was made with a value under the total cost of stay. |
|agent                          | ID of the travel agency that made the booking |
|company                        | ID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for anonymity reasons |
|days_in_waiting_list           | Number of days the booking was in the waiting list before it was confirmed to the customer |
|customer_type                  | Type of booking, assuming one of four categories:<br>Contract - when the booking has an allotment or other type of contract associated to it;<br>Group – when the booking is associated to a group;<br>Transient – when the booking is not part of a group or contract, and is not associated to other transient booking;<br>Transient-party – when the booking is transient, but is associated to at least other transient booking|
|adr                            | Average Daily Rate as defined by dividing the sum of all lodging transactions by the total number of staying nights |
|required_car_parking_spaces    | Number of car parking spaces required by the customer |
|total_of_special_requests      | Number of special requests made by the customer (e.g. twin bed or high floor)|
|reservation_status             | Reservation last status, assuming one of three categories:<br>Canceled – booking was canceled by the customer;<br>Check-Out – customer has checked in but already departed;<br>No-Show – customer did not check-in and did inform the hotel of the reason why |
|reservation_status_date        | Date at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotel|

# 47275 - Joel Tapia<br>47817 - João Mendonça

## Setup and Configuration for Data Analysis and Visualization

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
import plotly.express as px
!pip install pycountry
from pandas.api.types import CategoricalDtype

# Importing necessary libraries:
# - pandas: for data manipulation and analysis
# - numpy: for numerical operations
# - matplotlib.pylab: for plotting and visualization
# - seaborn: for statistical data visualization based on matplotlib
# - plotly.express: for interactive plotting
# - pycountry: for accessing country and subdivision data
# - CategoricalDtype from pandas.api.types: for defining categorical data types

# Configurations for pandas and matplotlib
pd.set_option('display.max_columns', None)  # Setting pandas to display all columns when outputting dataframes
pd.set_option('display.max_rows', None)     # Setting pandas to display all rows when outputting dataframes
plt.style.use('ggplot')                     # Setting the plotting style of matplotlib to 'ggplot' for better aesthetics

data = pd.read_csv("reserva_hotel.csv")



## Data understanding & cleaning

## Handling Missing Values

In [2]:
# Before starting the changes, we will rename all the columns to have consistent naming
data = data.rename(columns={'hotel':'Hotel', 
                     'is_canceled': 'Is_canceled', 
                     'lead_time' : 'Lead_time', 
                     'arrival_date_year' : 'Arrival_date_year',
                     'arrival_date_month' : 'Arrival_date_month', 
                     'arrival_date_week_number' : 'Arrival_date_week_number',
                     'arrival_date_day_of_month' : 'Arrival_date_day_of_month', 
                     'stays_in_weekend_nights' : 'Stays_in_weekend_nights',
                     'stays_in_week_nights' : 'Stays_in_week_nights', 
                     'adults' : 'Adults', 
                     'children' : 'Children', 
                     'babies' : 'Babies', 
                     'meal' : 'Meal',
                     'country' : 'Country', 
                     'market_segment' : 'Market_segment', 
                     'distribution_channel' : 'Distribution_channel',
                     'is_repeated_guest' : 'Is_repeated_guest', 
                     'previous_cancellations' : 'Previous_cancellations',
                     'previous_bookings_not_canceled' : 'Previous_bookings_not_canceled', 
                     'reserved_room_type' : 'Reserved_room_type',
                     'assigned_room_type' :  'Assigned_room_type', 
                     'booking_changes' : 'Booking_changes', 
                     'deposit_type' : 'Deposit_type', 
                     'agent' : 'Agent',
                     'company' : 'Company', 
                     'days_in_waiting_list' : 'Days_in_waiting_list', 
                     'customer_type' : 'Customer_type', 
                     'adr' : 'Adr',
                     'required_car_parking_spaces' : 'Required_car_parking_spaces', 
                     'total_of_special_requests' : 'Total_of_special_requests',
                     'reservation_status' : 'Reservation_status', 
                     'reservation_status_date' : 'Reservation_status_date'})

#### Filling Missing Values in Specific Columns

In [3]:
# Fill null points and confirm the action worked
data_children = data['Children'].fillna(0)
data['Children'] = data_children

data_country = data['Country'].fillna('XXX')
data['Country'] = data_country

data_agent = data['Agent'].fillna('0')
data['Agent'] = data_agent

data_company = data['Company'].fillna('0')
data['Company'] = data_company

# Rechecking the number of missing values in each column
nan_count = data.isna().sum()
print(nan_count)

Hotel                             0
Is_canceled                       0
Lead_time                         0
Arrival_date_year                 0
Arrival_date_month                0
Arrival_date_week_number          0
Arrival_date_day_of_month         0
Stays_in_weekend_nights           0
Stays_in_week_nights              0
Adults                            0
Children                          0
Babies                            0
Meal                              0
Country                           0
Market_segment                    0
Distribution_channel              0
Is_repeated_guest                 0
Previous_cancellations            0
Previous_bookings_not_canceled    0
Reserved_room_type                0
Assigned_room_type                0
Booking_changes                   0
Deposit_type                      0
Agent                             0
Company                           0
Days_in_waiting_list              0
Customer_type                     0
Adr                         

## Cleaning Inconsistent Data

In [4]:
# Finding the reason for negative values in 'Adr'
# Filtering the dataframe for rows where 'Adr' is less than 0
data[data['Adr'] < 0]


Unnamed: 0,Hotel,Is_canceled,Lead_time,Arrival_date_year,Arrival_date_month,Arrival_date_week_number,Arrival_date_day_of_month,Stays_in_weekend_nights,Stays_in_week_nights,Adults,Children,Babies,Meal,Country,Market_segment,Distribution_channel,Is_repeated_guest,Previous_cancellations,Previous_bookings_not_canceled,Reserved_room_type,Assigned_room_type,Booking_changes,Deposit_type,Agent,Company,Days_in_waiting_list,Customer_type,Adr,Required_car_parking_spaces,Total_of_special_requests,Reservation_status,Reservation_status_date
14969,Resort Hotel,0,195,2017,March,10,5,4,6,2,0.0,0,BB,GBR,Groups,Direct,1,0,2,A,H,2,No Deposit,273.0,0,0,Transient-Party,-6.38,0,0,Check-Out,2017-03-15


#### Fixing Negative Values in 'Adr' Column

In [5]:
# Fixing the problem by applying absolute value to 'Adr'
data['Adr'] = data['Adr'].abs()

# Confirming the solution worked by displaying summary statistics again
data.describe()

Unnamed: 0,Is_canceled,Lead_time,Arrival_date_year,Arrival_date_week_number,Arrival_date_day_of_month,Stays_in_weekend_nights,Stays_in_week_nights,Adults,Children,Babies,Is_repeated_guest,Previous_cancellations,Previous_bookings_not_canceled,Booking_changes,Days_in_waiting_list,Adr,Required_car_parking_spaces,Total_of_special_requests
count,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0
mean,0.370416,104.011416,2016.156554,27.165173,15.798241,0.927599,2.500302,1.856403,0.103886,0.007949,0.031912,0.087118,0.137097,0.221124,2.321149,101.831228,0.062518,0.571363
std,0.482918,106.863097,0.707476,13.605138,8.780829,0.998613,1.908286,0.579261,0.398555,0.097436,0.175767,0.844336,1.497437,0.652306,17.594721,50.535575,0.245291,0.792798
min,0.0,0.0,2015.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,18.0,2016.0,16.0,8.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,69.29,0.0,0.0
50%,0.0,69.0,2016.0,28.0,16.0,1.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,94.575,0.0,0.0
75%,1.0,160.0,2017.0,38.0,23.0,2.0,3.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,126.0,0.0,1.0
max,1.0,737.0,2017.0,53.0,31.0,19.0,50.0,55.0,10.0,10.0,1.0,26.0,72.0,21.0,391.0,5400.0,8.0,5.0


In [6]:
# Changing data types of columns, such as 'Children', since there are no more null values and error data has been resolved
data['Children'] = data['Children'].astype(int)
# Verifying the changes in data types
data.dtypes

Hotel                              object
Is_canceled                         int64
Lead_time                           int64
Arrival_date_year                   int64
Arrival_date_month                 object
Arrival_date_week_number            int64
Arrival_date_day_of_month           int64
Stays_in_weekend_nights             int64
Stays_in_week_nights                int64
Adults                              int64
Children                            int64
Babies                              int64
Meal                               object
Country                            object
Market_segment                     object
Distribution_channel               object
Is_repeated_guest                   int64
Previous_cancellations              int64
Previous_bookings_not_canceled      int64
Reserved_room_type                 object
Assigned_room_type                 object
Booking_changes                     int64
Deposit_type                       object
Agent                             

#### Dealing with Outliers

In [7]:
# Dealing with outliers
# In statistics, it is often best to ignore such data if there are few occurrences

# Counting occurrences of different values in 'Adults', 'Children', and 'Babies'
outliers_adults = data['Adults'].value_counts()
outliers_children = data['Children'].value_counts()
outliers_babies = data['Babies'].value_counts()

# Printing the counts to identify outliers
print(outliers_adults)
print(outliers_children)
print(outliers_babies)

Adults
2     89680
1     23027
3      6202
0       403
4        62
26        5
5         2
27        2
20        2
40        1
55        1
50        1
6         1
10        1
Name: count, dtype: int64
Children
0     110800
1       4861
2       3652
3         76
10         1
Name: count, dtype: int64
Babies
0     118473
1        900
2         15
10         1
9          1
Name: count, dtype: int64


In [8]:
# Removing rows with extreme values in 'Children' and 'Babies'
outliers = data[data['Children'] >= 10].index
data = data.drop(outliers)

outliers = data[data['Babies'] >= 9].index
data = data.drop(outliers)

# Rechecking the counts after removing outliers
outliers_adults = data['Adults'].value_counts()
outliers_children = data['Children'].value_counts()
outliers_babies = data['Babies'].value_counts()

print(outliers_adults)
print(outliers_children)
print(outliers_babies)
# Ensuring that only the intended data has been eliminated

Adults
2     89678
1     23026
3      6202
0       403
4        62
26        5
5         2
27        2
20        2
40        1
55        1
50        1
6         1
10        1
Name: count, dtype: int64
Children
0    110798
1      4861
2      3652
3        76
Name: count, dtype: int64
Babies
0    118472
1       900
2        15
Name: count, dtype: int64


## Columns Control

#### Ensuring Data Consistency Using Value Counts<br>Handling Redundant Data in 'Meal' Column

In [9]:
# Columns Control
# Ensuring all data is as described in the data dictionary
# Using value_counts to see every possible existing data within our columns

# Checking unique values in the 'Hotel' column
values_Hotel = data['Hotel'].value_counts()

# Checking unique values in the 'Lead_time' column to ensure no negative values
values_Lead_time = data[ data['Lead_time'] < 0]

# Handling redundant data in the 'Meal' column
# Combining 'Undefined' and 'SC' as they have the same meaning in the data dictionary
data['Meal'] = data['Meal'].replace('Undefined', 'SC')

# Checking unique values in the 'Meal' column after replacement
values_Meal_Hotel = data['Meal'].value_counts()

# Printing the unique values in the 'Meal' column
print(values_Meal_Hotel)


Meal
BB    92307
HB    14463
SC    11819
FB      798
Name: count, dtype: int64


#### Checking and Correcting Country Codes

In [10]:
# Second encounter with data issues
# Checking unique values in the 'Country' column to find problematic entries
values_Country_Hotel = data['Country'].value_counts().sort_index()
print(values_Country_Hotel)

# Replacing incorrect country code 'bbbbbbbbbbbbbbb' with 'XXX'
data['Country'] = data['Country'].replace('bbbbbbbbbbbbbbb', 'XXX')

# Rechecking unique values in the 'Country' column after replacement
values_Country_Hotel = data['Country'].value_counts().sort_index()
print(values_Country_Hotel)

Country
ABW                    2
AGO                  362
AIA                    1
ALB                   12
AND                    7
ARE                   51
ARG                  214
ARM                    8
ASM                    1
ATA                    2
ATF                    1
AUS                  426
AUT                 1263
AZE                   17
BDI                    1
BEL                 2342
BEN                    3
BFA                    1
BGD                   12
BGR                   75
BHR                    5
BHS                    1
BIH                   13
BLR                   26
BOL                   10
BRA                 2224
BRB                    4
BWA                    1
CAF                    5
CHE                 1730
CHL                   65
CHN                  999
CIV                    6
CMR                   10
CN                  1279
COL                   71
COM                    2
CPV                   24
CRI                   19
CUB              

#### Verifying Room Type Data

In [11]:
# Checking unique values in the 'Reserved_room_type' column
values_A_Hotel = data['Reserved_room_type'].value_counts().sort_index()
print(values_A_Hotel)

# Checking unique values in the 'Assigned_room_type' column
values_B_Hotel = data['Assigned_room_type'].value_counts().sort_index()
print(values_B_Hotel)

# Note: In a work environment, it is good to annotate and draw attention to specific issues such as the room type 'I' having no data.

Reserved_room_type
A    85993
B     1118
C      932
D    19199
E     6535
F     2897
G     2094
H      601
L        6
P       12
Name: count, dtype: int64
Assigned_room_type
A    74053
B     2162
C     2375
D    25320
E     7806
F     3751
G     2553
H      712
I      363
K      279
L        1
P       12
Name: count, dtype: int64


## Converting Date Columns

#### Converting 'Reservation_status_date' to DateTime Format

In [12]:
# Converting 'Reservation_status_date' to datetime format with error handling (fill with NaN if conversion fails)
data['Reservation_status_date'] = pd.to_datetime(data['Reservation_status_date'], errors='coerce')

# Identifying rows with invalid dates after conversion
invalid_dates = data[data['Reservation_status_date'].isna()]
print(invalid_dates)

Empty DataFrame
Columns: [Hotel, Is_canceled, Lead_time, Arrival_date_year, Arrival_date_month, Arrival_date_week_number, Arrival_date_day_of_month, Stays_in_weekend_nights, Stays_in_week_nights, Adults, Children, Babies, Meal, Country, Market_segment, Distribution_channel, Is_repeated_guest, Previous_cancellations, Previous_bookings_not_canceled, Reserved_room_type, Assigned_room_type, Booking_changes, Deposit_type, Agent, Company, Days_in_waiting_list, Customer_type, Adr, Required_car_parking_spaces, Total_of_special_requests, Reservation_status, Reservation_status_date]
Index: []


It works!

From now on we will use this table to work with and avoid modifying it as much as possible. 

# Automated learning

In [2]:
#ALL IMPORTS
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix



In [66]:
# Displaying the first 20 rows of the dataframe
data_train = pd.read_csv("train.csv")

data_train.dtypes

#data_train.head(10)

Hotel                              object
Is_canceled                         int64
Lead_time                           int64
Arrival_date_year                   int64
Arrival_date_month                 object
Arrival_date_week_number            int64
Arrival_date_day_of_month           int64
Stays_in_weekend_nights             int64
Stays_in_week_nights                int64
Adults                              int64
Children                            int64
Babies                              int64
Meal                               object
Country                            object
Market_segment                     object
Distribution_channel               object
Is_repeated_guest                   int64
Previous_cancellations              int64
Previous_bookings_not_canceled      int64
Reserved_room_type                 object
Assigned_room_type                 object
Booking_changes                     int64
Deposit_type                       object
Agent                             

In [29]:
#We create the test csv file
data_test = pd.read_csv("test.csv")

data_test.head(20)

Unnamed: 0,Hotel,Lead_time,Arrival_date_year,Arrival_date_month,Arrival_date_week_number,Arrival_date_day_of_month,Stays_in_weekend_nights,Stays_in_week_nights,Adults,Children,...,Deposit_type,Agent,Company,Days_in_waiting_list,Customer_type,Adr,Required_car_parking_spaces,Total_of_special_requests,Reservation_status,Reservation_status_date
0,Resort Hotel,342,2015,July,27,1,0,0,2,0,...,No Deposit,0.0,0.0,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,737,2015,July,27,1,0,0,2,0,...,No Deposit,0.0,0.0,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,7,2015,July,27,1,0,1,1,0,...,No Deposit,0.0,0.0,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,13,2015,July,27,1,0,1,1,0,...,No Deposit,304.0,0.0,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,14,2015,July,27,1,0,2,2,0,...,No Deposit,240.0,0.0,0,Transient,98.0,0,1,Check-Out,2015-07-03
5,Resort Hotel,14,2015,July,27,1,0,2,2,0,...,No Deposit,240.0,0.0,0,Transient,98.0,0,1,Check-Out,2015-07-03
6,Resort Hotel,0,2015,July,27,1,0,2,2,0,...,No Deposit,0.0,0.0,0,Transient,107.0,0,0,Check-Out,2015-07-03
7,Resort Hotel,9,2015,July,27,1,0,2,2,0,...,No Deposit,303.0,0.0,0,Transient,103.0,0,1,Check-Out,2015-07-03
8,Resort Hotel,85,2015,July,27,1,0,3,2,0,...,No Deposit,240.0,0.0,0,Transient,82.0,0,1,Canceled,2015-05-06
9,Resort Hotel,75,2015,July,27,1,0,3,2,0,...,No Deposit,15.0,0.0,0,Transient,105.5,0,0,Canceled,2015-04-22


In [5]:
#Not ever run, just one time
data_train.to_csv('train.csv', index=False)
data_test.to_csv('test.csv', index=False)

In [79]:
# Datos de entrenamiento y prueba
y = data_train["Is_canceled"]
#features = ["Lead_time", "Country" , "Agent", "Is_repeated_guest", "Assigned_room_type", "Customer_type","Arrival_date_week_number"]

features = ["Lead_time","Country","Is_repeated_guest","Total_of_special_requests", "Assigned_room_type","Previous_cancellations", "Customer_type"]
X = pd.get_dummies(data_train[features])


In [80]:
## RANDOM FOREST
# Dividir datos en entrenamiento y prueba
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1)

# Definir el modelo
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)

# Entrenar el modelo
model.fit(X_train, y_train)

# Predecir en el conjunto de validación
y_pred = model.predict(X_valid)

# Medir la precisión
accuracy = accuracy_score(y_valid, y_pred)
print(f'Accuracy: {accuracy}')

# Reporte de clasificación
print(classification_report(y_valid, y_pred))

# Matriz de confusión
print(confusion_matrix(y_valid, y_pred))


Accuracy: 0.7671915570818326
              precision    recall  f1-score   support

           0       0.74      0.99      0.84     15129
           1       0.95      0.39      0.55      8749

    accuracy                           0.77     23878
   macro avg       0.84      0.69      0.70     23878
weighted avg       0.81      0.77      0.74     23878

[[14932   197]
 [ 5362  3387]]


In [18]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [5, 10, 15],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy')
grid_search.fit(X_train, y_train)

print(grid_search.best_params_)

{'max_depth': 15, 'min_samples_split': 5, 'n_estimators': 300}


In [81]:
best_params = grid_search.best_params_

best_model = RandomForestClassifier(**best_params, random_state=1)
best_model.fit(X_train, y_train)

y_pred = best_model.predict(X_valid)

accuracy = accuracy_score(y_valid, y_pred)
print(f'Precisión del modelo con los mejores parámetros: {accuracy}')

# Reporte de clasificación
print(classification_report(y_valid, y_pred))

# Matriz de confusión
print(confusion_matrix(y_valid, y_pred))

Precisión del modelo con los mejores parámetros: 0.8030404556495518
              precision    recall  f1-score   support

           0       0.79      0.95      0.86     15129
           1       0.86      0.56      0.67      8749

    accuracy                           0.80     23878
   macro avg       0.82      0.75      0.77     23878
weighted avg       0.81      0.80      0.79     23878

[[14316   813]
 [ 3890  4859]]


In [82]:
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV

# Definir el modelo
xgb_model = XGBClassifier(random_state=1)

# Definir el grid de parámetros
param_dist = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 7],
    'subsample': [0.8, 0.9, 1.0],
    'colsample_bytree': [0.8, 0.9, 1.0]
}

# Realizar la búsqueda aleatoria
random_search = RandomizedSearchCV(xgb_model, param_distributions=param_dist, n_iter=10, cv=3, scoring='accuracy', random_state=1)
random_search.fit(X_train, y_train)

# Evaluar el modelo con los mejores parámetros
best_xgb_model = random_search.best_estimator_
y_pred = best_xgb_model.predict(X_valid)

# Medir la precisión
accuracy = accuracy_score(y_valid, y_pred)
print(f'Accuracy: {accuracy}')


Accuracy: 0.815813719742022
