<a href="https://colab.research.google.com/github/DANCAR1969/programacion/blob/master/hyper_pram_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Importar librerías necesarias
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

# Cargar el dataset Titanic (disponible en seaborn)
df = sns.load_dataset('titanic')

# Seleccionar columnas de interés y limpiar datos:
# Variables utilizadas: pclass, sex, age, fare; Objetivo: survived
data = df[['survived', 'pclass', 'sex', 'age', 'fare']].copy()
data = data.dropna(subset=['survived'])  # Asegurarse de tener la variable objetivo

# Rellenar valores faltantes en 'age' y 'fare'
data['age'] = data['age'].fillna(data['age'].median())
data['fare'] = data['fare'].fillna(data['fare'].median())

# Convertir la columna 'sex' a valores numéricos: male -> 0, female -> 1
data['sex'] = data['sex'].map({'male': 0, 'female': 1})

# Definir variables predictoras (X) y la variable objetivo (y)
X = data[['pclass', 'sex', 'age', 'fare']]
y = data['survived']

# Dividir el dataset en conjunto de entrenamiento y prueba (80%-20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Nota: Para Random Forest no es estrictamente necesario escalar, pero si se desea:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# Definir el grid de hiperparámetros a optimizar
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt', 'log2']
}

# Inicializar el clasificador Random Forest
rf = RandomForestClassifier(random_state=42)

# Configurar Grid Search con validación cruzada (cv=5)
grid_search = GridSearchCV(estimator=rf,
                           param_grid=param_grid,
                           cv=5,
                           scoring='accuracy',
                           n_jobs=-1,    # Usar todos los núcleos disponibles
                           verbose=2)

# Ejecutar Grid Search en el conjunto de entrenamiento
grid_search.fit(X_train, y_train)

# Mostrar los mejores hiperparámetros encontrados
print("Mejores hiperparámetros:")
print(grid_search.best_params_)

# Evaluar el mejor modelo en el conjunto de prueba
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy en el conjunto de prueba:", accuracy)
print("\nReporte de Clasificación:\n", classification_report(y_test, y_pred))


Fitting 5 folds for each of 324 candidates, totalling 1620 fits
Mejores hiperparámetros:
{'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 50}
Accuracy en el conjunto de prueba: 0.8100558659217877

Reporte de Clasificación:
               precision    recall  f1-score   support

           0       0.80      0.90      0.85       105
           1       0.83      0.68      0.75        74

    accuracy                           0.81       179
   macro avg       0.82      0.79      0.80       179
weighted avg       0.81      0.81      0.81       179



540 fits failed out of a total of 1620.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
154 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Default.DESKTOP-PEHND4S\anaconda3\envs\manim\Lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Default.DESKTOP-PEHND4S\anaconda3\envs\manim\Lib\site-packages\sklearn\base.py", line 1382, in wrapper
    estimator._validate_params()
  File "c:\Users\Default.DESKTOP-PEHND4S\anaconda3\envs\manim\Lib\site-packages\sklearn\base.py", line 436, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Default.DESKTOP-PEHND4S\anaconda3\envs\manim\Lib\

In [None]:
pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp312-cp312-win_amd64.whl.metadata (15 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Using cached joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.6.1-cp312-cp312-win_amd64.whl (11.1 MB)
   ---------------------------------------- 0.0/11.1 MB ? eta -:--:--
   ------------------ --------------------- 5.2/11.1 MB 31.7 MB/s eta 0:00:01
   ---------------------------------------- 11.1/11.1 MB 28.9 MB/s eta 0:00:00
Using cached joblib-1.4.2-py3-none-any.whl (301 kB)
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.6.1 threadpoolctl-3.6.0
Note: you may need to restart the kernel to use updated packages.
