___
<h3 style="text-align:center;">⚠️ Entrenamiento de <u>modelos</u> ⚠️
</h3>

---

<h3 style="text-align:center;">📚🔧
</h3>

In [30]:
# Importacion de Librerías:
import pandas as pd
import numpy as np

from sklearn.model_selection import RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

import pickle
import yaml

---
**<h3 style="text-align:center;">📚🔧: División: <u>X_train</u> , <u>y_train</u> , <u>X_test</u> , <u>y_test</u></h3>**

**X_train + y_train:**

In [2]:
df_train = pd.read_csv('../data/train.csv', index_col='Unnamed: 0')

X_train = df_train.drop('Diagnosis', axis=1)
y_train = df_train['Diagnosis']

**X_test + y_test:**

In [3]:
df_test = pd.read_csv('../data/test.csv', index_col='Unnamed: 0')

X_test = df_test.drop('Diagnosis', axis=1)
y_test = df_test['Diagnosis']

**☑️ Check Dimensiones:**

In [4]:
# Forma de variables de entrenamiento y comevaluación:
print('Forma de X_train:', X_train.shape)
print('Forma de y_train:', y_train.shape)
print('Forma de X_test:', X_test.shape)
print('Forma de y_test:', y_test.shape)

Forma de X_train: (1125, 10)
Forma de y_train: (1125,)
Forma de X_test: (375, 10)
Forma de y_test: (375,)


---
**<h3 style="text-align:center;">🤖¹ : <u>DecisionTreeClassifier</u>🌳 +  <u>RandomizedSearchCV</u>🔄</h3>**

**🏋️‍♂️ Entrenamiento:**

In [12]:
modelo_dtc = DecisionTreeClassifier()


params_dtc = {
    'max_depth': np.arange(1, 20),
    'min_samples_split': np.arange(2, 20),
    'min_samples_leaf': np.arange(1, 20),
    'criterion': ['gini', 'entropy']
}

rs_dtc = RandomizedSearchCV(
    modelo_dtc, param_distributions=params_dtc,
    n_iter=50, scoring='recall', cv=5, random_state=42, n_jobs=-1
)

rs_dtc.fit(X_train, y_train)

print("Mejores parámetros:", rs_dtc.best_params_)
print("Mejor puntuación de validación:", rs_dtc.best_score_)

Mejores parámetros: {'min_samples_split': np.int64(7), 'min_samples_leaf': np.int64(5), 'max_depth': np.int64(15), 'criterion': 'entropy'}
Mejor puntuación de validación: 0.8168674698795181


**🔮 Predicciones:**

In [13]:
best_dtc = rs_dtc.best_estimator_

pred_dtc = best_dtc.predict(X_test)

df_predicciones_dtc = pd.DataFrame({'y_real': y_test,  
                                'y_pred': pred_dtc})

df_predicciones_dtc

Unnamed: 0,y_real,y_pred
1116,1,1
1368,1,1
422,0,0
413,0,0
451,0,0
...,...,...
155,1,1
1151,0,0
1141,0,0
974,0,1


**✔️ Evaluación:**

In [14]:
print('accuracy:',accuracy_score(y_test, pred_dtc))
print('recall:',recall_score(y_test,pred_dtc))
print('precission:', precision_score(y_test, pred_dtc))

print('\n----------------------------------------------\n')

print('confusion_matrix:\n',confusion_matrix(y_test, pred_dtc))

accuracy: 0.8826666666666667
recall: 0.795774647887324
precission: 0.8828125

----------------------------------------------

confusion_matrix:
 [[218  15]
 [ 29 113]]


In [15]:
print(classification_report(y_test, pred_dtc))

              precision    recall  f1-score   support

           0       0.88      0.94      0.91       233
           1       0.88      0.80      0.84       142

    accuracy                           0.88       375
   macro avg       0.88      0.87      0.87       375
weighted avg       0.88      0.88      0.88       375



**💾 Guardar Modelo + Config:**

In [16]:
with open('../models/DecisionTree_RandomizedSearchCV/model_DTCRSCV.pkl', 'wb') as archivo:
    pickle.dump(best_dtc, archivo)

In [17]:
configuracion = {
    'modelo': 'DecisionTreeClassifier',
    'mejores_parametros': rs_dtc.best_params_,
    'mejor_puntuacion': rs_dtc.best_score_,
    'configuracion_busqueda': {
        'n_iter': 50,
        'scoring': 'recall',
        'cv': 5,
        'random_state': 42
    }
}

with open('../models/DecisionTree_RandomizedSearchCV/config_DTCRSCV.yaml', 'w') as archivo:
    yaml.dump(configuracion, archivo)

---
**<h3 style="text-align:center;">🤖² : <u>RandomForestClassifier</u>🌳 +  <u>RandomizedSearchCV</u>🔄</h3>**

**🏋️‍♂️ Entrenamiento:**

In [8]:
modelo_rf = RandomForestClassifier()

params_rf = {
    'n_estimators': np.arange(25,75),
    'max_depth': np.arange(1, 20),
    'min_samples_split': np.arange(2, 20),
    'min_samples_leaf': np.arange(1, 20),
    'criterion': ['gini', 'entropy']
}

rs_rf = RandomizedSearchCV(
    modelo_rf, param_distributions=params_rf,
    n_iter=50, scoring='recall', cv=5, random_state=42, n_jobs=-1
)

rs_rf.fit(X_train, y_train)

print("Mejores parámetros:", rs_rf.best_params_)
print("Mejor puntuación de validación:", rs_rf.best_score_)

Mejores parámetros: {'n_estimators': np.int64(65), 'min_samples_split': np.int64(8), 'min_samples_leaf': np.int64(1), 'max_depth': np.int64(16), 'criterion': 'gini'}
Mejor puntuación de validación: 0.8313253012048193


**🔮 Predicciones:**

In [9]:
best_rf = rs_rf.best_estimator_

pred_rf = best_rf.predict(X_test)

df_predicciones_rf = pd.DataFrame({'y_real': y_test,  
                                'y_pred': pred_rf})

df_predicciones_rf

Unnamed: 0,y_real,y_pred
1116,1,1
1368,1,1
422,0,0
413,0,0
451,0,0
...,...,...
155,1,1
1151,0,0
1141,0,0
974,0,1


**✔️ Evaluación:**

In [10]:
print('accuracy:',accuracy_score(y_test, pred_rf))
print('recall:',recall_score(y_test, pred_rf))
print('precission:', precision_score(y_test, pred_rf))

print('\n----------------------------------------------\n')

print('confusion_matrix:\n',confusion_matrix(y_test, pred_rf))

accuracy: 0.9173333333333333
recall: 0.8591549295774648
precission: 0.9172932330827067

----------------------------------------------

confusion_matrix:
 [[222  11]
 [ 20 122]]


**💾 Guardar Modelo + Config:**

In [18]:
with open('../models/RandomForest_RandomizedSearchCV/model_RFRSCV.pkl', 'wb') as archivo:
    pickle.dump(best_rf, archivo)

In [19]:
configuracion_rf = {
    'modelo': 'RandomForestClassifier',
    'mejores_parametros': rs_rf.best_params_,
    'mejor_puntuacion': rs_rf.best_score_,
    'configuracion_busqueda': {
        'n_iter': 50,
        'scoring': 'recall',
        'cv': 5,
        'random_state': 42
    }
}


with open('../models/RandomForest_RandomizedSearchCV/config_RFRSCV.yaml', 'w') as archivo:
    yaml.dump(configuracion, archivo)

---
**<h3 style="text-align:center;">🤖³ : LogisticRegression </h3>**

**🏋️‍♂️ Entrenamiento:**

In [25]:
modelo_lr = LogisticRegression(max_iter=500)

params_lr = {
    'penalty': ['l1', 'l2', 'elasticnet', None],
    'C': np.logspace(-4, 4, 20),
    'solver': ['liblinear', 'saga']
}

rs_lr = RandomizedSearchCV(
    modelo_lr, param_distributions=params_lr,
    n_iter=50, scoring='recall', cv=5, random_state=42, n_jobs=-1
)

rs_lr.fit(X_train, y_train)

print("Mejores parámetros:", rs_lr.best_params_)
print("Mejor puntuación de validación:", rs_lr.best_score_)

Mejores parámetros: {'solver': 'liblinear', 'penalty': 'l1', 'C': np.float64(545.5594781168514)}
Mejor puntuación de validación: 0.7903614457831326


80 fits failed out of a total of 250.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Python312\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Python312\Lib\site-packages\sklearn\base.py", line 1473, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Python312\Lib\site-packages\sklearn\linear_model\_logistic.py", line 1194, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Python312\Lib\

**🔮 Predicciones:**

In [26]:
best_lr = rs_lr.best_estimator_

pred_lr = best_lr.predict(X_test)

df_predicciones_lr = pd.DataFrame({'y_real': y_test,  
                                'y_pred': pred_lr})

df_predicciones_lr

Unnamed: 0,y_real,y_pred
1116,1,1
1368,1,1
422,0,0
413,0,0
451,0,0
...,...,...
155,1,1
1151,0,0
1141,0,0
974,0,0


**✔️ Evaluación:**

In [27]:
print('accuracy:',accuracy_score(y_test, pred_lr))
print('recall:',recall_score(y_test, pred_lr))
print('precission:', precision_score(y_test, pred_lr))

print('\n----------------------------------------------\n')

print('confusion_matrix:\n',confusion_matrix(y_test, pred_lr))

accuracy: 0.9013333333333333
recall: 0.8309859154929577
precission: 0.9007633587786259

----------------------------------------------

confusion_matrix:
 [[220  13]
 [ 24 118]]


**💾 Guardar Modelo + Config:**

In [28]:
with open('../models/LogisticRegression_RandomizedSearchCV/model_LRRSCV.pkl', 'wb') as archivo:
    pickle.dump(best_lr, archivo)

In [29]:
configuracion_lr = {
    'modelo': 'LogisticRegression',
    'mejores_parametros': rs_lr.best_params_,
    'mejor_puntuacion': rs_lr.best_score_,
    'configuracion_busqueda': {
        'n_iter': 50,
        'scoring': 'accuracy',
        'cv': 5,
        'random_state': 42
    }
}

with open('../models/LogisticRegression_RandomizedSearchCV/config_LRRSCV.yaml', 'w') as archivo:
    yaml.dump(configuracion_lr, archivo)


---
**<h3 style="text-align:center;">🤖⁴ : SVC + RandomizedSearchCV </h3>**

**🏋️‍♂️ Entrenamiento:**

In [34]:
modelo_svm = SVC()

params_svm = {
    'C': np.logspace(-3, 3, 7),
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto', 0.01, 0.1]
}

rs_svm = RandomizedSearchCV(
    modelo_svm, param_distributions=params_svm,
    n_iter=50, scoring='recall', cv=5, random_state=42, n_jobs=-1
)

rs_svm.fit(X_train, y_train)

print("Mejores parámetros:", rs_svm.best_params_)
print("Mejor puntuación de validación:", rs_svm.best_score_)

Mejores parámetros: {'kernel': 'rbf', 'gamma': 'scale', 'C': np.float64(1000.0)}
Mejor puntuación de validación: 0.8192771084337348


**🔮 Predicciones:**

In [35]:
best_svm = rs_svm.best_estimator_
pred_svm = best_svm.predict(X_test)

df_predicciones_svm = pd.DataFrame({'y_real': y_test, 'y_pred': pred_svm})

df_predicciones_svm

Unnamed: 0,y_real,y_pred
1116,1,1
1368,1,1
422,0,0
413,0,0
451,0,0
...,...,...
155,1,1
1151,0,0
1141,0,0
974,0,0


**✔️ Evaluación:**

In [36]:
print('accuracy:', accuracy_score(y_test, pred_svm))
print('recall:', recall_score(y_test, pred_svm))
print('precision:', precision_score(y_test, pred_svm))
print('\n----------------------------------------------\n')
print('confusion_matrix:\n', confusion_matrix(y_test, pred_svm))

accuracy: 0.896
recall: 0.8169014084507042
precision: 0.8992248062015504

----------------------------------------------

confusion_matrix:
 [[220  13]
 [ 26 116]]


In [37]:
print(classification_report(y_test, pred_svm))

              precision    recall  f1-score   support

           0       0.89      0.94      0.92       233
           1       0.90      0.82      0.86       142

    accuracy                           0.90       375
   macro avg       0.90      0.88      0.89       375
weighted avg       0.90      0.90      0.89       375



**💾 Guardar Modelo + Config:**

In [38]:
with open('../models/SVM_RandomizedSearchCV/model_SVMRSCV.pkl', 'wb') as archivo:
    pickle.dump(best_svm, archivo)

In [39]:
configuracion_svm = {
    'modelo': 'SVM',
    'mejores_parametros': rs_svm.best_params_,
    'mejor_puntuacion': rs_svm.best_score_,
    'configuracion_busqueda': {
        'n_iter': 50,
        'scoring': 'accuracy',
        'cv': 5,
        'random_state': 42
    }
}

with open('../models/SVM_RandomizedSearchCV/config_SVMRSCV.yaml', 'w') as archivo:
    yaml.dump(configuracion_svm, archivo)

---
**<h3 style="text-align:center;">🤖⁵ : XGBOOST 🚀 + RandomizedSearchCV</h3>**

**🏋️‍♂️ Entrenamiento:**

**🔮 Predicciones:**


**✔️ Evaluación:**


**💾 Guardar Modelo + Config:**
