# <FONT SIZE=5 COLOR="purple"> **4. Redes con Tensorflow-Pytorch** </FONT>

Hacer el problema de clasificación del punto 1, usando:

  - Tensorflow

  - Pytorch

En cada uno, evalúe el modelo. ¿Se obtuvieron mejores resultados en comparación con los modelos clásicos?

## <FONT SIZE=5 COLOR="purple"> ** Carga de los datos desde kaggle - Loan_approval_data_2025** </FONT>*italicized text*

In [1]:
import kagglehub
import os
import pandas as pd

In [2]:
path = kagglehub.dataset_download("parthpatel2130/realistic-loan-approval-dataset-us-and-canada")
print(path)

Using Colab cache for faster access to the 'realistic-loan-approval-dataset-us-and-canada' dataset.
/kaggle/input/realistic-loan-approval-dataset-us-and-canada


In [3]:
for f in os.listdir(path):
  print(f)

Loan_approval_data_2025.csv


In [4]:
df_loan = pd.read_csv(path + "/Loan_approval_data_2025.csv")
df_loan.head()

Unnamed: 0,customer_id,age,occupation_status,years_employed,annual_income,credit_score,credit_history_years,savings_assets,current_debt,defaults_on_file,delinquencies_last_2yrs,derogatory_marks,product_type,loan_intent,loan_amount,interest_rate,debt_to_income_ratio,loan_to_income_ratio,payment_to_income_ratio,loan_status
0,CUST100000,40,Employed,17.2,25579,692,5.3,895,10820,0,0,0,Credit Card,Business,600,17.02,0.423,0.023,0.008,1
1,CUST100001,33,Employed,7.3,43087,627,3.5,169,16550,0,1,0,Personal Loan,Home Improvement,53300,14.1,0.384,1.237,0.412,0
2,CUST100002,42,Student,1.1,20840,689,8.4,17,7852,0,0,0,Credit Card,Debt Consolidation,2100,18.33,0.377,0.101,0.034,1
3,CUST100003,53,Student,0.5,29147,692,9.8,1480,11603,0,1,0,Credit Card,Business,2900,18.74,0.398,0.099,0.033,1
4,CUST100004,32,Employed,12.5,63657,630,7.2,209,12424,0,0,0,Personal Loan,Education,99600,13.92,0.195,1.565,0.522,1


## <FONT SIZE=5 COLOR="purple"> **Carga de librerias necesarias** </FONT>

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import (
    classification_report,
    confusion_matrix,
    roc_curve,
    auc,
    accuracy_score,
    f1_score,
    mean_squared_error,
    r2_score,
)

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier # Métodos de Ensamble
from sklearn.svm import SVC, LinearSVC
from sklearn.naive_bayes import GaussianNB


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

## <FONT SIZE=5 COLOR="purple"> **División de datos** </FONT>

Antes de entrenar los modelos de Redes Neuronales, es muy importante que los datos estén en el formato numérico y de tensor esperado. Utilizamos el preprocesador definido en la sección de Modelos Clásicos para aplicar el escalado y la codificación One-Hot a las variables categóricas, asegurando que no queden columnas de tipo objeto

In [6]:
df_loan_final = df_loan.drop(columns=['customer_id'])


In [7]:
objetivo = 'loan_status'
lista_resultados = []

try:
    X = df_loan_final.drop(objetivo, axis=1)
    y = df_loan_final[objetivo]
except KeyError:
    print(f"Error: No se encontró la columna '{objetivo}' en datos_p1.")

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.3,
                                                    random_state=42,
                                                    stratify=y)

In [8]:
variables_num = X_train.select_dtypes(include=np.number).columns.tolist()
variables_cat = X_train.select_dtypes(include='object').columns.tolist()

preprocesador = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), variables_num),
        ('cat', OneHotEncoder(handle_unknown='ignore', sparse_output=False), variables_cat)
    ],
    remainder='passthrough'
)

X_train_transformado = preprocesador.fit_transform(X_train)
X_test_transformado = preprocesador.transform(X_test)

nombres_transformados = (
    variables_num +
    list(preprocesador.named_transformers_['cat'].get_feature_names_out(variables_cat))
)

df_transformado = pd.DataFrame(
    X_train_transformado,
    columns=nombres_transformados
)


## <FONT SIZE=5 COLOR="purple"> **Implementación  TensorFlow** </FONT>

Observamos que después de la transformación, la dimensión de entrada ha pasado de las variables originales a 27 características input_dim=27 un valor clave que usaremos para definir la primera capa de ambas redes.

In [9]:
X_train_final = preprocesador.transform(X_train)
X_test_final = preprocesador.transform(X_test)

input_dim = X_train_final.shape[1]
print(f"La dimensión de entrada correcta es: {input_dim}")


num_clases = 1

model_tf = Sequential([
    # Se corrige input_shape a la nueva dimensión (27)
    Dense(64, activation='relu', input_shape=(input_dim,)),
    Dense(32, activation='relu'),
    Dense(num_clases, activation='sigmoid')
])

model_tf.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print(model_tf.summary())

La dimensión de entrada correcta es: 27


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


None


In [10]:
history = model_tf.fit(
    X_train_final,
    y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_test_final, y_test),
    verbose=1
)

Epoch 1/50
[1m1094/1094[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.8563 - loss: 0.3233 - val_accuracy: 0.9029 - val_loss: 0.2190
Epoch 2/50
[1m1094/1094[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.9022 - loss: 0.2189 - val_accuracy: 0.9068 - val_loss: 0.2108
Epoch 3/50
[1m1094/1094[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.9088 - loss: 0.2065 - val_accuracy: 0.9032 - val_loss: 0.2139
Epoch 4/50
[1m1094/1094[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9060 - loss: 0.2079 - val_accuracy: 0.9098 - val_loss: 0.2054
Epoch 5/50
[1m1094/1094[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 3ms/step - accuracy: 0.9136 - loss: 0.1955 - val_accuracy: 0.9083 - val_loss: 0.2058
Epoch 6/50
[1m1094/1094[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9120 - loss: 0.1971 - val_accuracy: 0.9129 - val_loss: 0.2004
Epoch 7/50
[1m1

In [25]:
datos_inventados_nuevos = {
    'age': [20, 45],
    'occupation_status': ['Unemployed', 'Employed'],
    'years_employed': [0.0, 20.0],
    'annual_income': [10000, 150000],
    'credit_score': [650, 800],
    'credit_history_years': [0.5, 25.0],
    'savings_assets': [500, 50000],
    'current_debt': [0, 10000],
    'defaults_on_file': [0, 0],
    'delinquencies_last_2yrs': [0, 0],
    'derogatory_marks': [0, 0],
    'product_type': ['Credit Card', 'Line of Credit'],
    'loan_intent': ['Personal', 'Business'],
    'loan_amount': [500, 100000],
    'interest_rate': [18.0, 6.5],
    'debt_to_income_ratio': [0.0, 0.067],
    'loan_to_income_ratio': [0.05, 0.67],
    'payment_to_income_ratio': [0.05, 0.04]
}

X_nuevos_a_predecir = pd.DataFrame(datos_inventados_nuevos)

print("DataFrame 'X_nuevos_a_predecir' creado exitosamente.")

DataFrame 'X_nuevos_a_predecir' creado exitosamente.


In [26]:
print("Evaluación del Modelo Keras")

loss, accuracy = model_tf.evaluate(X_test_final, y_test, verbose=0)
print(f"Pérdida (Binary Cross-entropy) en prueba: {loss:.4f}")
print(f"Exactitud (Accuracy) en prueba: {accuracy:.4f}")

X_nuevos_final = preprocesador.transform(X_nuevos_a_predecir)

predicciones_keras_prob = model_tf.predict(X_nuevos_final)

clases_predichas_keras = (predicciones_keras_prob > 0.5).astype("int32").flatten()

print(f"Probabilidades de Aprobación (P(Y=1)):\n{predicciones_keras_prob.flatten()}")
print(f"Clases Predichas (0=No Aprobado, 1=Aprobado):\n{clases_predichas_keras}")

Evaluación del Modelo Keras
Pérdida (Binary Cross-entropy) en prueba: 0.2133
Exactitud (Accuracy) en prueba: 0.9107
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step
Probabilidades de Aprobación (P(Y=1)):
[0.9998812 0.9999285]
Clases Predichas (0=No Aprobado, 1=Aprobado):
[1 1]


## <FONT SIZE=5 COLOR="purple"> **Implementación con PyTorch** </FONT>

Para esta parte, tenemos que ejecutar el one hot encoder, ya que las variables para la predicción con pytorch deben estar escaladas

In [15]:
if hasattr(X_train_final, 'toarray'):
    X_train_np = X_train_final.toarray()
    X_test_np = X_test_final.toarray()
else:
    X_train_np = X_train_final
    X_test_np = X_test_final

X_train_tensor = torch.tensor(X_train_np, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test_np, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1)

train_data = TensorDataset(X_train_tensor, y_train_tensor)
test_data = TensorDataset(X_test_tensor, y_test_tensor) # Agregamos el dataset de prueba

train_loader = DataLoader(dataset=train_data, batch_size=32, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=32, shuffle=False)

Volvemos a verificar la dimensionalidad despues de codificar los datos y ver las variables categoricas, y creamos la clase clasificadores

In [27]:
class ClasificadorANN(nn.Module):
    def __init__(self, input_size):
        super(ClasificadorANN, self).__init__()
        self.layer_1 = nn.Linear(input_size, 64)
        self.relu = nn.ReLU()
        self.layer_2 = nn.Linear(64, 32)
        self.layer_out = nn.Linear(32, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.layer_1(x)
        x = self.relu(x)
        x = self.layer_2(x)
        x = self.relu(x)
        x = self.layer_out(x)
        x = self.sigmoid(x)
        return x

input_dim = X_train_tensor.shape[1]
print(f"La dimensión de entrada correcta para PyTorch es: {input_dim}")

model_pt = ClasificadorANN(input_dim)
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model_pt.parameters(), lr=0.001)

La dimensión de entrada correcta para PyTorch es: 27


Despues, entrenaremos el modelo

In [28]:
num_epochs = 50
model_pt.train()

for epoch in range(num_epochs):
    for X_batch, y_batch in train_loader:
        # Resetear gradientes
        optimizer.zero_grad()

        # Forward pass
        y_pred = model_pt(X_batch)
        loss = criterion(y_pred, y_batch)

        # Backward pass y optimización
        loss.backward()
        optimizer.step()

    # Opcional: imprimir el progreso de la pérdida
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [10/50], Loss: 0.3566
Epoch [20/50], Loss: 0.0373
Epoch [30/50], Loss: 0.3096
Epoch [40/50], Loss: 0.0745
Epoch [50/50], Loss: 0.0410


In [24]:
X_nuevos_final = preprocesador.transform(X_nuevos_a_predecir)

if hasattr(X_nuevos_final, 'toarray'):
    X_nuevos_np = X_nuevos_final.toarray()
else:
    X_nuevos_np = X_nuevos_final

model_pt.eval()
with torch.no_grad():
    X_nuevos_tensor = torch.tensor(X_nuevos_np, dtype=torch.float32)
    y_prob_pt = model_pt(X_nuevos_tensor)
    clases_predichas_pt = (y_prob_pt > 0.5).int().flatten().numpy()

print(f"Probabilidades de Aprobación (P(Y=1)):\n{y_prob_pt.flatten().numpy()}")
print(f"Clases Predichas (0=No Aprobado, 1=Aprobado):\n{clases_predichas_pt}")

Probabilidades de Aprobación (P(Y=1)):
[0.9999974 1.       ]
Clases Predichas (0=No Aprobado, 1=Aprobado):
[1 1]


Podemos ver que el uso de herramientas de deep learning como lo son pytorch y tensorflow hace mas efectiva las predicciones clasificatorias para los problemas qu se vieron antes. Usando modelos de machine learning llegamos a que estos teninan 0.91 de efectividad, mientras que estos en algunos casos llegan al 0.99 o hasta 1. Entonces pese a que usen mas recursos, estos son muchos mas efectivos para hacer clasificaciones.

Se podria a futuro mirar con modelos de clasificación no binarias, para mirar como seria el comportamiento de esta.

In [29]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [32]:
!apt-get install texlive-xetex texlive-latex-extra texlive-fonts-recommended pandoc

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pandoc is already the newest version (2.9.2.1-3ubuntu2).
pandoc set to manually installed.
The following additional packages will be installed:
  dvisvgm fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono
  fonts-texgyre fonts-urw-base35 libapache-pom-java libcommons-logging-java
  libcommons-parent-java libfontbox-java libgs9 libgs9-common libidn12
  libijs-0.35 libjbig2dec0 libkpathsea6 libpdfbox-java libptexenc1 libruby3.0
  libsynctex2 libteckit0 libtexlua53 libtexluajit2 libwoff1 libzzip-0-13
  lmodern poppler-data preview-latex-style rake ruby ruby-net-telnet
  ruby-rubygems ruby-webrick ruby-xmlrpc ruby3.0 rubygems-integration t1utils
  teckit tex-common tex-gyre texlive-base texlive-binaries texlive-latex-base
  texlive-latex-recommended texlive-pictures texlive-plain-generic tipa
  xfonts-encodings xfonts-utils
Suggested packages:
  fonts-noto fonts-freefont-otf | fonts-

In [33]:
!jupyter nbconvert --to PDF /content/drive/MyDrive/Clasificación_Redes_MCG.ipynb

[NbConvertApp] Converting notebook /content/drive/MyDrive/Clasificación_Redes_MCG.ipynb to PDF
[NbConvertApp] Writing 90873 bytes to notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', 'notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', 'notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 84194 bytes to /content/drive/MyDrive/Clasificación_Redes_MCG.pdf
