# Ejercicio: ¿Deberíamos decir adiós a ese cliente?

Utiliza nuestro modelo de RNA para predecir si el cliente con la siguiente información abandonará el banco:



Geografia: Francia

Puntaje de crédito: 600

Género masculino

Edad: 40 años de edad

Tenencia: 3 años.

Saldo: $ 60000

Número de productos: 2

¿Este cliente tiene una tarjeta de crédito? Sí

¿Es este cliente un miembro activo? Sí

Salario estimado: $ 50000

Entonces, ¿deberíamos decir adiós a ese cliente?



La solución se proporciona en el vídeo al final de la tarea realizada pero te recomiendo que intentes resolverla por su cuenta.





¡Disfruta del aprendizaje profundo!

In [8]:
import pandas as pd
import torch
import torch.optim as optim
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import torch.nn as nn

In [9]:
df = pd.read_csv('./Churn_Modelling.csv')
df.head

<bound method NDFrame.head of       RowNumber  CustomerId    Surname  CreditScore Geography  Gender  Age  \
0             1    15634602   Hargrave          619    France  Female   42   
1             2    15647311       Hill          608     Spain  Female   41   
2             3    15619304       Onio          502    France  Female   42   
3             4    15701354       Boni          699    France  Female   39   
4             5    15737888   Mitchell          850     Spain  Female   43   
...         ...         ...        ...          ...       ...     ...  ...   
9995       9996    15606229   Obijiaku          771    France    Male   39   
9996       9997    15569892  Johnstone          516    France    Male   35   
9997       9998    15584532        Liu          709    France  Female   36   
9998       9999    15682355  Sabbatini          772   Germany    Male   42   
9999      10000    15628319     Walker          792    France  Female   28   

      Tenure    Balance  NumOfPro

## Encode Categorical Data

In [10]:
df = pd.get_dummies(df, columns=['Geography'])
gender_enc = {
    'Male': 1,
    'Female': 0,
}
df['Gender'] = df['Gender'].map(gender_enc)
print(df)

      RowNumber  CustomerId    Surname  CreditScore  Gender  Age  Tenure  \
0             1    15634602   Hargrave          619       0   42       2   
1             2    15647311       Hill          608       0   41       1   
2             3    15619304       Onio          502       0   42       8   
3             4    15701354       Boni          699       0   39       1   
4             5    15737888   Mitchell          850       0   43       2   
...         ...         ...        ...          ...     ...  ...     ...   
9995       9996    15606229   Obijiaku          771       1   39       5   
9996       9997    15569892  Johnstone          516       1   35      10   
9997       9998    15584532        Liu          709       0   36       7   
9998       9999    15682355  Sabbatini          772       1   42       3   
9999      10000    15628319     Walker          792       0   28       4   

        Balance  NumOfProducts  HasCrCard  IsActiveMember  EstimatedSalary  \
0        

## Divide dataset in train and evaluation

In [11]:
# Define charasteristics and objective
X = df[['CreditScore','Gender', 'Geography_France','Geography_Spain','Geography_Germany','Age','Tenure', 'Balance','NumOfProducts','IsActiveMember','HasCrCard']]
y = df['Exited']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
type(X_train)

pandas.core.frame.DataFrame

## Scale variables

In [12]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # X_train it's a np array
X_test_scaled = scaler.transform(X_test)

type(X_train)
type(y_train.values)

numpy.ndarray

## Convert to tensor

In [13]:
X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values.reshape(-1, 1), dtype=torch.float32) # How the output it's unidimensional, we have to converto to bidimensional
y_test_tensor = torch.tensor(y_test.values.reshape(-1, 1), dtype=torch.float32)
print(X_train_tensor.shape[0])
len(X_train_tensor)

8000


8000

## Build ANN

In [14]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(11,6)
        self.relu = nn.ReLU()
        self.dropout1 = nn.Dropout(p=0.1) # 0,1???
        self.layer2 = nn.Linear(6, 6)
        self.dropout2 = nn.Dropout(p=0.1)
        self.output_layer = nn.Linear(6, 1)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.dropout1(x)

        x = self.layer2(x)
        x = self.relu(x)
        x = self.dropout2(x)

        x = self.output_layer(x)
        x = self.sigmoid(x)
        return x

model = NeuralNetwork()

# Define Loss function and optimizer
criterion = nn.BCELoss() # Binary Cross Entropy Loss
optimizer = optim.Adam(model.parameters())  # Optimizador Adam

## Train ANN

In [30]:
epochs = 100
batch_size = 10
model.train()  
for epoch in range(epochs):
    for i in range(0, X_train_tensor.shape[0], batch_size):
        inputs = X_train_tensor[i:i+batch_size]
        labels = y_train_tensor[i:i+batch_size]
        
        optimizer.zero_grad()   # Restart gradients
        outputs = model(inputs)
        loss = criterion(outputs, labels)  # Calc loss
        loss.backward()  # Backpropagation
        optimizer.step()  # Update weights

    if (epoch+1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

Epoch [10/100], Loss: 0.4160
Epoch [20/100], Loss: 0.3924
Epoch [30/100], Loss: 0.3904
Epoch [40/100], Loss: 0.4698
Epoch [50/100], Loss: 0.5141
Epoch [60/100], Loss: 0.4079
Epoch [70/100], Loss: 0.4443
Epoch [80/100], Loss: 0.3943
Epoch [90/100], Loss: 0.4243
Epoch [100/100], Loss: 0.4080


## Eval and calc final predictions

In [31]:
model.eval()  
with torch.no_grad(): 
    outputs = model(X_test_tensor)
    loss = criterion(outputs, y_test_tensor)
    print(f'Test Loss: {loss.item():.4f}')
    outputs = (outputs>0.5)
    print(outputs)

Test Loss: 0.3350
tensor([[False],
        [False],
        [False],
        ...,
        [ True],
        [False],
        [False]])


## Elaborate Matriz Confussion

In [32]:
from sklearn.metrics import confusion_matrix
print(y_test_tensor)
cm = confusion_matrix(y_test_tensor.numpy(), outputs.numpy()) # Confussion matrix only accepts np arrays
print("Confussion Matrix")
print(cm)

tensor([[0.],
        [0.],
        [0.],
        ...,
        [1.],
        [1.],
        [1.]])
Confussion Matrix
[[1544   63]
 [ 213  180]]


In [33]:
print("Accuracy:") #sum true positives and true negatives / total preds
(cm[0][0]+cm[1][1])/cm.sum()

Accuracy:


np.float64(0.862)

## PREDICTION OF THE EXERCISE WITH EXERCISE DATA

In [8]:
X = {
    'CreditScore': [600],
    'Gender': [1],
    'Geography_France': [1],
    'Geography_Spain': [0],
    'Geography_Germany': [0],
    'Age': [40],
    'Tenure': [3],
    'Balance': [60000],
    'NumOfProducts': [2],
    'IsActiveMember': [1],
    'HasCrCard': [1],
}

X_exercise = pd.DataFrame(X)
print(X_exercise)
print(type(X_exercise.values))
X_exercise_tensor = torch.tensor(X_exercise.values, dtype=torch.float32)
print(X_test_tensor)

   CreditScore  Gender  Geography_France  Geography_Spain  Geography_Germany  \
0          600       1                 1                0                  0   

   Age  Tenure  Balance  NumOfProducts  IsActiveMember  HasCrCard  
0   40       3    60000              2               1          1  
<class 'numpy.ndarray'>
tensor([[-0.5775,  0.9132, -0.9985,  ...,  0.8084, -1.0258, -1.5404],
        [-0.2973,  0.9132,  1.0015,  ...,  0.8084,  0.9748,  0.6492],
        [-0.5256, -1.0950, -0.9985,  ...,  0.8084, -1.0258,  0.6492],
        ...,
        [ 0.8131, -1.0950,  1.0015,  ..., -0.9167, -1.0258,  0.6492],
        [ 0.4188,  0.9132,  1.0015,  ..., -0.9167, -1.0258,  0.6492],
        [-0.2454,  0.9132, -0.9985,  ..., -0.9167,  0.9748,  0.6492]])


In [35]:
model.eval()  
with torch.no_grad(): 
    outputs = model(X_exercise_tensor)
    outputs = (outputs>0.5)
    print(outputs)

tensor([[True]])


# EXERCISE 2

Intenta conseguir la medalla de oro consiguiendo una precisión del 86% utilizando la k-Fold Cross Validation.

Como recordatorio:

- Medalla de bronce: precisión entre 84% y 85%

- Medalla de plata: precisión entre 85% y 86%

- Medalla de oro: precisión superior al 86%

Buena suerte e intenta conseguir la medalla de oro.

In [15]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, Subset, TensorDataset
from sklearn.model_selection import KFold
import numpy as np

In [16]:
df = pd.read_csv('./Churn_Modelling.csv')
df = pd.get_dummies(df, columns=['Geography'])
gender_enc = {
    'Male': 1,
    'Female': 0,
}
df['Gender'] = df['Gender'].map(gender_enc)

X = df[['CreditScore','Gender', 'Geography_France','Geography_Spain','Geography_Germany','Age','Tenure', 'Balance','NumOfProducts','IsActiveMember','HasCrCard']]
y = df['Exited']

In [17]:
X = X.astype(float)

y_tensor = torch.tensor(y.values, dtype=torch.float32).unsqueeze(1)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_tensor = torch.tensor(X_scaled, dtype=torch.float32)

dataset = TensorDataset(X_tensor, y_tensor)
print(dataset)
for i in range(3):
    print(f"Ejemplo {i + 1}:")
    print(f"  X: {dataset[i][0]}")
    print(f"  y: {dataset[i][1]}")

<torch.utils.data.dataset.TensorDataset object at 0x7894691caa00>
Ejemplo 1:
  X: tensor([-0.3262, -1.0960,  0.9972, -0.5738, -0.5787,  0.2935, -1.0418, -1.2258,
        -0.9116,  0.9702,  0.6461])
  y: tensor([1.])
Ejemplo 2:
  X: tensor([-0.4400, -1.0960, -1.0028,  1.7427, -0.5787,  0.1982, -1.3875,  0.1174,
        -0.9116,  0.9702, -1.5478])
  y: tensor([0.])
Ejemplo 3:
  X: tensor([-1.5368, -1.0960,  0.9972, -0.5738, -0.5787,  0.2935,  1.0329,  1.3331,
         2.5271, -1.0307,  0.6461])
  y: tensor([1.])


###
    Index | Feature1 | Feature2 | Feature3 | Label
    -------------------------------------------
        0 |    0.73  |    0.42  |    0.58  |  1.0
        1 |    0.61  |    0.73  |    0.89  |  0.0
        2 |    0.48  |    0.59  |    0.15  |  1.0
        3 |    0.34  |    0.25  |    0.76  |  0.0
        4 |    0.53  |    0.64  |    0.32  |  1.0

In [18]:
from sklearn.metrics import confusion_matrix

# If we use the order of miles simples, we should use k value of 5
k_folds = 5
kf = KFold(n_splits=k_folds, shuffle=True)

results = {}
accuracies = {}

epochs = 100
 
for fold, (train_idx, val_idx) in enumerate(kf.split(dataset)):
    print(f"Fold {fold}")
    print(train_idx)
    print(val_idx)
    train_subset = Subset(dataset, train_idx)
    val_subset = Subset(dataset, val_idx)

    train_loader = DataLoader(train_subset, batch_size=10, shuffle=True)
    val_loader = DataLoader(val_subset, batch_size=10, shuffle=False)

    model = NeuralNetwork()

    criterion = nn.BCELoss()  # Binary Cross Entropy Loss
    optimizer = optim.Adam(model.parameters()) 

    model.train()
    for epoch in range(epochs):
        for batch in train_loader:
            inputs, labels = batch
            optimizer.zero_grad() 
            outputs = model(inputs)
            loss = criterion(outputs, labels) 
            loss.backward() 
            optimizer.step() 

    # Evaluación del modelo
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch in val_loader:
            inputs, labels = batch
            outputs = model(inputs)
            predicted = (outputs > 0.5).float()
            loss = criterion(outputs, labels)
            val_loss += loss.item()

            # Calc number of trues
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

            

    val_loss /= len(val_loader)
    accuracy = 100 * correct / total
    accuracies[fold] = accuracy
    print(f'Accuracy loss: {accuracy}')
    print(f'Validation Loss for fold {fold}: {val_loss}')

    results[fold] = val_loss

mean_val_loss = sum(results.values()) / k_folds
mean_acuc = sum(accuracies.values()) / k_folds
print(f'Mean Validation Loss: {mean_val_loss}')
print(f'Mean Accuracy: {mean_acuc}')

Fold 0
[   0    1    2 ... 9996 9998 9999]
[   3    4   16 ... 9974 9976 9997]
Accuracy loss: 84.6
Validation Loss for fold 0: 0.37342016171664
Fold 1
[   1    2    3 ... 9997 9998 9999]
[   0    6    9 ... 9988 9993 9994]
Accuracy loss: 85.8
Validation Loss for fold 1: 0.3615429665148258
Fold 2
[   0    1    3 ... 9997 9998 9999]
[   2    8   38 ... 9987 9989 9995]
Accuracy loss: 85.9
Validation Loss for fold 2: 0.3551321718096733
Fold 3
[   0    2    3 ... 9996 9997 9999]
[   1    5   10 ... 9991 9992 9998]
Accuracy loss: 84.45
Validation Loss for fold 3: 0.37087736155837775
Fold 4
[   0    1    2 ... 9995 9997 9998]
[   7   15   19 ... 9990 9996 9999]
Accuracy loss: 85.7
Validation Loss for fold 4: 0.3443225518986583
Mean Validation Loss: 0.36105904269963507
Mean Accuracy: 85.28999999999999


In [19]:
X = {
    'CreditScore': [600],
    'Gender': [1],
    'Geography_France': [1],
    'Geography_Spain': [0],
    'Geography_Germany': [0],
    'Age': [40],
    'Tenure': [3],
    'Balance': [60000],
    'NumOfProducts': [2],
    'IsActiveMember': [1],
    'HasCrCard': [1],
}

X_exercise = pd.DataFrame(X)
print(X_exercise)
print(type(X_exercise.values))
X_exercise_tensor = torch.tensor(X_exercise.values, dtype=torch.float32)
print(X_test_tensor)

model.eval()  
with torch.no_grad(): 
    outputs = model(X_exercise_tensor)
    outputs = (outputs>0.5)
    print(outputs)

   CreditScore  Gender  Geography_France  Geography_Spain  Geography_Germany  \
0          600       1                 1                0                  0   

   Age  Tenure  Balance  NumOfProducts  IsActiveMember  HasCrCard  
0   40       3    60000              2               1          1  
<class 'numpy.ndarray'>
tensor([[-0.5775,  0.9132, -0.9985,  ...,  0.8084, -1.0258, -1.5404],
        [-0.2973,  0.9132,  1.0015,  ...,  0.8084,  0.9748,  0.6492],
        [-0.5256, -1.0950, -0.9985,  ...,  0.8084, -1.0258,  0.6492],
        ...,
        [ 0.8131, -1.0950,  1.0015,  ..., -0.9167, -1.0258,  0.6492],
        [ 0.4188,  0.9132,  1.0015,  ..., -0.9167, -1.0258,  0.6492],
        [-0.2454,  0.9132, -0.9985,  ..., -0.9167,  0.9748,  0.6492]])
tensor([[True]])


In [None]:
from sklearn.metrics import confusion_matrix
print(y_test_tensor)
cm = confusion_matrix(y_test_tensor.numpy(), outputs.numpy()) # Confussion matrix only accepts np arrays
print("Confussion Matrix")
print(cm)
print("Accuracy:") #sum true positives and true negatives / total preds
(cm[0][0]+cm[1][1])/cm.sum()