# PyTorch FCNN Churn Prediction
We want to build and save a model, ready for deployment, using PyTorch. The model should be capable of taking in data about an arbitrary customer and predict wether or not said customer is likely to churn. 

## Data loading and preprocessing
The dataset used with this model is publicly available on [kaggle](https://www.kaggle.com/datasets/blastchar/telco-customer-churn/data) and consists of both text and numerical values. To this end, a rigorous data preprocessing pipeline is necessary to facilitate supervised learning and later predictions. 

In [1]:
from pathlib import Path
import pandas as pd
# Load in the data from the local repository
#   NOTE: We do so with BASE_DIR to ensure localization on different machines
BASE_DIR = Path("FCNN.ipynb").parent.resolve()
data = pd.read_csv(f"{BASE_DIR}/data/WA_Fn-UseC_-Telco-Customer-Churn.csv")


### Data Preprocessing Pipeline (NEEDS UPDATE)
The dataset contains a fair share of problematic instances. To summarize, we need to handle the following:
- **Remove redundant columns:** Certain data is irrelevant when predicting customer churn. In our case, the most prevalent one is `customerID`, which should be disregarded.
- **Missing values:** A few instances see missing data in one or more columns. There are a few ways to handle such instances, like interpolating data or simply filling with a mean value (using f.ex `sklearn.impute.SimpleImputer`), however, given the size of the dataset and the potential of adding erroneous information, the adopted strategy is to simply remove incomplete rows of data using `data.dropna(inplace = True)`.
- **Text values:** The dataset contains a mix of datatypes, namely numerical ones and text, the latter of which is not well-suited for ML training or prediciton.
    - **Binary text data:** For text data that is binary in nature, we will simply encode the binary cases to 1's and 0's (f.ex in `gender`, consider _male_ -> 0 and _female_ -> 1).
    - **Non-Binary text data:** For text data that is not binary in nature, we will use a _One Hot Encoder_ from `sklearn.preprocessing.OneHotEncoder`. This is to be preferred over assigning numerical values, as we remove some spatial bias from the final classifier.

For convenience and simplicity of use, we will define a set of transformer classes (inheriting from `sklearn.base.BaseEstimator` and `sklearn.base.TransformerMixin`) which are integrated into an sklearn Pipeline `sklearn.pipeline.Pipeline`.

In [2]:
from pipelines.Preprocessing import ColumnDropper, NaNAmputator, FeatureEncoder, DataValidator
from sklearn.pipeline import Pipeline

# Put together the final Pipeline
preprocessing = Pipeline([
    ("dropper", ColumnDropper()),
    ("amputator", NaNAmputator()),
    ("feature_encoder", FeatureEncoder()),
    ("data_validator", DataValidator())
])


In [3]:
# preprocess the data
data = preprocessing.fit_transform(data)

### Convert data to PyTorch tensors
An important step in any PyTorch application is to convert datasets to the correct datatype. In this case, we want to convert our pandas dataframe into pytorch tensors with datatype `torch.float32`.

Note that to do this, we need to extract the labels from the data first.

In [4]:
import torch

# Set up some device agnostic code
#   NOTE: If one has an NVIDIA GPU, device can be set to 'cuda' for increased performance.
#         As my current dev. rig has an AMD GPU, the device is set to 'cpu'
device = "cpu"

# Extract the truth-labels from the data
labels = data[['Churn']]
data.drop(columns = ['Churn'], inplace = True)

# Convert the data into a tensor
data = torch.from_numpy(data.values).type(torch.float32).to(device)
labels = torch.from_numpy(labels.values).type(torch.float32).to(device)

## Build and train the binary classifier
We seek to build and train a Fully Connected Neural Network (FCNN) using PyTorch, and train it on a subset of the churn data. Note that the code for the FCNN can be found in `models.FCNN.py` and we only construct the imported class. 

### Build the FCNN

In [5]:
from ml_models.models.FCNN import FCNN

# Define an instance of the model and place it on the correct device
model = FCNN().to(device)

### Train the FCNN
To train the network, we need 3 things: 
1. To split the available data into a training and test set. This is done by the popular method `sklearn.model_selection.train_test_split()`.
2. Declare an optimizer and a loss function. The popular `torch.optim.Adam()` optimizer is adopted, as well as the loss function `torch.nn.BCEWithLogitsLoss()`, which is ideal for a binary classification problem. 
3. Declare a training and test loop to train and evaluate the model's performance. The training loop normally consists of 5 repeatable steps and the test loop is mainly to asses improvements over time.

In [6]:
from sklearn.model_selection import train_test_split

# Split the data into training/test sets
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.25)

In [7]:
# Declare the desired loss function and optimizer
loss_fn = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(params = model.parameters(),
                            lr = 0.01)

In [8]:
from sklearn.metrics import accuracy_score

# Make sure all data is present on the right device
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

# Define a training/test loop
epochs = 200

for epoch in range(epochs):
    # Set the model into training mode -> Turn on gradient tracking
    model.train()

    # 1. Perform a forward pass of the training data
    # NOTE: When performing the normal pass, we end up with logits, necessary for loss_fn in step 2
    y_train_logits = model(X_train)
    y_train_preds = torch.round(torch.sigmoid(y_train_logits))

    # 2. Calculate the loss (using logits)
    loss = loss_fn(y_train_logits, y_train)
    #acc = accuracy_score(y_train, y_train_preds)

    # 3. Reset the gradients of the optimizer (Otherwise they accumulate)
    optimizer.zero_grad()

    # 4. Calculate the gradient shifts to all parameters with a backwards pass
    loss.backward() # Backpropagation

    # 5. Update model parameters with a step in the optimizer
    optimizer.step()

    # TEST LOOP
    # Set the model to evaluation mode -> Turn off gradient tracking
    model.eval()
    with torch.inference_mode():
        # Perform a forward pass of the test data
        y_test_logits = model(X_test)
        y_test_preds = torch.round(torch.sigmoid(y_test_logits))

        # Calculate the loss and other performance metrics
        test_loss = loss_fn(y_test_logits, y_test)
        #test_acc = accuracy_score(y_test, y_test_preds)

    # Print out some performance metrics to track the improvement during training
    if epoch%(epochs//10) == 0:
        print(f"Epoch: {epoch} | Loss: {loss} | Test loss: {test_loss}")


Epoch: 0 | Loss: 6.201237678527832 | Test loss: 20.52345085144043
Epoch: 20 | Loss: 1.9906054735183716 | Test loss: 2.7850170135498047
Epoch: 40 | Loss: 0.9085407853126526 | Test loss: 1.4036239385604858
Epoch: 60 | Loss: 0.46694180369377136 | Test loss: 0.44691285490989685
Epoch: 80 | Loss: 0.7888277769088745 | Test loss: 0.5469276309013367
Epoch: 100 | Loss: 0.5213155746459961 | Test loss: 2.112501859664917
Epoch: 120 | Loss: 0.47783735394477844 | Test loss: 0.5143683552742004
Epoch: 140 | Loss: 1.21821129322052 | Test loss: 0.8596041202545166
Epoch: 160 | Loss: 0.5251643061637878 | Test loss: 0.5622532367706299
Epoch: 180 | Loss: 0.5491958260536194 | Test loss: 0.6347532868385315


## Save the model & Preprocessing pipeline
Finally, we want the model to be saved for quicker deployment as part of an API, as well as the adopted preprocessing pipeline. To this end, the `pathlib` module and `torch.save()` will be used to save the model, whilst the pipeline will be serialized using the `pickle` module.

In [9]:
import _pickle

# Define the path in which to save the model
MODEL_PATH = Path(f"{BASE_DIR}/ml_models/trained_models")
MODEL_PATH.mkdir(parents = True, exist_ok = True)   # If no 'models' folder -> mkdir

# Create a direct path to save the model as a .pth file
MODEL_NAME = 'FCNN_churn_V0.pth'
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# Save the models parameters (state_dict)
torch.save(model, MODEL_SAVE_PATH)

# Define a path in which to save the pipeline
PIPELINE_PATH = Path(f"{BASE_DIR}/pipelines/completed_pipelines")
PIPELINE_PATH.mkdir(parents = True, exist_ok = True)    # if no 'pipelines' folder -> mkdir

# Create a direct path to save the pipeline object as a .pkl file
PIPELINE_NAME = 'churn_preprocessing_V0.pkl'
PIPELINE_SAVE_PATH = PIPELINE_PATH / PIPELINE_NAME

# Save the model by dumping the object to a pickle file
with open(PIPELINE_SAVE_PATH, "wb") as f:
    _pickle.dump(preprocessing, f)