![title](Logo.png)

Hello and welcome to the Wild Green Credit Union's client prediction model! This application is trained on internal client data to predict whether or not a client will subscribe to a long term deposit during marketing campaigns! Run the following code cell using `Shift + Enter` after selecting it to initiate the model.

## Loading the data and model


In [6]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.preprocessing import OrdinalEncoder
from sklearn.linear_model import LogisticRegression
import ipywidgets as widgets
import io
%matplotlib inline

# Set random seed and import dataset
np.random.seed(42)
dataset = pd.read_csv("banking_data.csv")

# Numerically encode dataset
enc = OrdinalEncoder()
enc_set = enc.fit_transform(dataset)
enc_set = pd.DataFrame(enc_set)
enc_set.columns=dataset.columns.values

# Split and scale dataset
X = enc_set.drop(["y"], axis=1)
y = enc_set["y"]
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate and Train Logistic Regression Model
clf = LogisticRegression(class_weight="balanced")
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

# Return classification report
print("Model has been successfully loaded!")
print(classification_report(y_test, y_pred))

Model has been successfully loaded!
              precision    recall  f1-score   support

         0.0       0.96      0.84      0.90      5798
         1.0       0.40      0.76      0.52       792

    accuracy                           0.83      6590
   macro avg       0.68      0.80      0.71      6590
weighted avg       0.89      0.83      0.85      6590



# Client Predictions

Here you may upload a properly formatted csv document to receive a list of clients that the algorithm predicts will be willing to subscribe to the campaign. Properly formatted documents include 15 rows of data and can be generated from internal Wild Green Credit Union system records.
<br/> <br/>
Instructions:

        1. Run the previous cell to initiate the model
        2. Run the next cell to initiate the widgets
        3. Click the upload widget to select and upload your csv file
        4. Click the predict button to return a list of predictions.

The program will output a list of values associated with the index of each entry in the uploaded CSV file. To download a csv file with the predictions appended, right click on the "Predictions.csv" file in the list on the left and click download!

In [7]:
uploader = widgets.FileUpload(accept='.csv', multiple=False)
button = widgets.Button(description='Predict!', disabled=False)
output = widgets.Output()

def on_button_clicked(b):
    
    # Convert the uploaded file to a dataframe
    input_file = list(uploader.value.values())[0]
    content = input_file['content']
    content = io.StringIO(content.decode('utf-8'))
    df = pd.read_csv(content)
    
    # Transform and make a prediction on the data
    data = enc.fit_transform(df)
    data = pd.DataFrame(data)
    data = scaler.fit_transform(data)
    prediction = clf.predict(data)
    
    with output:
        # Prepare the saved output file
        df["Success"] = prediction
        df.to_csv("Predictions.csv", index=False)
        
        # Prepare the visual output
        pd.set_option('display.max_rows', None)
        p_output = pd.DataFrame(prediction)
        print(p_output)

button.on_click(on_button_clicked)
display(uploader, button, output)

FileUpload(value={}, accept='.csv', description='Upload')

Button(description='Predict!', style=ButtonStyle())

Output()

# About the Data

Here we analyze trends in the training dataset. Generating a heatmap, we can see that the strongest correlation between whether a client will subscribe or not is the duration of their last call during the campaign. Calls lasting longer than 250 seconds tend to subscribe more often than shorter calls. Generally, the longer the clients engage in the last campaign call the more likely they are to subscribe. Clients ages 20-60 are more likely to stay on call for longer durations during the campaign. 

In [8]:
Output0 = widgets.Output()
Output1 = widgets.Output()
Output2 = widgets.Output()

tab = widgets.Tab(children=[Output0, Output1, Output2])
tab.set_title(0, 'Heatmap')
tab.set_title(1, 'Boxplot')
tab.set_title(2, 'Histogram')
display(tab)

with Output0:
    plt.figure(figsize=(11,10))
    Heatmap = sns.heatmap(enc_set.corr(), annot=True, 
                          linewidths=.5, cmap="crest_r", fmt=".2f")
    plt.show(Heatmap)
    
with Output1:
    graphset = dataset.drop("y", axis=1)
    graphset["success"] = dataset["y"]

    sns.set_theme(rc={'figure.figsize':(10,4)})
    sns.set_palette("crest")
    Boxplot = sns.boxplot(x="duration", y="success", data=graphset, showfliers = False)
    plt.show(Boxplot)

with Output2:
    plt.figure(figsize=(8,10))
    sns.set_palette("crest")
    Histogram = sns.histplot(x="age", y="duration", data=graphset, bins=20, cbar=True)
    plt.show(Histogram)

Tab(children=(Output(), Output(), Output()), _titles={'0': 'Heatmap', '1': 'Boxplot', '2': 'Histogram'})