# Classification-based Collaborative Filtering Systems

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

This bank marketing dataset is open-sourced and available for download at the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Bank+Marketing#). However, this specific dataset have only replaced some of the categorical data with dummy variables for conceptual demonstration purposes. Other variable transformation methods might be more suitable along with the inclusion of more descriptive variables.

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

It was originally created by: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

In [None]:
bank_clients = pd.read_csv('../data/bank_full_w_dummy_vars.csv')
with pd.option_context('display.max_rows', 10, 'display.max_columns', None):
    display(bank_clients)

Extracting binary features from columns 18 to 36 and binary targets from column 17 (all transormed variables are in column 17 to 36).

In [None]:
train_features = bank_clients.iloc[0:40000,18:]
val_features = bank_clients.iloc[40000:,18:]
train_targets = bank_clients.iloc[0:40000,17]
#train_targets = pd.get_dummies(train_targets)
val_targets = bank_clients.iloc[40000:,17]
#val_targets = pd.get_dummies(val_targets)

## Simple neural network classification

In [8]:
model = keras.models.Sequential([
    keras.layers.Dense(50, input_dim=19, activation=tf.nn.relu, name='Input'),
    keras.layers.Dense(100, activation=tf.nn.relu, name='layer_2'),
    keras.layers.Dense(50, activation=tf.nn.relu, name='layer_3'),
    keras.layers.Dense(2, activation=tf.nn.softmax, name='layer_output')
])
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
logger = keras.callbacks.TensorBoard(
    log_dir='keras_log',
    write_graph=True,
    histogram_freq=1
)

In [10]:
model.fit(
    train_features,
    train_targets,
    epochs=20,
    shuffle=True,
    validation_data=(val_features, val_targets),
    callbacks=[logger]
    )

Train on 40000 samples, validate on 5211 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x25ed815a940>

## Random forrest tree classification

In [16]:
rf = RandomForestClassifier(random_state = 42)  # random_state is the seed used by the random number generator
#fitting the model
model_rf = rf.fit(train_features, train_targets)

In [17]:
model_rf.score(val_features, val_targets) #accuracy measurement

0.5828056035309921

## logistic regression w keras

In [18]:
model = keras.models.Sequential([
    keras.layers.Dense(2, input_dim=19, activation=tf.nn.softmax)
])
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

In [21]:
model.fit(
    train_features,
    train_targets,
    epochs=20,
    shuffle=True,
    validation_data=(val_features, val_targets)
    )

Train on 40000 samples, validate on 5211 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x25edd35a240>