In the previous section, we have explored multiple ways to address the class unbalanceness (average predictions of multiple models, boosting, etc.). In this part, we'll try another popular method that assigns different weights to the terms in the **loss function for different class** (depending on majority or minority class)

Again we'll train simple logistic regression, with **softmax function** (multinomial) rather than **sigmoid function**. We'll use tensorflow to control the weighting of different terms in the loss function.

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, precision_score, recall_score

Load data in dataframe

In [2]:
df = pd.read_csv("creditcard.csv", header=0, sep=",")
df.shape

(284807, 31)

In [3]:
df.columns

Index([u'Time', u'V1', u'V2', u'V3', u'V4', u'V5', u'V6', u'V7', u'V8', u'V9',
       u'V10', u'V11', u'V12', u'V13', u'V14', u'V15', u'V16', u'V17', u'V18',
       u'V19', u'V20', u'V21', u'V22', u'V23', u'V24', u'V25', u'V26', u'V27',
       u'V28', u'Amount', u'Class'],
      dtype='object')

The training data and target labels:

In [4]:
target = df["Class"].values
data = df.ix[:, :-1].values

1. Split all data into training and test set
2. Scale the **X** of training and test set
3. Turn the target labels into one-hot representation for softmax function

In [5]:
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=123, stratify=target)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

y_train = OneHotEncoder().fit_transform(y_train[:, np.newaxis]).toarray()

Define the one-layer neural network (just the output layer with softmax function):

$y_{2,} = X_{batch, 30} \cdot W_{30, 2} + b_{2,}$, where $y_{2,}$ is a 1-D tensor with two components for positive and negative class, on which softmax function will be applied.

In [6]:
X = tf.placeholder(tf.float32, [None, 30]) 
W = tf.Variable(tf.zeros([30, 2]))
b = tf.Variable(tf.zeros([2]))
y = tf.matmul(X, W) + b

y_ = tf.placeholder(tf.float32, [None, 2])

In [7]:
X.shape

TensorShape([Dimension(None), Dimension(30)])

In [8]:
y.shape

TensorShape([Dimension(None), Dimension(2)])

In [9]:
y_.shape

TensorShape([Dimension(None), Dimension(2)])

We'll optionally weight $y_{2,}$ with the majority class being down-weighted, and minority class being up-weighted. We want to assign much bigger cost when the positive class is classified as negative.

In [10]:
ratio = 0.1
class_weight = tf.constant(np.array([ratio, 1 - ratio]) * 10, dtype=tf.float32)

y_weighted = tf.multiply(y, class_weight)
y_weighted.shape

TensorShape([Dimension(None), Dimension(2)])

We'll run the process with unweighted and weighted loss function:

In [11]:
sess = tf.InteractiveSession()
Epochs = 50
learning_rate = 0.5
splits = 10 # Split training set into `splits` for training in each epoch

for logits, name in zip((y, y_weighted), ("Unweighted", "Weighted")):
    print "Loss function:", name
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logits))
    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
    tf.global_variables_initializer().run()
    
    for _ in range(Epochs):
        index_list = np.array_split(range(len(X_train)), splits)
        for index in index_list:
            sess.run(train_step, feed_dict={X: X_train[index, :], y_: y_train[index, :]})
            
    y_pred = sess.run(y, feed_dict={X: X_test})
    y_pred = np.where(y_pred[:, 1] > y_pred[:, 0], 1, 0)            
    
    cm = pd.DataFrame(confusion_matrix(y_test, y_pred))
    cm.index = ["T", "F"]
    cm.index.name = "Target"
    cm.columns = ["T", "F"]
    cm.columns.name = "Predicted"
    
    print "Confusion matrix:"
    print cm
    print "Precision: %f" % precision_score(y_test, y_pred)
    print "Recall: % f" % recall_score(y_test, y_pred)
    print 

Loss function: Unweighted
Confusion matrix:
Predicted      T   F
Target              
T          56855   9
F             44  54
Precision: 0.857143
Recall:  0.551020

Loss function: Weighted
Confusion matrix:
Predicted      T   F
Target              
T          56848  16
F             34  64
Precision: 0.800000
Recall:  0.653061



As we can see, when the loss function is weighted depending on the majority and minority class, the recall has increased by 18.5%, while the precision only decreased by 6.7%

This means the algorithm is forced to learn the importance of different classes.

### Multi-Layer Perceptron
Nest, we add a hidden layer of 8 units:

$$O_{8,} = sigmoid(X_{batch, 30} \cdot W_{30, 8} + b_{8,})$$
$$y = O_{8,} \cdot W_{8, 2} + b_{2,}$$

In [12]:
X = tf.placeholder(tf.float32, [None, 30]) 
#W_0 = tf.Variable(tf.truncated_normal([30, 8], stddev=0.05))
W_0 = tf.Variable(tf.zeros([30, 8]))
#b_0 = tf.Variable(tf.truncated_normal([8], stddev=0.05))
b_0 = tf.Variable(tf.zeros([8]))
O_0 = tf.nn.sigmoid(tf.matmul(X, W_0) + b_0)

#W_1 = tf.Variable(tf.truncated_normal([8, 2], stddev=0.05))
W_1 = tf.Variable(tf.zeros([8, 2]))
#b_1 = tf.Variable(tf.truncated_normal([2], stddev=0.05))
b_1 = tf.Variable(tf.zeros([2]))
y = tf.matmul(O_0, W_1) + b_1

y_ = tf.placeholder(tf.float32, [None, 2])

In [13]:
ratio = 0.1
class_weight = tf.constant(np.array([ratio, 1 - ratio]) * 10, dtype=tf.float32)

y_weighted = tf.multiply(y, class_weight)
y_weighted.shape

TensorShape([Dimension(None), Dimension(2)])

In [14]:
sess = tf.InteractiveSession()
Epochs = 50
learning_rate = 0.5
splits = 10 # Split training set into `splits` for training in each epoch

In [15]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_weighted))
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
tf.global_variables_initializer().run()
    
for _ in range(Epochs):
    index_list = np.array_split(range(len(X_train)), splits)
    for index in index_list:
        sess.run(train_step, feed_dict={X: X_train[index, :], y_: y_train[index, :]})
            
y_pred = sess.run(y, feed_dict={X: X_test})
y_pred = np.where(y_pred[:, 1] > y_pred[:, 0], 1, 0)            
    
cm = pd.DataFrame(confusion_matrix(y_test, y_pred))
cm.index = ["T", "F"]
cm.index.name = "Target"
cm.columns = ["T", "F"]
cm.columns.name = "Predicted"

In [16]:
cm

Predicted,T,F
Target,Unnamed: 1_level_1,Unnamed: 2_level_1
T,56837,27
F,26,72


In [17]:
precision_score(y_test, y_pred)

0.72727272727272729

In [18]:
recall_score(y_test, y_pred)

0.73469387755102045

The new two-layer neural network achived even better trade-off between precision and recall. Ideally, we want the recall to be in the range of 0.95 ~ 1.0, while maintaining precision at an acceptable level.