## Building Neural Networks with Gluon Library

Using the Standford Open policing data, we are predicting which race the subject of the arrest belongs to.

In [None]:
import pandas as pd
import numpy as np
import matplotlib as plt

In [2]:
#Importing data
openpolicing_path="C:/Users/SwetaMankala/Desktop/Assignments/EAI6000/ma_statewide_2020.csv"

data_read=pd.read_csv(openpolicing_path,low_memory=False)
print('The shape of the dataset is:', data_read.shape)

The shape of the dataset is: (3416238, 24)


In [3]:
data_read.head(10)

Unnamed: 0,raw_row_number,date,location,county_name,subject_age,subject_race,subject_sex,type,arrest_made,citation_issued,...,contraband_weapons,contraband_alcohol,contraband_other,frisk_performed,search_conducted,search_basis,reason_for_stop,vehicle_type,vehicle_registration_state,raw_Race
0,1,1.181088e+18,MIDDLEBOROUGH,Plymouth County,33.0,white,male,vehicular,False,True,...,,False,,,False,,Speed,Passenger,MA,White
1,2,1.181174e+18,SEEKONK,Bristol County,36.0,white,male,vehicular,False,False,...,False,False,False,False,True,other,,Commercial,MA,White
2,3,1.181174e+18,MEDFORD,Middlesex County,56.0,white,female,vehicular,False,False,...,,False,,,False,,,Passenger,MA,White
3,4,1.181174e+18,MEDFORD,Middlesex County,37.0,white,male,vehicular,False,False,...,,False,,,False,,,Commercial,MA,White
4,5,1.181174e+18,EVERETT,Middlesex County,22.0,hispanic,female,vehicular,False,True,...,,False,,,False,,,Commercial,MA,Hispanic
5,6,1.181174e+18,MEDFORD,Middlesex County,34.0,white,male,vehicular,False,True,...,,False,,,False,,Speed,Commercial,MA,White
6,7,1.181174e+18,SOMERVILLE,Middlesex County,54.0,hispanic,male,vehicular,False,True,...,,False,,,False,,,Commercial,MA,Hispanic
7,8,1.181174e+18,HOPKINTON,Middlesex County,31.0,hispanic,female,vehicular,False,True,...,,False,,,False,,,Passenger,MA,Hispanic
8,9,1.181174e+18,SOMERVILLE,Middlesex County,21.0,white,male,vehicular,False,True,...,,False,,,False,,,Passenger,MA,White
9,10,1.181088e+18,BARNSTABLE,Barnstable County,56.0,white,male,vehicular,False,True,...,,False,,,False,,Speed,Passenger,MA,White


We use neural networks to create classifier that can help us with the prediction. We use Apache's open source deep learning library MXNet that will help us build sequential neural network to combine a bunch of logistic regressions providing us with accurate predictions.

In [4]:
!pip install mxnet



On creating our neural network, we use the non-linear function tanh, with the size of 12 artifical neurons in the layer.

To prevent overfitting for the model, we use the dropout function such that some nodes are removed with a fixed probability as defined helping us with training the data. In the following neural network, we apply 40% dropout in the single layered neural network.

In [5]:
from mxnet.gluon import nn

net = nn.Sequential()

net.add(nn.Dense(12,                    # Dense layer-1 with 64 units
                 in_units=3,            # Input size of 3 is expected
                 activation='tanh'),    # Tanh activation is applied
        nn.Dropout(.4),                 # Apply random 40% drop-out to layer_1
        
        nn.Dense(1))                    # Output layer with single unit

print(net)

Sequential(
  (0): Dense(3 -> 12, Activation(tanh))
  (1): Dropout(p = 0.4, axes=())
  (2): Dense(None -> 1, linear)
)


In [10]:
import time
import mxnet as mx
from mxnet import gluon, autograd
import mxnet.ndarray as nd
from mxnet.gluon.loss import SigmoidBinaryCrossEntropyLoss

In [11]:
from mxnet import init
from mxnet.gluon import nn

net = nn.Sequential()
net.add(nn.Dense(4, in_units=2, activation='relu'),
        nn.Dense(1, activation='sigmoid'))
net.initialize(init=init.Xavier())

We are creating the network below that has only one hidden layer. Since the data seems easily seperable, we can have a small network with 4 units at each layer.

In [12]:
batch_size = 4           # How many samples to use for each weight update 
epochs = 50              # Total number of iterations
learning_rate = 0.01     # Learning rate
context = mx.cpu()       # Using CPU resource

# Define the loss. As we used sigmoid in the last layer, use from_sigmoid=True
binary_cross_loss = SigmoidBinaryCrossEntropyLoss(from_sigmoid=True)

# Define the trainer, SGD with learning rate
trainer = gluon.Trainer(net.collect_params(),
                        'sgd',
                        {'learning_rate': learning_rate}
                       )

__Binary Cross-entropy Loss:__ A common loss function for binary classification. It is given by: 
$$
\mathrm{BinaryCrossEntropyLoss} = -\sum_{examples}{(y\log(p) + (1 - y)\log(1 - p))}
$$
where p is the prediction (between 0 and 1, ie. 0.831) and y is the true class (either 1 or 0). 

In gluon, we can use binary cross entropy with `SigmoidBinaryCrossEntropyLoss`. It also applies sigmoid function on the predictions. Therefore, p is always between 0 and 1.


```python
from mxnet.gluon.loss import SigmoidBinaryCrossEntropyLoss
loss = SigmoidBinaryCrossEntropyLoss()
```

`mxnet.gluon.Trainer()` module provides necessary training algorithms for neural networks. We can use the following for training a network using Stochastic Gradient Descent method and learning rate of 0.001.

```python
from mxnet import gluon

trainer = gluon.Trainer(net.collect_params(),
                        'sgd', 
                        {'learning_rate': 0.001}
                       )
```

In [13]:
data = pd.DataFrame(data_read)

In [14]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
data['subject_race'] = le.fit_transform(data['subject_race'])
data['outcome'] = le.fit_transform(data['outcome'])

We use label encoder to transform our categorical values into numerical values. We have chosen the variable outcome and subject age to predict the subject's race as an experiment to fine tune the neural network in helping us track our loss function.

In [16]:
data.shape

(3416238, 24)

In [15]:
#numerical features
numerical_features = ['subject_age']

#categorical features
categorical_features = ['outcome']

model_features = numerical_features + categorical_features
model_target = 'subject_race'

print('Model Features:', model_features)
print('Model Target:', model_target)

Model Features: ['subject_age', 'outcome']
Model Target: subject_race


We have split our dataset into 10% test data with 27% contributing to validation dataset and remaining being the training data. Validation data will help us optimize our gradient descent to minimize the loss function by adjusting the weights for every node in the network. Since we use the stochastic gradient descent, our goal is to use the learning rate 0.01 that will help us the global minimum.

In [17]:
from sklearn.model_selection import train_test_split

dataset, test_data = train_test_split(data, test_size=0.1, shuffle=True, random_state=23)

In [18]:
train_data, val_data = train_test_split(dataset, test_size=0.3, shuffle=True, random_state=23)

In [19]:
X_train = train_data[model_features]
y_train = train_data[model_target]

X_val = val_data[model_features]
y_val = val_data[model_target]

X_test = test_data[model_features]
y_test = test_data[model_target]

A DataLoader is used to create mini-batches of samples from a Dataset, and provides a convenient iterator interface for looping these batches. It’s typically much more efficient to pass a mini-batch of data through a neural network than a single sample at a time, because the computation can be performed in parallel. A required parameter of DataLoader is the size of the mini-batches you want to create, called batch_size.

Another benefit of using DataLoader is the ability to easily load data in parallel using multiprocessing. Just set the num_workers parameter to the number of CPUs avaliable on your machine for maximum performance.

In [20]:
# Convert to ND arrays for gluon
X_train = nd.array(X_train)
X_val = nd.array(X_val)
y_train = nd.array(y_train)
y_val = nd.array(y_val)

# Using Gluon Data loaders to load the data in batches
train_dataset = gluon.data.ArrayDataset(X_train, y_train)
train_loader = gluon.data.DataLoader(train_dataset, batch_size=batch_size)

Let's start the training process. We will have training and validation sets and print our losses at each step.

In [None]:
import time

train_losses = []
val_losses = []
for epoch in range(epochs):
    start = time.time()
    training_loss = 0
    # Training loop, train the network
    for idx, (data, target) in enumerate(train_loader):

        data = data.as_in_context(context)
        target = target.as_in_context(context)
        
        with autograd.record():
            output = net(data)
            L = binary_cross_loss(output, target)
            training_loss += nd.sum(L).asscalar()
            L.backward()
        trainer.step(data.shape[0])
    
    # Get validation predictions
    val_predictions = net(X_val.as_in_context(context))
    # Calculate validation loss
    val_loss = nd.sum(binary_cross_loss(val_predictions, y_val)).asscalar()
    
    # Let's take the average losses
    training_loss = training_loss / len(y_train)
    val_loss = val_loss / len(y_val)
    
    train_losses.append(training_loss)
    val_losses.append(val_loss)
    
    end = time.time()
    print("Epoch %s. Train_loss %f Validation_loss %f Seconds %f" % \
          (epoch, training_loss, val_loss, end-start))

Hence, we have successfully trained the neural network to optimize our train and validation losses with 50 epochs as an experiment and reduced losses from 71% to 0.7% for train and 70% to 0.7% for validation data sets.