# Notebook 13: Using Deep Learning to Study SUSY with Pytorch

## Learning Goals
The goal of this notebook is to introduce the powerful PyTorch framework for building neural networks and use it to analyze the SUSY dataset. After this notebook, the reader should understand the mechanics of PyTorch and how to construct DNNs using this package. In addition, the reader is encouraged to explore the GPU backend available in Pytorch on this dataset.

## Overview
In this notebook, we use Deep Neural Networks to classify the supersymmetry dataset, first introduced by Baldi et al. in [Nature Communication (2015)](https://www.nature.com/articles/ncomms5308). The SUSY data set consists of 5,000,000 Monte-Carlo samples of supersymmetric and non-supersymmetric collisions with $18$ features. The signal process is the production of electrically-charged supersymmetric particles which decay to $W$ bosons and an electrically-neutral supersymmetric particle that is invisible to the detector.

The first $8$ features are "raw" kinematic features that can be directly measured from collisions. The final $10$ features are "hand constructed" features that have been chosen using physical knowledge and are known to be important in distinguishing supersymmetric and non-supersymmetric collision events. More specifically, they are given by the column names below.

In this notebook, we study this dataset using Pytorch.

In [1]:
from __future__ import print_function, division
import os,sys
import numpy as np
import torch # pytorch package, allows using GPUs
# fix seed
seed=17
np.random.seed(seed)
torch.manual_seed(seed)

<torch._C.Generator at 0x109991770>

## Structure of the Procedure

Constructing a Deep Neural Network to solve ML problems is a multiple-stage process. Quite generally, one can identify the key steps as follows:

* ***step 1:*** Load and process the data
* ***step 2:*** Define the model and its architecture
* ***step 3:*** Choose the optimizer and the cost function
* ***step 4:*** Train the model 
* ***step 5:*** Evaluate the model performance on the *unseen* test data
* ***step 6:*** Modify the hyperparameters to optimize performance for the specific data set

Below, we sometimes combine some of these steps together for convenience.

Notice that we take a rather different approach, compared to the simpler MNIST Keras notebook. We first define a set of classes and functions and run the actual computation only in the very end.

### Step 1: Load and Process the SUSY Dataset

The supersymmetry dataset can be downloaded from the UCI Machine Learning repository on [https://archive.ics.uci.edu/ml/machine-learning-databases/00279/SUSY.csv.gz](https://archive.ics.uci.edu/ml/machine-learning-databases/00279/SUSY.csv.gz). The dataset is quite large. Download the dataset and unzip it in a directory.

Loading data in Pytroch is done by creating a user-defined a class, which we name `SUSY_Dataset`, and is a child of the `torch.utils.data.Dataset` class. This ensures that all necessary attributes required for the processing of the data during the training and test stages are easily inherited. The `__init__` method of our custom data class should contain the usual code for loading the data, which is problem-specific, and has been discussed for the SUSY data set in Notebook 5. More importantly, the user-defined data class must override the `__len__` and `__getitem__` methods of the parent `DataSet` class. The former returns the size of the data set, while the latter allows the user to access a particular data point from the set by specifying its index.

In [2]:
from torchvision import datasets # load data

class SUSY_Dataset(torch.utils.data.Dataset):
    """SUSY pytorch dataset."""

    def __init__(self, data_file, root_dir, dataset_size, train=True, transform=None, high_level_feats=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            train (bool, optional): If set to `True` load training data.
            transform (callable, optional): Optional transform to be applied on a sample.
            high_level_festures (bool, optional): If set to `True`, working with high-level features only. 
                                        If set to `False`, working with low-level features only.
                                        Default is `None`: working with all features
        """

        import pandas as pd

        features=['SUSY','lepton 1 pT', 'lepton 1 eta', 'lepton 1 phi', 'lepton 2 pT', 'lepton 2 eta', 'lepton 2 phi', 
                'missing energy magnitude', 'missing energy phi', 'MET_rel', 'axial MET', 'M_R', 'M_TR_2', 'R', 'MT2', 
                'S_R', 'M_Delta_R', 'dPhi_r_b', 'cos(theta_r1)']

        low_features=['lepton 1 pT', 'lepton 1 eta', 'lepton 1 phi', 'lepton 2 pT', 'lepton 2 eta', 'lepton 2 phi', 
                'missing energy magnitude', 'missing energy phi']

        high_features=['MET_rel', 'axial MET', 'M_R', 'M_TR_2', 'R', 'MT2','S_R', 'M_Delta_R', 'dPhi_r_b', 'cos(theta_r1)']


        #Number of datapoints to work with
        df = pd.read_csv(root_dir+data_file, header=None,nrows=dataset_size,engine='python')
        df.columns=features
        Y = df['SUSY']
        X = df[[col for col in df.columns if col!="SUSY"]]

        # set training and test data size
        train_size=int(0.8*dataset_size)
        self.train=train

        if self.train:
            X=X[:train_size]
            Y=Y[:train_size]
            print("Training on {} examples".format(train_size))
        else:
            X=X[train_size:]
            Y=Y[train_size:]
            print("Testing on {} examples".format(dataset_size-train_size))


        self.root_dir = root_dir
        self.transform = transform

        # make datasets using only the 8 low-level features and 10 high-level features
        if high_level_feats is None:
            self.data=(X.values.astype(np.float32),Y.values.astype(int))
            print("Using both high and low level features")
        elif high_level_feats is True:
            self.data=(X[high_features].values.astype(np.float32),Y.values.astype(int))
            print("Using both high-level features only.")
        elif high_level_feats is False:
            self.data=(X[low_features].values.astype(np.float32),Y.values.astype(int))
            print("Using both low-level features only.")


    # override __len__ and __getitem__ of the Dataset() class

    def __len__(self):
        return len(self.data[1])

    def __getitem__(self, idx):

        sample=(self.data[0][idx,...],self.data[1][idx])

        if self.transform:
            sample=self.transform(sample)

        return sample

Last, we define a helper function `load_data()` that accepts as a required argument the set of parameters `args`, and returns two generators: `test_loader` and `train_loader` which readily return mini-batches.

In [3]:
def load_data(args):

    data_file='/SUSY.csv'
    root_dir='./datasets'

    kwargs = {} # CUDA arguments, if enabled
    # load and noralise train and test data
    train_loader = torch.utils.data.DataLoader(
        SUSY_Dataset(data_file,root_dir,args.dataset_size,train=True,high_level_feats=args.high_level_feats),
        batch_size=args.batch_size, shuffle=True, **kwargs)

    test_loader = torch.utils.data.DataLoader(
        SUSY_Dataset(data_file,root_dir,args.dataset_size,train=False,high_level_feats=args.high_level_feats),
        batch_size=args.test_batch_size, shuffle=True, **kwargs)

    return train_loader, test_loader

### Step 2: Define the Neural Net and its Architecture

To construct neural networks with Pytorch, we make another class called `model` as a child of Pytorch's `nn.Module` class. The `model` class initializes the types of layers needed for the deep neural net in its `__init__` method, while the DNN is assembled in a function method called `forward`, which accepts an `autograd.Variable` object and returns the output layer. Using this convention Pytorch will automatically recognize the structure of the DNN, and the `autograd` module will pull the gradients forward and backward using backprop.

Our code below is constructed in such a way that one can choose whether to use the high-level and low-level features separately and altogether. This choice determines the size of the fully-connected input layer `fc1`. Therefore the `__init__` method accepts the optional argument `high_level_feats`. 

In [4]:
import torch.nn as nn # construct NN

class model(nn.Module):
    def __init__(self,high_level_feats=None):
        # inherit attributes and methods of nn.Module
        super(model, self).__init__()

        # an affine operation: y = Wx + b
        if high_level_feats is None:
            self.fc1 = nn.Linear(18, 200) # all features
        elif high_level_feats:
            self.fc1 = nn.Linear(10, 200) # low-level only
        else:
            self.fc1 = nn.Linear(8, 200) # high-level only


        self.batchnorm1=nn.BatchNorm1d(200, eps=1e-05, momentum=0.1)
        self.batchnorm2=nn.BatchNorm1d(100, eps=1e-05, momentum=0.1)

        self.fc2 = nn.Linear(200, 100) # see forward function for dimensions
        self.fc3 = nn.Linear(100, 2)

    def forward(self, x):
        '''Defines the feed-forward function for the NN.

        A backward function is automatically defined using `torch.autograd`

        Parameters
        ----------
        x : autograd.Tensor
            input data

        Returns
        -------
        autograd.Tensor
            output layer of NN

        '''

        # apply rectified linear unit
        x = F.relu(self.fc1(x))
        # apply dropout
        #x=self.batchnorm1(x)
        x = F.dropout(x, training=self.training)


        # apply rectified linear unit
        x = F.relu(self.fc2(x))
        # apply dropout
        #x=self.batchnorm2(x)
        x = F.dropout(x, training=self.training)


        # apply affine operation fc2
        x = self.fc3(x)
        # soft-max layer
        x = F.log_softmax(x,dim=1)

        return x

### Steps 3+4+5: Choose the Optimizer and the Cost Function. Train and Evaluate the Model

Next, we define the function `evaluate_model`. The first argument, `args`, contains all hyperparameters needed for the DNN (see below). The second and third arguments are the `train_loader` and the `test_loader` objects, returned by the function `load_data()` we defined in Step 1 above. The `evaluate_model` function returns the final `test_loss` and `test_accuracy` of the model.

First, we initialize a `model` and call the object `DNN`. In order to define the loss function and the optimizer, we use modules `torch.nn.functional` (imported here as `F`) and `torch.optim`. As a loss function we choose the negative log-likelihood, and stored is under the variable `criterion`. As usual, we can choose any from a variety of different SGD-based optimizers, but we focus on the traditional SGD.

Next, we define two functions: `train()` and `test()`. They are called at the end of `evaluate_model` where we loop over the training epochs to train and test our model. 

The `train` function accepts an integer called `epoch`, which is only used to print the training data. We first set the `DNN` in a train mode using the `train()` method inherited from `nn.Module`. Then we loop over the mini-batches in `train_loader`. We cast the data as pytorch `Variable`, re-set the `optimizer`, perform the forward step by calling the `DNN` model on the `data` and computing the `loss`. The backprop algorithm is then easily done using the `backward()` method of the loss function `criterion`. We use `optimizer.step` to update the weights of the `DNN`. Last print the performance for every minibatch. `train` returns the loss on the data.

The `test` function is similar to `train` but its purpose is to test the performance of a trained model. Once we set the `DNN` model in `eval()` mode, the following steps are similar to those in `train`. We then compute the `test_loss` and the number of `correct` predictions, print the results and return them.  

In [5]:
import torch.nn.functional as F # implements forward and backward definitions of an autograd operation
import torch.optim as optim # different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc

def evaluate_model(args,train_loader,test_loader):

    # create model
    DNN = model(high_level_feats=args.high_level_feats)
    # negative log-likelihood (nll) loss for training: takes class labels NOT one-hot vectors!
    criterion = F.nll_loss
    # define SGD optimizer
    optimizer = optim.SGD(DNN.parameters(), lr=args.lr, momentum=args.momentum)
    #optimizer = optim.Adam(DNN.parameters(), lr=0.001, betas=(0.9, 0.999))


    ################################################

    def train(epoch):
        '''Trains a NN using minibatches.

        Parameters
        ----------
        epoch : int
            Training epoch number.

        '''

        # set model to training mode (affects Dropout and BatchNorm)
        DNN.train()
        # loop over training data
        for batch_idx, (data, label) in enumerate(train_loader):
            # zero gradient buffers
            optimizer.zero_grad()
            # compute output of final layer: forward step
            output = DNN(data)
            # compute loss
            loss = criterion(output, label)
            # run backprop: backward step
            loss.backward()
            # update weigths of NN
            optimizer.step()
            
            # print loss at current epoch
            if batch_idx % args.log_interval == 0:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(data), len(train_loader.dataset),
                    100. * batch_idx / len(train_loader), loss.item() ))
            

        return loss.item()

    ################################################

    def test():
        '''Tests NN performance.

        '''

        # evaluate model
        DNN.eval()

        test_loss = 0 # loss function on test data
        correct = 0 # number of correct predictions
        # loop over test data
        for data, label in test_loader:
            # compute model prediction softmax probability
            output = DNN(data)
            # compute test loss
            test_loss += criterion(output, label, size_average=False).item() # sum up batch loss
            # find most likely prediction
            pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability
            # update number of correct predictions
            correct += pred.eq(label.data.view_as(pred)).cpu().sum().item()

        # print test loss
        test_loss /= len(test_loader.dataset)
        
        print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.3f}%)\n'.format(
            test_loss, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))
        

        return test_loss, correct / len(test_loader.dataset)


    ################################################


    train_loss=np.zeros((args.epochs,))
    test_loss=np.zeros_like(train_loss)
    test_accuracy=np.zeros_like(train_loss)

    epochs=range(1, args.epochs + 1)
    for epoch in epochs:

        train_loss[epoch-1] = train(epoch)
        test_loss[epoch-1], test_accuracy[epoch-1] = test()



    return test_loss[-1], test_accuracy[-1]

### Step 6: Modify the Hyperparameters to Optimize Performance of the Model

To study the performance of the model for a variety of different `data_set_sizes` and `learning_rates`, we do a grid search. 

Let us define a function `grid_search`, which accepts the `args` variable containing all hyper-parameters needed for the problem. After choosing logarithmically-spaced `data_set_sizes` and `learning_rates`, we first loop over all `data_set_sizes`, update the `args` variable, and call the `load_data` function. We then loop once again over all `learning_rates`, update `args` and call `evaluate_model`.

In [6]:
def grid_search(args):


    # perform grid search over learnign rate and number of hidden neurons
    dataset_sizes=[1000, 10000, 100000, 200000] #np.logspace(2,5,4).astype('int')
    learning_rates=np.logspace(-5,-1,5)

    # pre-alocate data
    test_loss=np.zeros((len(dataset_sizes),len(learning_rates)),dtype=np.float64)
    test_accuracy=np.zeros_like(test_loss)

    # do grid search
    for i, dataset_size in enumerate(dataset_sizes):
        # upate data set size parameters
        args.dataset_size=dataset_size
        args.batch_size=int(0.01*dataset_size)

        # load data
        train_loader, test_loader = load_data(args)

        for j, lr in enumerate(learning_rates):
            # update learning rate
            args.lr=lr

            print("\n training DNN with %5d data points and SGD lr=%0.6f. \n" %(dataset_size,lr) )

            test_loss[i,j],test_accuracy[i,j] = evaluate_model(args,train_loader,test_loader)


    plot_data(learning_rates,dataset_sizes,test_accuracy)

Last, we use the function `plot_data`, defined below, to plot the results. 

In [7]:
import matplotlib.pyplot as plt

def plot_data(x,y,data):

    # plot results
    fontsize=16


    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
    
    cbar=fig.colorbar(cax)
    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)
    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])
    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])

    # put text on matrix elements
    for i, x_val in enumerate(np.arange(len(x))):
        for j, y_val in enumerate(np.arange(len(y))):
            c = "${0:.1f}\\%$".format( 100*data[j,i])  
            ax.text(x_val, y_val, c, va='center', ha='center')

    # convert axis vaues to to string labels
    x=[str(i) for i in x]
    y=[str(i) for i in y]


    ax.set_xticklabels(['']+x)
    ax.set_yticklabels(['']+y)

    ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)
    ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)

    plt.tight_layout()

    plt.show()

## Run Code

As we mentioned in the beginning of the notebook, all functions and classes discussed above only specify the procedure but do not actually perform any computations. This allows us to re-use them for different problems. 

Actually running the training and testing for every point in the grid search is done below. The `argparse` class allows us to conveniently keep track of all hyperparameters, stored in the variable `args` which enters most of the functions we defined above. 

To run the simulation, we call the function `grid_search`. 

## Exercises

* One of the advantages of Pytorch is that it allows to automatically use the CUDA library for fast performance on GPU's. For the sake of clarity, we have omitted this in the above notebook. Go online to check how to put the CUDA commands back into the code above. _Hint:_ study the [Pytorch MNIST tutorial](https://github.com/pytorch/examples/blob/master/mnist/main.py) to see how this works in practice.


In [None]:
import argparse # handles arguments
import sys; sys.argv=['']; del sys # required to use parser in jupyter notebooks

# Training settings
parser = argparse.ArgumentParser(description='PyTorch SUSY Example')
parser.add_argument('--dataset_size', type=int, default=100000, metavar='DS',
                help='size of data set (default: 100000)')
parser.add_argument('--high_level_feats', type=bool, default=None, metavar='HLF',
                help='toggles high level features (default: None)')
parser.add_argument('--batch-size', type=int, default=100, metavar='N',
                help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs', type=int, default=10, metavar='N',
                help='number of epochs to train (default: 10)')
parser.add_argument('--lr', type=float, default=0.05, metavar='LR',
                help='learning rate (default: 0.02)')
parser.add_argument('--momentum', type=float, default=0.8, metavar='M',
                help='SGD momentum (default: 0.5)')
parser.add_argument('--no-cuda', action='store_true', default=False,
                help='disables CUDA training')
parser.add_argument('--seed', type=int, default=2, metavar='S',
                help='random seed (default: 1)')
parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                help='how many batches to wait before logging training status')
args = parser.parse_args()

# set seed of random number generator
torch.manual_seed(args.seed)

grid_search(args)


Training on 800 examples
Using both high and low level features
Testing on 200 examples
Using both high and low level features

 training DNN with  1000 data points and SGD lr=0.000010. 


Test set: Average loss: 0.6872, Accuracy: 109/200 (54.500%)






Test set: Average loss: 0.6868, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6865, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6862, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6859, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6855, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6852, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6850, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6847, Accuracy: 109/200 (54.500%)


Test set: Average loss: 0.6844, Accuracy: 110/200 (55.000%)


 training DNN with  1000 data points and SGD lr=0.000100. 


Test set: Average loss: 0.6949, Accuracy: 98/200 (49.000%)


Test set: Average loss: 0.6939, Accuracy: 97/200 (48.500%)


Test set: Average loss: 0.6930, Accuracy: 97/200 (48.500%)


Test set: Average loss: 0.6920, Accuracy: 98/200 (49.000%)


Test set: Average loss: 0.6911, Accuracy: 101/200 (50.500%)


Test set: Average loss: 0.6902, Accuracy: 107/200 (53.500%)


Test set: Av


Test set: Average loss: 0.6787, Accuracy: 113/200 (56.500%)


Test set: Average loss: 0.6707, Accuracy: 128/200 (64.000%)


Test set: Average loss: 0.6629, Accuracy: 132/200 (66.000%)


Test set: Average loss: 0.6534, Accuracy: 136/200 (68.000%)


Test set: Average loss: 0.6444, Accuracy: 137/200 (68.500%)


Test set: Average loss: 0.6361, Accuracy: 142/200 (71.000%)


Test set: Average loss: 0.6284, Accuracy: 140/200 (70.000%)


Test set: Average loss: 0.6192, Accuracy: 142/200 (71.000%)


Test set: Average loss: 0.6115, Accuracy: 141/200 (70.500%)


 training DNN with  1000 data points and SGD lr=0.010000. 


Test set: Average loss: 0.6101, Accuracy: 141/200 (70.500%)


Test set: Average loss: 0.5575, Accuracy: 151/200 (75.500%)


Test set: Average loss: 0.5326, Accuracy: 150/200 (75.000%)


Test set: Average loss: 0.5235, Accuracy: 153/200 (76.500%)


Test set: Average loss: 0.5019, Accuracy: 150/200 (75.000%)


Test set: Average loss: 0.5199, Accuracy: 150/200 (75.000%)


Test set


Test set: Average loss: 0.6040, Accuracy: 134/200 (67.000%)


Test set: Average loss: 0.5913, Accuracy: 140/200 (70.000%)


Test set: Average loss: 0.5773, Accuracy: 135/200 (67.500%)


Test set: Average loss: 0.5735, Accuracy: 144/200 (72.000%)


Test set: Average loss: 0.5836, Accuracy: 139/200 (69.500%)


Test set: Average loss: 0.5456, Accuracy: 149/200 (74.500%)


Test set: Average loss: 0.6226, Accuracy: 129/200 (64.500%)


Test set: Average loss: 0.6003, Accuracy: 131/200 (65.500%)


Test set: Average loss: 0.5639, Accuracy: 141/200 (70.500%)

Training on 8000 examples
Using both high and low level features
Testing on 2000 examples
Using both high and low level features

 training DNN with 10000 data points and SGD lr=0.000010. 


Test set: Average loss: 0.7069, Accuracy: 991/2000 (49.550%)


Test set: Average loss: 0.7065, Accuracy: 986/2000 (49.300%)


Test set: Average loss: 0.7062, Accuracy: 984/2000 (49.200%)


Test set: Average loss: 0.7059, Accuracy: 979/2000 (48.950%)




Test set: Average loss: 0.6927, Accuracy: 1019/2000 (50.950%)


Test set: Average loss: 0.6908, Accuracy: 1043/2000 (52.150%)


Test set: Average loss: 0.6891, Accuracy: 1089/2000 (54.450%)


Test set: Average loss: 0.6875, Accuracy: 1123/2000 (56.150%)


Test set: Average loss: 0.6860, Accuracy: 1161/2000 (58.050%)


Test set: Average loss: 0.6847, Accuracy: 1191/2000 (59.550%)


Test set: Average loss: 0.6832, Accuracy: 1215/2000 (60.750%)


Test set: Average loss: 0.6819, Accuracy: 1231/2000 (61.550%)


Test set: Average loss: 0.6807, Accuracy: 1257/2000 (62.850%)


Test set: Average loss: 0.6793, Accuracy: 1277/2000 (63.850%)


 training DNN with 10000 data points and SGD lr=0.001000. 


Test set: Average loss: 0.6833, Accuracy: 1246/2000 (62.300%)


Test set: Average loss: 0.6717, Accuracy: 1391/2000 (69.550%)


Test set: Average loss: 0.6623, Accuracy: 1444/2000 (72.200%)


Test set: Average loss: 0.6525, Accuracy: 1472/2000 (73.600%)


Test set: Average loss: 0.6432, Accuracy: 


Test set: Average loss: 0.6010, Accuracy: 1498/2000 (74.900%)


 training DNN with 10000 data points and SGD lr=0.010000. 


Test set: Average loss: 0.5936, Accuracy: 1541/2000 (77.050%)


Test set: Average loss: 0.5347, Accuracy: 1553/2000 (77.650%)


Test set: Average loss: 0.5035, Accuracy: 1534/2000 (76.700%)


Test set: Average loss: 0.4879, Accuracy: 1567/2000 (78.350%)


Test set: Average loss: 0.4802, Accuracy: 1560/2000 (78.000%)


Test set: Average loss: 0.4778, Accuracy: 1553/2000 (77.650%)


Test set: Average loss: 0.4669, Accuracy: 1577/2000 (78.850%)


Test set: Average loss: 0.4643, Accuracy: 1577/2000 (78.850%)


Test set: Average loss: 0.4611, Accuracy: 1587/2000 (79.350%)


Test set: Average loss: 0.4670, Accuracy: 1565/2000 (78.250%)


 training DNN with 10000 data points and SGD lr=0.100000. 


Test set: Average loss: 0.4694, Accuracy: 1580/2000 (79.000%)


Test set: Average loss: 0.4629, Accuracy: 1570/2000 (78.500%)


Test set: Average loss: 0.4606, Accuracy: 156


Test set: Average loss: 0.4540, Accuracy: 1593/2000 (79.650%)


Test set: Average loss: 0.4555, Accuracy: 1581/2000 (79.050%)


Test set: Average loss: 0.4496, Accuracy: 1584/2000 (79.200%)

Training on 80000 examples
Using both high and low level features
Testing on 20000 examples
Using both high and low level features

 training DNN with 100000 data points and SGD lr=0.000010. 


Test set: Average loss: 0.6880, Accuracy: 11181/20000 (55.905%)


Test set: Average loss: 0.6879, Accuracy: 11199/20000 (55.995%)


Test set: Average loss: 0.6878, Accuracy: 11221/20000 (56.105%)


Test set: Average loss: 0.6876, Accuracy: 11230/20000 (56.150%)


Test set: Average loss: 0.6875, Accuracy: 11258/20000 (56.290%)


Test set: Average loss: 0.6874, Accuracy: 11270/20000 (56.350%)


Test set: Average loss: 0.6873, Accuracy: 11289/20000 (56.445%)


Test set: Average loss: 0.6872, Accuracy: 11313/20000 (56.565%)


Test set: Average loss: 0.6870, Accuracy: 11341/20000 (56.705%)


Test set: Average lo


Test set: Average loss: 0.7018, Accuracy: 8836/20000 (44.180%)


Test set: Average loss: 0.7003, Accuracy: 8879/20000 (44.395%)


Test set: Average loss: 0.6989, Accuracy: 8955/20000 (44.775%)


Test set: Average loss: 0.6976, Accuracy: 9031/20000 (45.155%)


Test set: Average loss: 0.6963, Accuracy: 9099/20000 (45.495%)


 training DNN with 100000 data points and SGD lr=0.001000. 


Test set: Average loss: 0.6860, Accuracy: 11350/20000 (56.750%)


Test set: Average loss: 0.6774, Accuracy: 12285/20000 (61.425%)


Test set: Average loss: 0.6699, Accuracy: 12938/20000 (64.690%)


Test set: Average loss: 0.6629, Accuracy: 13423/20000 (67.115%)


Test set: Average loss: 0.6562, Accuracy: 13745/20000 (68.725%)


Test set: Average loss: 0.6497, Accuracy: 13936/20000 (69.680%)


Test set: Average loss: 0.6432, Accuracy: 14090/20000 (70.450%)


Test set: Average loss: 0.6368, Accuracy: 14201/20000 (71.005%)


Test set: Average loss: 0.6302, Accuracy: 14335/20000 (71.675%)


Test set: Average 


Test set: Average loss: 0.5082, Accuracy: 15446/20000 (77.230%)


Test set: Average loss: 0.4902, Accuracy: 15545/20000 (77.725%)


Test set: Average loss: 0.4816, Accuracy: 15615/20000 (78.075%)


Test set: Average loss: 0.4755, Accuracy: 15666/20000 (78.330%)


Test set: Average loss: 0.4702, Accuracy: 15673/20000 (78.365%)


Test set: Average loss: 0.4667, Accuracy: 15721/20000 (78.605%)


Test set: Average loss: 0.4649, Accuracy: 15724/20000 (78.620%)


Test set: Average loss: 0.4622, Accuracy: 15739/20000 (78.695%)


 training DNN with 100000 data points and SGD lr=0.100000. 


Test set: Average loss: 0.4766, Accuracy: 15490/20000 (77.450%)


Test set: Average loss: 0.4593, Accuracy: 15736/20000 (78.680%)


Test set: Average loss: 0.4543, Accuracy: 15803/20000 (79.015%)


Test set: Average loss: 0.4528, Accuracy: 15811/20000 (79.055%)


Test set: Average loss: 0.4491, Accuracy: 15858/20000 (79.290%)


Test set: Average loss: 0.4476, Accuracy: 15900/20000 (79.500%)


Test set: Ave


Test set: Average loss: 0.7014, Accuracy: 15839/40000 (39.597%)


Test set: Average loss: 0.7013, Accuracy: 15857/40000 (39.642%)


Test set: Average loss: 0.7012, Accuracy: 15873/40000 (39.682%)


Test set: Average loss: 0.7011, Accuracy: 15899/40000 (39.748%)


Test set: Average loss: 0.7010, Accuracy: 15920/40000 (39.800%)


Test set: Average loss: 0.7009, Accuracy: 15948/40000 (39.870%)


Test set: Average loss: 0.7008, Accuracy: 15966/40000 (39.915%)


Test set: Average loss: 0.7007, Accuracy: 15986/40000 (39.965%)


Test set: Average loss: 0.7005, Accuracy: 16005/40000 (40.013%)

