In [17]:
import sys
sys.path.insert(0, '..') #This line adds '..' to the path so we can import the net_framework python file
from net_framework import *
import numpy as np
from IPython.display import clear_output, display
import matplotlib.pyplot as plt
import scipy.stats as sp

The following cell allows you to toggle warnings on and off. It helps the readability of the results but warnings in general are important and should not be ignored. 

In [19]:
from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a href="javascript:code_toggle_err()">here</a>.''')

Hi! This notebook will serve as a tutorial to walk through the basics of this project. In this notebook, we will model the truthfulness of a class of mathematical statements that amount to the addition of two integers between -50 and 50. We need to limit the range of integers because it would take an infinite amount of compute power to fully conceptualize the addition of any two integers with no bounds. 

To model the information content of the chosen set of statements, we will go through the following procedure-

1. Defining a "sub-language" to constrain the set of statements. This will require going through the process of formalizing what the set of statements you want to model are and how to interpret them. 
2. Finding an efficient representation that can be inputted into neural networks for the characters in the language and the statements you want to model. For the most part, we will be using one-hot vectors for this task (https://en.wikipedia.org/wiki/One-hot). 
3. Generating training and validation datasets for our statements. These datasets should consist of input vectors that represent legal statements in the "sub-language" and the associated truth values for each of these statements. 
4. Use the data to train a set of neural networks with different structures neural complexities. The ones with training error below a certain threshold will be validated. We will be using a threshold value of 0.05 for this notebok. The networks with validation error below that threshold will be classified as "good" networks and the "good" network with minimal neural complexity will be used as the final model to represent the statements. 
5. (optional) Try finding the information complexity (the amount data it takes to train the network) of that network structure by training it with different amounts of training data and seeing how much training data is required for it to have training and validation errors below the threshold. 
6. Save the statistics from the final network. This will include things such as the final neural complexity (the number of hidden layer nodes in the network), information complexity (the amount data it takes to train the network), final validation/training error, final weights of the network and plots of the training error as the number of training cycles increases. 

## Step 1: Defining a Sub-Language

This is a purely conceptual step but is easily the most important and difficult part of this project. We need to define a language to constrain the set of statements we are working with. To do this, we need to define three things:

1. The characters in the language
2. The way these characters are legally allowed to be combined. 
3. How to determine the truthfulness of a given statement. 

In our case, the set of characters we are using can be split into two groups. 

1. The set of integers from -50 to 50, inclusive. We will call this set, $Z$.
2. The symbols '+' and '='.

These characters can only legally be combined in a single way.

Let $z_1$, $z_2$ and $z_3$ be three elements of $Z$. Then, we define a legal 'sentence' in this language as the following combination of characters: 

$$z_1 + z_2 = z_3$$.

And the veracity of these statements can be understood in the normal way, the statement is true if and only if the first two integers add up to the third. 

At first glance, it might seem unecessary to model the '+' and '=' symbols since they are in the same positions every time, but this is just a very simple example and modeling these types of symbols will be necessary when dealing with more complex syntaxes in the future. 

## Step 2: Finding a Representation

We now need to represent our language in a way that can be understood by the neural network. To do this, we will be using one-hot vectors (https://en.wikipedia.org/wiki/One-hot) to represent each character in the network. Each vector will have a dimensionality of 103 to represent each of the 101 integers as well the '+' and '=' symbols. 

We will use the following mapping: 

'+' maps to [1,0,0,...,0,0]  
'=' maps to [0,1,0,...,0,0]
'-50' maps to [0,0,1,...,0,0]  
...  
'49' maps to [0,0,0,...,1,0]  
and  
'50' maps to [0,0,0,...,0,1]  

Each sentence in the langauge can be thought of as a sequence of five of these vectors stacked end to end. 

## Step 3: Generating Training/Validation Data

Now, we have to write functions that will generate training or validation datasets on command. 

In [20]:
def one_hot(symbols):
    '''
    Converts symbol ('+', '=', int. between -50 to 50) array into 'stacked' one-hot vectors as specified above. 
    '''
    vector_stack = []
    for symbol in symbols:
        vector = np.zeros(103)
        if symbol == '+':
            vector[0] = 1
        elif symbol == '=':
            vector[1] = 1
        else:
            idx = int(symbol) + 52
            vector[idx] = 1
        vector_stack = np.concatenate((vector_stack, vector))
    return np.asarray(vector_stack)
    
    
def gen_data(num_examples):
    
    '''
    Generates statements in this language as well as their veracity.
    
    Params
    ------
    num_examples : int
        The number of examples in the dataset. 
    randomize : bool
        Whether or not to randomize the output dataset. 
        
    Returns
    -------
    X : 2D numpy array
        Matrix of inputs where each row is a vector that represents a single sentence.
    Y : 1D numpy array
        Each element 1 if the corresponding statement in X is true and 0 otherwise. 
    '''
    X = []
    Y = []
    sentences = []
    i = 0
    while i < num_examples:
        if i < num_examples/2:
            #Randomly Choosing three integers between -50 and 50
            z = np.random.randint(-50, 51, 3)
            sentence = np.array([z[0], '+', z[1], '=', z[2]])
            sentences.append(sentence)
            X.append(one_hot(sentence))
            if z[0] + z[1] == z[2]:
                Y.append(1)
            else:
                Y.append(0)
        else:
            #Choosing values such that the output is true to ensure that our training dataset
            #is not skewed with False results.
            z = np.random.randint(-50, 51, 2)
            z = np.append(z, z[0] + z[1])
            #Ensuring the sentence is legal
            if z[2] in np.arange(-50, 51, 1):
                sentence = np.array([z[0], '+', z[1], '=', z[2]])
                sentences.append(sentence)
                X.append(one_hot(sentence))
                if z[0] + z[1] == z[2]:
                    Y.append(1)
                else:
                    Y.append(0)
            else:
                i -= 1
        i += 1
        
    return np.asarray(X), np.asarray(Y), np.asarray(sentences)

In [3]:
#There might be repeats in the training/validation data. This isn't ideal, but 
#hopefully it won't be a big deal. 
data = gen_data(10000)

We can now visualize our training data

In [21]:
X_train = data[0]
Y_train = data[1]
sentences = data[2]

print('A typical sentence is: ' + str(sentences[6000]))
print('The truth value of this sentence is: ' + str(Y_train[6000]))
print('Its one-hot representation is: ' + str(X_train[6000]))

A typical sentence is: ['21' '+' '-24' '=' '-3']
The truth value of this sentence is: 1
Its one-hot representation is: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0

## Step 4: Training and Validating the Networks

Now, we need to define and train a bunch of different neural networks of various shapes. We will train over 50 iterations with 100 training samples. You will probably have to use much larger values for these. 

The neural networks we are using will have an input size of 5 * 103 = 515 since that is the size of 5 one-hot vectors stacked on each other. The output size will be 1. We will be trying networks with one hidden layer and varying sizes of 1, 2 and 3, networks with two hidden layers with sizes of [3,2] and [2,1]. We will also be using a learning rate of 0.1 but other learning rates work well as well. 

### Training

In [22]:
#Number of training iterations
num_iters = 50
#Size of training dataset
num_examples = 100
#Acquiring Training Data
training_data = gen_data(num_examples)
X = training_data[0]
Y = training_data[1]
#Converting training data into pytorch format
X = torch.from_numpy(X).float()
Y = torch.from_numpy(Y).float()
#Listing out the shapes of each model
network_shapes = [(515, [1], 1), (515, [2], 1), (515, [3], 1), (515, [3,2], 1), (515, [2,1], 1)]
#Learning rate of the network
rate = 0.1
#Array of losses over training period for each network
for net_num, shape in enumerate(network_shapes):
    print('Network Shape: ' + str(shape), flush = True)
    NN = Neural_Network(inputSize = shape[0], outputSize = shape[2], hiddenSize = shape[1] , learning_rate = rate)
    for i in range(num_iters):
        loss = torch.mean((Y - NN(X))**2).item()
        if i == 0: 
            dh = display("#" + str(i) + " Loss: " + str(loss), display_id=True)
        else:
            dh.update("#" + str(i) + " Loss: " + str(loss))
        
        NN.train(X, Y)
        
    #Saves the training results in a filed named "Net 0", "Net 1", etc. 
    NN.saveWeights(model = NN, path = "saved_networks/Net " + str(net_num));

Network Shape: (515, [1], 1)


'#49 Loss: 0.0010861869668588042'

Network Shape: (515, [2], 1)


'#49 Loss: 0.0003228409623261541'

Network Shape: (515, [3], 1)


'#49 Loss: 0.0003011675726156682'

Network Shape: (515, [3, 2], 1)


'#49 Loss: 0.0003451051888987422'

Network Shape: (515, [2, 1], 1)


'#49 Loss: 0.0005270301480777562'

### Validation

Due to the way we are storing and loading these networks, the validation dataset has to be the same size as the training dataset. We will validate the network 100 times and display the resulting mean/standard deviation. We will be using a threshold error of 0.05 for validation

In [23]:
#Generating validation data
validation_data = gen_data(num_examples)
X = validation_data[0]
Y = validation_data[1]
#Converting validation data into pytorch format
X = torch.from_numpy(X).float()
Y = torch.from_numpy(Y).float()
#Array of validation error means
validation_error_means = []
#Array of validation error stds
validation_error_stds = []
for net_num, shape in enumerate(network_shapes):
    print('Network Shape: ' + str(shape), flush = True)
    #Loading the network we trained in the prev. section
    NN = torch.load("saved_networks/Net " + str(net_num))
    validation_errors = []
    for i in range(100):
        validation_err = torch.mean((Y - NN(X))**2).item()
        validation_errors.append(validation_err)
    mean, var = sp.describe(validation_errors)[2:4]
    std = np.sqrt(var)
    validation_error_means.append(mean)
    validation_error_stds.append(std)
    print("The validation error is: mean = " + str(mean) + " and std = " + str(std), flush = True)  

Network Shape: (515, [1], 1)
The validation error is: mean = 0.019764089956879616 and std = 0.0
Network Shape: (515, [2], 1)
The validation error is: mean = 0.019605722278356552 and std = 0.0
Network Shape: (515, [3], 1)
The validation error is: mean = 0.019608907401561737 and std = 0.0
Network Shape: (515, [3, 2], 1)
The validation error is: mean = 0.01960284821689129 and std = 0.0
Network Shape: (515, [2, 1], 1)
The validation error is: mean = 0.019607605412602425 and std = 0.0


Clearly, due to the simplicity of this tasks, all our examples are below the 0.05 validation error threshold. Therefore, we can say the "information content" of adding two integers between -50 and 50 is 1. This value is meaningless on its own but can become interesting when comparing it between statements and languages.

## (Optional) Step 5: Attempt to Determine Information Complexity

In this step, we will quantify the information complexity of modeling this addition by figuring out how many training examples is required by our minimal neural complexity network to reach the threshold validation error of 0.05. To do this, we will retrain our example network with one hidden node and varying amounts of training data.

In [24]:
#List of varying amounts of training examples
num_examples_arr = np.arange(1, 6, 1)
#Validation Error Threshold
threshold = 0.05
print('Network Shape: ' + str(shape), flush = True)
for num_examples in num_examples_arr:
    #Acquiring Training Data
    training_data = gen_data(num_examples)
    X = training_data[0]
    Y = training_data[1]
    #Converting training data into pytorch format
    X = torch.from_numpy(X).float()
    Y = torch.from_numpy(Y).float()
    #Winning Network Shape
    shape = (515, [1], 1)
    #Learning rate of the network
    rate = 0.1
    #Number of training examples
    print('Number of Examples: ' + str(num_examples), flush = True)
    NN = Neural_Network(inputSize = shape[0], outputSize = shape[2], hiddenSize = shape[1] , learning_rate = rate)
    validation_err = 1
    i = 0
    while validation_err > 0.05 and i < 10000:       
        #Averaging validation error over 100 tries (so outliers do not skew results)
        validation_errors = []
        for j in range(250):
            #Generating validation data
            validation_data = gen_data(num_examples)
            X_val = validation_data[0]
            Y_val = validation_data[1]
            #Converting validation data into pytorch format
            X_val = torch.from_numpy(X_val).float()
            Y_val = torch.from_numpy(Y_val).float()
        
            err = torch.mean((Y_val - NN(X_val))**2).item()
            validation_errors.append(err)
        validation_err = sp.describe(validation_errors)[2]
        if i == 0: 
            dh = display("#" + str(i) + " Validation Error: " + str(validation_err), display_id=True)
        else:
            dh.update("#" + str(i) + " Validation Error: " + str(validation_err))
        NN.train(X, Y)
        i += 1

Network Shape: (515, [2, 1], 1)
Number of Examples: 1


'#178 Validation Error: 0.04905067577958107'

Number of Examples: 2


'#109 Validation Error: 0.049917827785015106'

Number of Examples: 3


'#39 Validation Error: 0.04678682877123356'

Number of Examples: 4


'#45 Validation Error: 0.04785569289326668'

Number of Examples: 5


'#39 Validation Error: 0.04965037626028061'

Clearly, we only need 1 training example for the network to learn what is going on with high precision!

We also see a trend here where the network learns more quickly with a higher number of training examples, which is expected. 

## Step 6: Save and Discuss Results

Since this notebook really only takes around a minute to run and is a tutorial, there is no reason for me to save all the results. However, I expect future results to be saved for this project and displayed in the form of graphics or a discussion section which you can use to present your results to me and/or the groups. Formatting code in a Jupyter Notebook the way I have here is preferred, but may not always be possible for more complicated systems. 

From these results, we see that addition is an extremely simple and easy task for a neural network to learn. In the future, I would want to explore how this compares with the results from learning other arithmetic operations as well as learning these operations simultaneously. 