# HW7 V2: Neural Net on Phoneme Data for Speech Recognition

__Objective:__ The aim of neural networks is to extract linear combinations of inputs as derived features to generate a nonlinear model of the data that makes predictions for new data sets [1]. A neural net takes a set of inputs, weights and biases them, and runs them through a series of hidden layers. These hidden layers are composed of nodes that each contain primitive function; nodes add together the weighted inputs they retrieve and applies the primitive function. These primitive functions are called 'activation functions' and they are usually the sigmoid/logistic function (the reLU and hyperbolic rangent functions are two others used) [2]  [3]. After traversing the network of hidden layers, the inputs are transformed into a set of outputs to make predictions about new data [1]. When given a set of data with known labels/targets estimating the optimal neural network weights and biases is computed using back-propogation. For this assignment, a data set of 5 phoneme classifications from continuous data of 50 male speakers were used.

the output of a neuron can be the input of another

__Forward Propogation:__ calculate 

__Backpropagation:__ update each existing weight in the network so that they cause the current output value to move closer the target/true output, which is achieved by minimizing the error for each output neuron

__Variables__
- x
- y
- y_hat

__Equations__
- Sum of Squares Error Function/Loss Function: Error = 1/2 * sum(target_j - output_j)^2
- Sigmoid Function: sigmoid σ(v) = 1/(1 + e^(−v))
- Weight Update Rule for Single Output Node for Hidden-to-Output Weights:


__General Algorithm__

_Assumptions_
- binary classification 
- the hidden layer & output layer use the same activation function (this is due to doing binary classification)

_Forward Propagation through the Network_
- traverse the network forwards from the input layer nodes --> output layer nodes:
    - calculate the net input for each hidden layer node and each output layer node
    - "squash" each net input with the activation function
_Backward Propogation through the Network_
- traverse the network backwards from the output layer nodes --> input layer nodes:
    - calculate the squared error for each output layer node: Error = computed_output(y_hat) - target_output(y)
    - calculate the squared error for each hidden layer node: Error = actv_output(o)*(1-actv_output)*sum(weights*delta)
    - calculate the difference in weights 
    
    
The algorithm terminates when the value of the error function is sufficiently small. This value is usually ... ?

__References:__
1. Trevor Hastie, Robert Tibshirani, Jerome Friedman, Elements of Statistical Learning: Data mining, inference, and prediction, 2002. Retrieved from: http://web.stanford.edu/~hastie/ElemStatLearn/main.html
2. Raul Rojas, Neural Networks: A systematic introduction, 1996. Retrieved from: http://page.mi.fu-berlin.de/rojas/neural/neuron.pdf
3. Aurelien Geron, Hands-on machine learning with scikit learn and tensorflow: concepts, tools, and techniques to build intelligent systems, Sebastopol, CA: O'Reilly Media, 2017.


- https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
- https://brilliant.org/wiki/backpropagation/
- https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/07/04/how-to-implement-the-backpropagation-using-python-and-numpy/

- http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm
- http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
- https://en.wikipedia.org/wiki/Backpropagation

__The Softmax / Logistic Function [3]:__ σ(v) = 1/(1 + e^(−v))

This function is used to guarantee a gradient upon taking the derivative. We desire a function that produces a gradient so that when we implement gradient descent and iterate through the parameters, we are guaranteed to make progress and smoothly transition with each step toward convergence. Conversely, if we were to use a function that contains only flat segment, e.g. the step function, we wouldn't know that we were making progress because the gradient would be zero.  

More specifically, this equation squashes the total net input, the value that is calculated by summing all of the inputs that go into a node. The term 'squashing' refers to the fact that we are taking values from the number line and bounding them into the range 0 to 1. This is the same range that the ReLU activation function squashes to. As a second example, if we were to be using the hyperbolic tangent function, the sqaushing range would be from -1 to 1.

__Total Net Input:__ net = w1 x i1  +  w2 x i2 + ... + wN x wN + bias1 x 1

This function sums all of the inputs for a given node. This summation is composed of products of weights and the values of the input nodes, including bias nodes.


In [1]:
import math
import random
import numpy as np
import pandas as pd
import sklearn
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

### FUNCTIONS 

In [2]:
class Neuron:
    def __init__(self, bias):
        self.bias = bias
        self.weights = []
        
    def calc_total_net_input(self):
        total = 0
        for i in range(len(self.inputs)):
            total += self.inputs[i] * self.weights[i]
        return total + self.bias
        
    def calc_output(self, inputs):
        self.inputs = inputs
        self.output = self.sigmoid(self.calc_total_net_input())
        return self.output
        
    def sigmoid(self, total_net_input):
        return 1.0 / 1.0 + np.exp(-total_net_input)
    
    def sqr_error(self, target):
        return 0.5 * (target - self.output) ** 2
        
    def error_wrt_output(self):
        return self.output * (1 - self.output)
    
    def pd_total_net_input_wrt_weight(self, index): # WHERE DOES THIS INDEX VALUE COME FROM ????
        return self.inputs[index]     
    
    # = ∂E/∂yⱼ = -(tⱼ - yⱼ)
    def pd_error_wrt_output(self, target):
        return -(target - self.output)
    
    # dyⱼ/dzⱼ = yⱼ * (1 - yⱼ)
    def pd_total_net_input_wrt_input(self):
        return self.output * (1 - self.output)
    
    # δ = ∂E/∂zⱼ = ∂E/∂yⱼ * dyⱼ/dzⱼ
    def pd_error_wrt_total_net_input(self, target):
        print("PD Error Target: {}".format(target))
        print("PD Total Net Input: {}\n".format(self.pd_total_net_input_wrt_input()))
        return self.pd_error_wrt_output(target) * self.pd_total_net_input_wrt_input()
    
    

In [3]:
class NeuronLayer:
    
    def __init__(self, num_neurons, bias):
        
        # every neuron in a layer shares the same bias
        self.bias = bias if bias else random.random()
        self.neurons = []
        for i in range(num_neurons):
            self.neurons.append(Neuron(self.bias))
            
    def inspect(self):
        print('Neurons:', len(self.neurons))
        for n in range(len(self.neurons)):
            print(' Neuron', n)
            for w in range(len(self.neurons[n].weights)):
                print('  Weight:', self.neuron[n].weights[w])
            print('  Bias:', self.bias)
            
    def feed_forward(self, inputs):
        outputs = []
        for neuron in self.neurons:
            outputs.append(neuron.calc_output(inputs))
        return outputs
            
    def get_outputs(self):
        outputs = []
        for neuron in self.neurons:
            outputs.append(neuron.output)
        return outputs

In [4]:
class NeuralNetwork:
    LEARNING_RATE = 0.5
    
    def __init__(self, num_inputs, num_hidden, num_outputs, hidden_layer_weights=None, hidden_layer_bias=None, output_layer_weights=None, output_layer_bias=None):
        self.num_inputs = num_inputs
        self.hidden_layer = NeuronLayer(num_hidden, hidden_layer_bias)
        self.hidden_layer1 = NeuronLayer(num_hidden, hidden_layer_bias)
        self.output_layer = NeuronLayer(num_outputs, output_layer_bias)
        
        self.init_weights_from_inputs_to_hidden_layer_neurons(hidden_layer_weights)
        self.init_weights_from_hidden_layer_neurons_to_output_layer_neurons(output_layer_weights)
        
    def init_weights_from_inputs_to_hidden_layer_neurons(self, hidden_layer_weights):
        weight_num = 0 
        for h in range(len(self.hidden_layer.neurons)):
            for i in range(self.num_inputs):
                if not hidden_layer_weights:
                    self.hidden_layer.neurons[h].weights.append(random.random())
                else:
                    self.hidden_layer.neurons[h].weights.append(hidden_layer_weights[weight_num])
                weight_num += 1
                
    def init_weights_from_hidden_layer_neurons_to_output_layer_neurons(self, output_layer_weights):
        weight_num = 0 
        for o in range(len(self.output_layer.neurons)):
            for h in range(len(self.hidden_layer.neurons)):
                if not output_layer_weights:
                    self.output_layer.neurons[o].weights.append(random.random())
                else:
                    self.output_layer.neurons[o].weights.append(output_layer_weights[weight_num])
                weight_num += 1
    '''            
    def init_weights_from_hidden_layer_neurons_to_output_layer_neurons(self, next_hidden_layer_weights):
        weight_num = 0
        for h1 in range(len(self.hidden_layer1.neurons)):
            for h0 in range(len(self.hidden_layer.neurons)):
                if not hidden_layer1_weights:
                    self.hidden_layer1.neurons[h1].weights.append(random.random())
                else:
                    self.hidden_layer1.neurons[h1].weights.append(hidden_layer1_weights[weight_num])
    '''            
    def inspect(self):
        print('------')
        print('* Inputs: {}'.format(self.num_inputs))
        print('------')
        print('Hidden Layer')
        self.hidden_layer.inspect()
        print('------')
        print('* Output Layer')
        self.output_layer.inspect()
        print('------')
        
    def feed_forward(self, inputs):
        hidden_layer_outputs = self.hidden_layer.feed_forward(inputs)
        # ADD HIDDEN LAYER 2
        return self.output_layer.feed_forward(hidden_layer_outputs)
    
    def train(self, training_inputs, training_outputs):
        self.feed_forward(training_inputs)
        
        # 1. Calculate deltas of output neurons
        pd_errors_wrt_output_neuron_total_net_input = 
        [0] * len(self.output_layer.neurons)
        print("PD ERROR/OUTPUT NEURON TOTAL: {}".format(pd_errors_wrt_output_neuron_total_net_input))
        for o in range(len(self.output_layer.neurons)):
            
            # ∂E/∂zⱼ
            print("Index of output layer neuron: {}".format(o))
            print("Value of training output: {}".format(training_outputs[o]))
            #print("Value of PD Error / PD net input: {}".format(neurons[o].pd_error_wrt_total_net_input(training_outputs[o])))
            print("Value of output layer neuron: {}\n".format(self.output_layer.neurons[o].pd_error_wrt_total_net_input(training_outputs[o])))
            
            pd_errors_wrt_output_neuron_total_net_input[o] = 
            self.output_layer.neurons[o].
            pd_error_wrt_total_net_input(training_outputs[o])

        
        # 2. Calculate deltas of hidden neurons
        pd_errors_wrt_hidden_neuron_total_net_input = [0] * len(self.hidden_layer.neurons)
        for h in range(len(self.hidden_layer0.neurons)):
            
            # dE/dyⱼ = Σ ∂E/∂zⱼ * ∂z/∂yⱼ = Σ ∂E/∂zⱼ * wᵢⱼ
            d_error_wrt_hidden_neuron_output = 0.5        
            for o in range(len(self.output_layer.neurons)): # CHANGE TO HIDDEN LAYER 2 NOT OUTPUT ???
                                                            # DOES THIS MEAN I NEED ANOTHER FUNCTION FOR HL WRT HL ???
                d_error_wrt_hidden_neuron_output += pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].weights[h]
        
            # ∂E/∂zⱼ = dE/dyⱼ * ∂zⱼ/∂
            pd_errors_wrt_hidden_neuron_total_net_input[h] = d_error_wrt_hidden_neuron_output * self.hidden_layer.neurons[h].pd_total_net_input_wrt_input()
            
        # COMPUTE DELTA FOR HIDDEN LAYER 2 
        #for i in range(len(self.hidden_layer1.neurons)):
            
            #d_error_wrt_hidden_neuron_output = 0
            #for p in range(len(self.))
        
            
        # 3. Update weights of output neurons
        for o in range(len(self.output_layer.neurons)):
            for w_ho in range(len(self.output_layer.neurons[o].weights)):
                
                # ∂Eⱼ/∂wᵢⱼ = ∂E/∂zⱼ * ∂zⱼ/∂wᵢⱼ
                pd_error_wrt_weight = pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].pd_total_net_input_wrt_weight(w_ho)
                
                # Δw = α * ∂Eⱼ/∂wᵢ
                self.output_layer.neurons[o].weights[w_ho] -= self.LEARNING_RATE * pd_error_wrt_weight
                
        # 4. Update hidden neuron weights
        for h in range(len(self.hidden_layer.neurons)):
            for w_ih in range(len(self.hidden_layer.neurons[h].weights)):
                
                # ∂Eⱼ/∂wᵢ = ∂E/∂zⱼ * ∂zⱼ/∂wᵢ
                pd_error_wrt_weight = pd_errors_wrt_hidden_neuron_total_net_input[h] * self.hidden_layer.neurons[h].pd_total_net_input_wrt_weight(w_ih)
                
                # Δw = α * ∂Eⱼ/∂wᵢ
                self.hidden_layer.neurons[h].weights[w_ih] -= self.LEARNING_RATE * pd_error_wrt_weight
                
            # UPDATE HIDDEN LAYER 2
                
                
        
    def calculate_total_error(self, training_sets):
        total_error = 0
        for t in range(len(training_sets)):
            training_inputs, training_outputs = training_sets[t]
            self.feed_forward(training_inputs)
            for o in range(len(training_outputs)):
                total_error += self.output_layer.neurons[o].sqr_error(training_outputs[o])
            return total_error

In [5]:
'''
nn = NeuralNetwork(2, 2, 2, 
                   hidden_layer_weights=[0.15, 0.2, 0.25, 0.3], 
                   hidden_layer_bias=0.35, 
                   output_layer_weights=[0.4, 0.45, 0.5, 0.55], 
                   output_layer_bias=0.6)

for i in range(10000):
    nn.train([0.05, 0.1], [0.01, 0.99])
    
print(i, round(nn.calculate_total_error([[[0.05, 0.1], [0.01, 0.99]]]), 9))
'''

'\nnn = NeuralNetwork(2, 2, 2, \n                   hidden_layer_weights=[0.15, 0.2, 0.25, 0.3], \n                   hidden_layer_bias=0.35, \n                   output_layer_weights=[0.4, 0.45, 0.5, 0.55], \n                   output_layer_bias=0.6)\n\nfor i in range(10000):\n    nn.train([0.05, 0.1], [0.01, 0.99])\n    \nprint(i, round(nn.calculate_total_error([[[0.05, 0.1], [0.01, 0.99]]]), 9))\n'

(num_inputs, 
num_hidden, 
num_outputs, 
hidden_layer_weights=None, 
hidden_layer_bias=None, 
output_layer_weights=None, 
output_layer_bias=None)

#### Load phoneme data set 

In [6]:
data = pd.read_csv('five_phonemes.txt', sep=',')

In [7]:
data.shape

(4509, 259)

In [8]:
print(data[0:5])

   row.names       x.1       x.2       x.3       x.4       x.5       x.6  \
0          1   9.85770   9.20711   9.81689   9.01692   9.05675   8.92518   
1          2  13.23079  14.19189  15.34428  18.11737  19.53875  18.32726   
2          3  10.81889   9.07615   9.77940  12.20135  12.59005  10.53364   
3          4  10.53679   9.12147  10.84621  13.92331  13.52476  10.27831   
4          5  12.96705  13.69454  14.91182  18.22292  18.45390  17.25760   

        x.7       x.8       x.9         ...              x.249     x.250  \
0  11.28308  11.52980  10.79713         ...           12.68076  11.20767   
1  17.34169  17.16861  19.63557         ...            8.45714   8.77266   
2   8.54693   9.46049  11.96755         ...            5.00824   5.51019   
3   8.97459  11.57109  12.35839         ...            5.85688   5.40324   
4  17.79614  17.76387  18.99632         ...            8.00151   7.58624   

      x.251     x.252     x.253     x.254     x.255    x.256    g  \
0  13.69394  13.7

#### Convert Data Frame Into Numpy Array

In [9]:
data_set = data.as_matrix()

In [10]:
print(data[0:5])

   row.names       x.1       x.2       x.3       x.4       x.5       x.6  \
0          1   9.85770   9.20711   9.81689   9.01692   9.05675   8.92518   
1          2  13.23079  14.19189  15.34428  18.11737  19.53875  18.32726   
2          3  10.81889   9.07615   9.77940  12.20135  12.59005  10.53364   
3          4  10.53679   9.12147  10.84621  13.92331  13.52476  10.27831   
4          5  12.96705  13.69454  14.91182  18.22292  18.45390  17.25760   

        x.7       x.8       x.9         ...              x.249     x.250  \
0  11.28308  11.52980  10.79713         ...           12.68076  11.20767   
1  17.34169  17.16861  19.63557         ...            8.45714   8.77266   
2   8.54693   9.46049  11.96755         ...            5.00824   5.51019   
3   8.97459  11.57109  12.35839         ...            5.85688   5.40324   
4  17.79614  17.76387  18.99632         ...            8.00151   7.58624   

      x.251     x.252     x.253     x.254     x.255    x.256    g  \
0  13.69394  13.7

In [11]:
# Parse data values: get columns 1-(last-1) for all rows
X_phonemes = data_set[1:4509, 1:257]

# Parse labels: get last column for all rows 
y_phonemes = data_set[1:4509, 257]

In [12]:
print("Data: {}".format(X_phonemes.shape))
print("Labels: {}".format(y_phonemes.shape))

Data: (4508, 256)
Labels: (4508,)


In [13]:
print(X_phonemes[0:4])

[[13.230789999999999 14.191889999999999 15.34428 ... 5.38504 9.43063
  8.59328]
 [10.81889 9.07615 9.7794 ... 6.584160000000001 6.270580000000001
  3.8504199999999997]
 [10.53679 9.12147 10.846210000000001 ... 3.63384 3.22823 4.63123]
 [12.96705 13.69454 14.91182 ... 7.036 7.01278 8.52197]]


In [14]:
print(y_phonemes[0:4])

['iy' 'dcl' 'dcl' 'aa']


#### Generate Test & Training Sets

In [15]:
# Allocate 2/3 of the data set as training & 1/3 as testing
X_train, X_test, y_train, y_test = train_test_split(X_phonemes, y_phonemes, test_size=0.33)

In [16]:
# print data & label set dimensionality for verification
print("Phoneme Training Data: {}".format(X_train.shape))
print("Phoneme Training Labels: {}".format(y_train.shape))
print("Phoneme Testing Data: {}".format(X_test.shape))
print("Phoneme Testing Labels: {}".format(y_test.shape))

Phoneme Training Data: (3020, 256)
Phoneme Training Labels: (3020,)
Phoneme Testing Data: (1488, 256)
Phoneme Testing Labels: (1488,)


In [17]:
print(X_train[0:4])

[[11.702960000000001 14.042620000000001 16.26874 ... 5.6684
  7.952719999999999 7.72456]
 [8.10171 15.73148 15.967529999999998 ... 3.94521 6.4263900000000005
  5.12735]
 [13.88137 16.639570000000003 17.941570000000002 ... 10.183689999999999
  9.45748 6.642580000000001]
 [12.787 10.38592 10.02926 ... 15.405170000000002 14.671560000000001
  14.60944]]


In [18]:
print(X_test[0:4])

[[10.79055 10.34958 11.25291 ... 5.7674900000000004 5.99896
  7.174919999999999]
 [13.51566 9.73408 16.74912 ... 10.370510000000001 10.78328
  11.339830000000001]
 [12.40045 15.39192 18.76651 ... 8.49279 9.160210000000001 9.4957]
 [12.42842 12.4803 17.085810000000002 ... 7.233639999999999 9.25057
  4.22591]]


In [19]:
print(y_train[0:4])

['ao' 'ao' 'aa' 'sh']


In [20]:
print(y_test[0:4])

['dcl' 'iy' 'aa' 'iy']


#### Convert the Phoneme Classifiers from Strings to Numbers

In [21]:
def convert_string_class_to_int_class(y):
    for i in range(len(y)):
    
        if y[i] == 'aa':
            y[i] = [0]
        elif y[i] == 'ao':
            y[i] = [1]
        elif y[i] == 'dcl':
            y[i] = [2]
        elif y[i] == 'iy':
            y[i] = [3]
        elif y[i] == 'sh':
            y[i] = [4]
            
    return y

In [22]:
y_int_train = convert_string_class_to_int_class(y_train)
y_int_test = convert_string_class_to_int_class(y_test)

#### Standardize Data to Obtain Similar Inputs & Weight Magnitudes

In [23]:
# set axis to 1 to standardize by sample/vector, rather than by feature 
X_train = preprocessing.scale(X_train, axis=1)
X_test = preprocessing.scale(X_test, axis=1)



#### After preprocessing the data matrices, add in their corresponding labels

In [24]:
Xy_train = zip(X_train, y_train)

In [25]:
len(X_train)

3020

In [26]:
len(y_train)

3020

In [30]:
y_train

array([list([1]), list([1]), list([0]), ..., list([3]), list([3]),
       list([2])], dtype=object)

In [27]:
nn = NeuralNetwork(len(X_train), 10, len(y_train))

In [29]:
for i in range(10000):
    training_inputs, training_outputs = random.choice(Xy_train)
    print("Index of training data: {}".format(i))
    print("Value training outputs: {}".format(training_outputs))
    print("Value training inputs: {} \n".format(training_inputs))
    nn.train(training_inputs, training_outputs) # CAN'T GRAB 2nd ELEMENT in t_o
    print(i, nn.calculate_total_error(Xy_train))

Index of training data: 0
Value training outputs: [2]
Value training inputs: [ 2.13944358  2.20378961  3.75827809  3.54194155  2.93383709  4.15060479
  4.28956833  2.99426625  3.54235789  3.62781112  2.51959242  1.22994925
  0.24503212  0.27828432  0.65431361  0.82057463 -0.08280065  1.28999495
  1.34962431  0.69211265 -0.47025182  0.57990975  0.8220099   1.1865406
  1.16878601  0.68540743 -0.79128623  0.72400628  0.19097948 -0.38067356
 -0.26268575  0.71417853 -0.23900383  0.51720168  0.48853467  0.18762687
 -0.47324287  0.14436614  0.59682619 -1.38507637  0.36699156  0.3354814
 -0.65281573 -2.3849159  -0.21261571  0.28711506 -0.50984769 -1.93163961
 -0.09469913  0.57529717  1.00439303  0.9284279  -0.57920072  0.20226989
 -0.32548147 -0.89556777 -0.25793622  0.73836992  0.41672744 -0.63752629
 -0.29682541  0.01125324 -0.12918391 -0.15464622 -0.04288704 -0.60641603
  0.34397249  0.29855338 -1.35770219  0.02354067  0.60072661  0.38290551
  0.35703233  0.48243752  0.56934245  0.14369233 

IndexError: list index out of range