## Optimization of NN code

### ToDo-Steps
Problem statement
original code
what we chose based on profiling

Implementation
parallel computing impl
gpu acceleration
compiling or cython

Results & analysis
profiling serial vs parallel
performance
memory?
improvements

Critical analysis
bottlenecks discussed
how we optimized
future improvements

### Profiling
1. We firstly analyze the whole code using cProfile and observe that the scipy.optimize call uses the majority of time, with the gradient function of the given code running second (since that we will look further into main and the gradient function)
We set the param maxiter=50 here to profile in a reasonable amout of walltime, thus the ncalls here for the optimizer equals 50.
command: python -m cProfile -s cumulative .\artificialneuralnetwork.py

```
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    673/1    0.010    0.000   32.086   32.086 {built-in method builtins.exec}
        1    0.003    0.003   32.086   32.086 artificialneuralnetwork.py:1(<module>)
        1    0.178    0.178   30.336   30.336 artificialneuralnetwork.py:223(main)
        1    0.000    0.000   27.184   27.184 _optimize.py:1512(fmin_cg)
        1    0.002    0.002   27.183   27.183 _optimize.py:1695(_minimize_cg)
       50    0.002    0.000   24.471    0.489 _optimize.py:1139(_line_search_wolfe12)
       50    0.004    0.000   24.464    0.489 _linesearch.py:37(line_search_wolfe1)
       50    0.001    0.000   24.460    0.489 _linesearch.py:100(scalar_search_wolfe1)
       50    0.002    0.000   24.459    0.489 _dcsrch.py:201(__call__)
       97    0.001    0.000   22.751    0.235 _differentiable_functions.py:303(_update_grad)
       96    0.002    0.000   22.751    0.237 _differentiable_functions.py:39(wrapped)
       96   11.625    0.121   22.748    0.237 artificialneuralnetwork.py:100(gradient)
       95    0.006    0.000   22.572    0.238 _linesearch.py:86(derphi)
       96    0.001    0.000   22.566    0.235 _differentiable_functions.py:329(grad)
   518400    2.474    0.000    5.423    0.000 shape_base.py:220(vstack)
       81    0.005    0.000    3.552    0.044 __init__.py:1(<module>)
   778202    3.453    0.000    3.453    0.000 artificialneuralnetwork.py:19(g)
      998    3.046    0.003    3.054    0.003 shape_base.py:295(hstack)
        2    1.393    0.696    2.904    1.452 _npyio_impl.py:1747(genfromtxt)
      198    0.441    0.002    2.713    0.014 artificialneuralnetwork.py:61(cost_function)
   518401    1.452    0.000    2.619    0.000 shape_base.py:80(atleast_2d)
       50    0.000    0.000    2.500    0.050 _util.py:1058(_call_callback_maybe_halt)
       50    0.001    0.000    2.500    0.050 _optimize.py:105(wrapped_callback)
       50    0.366    0.007    2.499    0.050 artificialneuralnetwork.py:165(callbackF)
       97    0.001    0.000    1.884    0.019 _differentiable_functions.py:293(_update_fun)
       96    0.267    0.003    1.883    0.020 _differentiable_functions.py:16(wrapped)
       95    0.003    0.000    1.879    0.020 _linesearch.py:82(phi)
       96    0.001    0.000    1.876    0.020 _differentiable_functions.py:323(fun)

```

2.looking further into the code using the line_profiler shows following output for maxiter=50 (in the artificalneuralnetwork.py.lprof file):
We outcomment any plt calls since they are only for plotting purposes and not part of the actual NN calculation
We observe that the optimize function is only called once here, but the cProfiler shows, that it runs under the hood until it converges or maxiter has been reached!

```
Timer unit: 1e-06 s

Total time: 22.9732 s
File: .\artificialneuralnetwork.py
Function: gradient at line 95

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    95                                           @profile
    96                                           def gradient(theta, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda):
    97                                                  """ Neural net cost function gradient for a three layer classification network.
    98                                                  Input:
    99                                                    theta               flattened vector of neural net model parameters
   100                                                    input_layer_size    size of input layer
   101                                                    hidden_layer_size   size of hidden layer
   102                                                    num_labels          number of labels
   103                                                    X                   matrix of training data
   104                                                    y                   vector of training labels
   105                                                    lmbda               regularization term
   106                                                  Output:
   107                                                    grad                flattened vector of derivatives of the neural network
   108                                                  """
   109
   110                                                  # unflatten theta
   111        96       1276.3     13.3      0.0         Theta1, Theta2 = reshape(theta, input_layer_size, hidden_layer_size, num_labels)
   112
   113                                                  # number of training values
   114        96        111.7      1.2      0.0         m = len(y)
   115
   116                                                  # Backpropagation: calculate the gradients Theta1_grad and Theta2_grad:
   117
   118        96        698.7      7.3      0.0         Delta1 = np.zeros((hidden_layer_size,input_layer_size+1))
   119        96        137.2      1.4      0.0         Delta2 = np.zeros((num_labels,hidden_layer_size+1))
   120
   121    259296     129343.6      0.5      0.6         for t in range(m):
   122
   123                                                          # forward
   124    259200     329572.6      1.3      1.4                 a1 = X[t,:].reshape((input_layer_size,1))
   125    259200    4276086.8     16.5     18.6                 a1 = np.vstack((1, a1))   #  +bias
   126    259200    1065144.2      4.1      4.6                 z2 = Theta1 @ a1
   127    259200    1361719.6      5.3      5.9                 a2 = g(z2)
   128    259200    3857658.1     14.9     16.8                 a2 = np.vstack((1, a2))   #  +bias
   129    259200    1826285.1      7.0      7.9                 a3 = g(Theta2 @ a2)
   130
   131                                                          # compute error for layer 3
   132    259200     241024.6      0.9      1.0                 y_k = np.zeros((num_labels,1))
   133    259200     721563.8      2.8      3.1                 y_k[y[t,0].astype(int)] = 1
   134    259200     324440.5      1.3      1.4                 delta3 = a3 - y_k
   135    259200     799152.8      3.1      3.5                 Delta2 += (delta3 @ a2.T)
   136
   137                                                          # compute error for layer 2
   138    259200    2878449.2     11.1     12.5                 delta2 = (Theta2[:,1:].T @ delta3) * grad_g(z2)
   139    259200    5151844.1     19.9     22.4                 Delta1 += (delta2 @ a1.T)
   140
   141        96       1159.4     12.1      0.0         Theta1_grad = Delta1 / m
   142        96        238.3      2.5      0.0         Theta2_grad = Delta2 / m
   143
   144                                                  # add regularization
   145        96       4407.2     45.9      0.0         Theta1_grad[:,1:] += (lmbda/m) * Theta1[:,1:]
   146        96       1066.1     11.1      0.0         Theta2_grad[:,1:] += (lmbda/m) * Theta2[:,1:]
   147
   148                                                  # flatten gradients
   149        96       1483.2     15.5      0.0         grad = np.concatenate((Theta1_grad.flatten(), Theta2_grad.flatten()))
   150
   151        96        299.5      3.1      0.0         return grad

Total time: 97.0647 s
File: .\artificialneuralnetwork.py
Function: main at line 218

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   218                                           @profile
   219                                           def main():
   220                                                  """ Artificial Neural Network for classifying galaxies """
   221
   222                                                  # set the random number generator seed
   223         1         31.8     31.8      0.0         np.random.seed(917)
   224
   225                                                  # Load the training and test datasets
   226         1   54975339.3    5e+07     56.6         train = np.genfromtxt('train.csv', delimiter=',')
   227         1   13901090.6    1e+07     14.3         test = np.genfromtxt('test.csv', delimiter=',')
   228
   229                                                  # get labels (0=Elliptical, 1=Spiral, 2=Irregular)
   230         1         17.8     17.8      0.0         train_label = train[:,0].reshape(len(train),1)
   231         1          3.5      3.5      0.0         test_label = test[:,0].reshape(len(test),1)
   232
   233                                                  # normalize image data to [0,1]
   234         1       8045.8   8045.8      0.0         train = train[:,1:] / 255.
   235         1       2432.4   2432.4      0.0         test = test[:,1:] / 255.
   236
   237                                                  # Construct our data matrix X (2700 x 5000)
   238         1          1.5      1.5      0.0         X = train
   239
   240                                                  # Construct our label vector y (2700 x 1)
   241         1          0.5      0.5      0.0         y = train_label
   242
   243                                                  # Two layer Neural Network parameters:
   244         1         23.3     23.3      0.0         m = np.shape(X)[0]
   245         1          4.2      4.2      0.0         input_layer_size = np.shape(X)[1]
   246         1          0.4      0.4      0.0         hidden_layer_size = 8
   247         1          0.4      0.4      0.0         num_labels = 3
   248         1          0.9      0.9      0.0         lmbda = 1.0    # regularization parameter
   249
   250                                                  # Initialize random weights:
   251         1        150.5    150.5      0.0         Theta1 = np.random.rand(hidden_layer_size, input_layer_size+1) * 0.4 - 0.2
   252         1         10.0     10.0      0.0         Theta2 = np.random.rand(num_labels, hidden_layer_size+1) * 0.4 - 0.2
   253
   254                                                  # flattened initial guess
   255         1         59.5     59.5      0.0         theta0 = np.concatenate((Theta1.flatten(), Theta2.flatten()))
   256         1      15086.8  15086.8      0.0         J = cost_function(theta0, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda)
   257         1        181.0    181.0      0.0         print('initial cost function J =', J)
   258         1      19496.0  19496.0      0.0         train_pred = predict(Theta1, Theta2, train)
   259         1        483.8    483.8      0.0         print('initial accuracy on training set =', np.sum(1.*(train_pred==train_label))/len(train_label))
   260                                                  global Js_train
   261                                                  global Js_test
   262         1         10.2     10.2      0.0         Js_train = np.array([J])
   263         1       3895.9   3895.9      0.0         J_test = cost_function(theta0, input_layer_size, hidden_layer_size, num_labels, test, test_label, lmbda)
   264         1          5.5      5.5      0.0         Js_test = np.array([J_test])
   265
   266                                                  # prep figure
   267                                                  # fig = plt.figure(figsize=(6,6), dpi=80)
   268
   269                                                  # Minimize the cost function using a nonlinear conjugate gradient algorithm
   270         1          0.7      0.7      0.0         args = (input_layer_size, hidden_layer_size, num_labels, X, y, lmbda)  # parameter values
   271         1          3.0      3.0      0.0         cbf = partial(callbackF, input_layer_size, hidden_layer_size, num_labels, X, y, lmbda, test, test_label)
   272         1   28119811.0    3e+07     29.0         theta = optimize.fmin_cg(cost_function, theta0, fprime=gradient, args=args, callback=cbf, maxiter=50)
   273
   274                                                  # unflatten theta
   275         1         11.6     11.6      0.0         Theta1, Theta2 = reshape(theta_best, input_layer_size, hidden_layer_size, num_labels)
   276
   277                                                  # Make predictions for the training and test sets
   278         1      13722.0  13722.0      0.0         train_pred = predict(Theta1, Theta2, train)
   279         1       4037.0   4037.0      0.0         test_pred = predict(Theta1, Theta2, test)
   280
   281                                                  # Print accuracy of predictions
   282         1        404.6    404.6      0.0         print('accuracy on training set =', np.sum(1.*(train_pred==train_label))/len(train_label))
   283         1        304.7    304.7      0.0         print('accuracy on test set =', np.sum(1.*(test_pred==test_label))/len(test_label))
   284
   285                                                  # Save figure
   286                                                  # plt.savefig('artificialneuralnetwork.png',dpi=240)
   287                                                  # plt.show()
```
### 2. memory profiling using memory_profiler for n=50 and n=500, outputs in /01_profiling
command: python -m mprof run artificialneuralnetwork.py


3. based on the outputs we choose to optimize the input parameters of the scipy.optimize function -> optimize.fmin_cg(cost_function, theta0, fprime=gradient, args=args, callback=cbf, maxiter=50)
