# * Kossi Neroma
# * Marin Bouthemy

<h1 style="text-align: center;">Non-Convex Finite-Sum Optimization Via SCSG Methods</h1>

>**The paper** : [Non-Convex Finite-Sum Optimization Via SCSG Methods](https://arxiv.org/abs/1706.09156)

>Here, we will be applying [SCSG]((https://arxiv.org/abs/1706.09156)) to convex/non convex optimization problems, mainly a neural network loss function. <br> <br> 
The first example, even if modelised as a neural network is just a logistic regression (two layer perceptron) which lost function is clearly convex.<br> <br>
The second one concerns a deeper neural network and here the cost function is no more convex. Can our algorithm find the  optimal solution and thus make us get the best ever accuracy ? Let's  see ...

In [17]:
import tensorflow as tf, numpy as np

from models import SGD, SCSG # two objects that implements, respectively, the Stochastic Dradient Descent algorithm (SGD)
                            # and the stochastically controlled stochastic gradient (SCSG)
import importlib, models
importlib.reload(models)
from models import SGD, SCSG

# 1. Laod the data
> Our toy dataset would be the **mnist** one. This is a classical database of *70 000 handwritten digits*. Let's recall that this dataset is natively avaialable in tensorflow, our main library for gradient computing (**automatic differenciation**) and neural network modeling.

In [14]:
# mnist = tf.keras.datasets.mnist

# (x_train, y_train),(x_test, y_test) = mnist.load_data()
# x_train, x_test = x_train / 255.0, x_test / 255.0

# def one_hot(x):
#     v = np.zeros((len(x), x.max() +1))
#     v[np.arange(len(x)), x] = 1
#     return v


# y_train, y_test = one_hot(y_train), one_hot(y_test)
# x_train, x_test = x_train.reshape((-1, 28*28)), x_test.reshape((-1, 28*28))

# x_train.shape, x_test.shape, y_train.shape, y_test.shape

((60000, 784), (10000, 784), (60000, 10), (10000, 10))

In [15]:
model = SGD(0.5, 50, 10)
model.archi = [28*28, 256, 128, 10]

model.fit(x_train, y_train, x_test, y_test)

Epoch: 1 cost = 0.31529 acuracy:  0.9604
Epoch: 2 cost = 0.10463 acuracy:  0.9629
Epoch: 3 cost = 0.07484 acuracy:  0.96416664
Epoch: 4 cost = 0.05787 acuracy:  0.9669
Epoch: 5 cost = 0.04567 acuracy:  0.96666
Epoch: 6 cost = 0.03752 acuracy:  0.9676167
Epoch: 7 cost = 0.02949 acuracy:  0.9675429
Epoch: 8 cost = 0.02593 acuracy:  0.96895
Epoch: 9 cost = 0.02545 acuracy:  0.97003335
Epoch: 10 cost = 0.02046 acuracy:  0.97071


In [18]:
model = SCSG(0.5, 50, 10)
model.archi = [28*28, 256, 128, 10]

model.fit(x_train, y_train, x_test, y_test,eta = 0.5,  B = 50, b = 50)

Epoch: 1 cost: 0.24301 acuracy:  0.1028
Epoch: 2 cost: 0.36431 acuracy:  0.1004
Epoch: 3 cost: 0.09005 acuracy:  0.1207
Epoch: 4 cost: 0.05360 acuracy:  0.143275
Epoch: 5 cost: 0.04405 acuracy:  0.13422
Epoch: 6 cost: 0.03613 acuracy:  0.14115
Epoch: 7 cost: 0.03053 acuracy:  0.13802858
Epoch: 8 cost: 0.02518 acuracy:  0.1489125
Epoch: 9 cost: 0.02324 acuracy:  0.17042223
Epoch: 10 cost: 0.01870 acuracy:  0.18746
