# PS1: Your first library-free neural network!  

Advanced Learning 2024


For SUBMISSION:   

Please upload the complete and executed `ipynb` to your git repository. Verify that all of your output can be viewed directly from github, and provide a link to that git file below.

~~~
STUDENT ID: 208066118  
~~~

~~~
STUDENT GIT LINK: https://github.com/atarporat/Adv.-computational-learning-and-data-analysis---52025
~~~
In Addition, don't forget to add your ID to the files:    
  
`PS1_Part2_HelloNN_2024_ID_[000000000].html`   


In [None]:
import numpy as np # You are allowed to use  only numpy.
import time


**Welcome**.   

In this part of the problem set you are set to build a complete and flexible neural network.  
This neural network will be library free (in the sense that we won't use PyTorch/Tensorflow/etc.).   

Let's do a quick review of the basic neural-network components:  


*   *Layer* - can be fully connected (dense/hidden), convolution, etc.
  * Forward propagation- the layer outputs the next layer's input
  * Backward propagation- the layer also outputs the gradient descent update
*   *Activation* Layer (e.g. ReLU) - there are no parameters, only gradients with respect to the input. We want to compute both the gradient w.r.t the parameters of the layer and to create the gradient with respect to the layer's inputs
   * *Forward propagation*- the layer outputs the next layer's input
   * *Backward propagation*- the layer also outputs the gradient descent update
*   *Loss Function* : how our model  quantifies the difference between the predicted outputs the actual (target) values  
*   *Network Wrapper*-  wraps our components together as a trainable model.






Useful resource:  
* Gradient descent for neural networks [cheat sheet](https://moodle4.cs.huji.ac.il/hu23/mod/resource/view.php?id=402297).
* Neural network architecture [cheat sheet](https://moodle4.cs.huji.ac.il/hu23/mod/url/view.php?id=402298).

### 0. Loading data

You are going to test and evaluate your home-made network on the `mnist` dataset.   
The MNIST dataset is a large dataset of handwritten digits that is commonly used for training various image and vision models.

In [None]:
!pip install keras



In [None]:
!pip install tensorflow


Collecting tensorflow
  Downloading tensorflow-2.18.0-cp312-cp312-macosx_12_0_arm64.whl.metadata (4.0 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-24.3.25-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting google-pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-1-py2.py3-none-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting opt-einsum>=2.3.2 (from tensorflow)
  Downloading opt_einsum-3.4.0-py3-none-any.whl.metadata (6.3 kB)
Collecting termcolor>=1.1.0 (from tensorflow)
  Downloading termcolor-2.5.0-py3-none-any.whl.metadata (6.1 kB)
Collecting grpcio<2.0,>=1.24.3 (from tensorflow)
  

In [None]:

from keras.datasets import mnist
from keras.utils import to_categorical
# load MNIST from server
# Using a standard library (keras.datasets) to load the mnist data
(x_train, y_train), (x_test, y_test) = mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


#### Data transformations





In [None]:
# training data : 60000 samples
# reshape and normalize input data
x_train = x_train.reshape(x_train.shape[0], 1, 28*28)
x_train = x_train.astype('float32')
x_train /= 255
# One-hot encoding of the output.
# Currently a number in range [0,9]; Change into a vector of size 10
# e.g. number 3 will become [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
y_train = to_categorical(y_train)
# same for test data : 10000 samples
x_test = x_test.reshape(x_test.shape[0], 1, 28*28)
x_test = x_test.astype('float32')
x_test /= 255
y_test = to_categorical(y_test)

### 1. Network's Components

Please fill-in the missing code in the code boxes below (only where  `#### SOLUTION REQUIRED ####` is specified).   

In [None]:

# This class is a general layer primitive, defining that each instance must
# have an (input,output) parameters, and 2 functions: forward+backward propogation
class Layer_Primitive:
    def __init__(self):
        self.input = None
        self.output = None

    # computes the output Y of a layer for a given input X
    def forward_propagation(self, input):
        raise NotImplementedError

    # computes dE/dX for a given dE/dY (and update parameters if any)
    def backward_propagation(self, output_error, learning_rate):
        raise NotImplementedError

#### Fully Connected Layer

A fully-connected layer (a.k.a. affine, dense,linear layer) connects every input neuron to every output neuron.   
It has 2 parameters: (input, output).   
You need to define (code) the following:
* its initialization weights with random weights.
* the forward propogation calculation (as shown in class).
* the backward propogation gradients calculation (given output, as shown in class).

Parameters must be intitialized with some values. There are many ways to initialize the weights, and you are encouraged to do a quick research about the common methods. Any commonly used method will be accepted.  

1.1 (20 pts)

In [None]:
#### SOLUTION REQUIRED ####
# inherit from base class Layer
class Affine_Layer(Layer_Primitive):
    # input_size = number of input neurons
    # output_size = number of output neurons
    def __init__(self, input_size, output_size):
        self.weights = np.random.randn(input_size, output_size) * 0.01
        self.bias =  np.random.randn(1, output_size)

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    # returns output for a given input
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = np.dot(self.input, self.weights) + self.bias  # Linear combination
        return self.output

    # computes dE/dW, dE/dB for a given output_error=dE/dY. Returns input_error=dE/dX.
    def backward_propagation(self, output_grad, learning_rate):
        input_error = np.dot(output_grad, self.weights.T) #dE/dX = dE/dY * dY/dX
        bias_error = output_grad.sum(axis=0, keepdims=True)  # dE/dB = sum(dE/dY)
        weights_error = self.input.T @ output_grad  # dE/dW = X^T * dE/dY

        # update parameters
        self.weights -= learning_rate * weights_error
        self.bias -= learning_rate * bias_error
        return input_error



In [None]:
# Tests

# dimensions
layer = Affine_Layer(input_size=4, output_size=2)
assert layer.forward_propagation(np.zeros(4)).shape == (2,)
assert layer.forward_propagation(np.zeros(shape=(10,4))).shape == (10,2)
layer.weights = np.zeros((4,2))
layer.bias = np.zeros(2)
assert (layer.forward_propagation(np.zeros(shape=(10,4))) == np.zeros((10,2))).all()

# d/dx ((Ax + b - y) ^ 2)
# z = Ax + b
# d/dz((Ax + b - y) ^ 2) = d/dz((z-y)^2) = 2(z-y)
layer = Affine_Layer(input_size=4, output_size=2)

x = np.array(((1,2,3,4),))
y = np.array((2,3))
z = layer.forward_propagation(x)
output_grad = 2 * (z - y)

print(f"{layer.weights.shape=}")
print(f"{output_grad.shape=}")
print(f"{x.shape=}")

print(layer.backward_propagation(output_grad, learning_rate=0))

epsilon = np.array(((0.001,0,0,0)))
norm1 = np.linalg.norm(layer.forward_propagation(x) - y)
norm2 =  np.linalg.norm(layer.forward_propagation(x + epsilon) - y)

print((norm2 - norm1) / 0.001)



layer.weights.shape=(4, 2)
output_grad.shape=(1, 2)
x.shape=(1, 4)
[[ 8.73375033 33.67817073 -9.30931243  6.53762313]]
0.5365056374753152


#### Activation layers

Activation functions are often a non-linear functions that aid in how well the network model adapts to and learns  the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.  



In [None]:
# inherit from base class Layer
class ActivationLayer(Layer_Primitive):
    def __init__(self, activation, activation_grad):
        self.activation = activation
        self.activation_grad = activation_grad

    # returns the activated input
    def forward_propagation(self, input_data):
        self.input = input_data
        self.output = self.activation(self.input)
        return self.output

    # Returns input_error=dE/dX for a given output_grad=dE/dY.
    # learning_rate is not used because there is no "learnable" parameters.
    def backward_propagation(self, output_grad, learning_rate):
        return self.activation_grad(self.input) * output_grad



You need to define (code) the following via different functions:
* the forward propogation calculation (as shown in class).
* the backward propogation gradients calculation (given output, as shown in class).

1.2 (20 pts)

In [None]:
#### SOLUTION REQUIRED ####

# activation functions and their derivatives:

def tanh(x):
    return np.tanh(x)

def tanh_grad(x):
    return 1 - np.tanh(x)**2

def relu(x):
    return np.maximum(0, x)

def relu_grad(x):
    return np.where(x > 0, 1, 0)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):
    s = sigmoid(x)
    return s * (1 - s)

def softmax(x):
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / np.sum(e_x, axis=1, keepdims=True)

def grad_softmax(x):
  sm = softmax(x)
  return sm * (1 - sm)

#### Loss function

1.3 (10 pts)

In [None]:
#### SOLUTION REQUIRED ####

# loss function and its derivative

def mse(y_true, y_pred):
    return np.mean((y_true - y_pred)**2)

def mse_grad(y_true, y_pred):
    return 2 * (y_pred - y_true) / y_true.shape[0]

#### Putting everything together

1.4 (10 pts)

In [None]:
#### SOLUTION REQUIRED (in `predict`) ####

class MyNetwork:
    def __init__(self):
        self.layers = []
        self.loss = None
        self.loss_grad = None

    # add layer to network
    def add(self, layer):
        self.layers.append(layer)

    # set loss to use
    def use_loss(self, loss, loss_grad):
        self.loss = loss
        self.loss_grad = loss_grad


    # train the network
    def fit(self, x_train, y_train, epochs, learning_rate):
        # sample dimension first
        samples = len(x_train)

        # training loop
        for i in range(epochs):
            err = 0
            for j in range(samples):
                # forward propagation
                output = x_train[j]
                for layer in self.layers:
                    output = layer.forward_propagation(output)

                # compute loss (for display purpose only)
                err += self.loss(y_train[j], output)

                # backward propagation
                grad = self.loss_grad(y_train[j], output)
                for layer in reversed(self.layers):
                    grad = layer.backward_propagation(grad, learning_rate)

            # calculate average error on all samples
            err /= samples
            print('Training epoch %d/%d   error=%f' % (i+1, epochs, err))


    # predict output for given input
    def predict(self, x_test,y_test=np.array([])):
        if y_test.size:
           assert len(x_test)==len(y_test) # if Y is given
        # sample dimension first
        samples = len(x_test)
        result = []
        loss = 0
        correct = 0
        # run network over all samples
        for i in range(samples):
            # forward propagation
            output = x_test[i]
            for layer in self.layers:
                output = layer.forward_propagation(output)
            result.append(output)
            # ONLY IF LABELS ARE GIVEN (Y):
            if y_test.size:
                # Evaluate the output against Y,
                # calculate loss against Y, add to `loss`:
                loss += self.loss(y_test[i], output)
                target = y_test[i]
                # Evaluate the label of the output against real, and if identical,
                # add +1 to `correct`:
                if np.argmax(output) == np.argmax(target): # if the model's predicted class is the same as the true class
                   correct += 1
        if y_test.size:
            mean_loss = loss/samples

            print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.
                  format(mean_loss, correct, samples,100. * correct / samples))

        return result


## 2. Testing Your Neural Network

### Defining our main neural network architecture

Define your network's architecture:  
(Please rationalize your choice of activation funciton.)
* first affine layer that takes your input and outputs 128 nodes
* `tanh/relu/sigmoid` activation layer following the first affine layer
* second affine layer that takes the first layer's input and outputs 64 nodes
* `tanh/relu/sigmoid` activation layer following the second affine layer
* third affine layer that takes your second layer's input and outputs nodes in the size of the Y labels.
* `tanh/relu/sigmoid` activation layer following the last affine layer


2.1 (5 pts)

In [None]:
#### SOLUTION REQUIRED (in `predict`) ####

# Network Architecture
net = MyNetwork()
net.add(Affine_Layer(input_size=28*28, output_size=128))
net.add(ActivationLayer(relu, relu_grad))
net.add(Affine_Layer(input_size=128, output_size=64))
net.add(ActivationLayer(relu, relu_grad))
net.add(Affine_Layer(input_size=64, output_size=10)) #the MNIST dataset contains 10 classes, representing digits from 0 to 9
net.add(ActivationLayer(sigmoid, sigmoid_grad))


### Training!

In [None]:

# While developing, it is recommended to train your model on a subset of the data... / or low epochs.
# as we didn't implemented mini-batch GD, training will be pretty slow if we update at each iteration on 60000 samples...
net.use_loss(mse, mse_grad)
epoch_num = 20
lr = 0.01
t1 = time.time()
net.fit(x_train, y_train, epochs=epoch_num, learning_rate=lr)

print(f"Total process time: {round(time.time() - t1,3)}")


Training epoch 1/20   error=0.090148
Training epoch 2/20   error=0.081068
Training epoch 3/20   error=0.059314
Training epoch 4/20   error=0.035949
Training epoch 5/20   error=0.023287
Training epoch 6/20   error=0.018422
Training epoch 7/20   error=0.015877
Training epoch 8/20   error=0.013913
Training epoch 9/20   error=0.012355
Training epoch 10/20   error=0.011132
Training epoch 11/20   error=0.010146
Training epoch 12/20   error=0.009311
Training epoch 13/20   error=0.008598
Training epoch 14/20   error=0.007984
Training epoch 15/20   error=0.007454
Training epoch 16/20   error=0.007001
Training epoch 17/20   error=0.006605
Training epoch 18/20   error=0.006259
Training epoch 19/20   error=0.005954
Training epoch 20/20   error=0.005678
Total process time: 5624.806


### Evaluation

Exciting! Now is the time to test your model.   

    May the gradients be always in your favor.

In [None]:
output = net.predict(x_test ,y_test )


Test set: Avg. loss: 0.0065, Accuracy: 9614/10000 (96%)



## 3. Benchmarking against PyTorch

How well your model performs against a similar-architecture PyTorch model?   
It is time to find out:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset

#### Prepare the data as tensors using PyTorch DataLoader:

In [None]:
t_train =  TensorDataset(torch.Tensor(x_train),torch.Tensor(y_train))
t_test =  TensorDataset(torch.Tensor(x_test),torch.Tensor(y_test))
train_loader = torch.utils.data.DataLoader(dataset=t_train, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=t_test, batch_size=64, shuffle=False)

Define a `PyTorchNet` class with an identical architecture you used in your home-made network.

3.1 (10 pts)

In [None]:
#### SOLUTION REQUIRED  ####

class PyTorchNet(nn.Module):
    def __init__(self):
        super(PyTorchNet, self).__init__()
        input_size = x_train.shape[2]
        num_classes = y_test.shape[1]

        self.fc1 = nn.Linear(input_size, 128)  # First fully connected layer
        self.activ1 = nn.ReLU()  # First ReLU activation
        self.fc2 = nn.Linear(128, 64)  # Second fully connected layer
        self.activ2 = nn.ReLU()  # Second ReLU activation
        self.fc3 = nn.Linear(64, num_classes)  # Third fully connected layer
        self.activ3 = nn.ReLU()  # Third ReLU activation

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = self.fc1(x)
        x = self.activ1(x)
        x = self.fc2(x)
        x = self.activ2(x)
        x = self.fc3(x)
        x = self.activ3(x)
        return x

In [None]:

# Train the model
num_epochs = 20
pt_learning_rate = 0.01
pt_network = PyTorchNet()
optimizer = torch.optim.Adam(pt_network.parameters(), lr=pt_learning_rate)
criterion = nn.MSELoss()

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Forward pass
        outputs = pt_network(images)
        loss = criterion(outputs, labels)
        # Backward pass and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # A handy printout:
        if (i + 1) % 500 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')


Epoch [1/20], Step [500/938], Loss: 0.0123
Epoch [2/20], Step [500/938], Loss: 0.0131
Epoch [3/20], Step [500/938], Loss: 0.0113
Epoch [4/20], Step [500/938], Loss: 0.0234
Epoch [5/20], Step [500/938], Loss: 0.0149
Epoch [6/20], Step [500/938], Loss: 0.0193
Epoch [7/20], Step [500/938], Loss: 0.0120
Epoch [8/20], Step [500/938], Loss: 0.0173
Epoch [9/20], Step [500/938], Loss: 0.0154
Epoch [10/20], Step [500/938], Loss: 0.0220
Epoch [11/20], Step [500/938], Loss: 0.0175
Epoch [12/20], Step [500/938], Loss: 0.0224
Epoch [13/20], Step [500/938], Loss: 0.0175
Epoch [14/20], Step [500/938], Loss: 0.0187
Epoch [15/20], Step [500/938], Loss: 0.0206
Epoch [16/20], Step [500/938], Loss: 0.0214
Epoch [17/20], Step [500/938], Loss: 0.0215
Epoch [18/20], Step [500/938], Loss: 0.0162
Epoch [19/20], Step [500/938], Loss: 0.0104
Epoch [20/20], Step [500/938], Loss: 0.0122


Evaluation:

In [None]:
pt_network.eval()
test_losses = []
test_loss = 0
correct = 0
with torch.no_grad():
    for data, target in test_loader:
        output = pt_network(data)
        test_loss += criterion(output, target,)
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.max(1,keepdim=True)[1]).sum()

test_loss /= len(test_loader.dataset)
test_losses.append(test_loss)
print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
  test_loss, correct, len(test_loader.dataset),
  100. * correct / len(test_loader.dataset)))


Test set: Avg. loss: 0.0003, Accuracy: 8618/10000 (86%)



3.2 (10 pts)

Time for some questions:
1. Which one of the models performed better? Why?
2. Which one of the models performed faster? Why?  
3. What would you change in your network's architecture?   
4. What would you change in your model's solution algorithm?

Write your solutions here:

1. PyTorch performed worse with 86% accuracy compared with 96% of the frist model. Initialization of Weights could be one of the causes of difference:
The way weights are initialized can impact convergence and final accuracy. PyTorch uses kaiming or xavier initialization by default. My model ran on all samples which is why it took it over 5000s so there is probably overfitting if it is 96% accurate and had to run on all samples.
2. PyTorch was faster. PyTorch's underlying implementation is optimized with CUDA (for GPU acceleration) and highly efficient tensor operations, significantly speeding up computation compared to custom implementations like the first model. Also my model ran on all samples.
3. Add more layers or neurons in each layer to allow the network to learn more complex features.
4. Replace basic optimizers like SGD with advanced optimizers like Adam for faster convergence.


## 4. The Network Wars!

Here is your chance to play with your model's architecture in order to break your own benchmark set eariler.  
You can add/remove layers, play with their sizes, types, etc.   
You can add a new loss if you wish, or anything else that will fairly give your model an advantage over base.  

4.1 (15 pts)

I made the following changes to the layers:
1. Reduced Layer Sizes: Smaller architecture with layers 64 → 32 → 10 for faster computation
2. Added a Softmax in the Output Layer
3. Added data shuffling at the beginning of each epoch to ensure Mini-batches contain varied samples across epochs and better generalization to prevent overfitting
4. Changed the loss function: Replaced MSE with Cross-Entropy Loss.
5. The fit method now uses mini-batches (fixed size of 64 samples) instead of using it on all the 60,000 samples in one time

What eneded up happening is that there were not enough batches or not enough neurons because the model came to a local minumum and did not learn from there, it remained stuck there. It did take considerably less time 460 seconds but the accuracy rate was 11% which is almost as if the model did not learn. 
I did not break the benchmark from ealier because in my previous attempts it took too long and I think the point of the exercise it to play with the network and see its effects so I left the model as is (despite previous attempts which did improve the model).




In [None]:
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))  # Subtract max for numerical stability
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

def softmax_grad(x):
    # The gradient of softmax is usually handled alongside cross-entropy loss.
    # When combined with cross-entropy, the gradient simplifies to:
    return softmax(x)  # Directly use softmax probabilities


In [None]:
# Updated Network Architecture
net = MyNetwork()
net.add(Affine_Layer(input_size=28 * 28, output_size=64))  # Reduced size for faster computation
net.add(ActivationLayer(relu, relu_grad))  # First hidden layer
net.add(Affine_Layer(input_size=64, output_size=32))  # Smaller hidden layer
net.add(ActivationLayer(relu, relu_grad))  # Second hidden layer
net.add(Affine_Layer(input_size=32, output_size=10))  # Output layer with 10 classes
net.add(ActivationLayer(softmax, softmax_grad))  # Softmax for multi-class probabilities


In [None]:
def fit(self, x_train, y_train, epochs, learning_rate):
    samples = len(x_train)
    batch_size = 64  # Fixed batch size for mini-batches

    for epoch in range(epochs):
        err = 0
        # Shuffle data
        indices = np.arange(samples)
        np.random.shuffle(indices)
        x_train = x_train[indices]
        y_train = y_train[indices]

        # Process mini-batches
        for start_idx in range(0, samples, batch_size):
            end_idx = min(start_idx + batch_size, samples)
            batch_x = x_train[start_idx:end_idx]
            batch_y = y_train[start_idx:end_idx]

            # Forward and backward pass for the mini-batch
            output = batch_x
            for layer in self.layers:
                output = layer.forward_propagation(output)
            err += self.loss(batch_y, output)
            
            grad = self.loss_grad(batch_y, output)
            for layer in reversed(self.layers):
                grad = layer.backward_propagation(grad, learning_rate)

        # Log epoch error
        err /= (samples / batch_size)
        print(f"Epoch {epoch+1}/{epochs}, Error: {err}")


In [None]:
def cross_entropy_loss(y_true, y_pred):
    sy_pred = softmax(y_pred)
    return -np.sum(y_true * np.log(sy_pred + 1e-12)) / y_true.shape[0]

def cross_entropy_grad(y_true, y_pred):
    return softmax(y_pred) - y_true


In [100]:
# Training parameters
epoch_num = 20
lr = 0.01

# Set the new loss function
net.use_loss(cross_entropy_loss, cross_entropy_grad)

# Train the network
t1 = time.time()
net.fit(x_train, y_train, epochs=epoch_num, learning_rate=lr)
print(f"Total process time: {round(time.time() - t1, 3)} seconds")


Training epoch 1/20   error=0.230122
Training epoch 2/20   error=0.230122
Training epoch 3/20   error=0.230122
Training epoch 4/20   error=0.230122
Training epoch 5/20   error=0.230122
Training epoch 6/20   error=0.230122
Training epoch 7/20   error=0.230122
Training epoch 8/20   error=0.230122
Training epoch 9/20   error=0.230122
Training epoch 10/20   error=0.230122
Training epoch 11/20   error=0.230122
Training epoch 12/20   error=0.230122
Training epoch 13/20   error=0.230122
Training epoch 14/20   error=0.230122
Training epoch 15/20   error=0.230122
Training epoch 16/20   error=0.230122
Training epoch 17/20   error=0.230122
Training epoch 18/20   error=0.230122
Training epoch 19/20   error=0.230122
Training epoch 20/20   error=0.230122
Total process time: 459.048 seconds


In [101]:
output = net.predict(x_test ,y_test )


Test set: Avg. loss: 0.2301, Accuracy: 1135/10000 (11%)



In [102]:
import os
print(os.getcwd())
!jupyter nbconvert --to html "/Users/atark/Documents/Asvanced Comutational Learning and Data Analysis/PS1_Part2_Hello_NN_2024_ID_000000000.ipynb"


/Users/atark/Documents/Asvanced Comutational Learning and Data Analysis
[NbConvertApp] Converting notebook /Users/atark/Documents/Asvanced Comutational Learning and Data Analysis/PS1_Part2_Hello_NN_2024_ID_000000000.ipynb to html
[NbConvertApp] Writing 379573 bytes to /Users/atark/Documents/Asvanced Comutational Learning and Data Analysis/PS1_Part2_Hello_NN_2024_ID_000000000.html
