The purpose of this notebook is to get a practical understanding of how neural networks work in code, not relying on PyTorch/TensorFlow.
Reason why I chose to do this small project is due to the fact that I understand NN, Convolutional layers, Attention heads, though I always rely on libraries to implement them. 
Hopefully, this will enable me to slowly grasp deeper understandings of neural networks to eventually allow me to create custom tasks by myself.

Expectations:
- Learn granular details about Neural Networks and document them on this notebook
- Primarily use numpy for most of the code
- No PyTorch or TF
- I will use simple datasets such as MNIST number dataset to evaluate the model

Future Work:
- Create a simple CNN from scratch (for the same reason mentioned above)

In [4]:
import numpy as np
import matplotlib

### Starting with a simple neuron


Thinking about how it looks it should be very similar to linear regression.

- we have **y_hat** which is the predicted value
- we have all the **features**: x1, x2, x3, x4, x5... xn 
- each **feature** is multiplied by its corresponding **weight**: w1, w2, ... wn.
- Lastly, we also have the **bias**, which helps the linear regression task be more flexible in terms of finding the best fitting line.

##### in the example of having 5 features, we will have something similar to this equation
y_hat = x1w1 + x2w2 + x3w3 + x4w4 + x5w5 + b

In practicality, to make our lives easier, what we do in neural networks is using separated matrices to represent both features, weights and biases.

This results in something that looks like: 

y_hat = X * W.T + B

In [None]:
#to code a simple neuron, I will start by choosing arbitrary values to figure out a simple way to write my code.
#once again, this is strictly for me to learn how to code a neuron given no examples or libraries outside of numpy or matplotlib

#let's create a simple inpt for our neuron
simple_input = np.array([1.1, 2.6, 6.3, 3.1, 1.9])

#let's create the weights for our neuron
simple_weights = np.array([0.5, -0.2, 0.01, -0.3, 0.4])

#choosing a random bias value
b = 2.85

#referecing my notes above, we should transpose the weight vector to be able to multiple between simple_input and simple_weight
#intuitively, transposing doesn't matter in 1 dimensional vectors since the shapes naturally align, but 2D and above it should be used, so let's build a good habit
y_hat = simple_input * simple_weights.T + b
print(y_hat)

[3.4   2.33  2.913 1.92  3.61 ]


The result above is an obvious mistake, we should be getting a scalar value instead.
My understanding is that we would do element-wise multiplication, into adding all of the values together. Basically collapse all values into 1 scalar value.

Looking at documentation from numpy: https://numpy.org/doc/stable/reference/generated/numpy.dot.html

The function I should use is np.dot(), it does exactly what we're looking for

Weighted sum: 0.55 + (-0.52) + 0.063 + (-0.93) + 0.76 -> -0.077 + 2.85 = 2.773

In [None]:
#fixing the code, we get
y_hat = np.dot(simple_input, simple_weights.T)+ b
print(y_hat)

2.773


In an attempt to create a resemblance of a neural networks, I will try to create a simple example of a layer of 5 neurons feeding their inputs to a singular neuron in the following layer

*insert equation eventually*

Notes: 
- As I was reading the `np.dot()` documentation, I saw their examples contain `np.arange()` and `np.reshape()` which can definitely help me with writing this code
- Disadvantage of `np.arange()` is that there's no randomness, so did further digging and found `np.random.rand()` and `np.random.randn()`
- `np.random.rand()` - giving us random numbers between [0,1).
- `np.random.randn()` - giving us random numbers between -1 to 1. 
- To make our code easily written without much clutter, I am thinking about creating a large 2D matrix hosting each of our neurons in layer 1 as a row

Intuitively, I am thinking about using `np.random.rand()` for feature inputs to simulate normalized data, while using `np.random.randn()` for weights since weights can be both positive and negative. I know in practice weights are not always betweeen -1 to 1, however, it makes sense to initialize them as such at the beginning.

In [None]:
#create random 'normalized' 5 inputs
layer1_inputs = np.random.rand(5)

#creating our layer1 2D matrix, each neuron is a row
layer1_neurons = np.random.randn(25).reshape(5,5)

#biases for the neurons in layer 1
layer1_biases = np.random.rand(5)

#print our variables
print('our input:\n', layer1_inputs)

print('layer1 neurons:\n', layer1_neurons)

print('layer1 biases:', layer1_biases)

our input:
 [0.20571639 0.75373995 0.52000017 0.61077481 0.92073805]
layer1 neurons:
 [[ 0.07893618 -0.32123298 -0.53707044  0.23327483 -0.52434425]
 [ 0.81547935 -1.31076576  0.30052341 -0.04702089  2.31393213]
 [ 0.9023737  -0.30687239 -0.07008626 -0.88086836  0.73075758]
 [-0.53679917 -1.88899984 -1.50660695 -2.56780042 -1.74279219]
 [ 0.44676928  0.70542972 -1.51599587 -0.66946737 -0.62196441]]
layer1 biases: [0.6482886  0.03822854 0.21075823 0.28566933 0.67445388]


Now that we set up all that we need, let's emulate a small feedforward logic (without activation yet).

1. Given 5 input features, we will use `np.dot()` to multiple between the feature vector to the neuron matrix in `layer 1`.
2. We will then create `layer 2 neuron` with `np.random.randn()` - 5 different weights once again (5 input features).
3. Now, to emulate the feedforward concept, we will take the result of step 1, and treat it as input feature vector to `layer 2 neuron`.
4. Repeat the usage of `np.dot()` on the new input vector and `layer 2 neuron`, and we should get a singular scalar value.

NOTE:
- To use activation function, we will have to pass the resultant feature vector of layer 1 into some activation function method prior to continuing to step 4.

In [66]:
#compute the weighted sum of input and neuron in layer 2
layer1_output = np.dot(layer1_inputs, layer1_neurons.T) + layer1_biases

#create the neuron weights and bias for layer 2
layer2_neuron1 = np.random.randn(5)
layer2_bias = np.random.rand(1)

#print neuron 1 in layer 2 for visuals
print('layer 2 neuron weights:', layer2_neuron1)

#calculate and print the final output
final_output = np.dot(layer1_output, layer2_neuron1.T) + layer2_bias

print('the final output of our small feedforward experiment:', final_output)

layer 2 neuron weights: [-0.17380856 -2.47304544 -0.14746777 -2.26227928  1.88420965]
the final output of our small feedforward experiment: [7.44955699]


I had to debug a simple issue when trying to implement many neurons in layer 2 instead of just one (in earlier cell).

Here is what I learned. When deciding to create many neurons I can approach it in two angles: 
1. (num of neurons, weights per neuron)
2. (weights per neuron, num of neurons)

This choice impacts the decision of my future architecture due to how matrix multiplications work.
In my example, if i were to code my layer2 as (5 weights, 8 neurons) and try to dot product with (5 inputs), I'd have to transpose the neurons/weights matrix to make it compatible.

Following a standard of (weights, neurons) and `np.dot(input, weights)` alleviates the need to transpose some of the times, decreasing computational costs.

In [None]:
#choosing (weights, neurons) neurons instantiation
layer2_neurons = np.random.randn(5,8)
#create biases
layer2_biases = np.random.rand(8)

#calculate output using with layer2_neurons shape in mind (5) * (5,8)
layer2_output = np.dot(layer1_output, layer2_neurons) + layer2_biases

#printing variables
print("neurons in layer 2:\n",layer2_neurons)
print("biases in layer 2:", layer2_biases)
print("final output of layer2:", layer2_output)

neurons in layer 2:
 [[-0.7624961  -0.67240077 -1.44925195 -0.50618838 -1.7259303  -0.0101107
   0.58947539 -1.00829823]
 [-0.83022386  0.32935717 -0.88348662  0.75813921  1.06812995 -0.0294345
  -1.3966351  -0.13097041]
 [ 0.46194396 -1.80321943 -0.00970695  0.241571    0.59000734 -0.45725993
   0.17636135 -1.06666958]
 [-0.29449943 -0.68545972  0.09589121 -0.42960172  0.82950902 -1.58536818
  -0.83079874 -0.21816754]
 [-0.29447958  0.22413771  0.19727941 -0.02606043 -0.07032696  0.95250137
   0.54078732 -0.35324619]]
biases in layer 2: [0.06934986 0.21186273 0.684796   0.96590163 0.25501542 0.04391858
 0.85572842 0.60971052]
final output of layer2: [ 0.78773442  3.81759283 -0.92829285  4.49681662 -1.95698579  7.68445591
  2.79357561  1.63639858]
