[[Neural Networks from Scratch]]

##### What are batches and why do we use them?
A batch is a subset of the full training dataset used in a single forward/backward pass in the neural network. The dataset is split into batches to make the training process more computationally efficient and to stabilise gradient updates. 

A dataset size of 10,000 samples with a batch size of 32 equates to 10,000/32 = 313 batches to process.

##### How to we compute the dot product when dealing with multiple input samples?
If you try to derive the dot product of inputs and weights where the shapes are, for instance, both `(3,4)`, there will be an error that arises due to incompatible matrix dimensions for the dot product.

For `np.dot(A, B)` to work, `A.shape[1]` must equal `B.shape[0]`.

Therefore, the shape of the weights must be `(4,3)` if the shape of the inputs is `(3,4)`. In order to fix the shape of B we transpose the matrix B by switching its rows and columns using `.T` on an `np.array`.

In [None]:
import micropip
await micropip.install("numpy")

import numpy as np

inputs = [[1.0, 2.0, 3.0, 2.5],
			[2.0, 5.0, -1.0, 2.0],
			[-1.5, 2.7, 3.3, -0.8]] # MATRIX OF VECTORS

weights = [[0.2, 0.8, -0.5, 1.0],
			[0.5, -0.91, 0.26, -0.5],
			[-0.26, -0.27, 0.17, 0.87]] # MATRIX OF VECTORS

biases = [2, 3, 0.5] # VECTOR

# layer_outputs = the dot product of inputs and transpose of weights + biases
layer1_outputs = np.dot(inputs, np.array(weights).T) + biases
print(layer1_outputs)

##### The second layer

In [None]:
weights2 = [[0.1, -0.14, 0.5],
			[-0.5, 0.12, -0.33],
			[-0.44, 0.73, -0.13]]

biases2 = [-1, 2, -0.5]

layer2_outputs = np.dot(layer1_outputs, np.array(weights2).T) + biases2
print(layer2_outputs)


##### Converting the layers into an object
The standard name for the input data is `X`

In [None]:
X = [[1.0, 2.0, 3.0, 2.5],
	[2.0, 5.0, -1.0, 2.0],
	[-1.5, 2.7, 3.3, -0.8]]

np.random.seed(0)

class Layer_Dense:
	def __init__(self, n_inputs, n_neurons):
		self.weights = 0.10 * np.random.randn(n_inputs, n_neurons)
		self.biases = np.zeros((1, n_neurons))
	def forward(self, inputs):
		self.output = np.dot(inputs, self.weights) + self.biases

layer1 = Layer_Dense(4,5)
layer2 = Layer_Dense(5,2)

layer1.forward(X)
#print(f"Layer 1 Output: {layer1.output}")
layer2.forward(layer1.output)
print(f"Layer 2 Output: {layer2.output}")


##### Why are we multiplying the random weights by `0.10`?
The random weights are scaled by a factor of 0.10 in order to reduce their initial magnitude and prevents excessively large outputs early in training.
##### Why make biases `np.zeros((1, n_neurons))` where there are two sets of brackets?
Biases are shaped `(1, n_neurons)` so they can be added to a layer output shaped `(batch_size, n_neurons`. This ensures each neuron's bias is applied to every sample.
##### Why must the shape of `self.weights` in `Layer_Dense` be `(n_inputs, n_neurons)` and not the reverse?
This is the ensure the correct orientation required for matrix multiplication with the inputs array `X`.