In [4]:
import numpy as np

# Notes on Neural Networks from Scratch

## Chapters 1 + 2

### Terms

### Data
* Feature - A single measurable property of a phenomena that can be used for predictions or classifications.
    - Usually Numeric. Features usually must be converted/encoded to numeric values to be used with ML algorithms.
    - Can sometimes be combined to create new features. 
    - Should be informative and relevant to the task.
    - Deep neural networks are able to learn complex relationships between features, including interactions and non-linear relationships. Thus eliminating the need for manual feature engineering.
* Preprocessing Data - Adjusting feature data to be suitable for an algorithm.
    - Encoding
    - Normalization
    - Scaling
* Feature Set - Group of features. 
    - Represented as vectors.
* Sample - An instance of a feature set. One "Row", one observation.
* Batch - Group of samples.
    - Seperated used to train with parallel processing.
    - Batching helps prevent overfitting. See [this](https://nnfs.io/vyu/) animation
* In-Sample Data - Training Data
* Out-Of-Sample Data - Testing/Validation Data
* Generalization - "Generalizing" the model to work well on never before seen input. 
* Overfitting - When a model "memorizes" a dataset and looks for "too-specific" features 
* Training - Slowly adjusting the weights/biases of a model to obtain the desired output.

### Model Types
* Classification - Predict discrete labels/targets
* Regression - Predict numeric values    

### Layers
* Input Layer - Actual input data, typically preprocessed.
* Output Layer - What the neural network returns
    - Often has as many neurons as the training dataset has classes.
* Dense Layer - Every neuron in each layer is connected to every neuron of the next layer.
    - Each connection has a weight associated with it.

### Networks
* Fully Connected Network - Every neuron in the current layer is connected to every neuron from the previous layer 


### Activation
* ```Input * Weight + Bias = Output```
* Inputs * Weights (Connections) + bias (per neuron)
* Weight - A trainable factor of how much of the input to use. 
    - Multiplied by the input value to determine how much of that input signal contributes to the output.
    - Present in each *connection* between neurons.
    - Weights are initially random, then are adjusted based on how far the actual output is from the desired output.
    - Each connection in a neural network has a unique weight.
    - Weights can be positive or negative.
    - After all input * weight values are taken they are summed and added to the bias.
* Bias - A constant added to the weighted sum of inputs which changes a neuron's activation threshold.
    - Present in each neuron itself.
    - Allows control of neuron activation threshold independent of the input values.
    - Higher bias needs stronger signal for neuron to fire, lower bias means it will fire at a weaker signal
    - Initially random/0, then are adjusted based on how far the actual output is from the desired output.
* Activation Function - Fed the result of `input * weight + bias` resulting in whether the neuron should fire or not and the strength of the resulting signal.
    - Sigmoid: Outputs a value between 0 and 1, representing a probability-like activation.
    - ReLU (Rectified Linear Unit): Outputs the input directly if it's positive, otherwise outputs 0.
    - Tanh (Hyperbolic Tangent): Outputs a value between -1 and 1, similar to sigmoid but with a steeper slope around zero.
    - Activation functions introduce non-linearity into neural networks, which is crucial for them to learn complex patterns and relationships in data. Without non-linearity, networks would only be able to learn linear relationships, which is insufficient for most real-world tasks.
    - Different activation functions have different properties like differentiability, vanishing/exploding gradient problems, and computational efficiency.
    - Some activation functions work better for specific types of tasks or network architectures.

In [1]:
# Example Inputs
inputs_vector = [1.0, 2.0, 3.0, 2.5]

# Random Weights
weights_vector = [0.2, 0.8, -0.5, 1.0]

# Random Bias
bias = 2.0

# Get output for inputs to a single neuron
def single_neuron_output(inputs, weights, bias):
    weighted_sum = sum(inputs[i] * weights[i] for i in range(len(inputs)))
    output = weighted_sum + bias
    return output
    
single_neuron_output(inputs_vector, weights_vector, bias)

4.8

In [2]:
# Example Inputs
inputs_vector = [1.0, 2.0, 3.0, 2.5]

# Random Weights
weights_matrix = [[0.2, 0.8, -0.5, 1.0],
                  [0.5, -0.91, 0.26, -0.5],
                  [-0.26, -0.27, 0.17, 0.87]]

# Random Biases
biases = [2.0, 3.0, 0.5]

# Get output for inputs of a single layer
def single_layer_output(inputs, weights, biases):
    outputs = [single_neuron_output(inputs, weights[i], biases[i]) for i in range(len(weights))]
    return outputs

single_layer_output(inputs_vector, weights_matrix, biases)

[4.8, 1.21, 2.385]

### Vectors, Matrices, Tensors

* Array - Homologous container. Homo - Same, Logos - Proportions/Ratios. Consistent in both dimensions. 
* Vector - Linear array; One dimensional array
    - [1, 2, 3]
* Row Vector - An array with the shape (1, n)

In [12]:
a = [1, 2, 3]

In [13]:
# Make a into a 2d row array
np.expand_dims(a, axis=0)

array([[1, 2, 3]])

In [14]:
# Make a into a 2d row array
np.array([a])

array([[1, 2, 3]])

* Column Vector - An array with the shape (n, 1)

In [15]:
# Make a into a 2d column array
np.expand_dims(a, axis=1)

array([[1],
       [2],
       [3]])

In [16]:
# Transpose a into a column array
np.array([a]).T

array([[1],
       [2],
       [3]])

* Matrix - Two dimensional Array

In [19]:
np.array([[1, 2, 3],
          [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

* Three dimensional array

In [20]:
np.array([[[1, 2, 3],
          [4, 5, 6]],
         [[7, 8, 9],
          [10, 11, 12]],
         [[13, 14, 15],
          [16, 17, 18]]])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]],

       [[13, 14, 15],
        [16, 17, 18]]])

* Tensor - Object that can be represented as an array
* Dot Product - Sum of the product of consecutive vector elements
    - ```sum(v1[i] * v2[i] for in in range(len(v1)]```
    - ```sum(inputs[i] * weights[i] for in in range(len(inputs)]```
    - ```np.dot(inputs, weights)```
    - Both vectors must be the same size

In [21]:
# Example Inputs
inputs_vector = [1.0, 2.0, 3.0, 2.5]

# Random Weights
weights_vector = [0.2, 0.8, -0.5, 1.0]

# Random Bias
bias = 2.0

# Using NumPy dot product to get single neuron output
output = np.dot(inputs_vector, weights_vector) + bias
output

4.8

* Matrix + vector  =  add vector to all rows/columns of matrix depending on if row/column vector
    - Column vector must be same length as matrix column
    - Row vector must be same length as matrix row

In [47]:
a = [1, 2, 3]
a

[1, 2, 3]

In [52]:
B = np.zeros([3, 3])
B

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [53]:
B + np.asarray([a])

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [54]:
B + np.asarray([a]).T

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.]])

* Matrix Product - The dot product of all combinations of rows of a first matrix (A) and columns of a second matrix (B).
    - For a matrix A with dimensions x, y and a matrix B with dimensions n, m y must equal n to perform matrix product.
    - A(x, y) & B(n, m) y == n
    - Results in a Matrix C(x, m)
    - The dot product of two vectors = a matrix product of a row and column vector.
* Transposition - Changing a matrix A(x, y) to Matrix A(y, x). Flipping the columns into rows and rows into columns.
    - Transposing weights matrix allows us to take matrix product

### Chapter Two Final Code

In [55]:
# Using numpy to get Matrix Product - transposing weights
# Matrix of inputs requires matrix product
inputs_matrix = [[1.0, 2.0, 3.0, 2.5],
                 [2.0, 5.0, -1.0, 2.0],
                 [-1.5, 2.7, 3.3, -0.8]]

# Random Weights
weights_matrix = [[0.2, 0.8, -0.5, 1.0],
                  [0.5, -0.91, 0.26, -0.5],
                  [-0.26, -0.27, 0.17, 0.87]]

# Random Biases
biases = [2.0, 3.0, 0.5]

# numpy converts biases to array by itself
outputs = np.dot(inputs_matrix, np.asarray(weights_matrix).T) + biases
outputs

array([[ 4.8  ,  1.21 ,  2.385],
       [ 8.9  , -1.81 ,  0.2  ],
       [ 1.41 ,  1.051,  0.026]])