# Chapter 1: Weights and Biases

- neural networks are like **"black boxes"** in the sense that we often have no idea why they reach the conclusions they do, however, we understand how
---
- each connection between neurons has a **weight** associated with it, and this weight gets multiplied by the input value
- once all the $inputs * weights$ flow into our neuron, they are summed and a **bias** is added
- the purpose of the bias is to offset the output either positively or negatively 
---
- the weights and biases serve as "knobs" that we can tweak and tune to fit to our data
- in a neural network, there are often thousands or even millions of these parameters that get tweaked by the **optimizer** during training
- weights and biases, as just discussed, are both tunable parameters that will impact the output of neurons, but they do so in *different ways*
- because weights are *multipled* (by the inputs), they will only change the magnitude or flip the sign
---
- to understand the concept of weights, consider a value of -0.5 and a weight of 0.7
- again, since weights get multipled, we get -0.5 * 0.7 = -0.35
- now consider a bias, of the same value, 0.7
- the bias is *summed* with the value, which yields -0.5 + 0.7 = 0.2
---
- in this example, the bias has offset the final value, so much so that it flipped the sign from negative to positive
- a positive bias is always going to offset values in the positive direction

### Activation Functions
- after these calculations (weights and biases), the resulting value is passed through an **activation function**
- the activation function is meant to mimic a neuron in the brain that is either "firing" or not - like an on-off switch
- in programming, an on-off switch as a function would be referred to as a **step function** because it looks like a step if we graph it
---
- for a step function, if the value of $sum(inputs * weights) + bias$ is greater than some threshold, the neuron fires (output a 1), otherwise, it does not fire, and instead outputs 0
- while you can use a step function for your activation function, we tend to use something slightly more advanced
- the formula for a single neuron might look something like *this*: $output = sum(inputs * weights) + bias$
- we then apply an activation function to the output: $output = activation(output)$

### Example Network
- each neuron's output could be a part of the ending output layer as well as the input to another neuron
- while the full function of a neural network can get very large, let's start with a simple example with 2 hidden layers of 4 neurons each
- while there are 2 hidden layers of 4 neurons each, there are also 2 more layers: an **input layer** and an **output layer**
- the input layer represents actual input data, such as pixel values from an image or perhaps data from a temperature sensor
- while this input data can be in the same exact form from which it was collected ("raw" data), you will typically have to **preprocess** your input data through functions like **normalization** and **scaling** as your input data needs to be in *numeric* form
- the output layer is whatever the neural network returns and aims to have as many neurons as the training set has classes, but the output layer can also simply have a single output neuron for binary (two classes) classification
---
- for right now, however, **we will focus on a classifier that uses separate outputs for each class** (3 different classes = 3 outputs)
- for example, if our task is to classify images as either a "dog" or a "cat", then we will have two total classes, and therefore, two output neurons (one associated with "dog" and the other, with "cat")
- as just discussed, we could also have one output neuron, that is either "dog" or "not dog"
- for each image passed through this network, the final output will have a calculated value in the "cat" output neuron, and a calculated value in the "dog" output neuron
- the output neuron that receives the highest score becomes the official class prediction for the image used as input

### Overfitting
- the end goal for a neural network is to adjust their weights and biases (the parameters) so when applied to an *unseen* example in the input, they produce the desired output
- when supervised machine learning algorithms are trained, we show the algorithm examples of inputs and their associated, desired outputs
- one major issue is **overfitting**, which is when the algorithm just learns to fit the training data and fails to gain an understanding of the underlying input-output dependencies (basically just memorizes the training data) 
- to deal with overfitting, we use "in-sample" data to train the algorithm, and then "out-of-sample" data to validate the algorithm
- to do this, a certain percentage of data is set aside
---
- for example, if there is a dataset of 100,000 samples of data and labels, you should immediently take 10,000 samples of data and labels and set them aside as out-of-sample or "validation" data
- you will train your model with the 90,000 in-sample data points and then evaluate your model on the 10,000 out-of-sample data points that the model has not yet seen
- the goal is for our models to not only be able to accurately make predictions on the training data, but for it to also be similarly accurate on unseen data
- this is called **generalization**