<a href="https://colab.research.google.com/github/PaulToronto/Stanford-Andrew-Ng-Machine-Learning-Specialization/blob/main/2_1_1_Neural_networks_intuition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2.1.1 Neural networks intuition

## 2.1.1.1 Welcome

1. Week 1: Neural Networks, inference (prediction)
2. Week 2: Neural Networks, training
3. Week 3: Practical advice for building machine learning systems
4. Week 5: Decision Trees


## 2.1.1.2 Neurons and the brain

- **Neural networks** are sometimes called **artificial neural networks**
- Origins: Algorithms that try to mimic the brain
- Started in the 1950s and then fell out of favour
- Used in the 1980s and early 1990s
    - mostly for handwritten digit recognition
        - mail postal codes and checks
- Resurgence from around 2005
    - rebranded as **deep learning**
    - speech recognition
    - computer vision
    - ImageNet (2012): https://image-net.org/
    - text, natural language processing
    - now they are used in a wide variety of application areas
- Even though today's neural networks have almost nothing to do with how the brain learns, there was the early motivation of trying to build software that mimics the brain

### How does the brain work?

<img src='https://drive.google.com/uc?export=view&id=10jGJcVuQOS7r-j0Jc1N2n_Rq4MFcqvLd'>

- All of human thought is from neurons sending electrical impulses and sometimes forming new connections with other neurons
- Neurons receive electrical impulses from other neurons, then the neuron carries out some computations and send the output to other neurons
- The output of one neuron beccomes the input to other neurons
- Neurons aggrgegate imputs from multiple other neurons and then maybe sends that output to yet other neurons

<img src='https://drive.google.com/uc?export=view&id=1CdedwBQZBQIFrCZxAN5B7c9KYKO9zA3Q'>

- The input "wires" are called **dendrites**
- Neurons occassionally send electrical impulses to other neurons via the output wire which is called the **axon**
<hr/>
- In an artificial neural network, a neuron takes one or more inputs which are just numbers
- It does some computation, outputs a number which can become input to another neuron
- Often many neurons are simulated at the same time
- **BIG CAVEAT**: the analogy between artificial neural networks and biological neurons is a loose one. We have almost no idea how the human brain works





### Why Now?

- The ideas of neural networks have been around for decades
- The answer to this question is best answered with a diagram
- The amount of data we have is increasing
- We have faster computer processors
- We have **GPU**s

<img src='https://drive.google.com/uc?export=view&id=1FNRmsRKkVFbI9SK3z-cW8fvtKmvyqfn8'>



## 2.1.1.3 Demand Prediction

### Will this product be a top seller or not?

- selling t-shirts
- you have collected data of different t-shirts that were sold at different prices, as well as which ones became a top seller
- this type of application is used by retailers today for inventory planning and marketing campaigns
- previously, we tackled problems like this with logistic regression
- to set this up to **build a neural network** we switch the terminolgy a bit and use $a$ to denote the output rether than $f(x)$
    - $a$ stands for **activation**
    - activation is a term from neuroscience and it refers to how much a neuron is sending a high output to other neurons downstream from it
- it turns out that this little logistic regression algorithm canm be though of as a very simplified model of a single neuron in the brain
    1. the neuron takes as input the price, $x$
    2. it computes the forumula
    3. it's output is $a$, which is the probability of the t-shirt being a top seller
- can think of a neuron as being a tiny computer whose only job is to take a number (or a few numbers), such as price, then to output one number, such as the probability of being a top seller

<img src='https://drive.google.com/uc?export=view&id=1mYnBF_ptr5kSBYtlyLtK4dWz2FbQO64O'>

- Building a neural network just requires taking a bunch of these neurons and "wiring" them together



### A more complex example of demand prediction

Now we have four input features:

- price of the t-shirt
- shipping cost
- marketing amount
- material quality

You might suspect that whether or not a t-shirt becomes a best seller depends on a few factors:

- affordability
- awareness
- perceived quality (bias or potential bias that this is a high quality t-shirt)

<img src='https://drive.google.com/uc?export=view&id=1-j2ke6ibyXIncdwDGr1ToMwh9e-p6VMO'>

- here, we manually decided which neurons should take which inputs as features (i.e. affordability is a function of price and shipping costs)
- in practice, each neuron in a certain layer, say the layer in the middle, will have access to every feature, to every value in the previous layer

<img src='https://drive.google.com/uc?export=view&id=1oasWyw01avy6Bm_JvNnxl9gwhD7BAoSE'>

- the layer for "affordability" may learn to ignore marketing and material to only focus on the features that are most relevant to affordablity

<img src='https://drive.google.com/uc?export=view&id=1VH1Vk9hd_WXUH_ZsE_xVLsYmiMx5w27Z'>

- further simplifying the notation, the four input features are are written as a vector, $\vec{x}$, which is the **input layer**
- this feature vector is fed to the layer in the middle, called the **hidden** layer, which then computes three **activation values**
- these 3 numbers become another vector, $\vec{a}$
- the vector, $\vec{a}$ is fed to the **output layer**, which finally outputs the probability of this t-shirt becoming a top seller, $a$








### Another of thinking about neural networks

<img src='https://drive.google.com/uc?export=view&id=1hfL7alVXwIJrW1V_Fs-aj99XeXN6hG0D'>

- covering up the left half of the diagram
- what we are left with is a **logistic regression algorithm** or **logistic regression unit** that is taking as input, afforability, awareness and perceived quality and using these 3 features to estimate the probability of the t-shirt being a top seller
- this is just logistic regression, but the cool thing about it is that rather than using the original features, price, shipping cost, marketing and material we are using what is maybe a better set of features which are hopefully more predictive of whether or not the t-shirt will be a top seller
    - a version of logistic regression that can learn its own feature that make it easier to make accurate predictions
    - in the previous course we multiplied the frontage by the depth to engineer a new feature, that was manual feature engineering

### Summary of what a neural network does

- the input layer has a vector of features, four numbers in our example, $\vec{x}$
- this vector of length 4 is input to the hidden layer, which outputs a vector of length 3
    - this is a vector of activations, $\vec{a}$
- this vector of length 3 outputs one number, $a$

### Important

- In our example we explicitly decided that neural network should compute affordability, awareness and perceived quality
- In practice, the neural network decides what should be computed for its hidden layer(s)

### A neural network can have multiple hidden layers

<img src='https://drive.google.com/uc?export=view&id=10yryONtvApbaU9ibDXr_mt37tScqSa2J'>

- When building a neural network, one of the decisions you need to make is how many hidden layers you want and how many neurons in each layer
- This is a question of the architecture of the neural network
- Some tips for choosing an appropriate architecture of the neural network are covered later in this course
    - the choice of architecture has an impact on the performance of the neural network
- A neural network with mutiple layers is called a **multilayer perceptron**

## 2.1.1.4 Example: Recognizing Images

<img src='https://drive.google.com/uc?export=view&id=1aiJWM9A0lXWJb74zrpNZPowdAl4BiGwd'>

- the values in the matrix are pixel brightness values



<img src='https://drive.google.com/uc?export=view&id=18oTKdunNp9nI53oKHaNZLE-g2rfb5GU1'>

- Suppose a neural network has been trained on a lot of images of faces
- If you pieek at the different neurons in the hidden layers to figure out what they may be computing, this is what you might find
    - the first layer could be finding very short lines or edges in the image
    - in the second layer, the neurons might learn to group together lots of little short edge segments in order to look for parts of faces
        - the first neuron appears to be detecting the presence or absence of an eye in a certain position of the image
        - the second neuron, the corner of a nose
        - the bottom of an ear
        - ...
    - in the next layer the neural network is aggregating different parts of faces to then try to  detect the presence or absence of larger, coarser face shapes
    - this helps the output layer detect the identity of the person
- Each successive layer loks at large and larger windows
    - this little neuron visualizations actually correspond to differently sized regions in the image

### Car Classification

<img src='https://drive.google.com/uc?export=view&id=1YSA66tkk-0Pg8TMsdDMS2sNgDUrNpjt2'>

- Just by feeing it different data, the neural network learns to detect very different features