### Learning very non-linear features with neural networks
- Linear classifiers
 - $\text{Score(x)} = w_0 + w_1x_1 + w_2x_2 + \cdots + w_dx_d$
   - $\text{Score(x)} > 0$
   - $\text{Score(x)} < 0$
- Graph representation of classifier: useful for defining neural networks
<img src="./figures/w5-f1.png" width=400>

- What can a linear classifier represent?
<img src="./figures/w5-f2.png" width=400>

- What can't a simple linear classifier represent?
 - Impossible with only one simple linear classifier
<img src="./figures/w5-f3.png" width=400>

- Solving the XOR problem: Adding a layer
<img src="./figures/w5-f4.png" width=400>

`x1 AND NOT x2`
```
x1 |  x2 | z1
0  |  0  | 0
1  |  0  | 1
0  |  1  | 0
1  |  1  | 0
```

`NOT x1 AND x2`
```
x1 |  x2 | z2
0  |  0  | 0
1  |  0  | 0
0  |  1  | 1
1  |  1  | 0
```

`z1 OR z2`
```
z1 |  z2 | y
0  |  0  | 0
1  |  0  | 1
0  |  1  | 1
0  |  0  | 0
```


**quiz hints:**
- https://www.coursera.org/learn/ml-foundations/discussions/weeks/6/threads/AAIUurrtEeWGphLhfbPAyQ

### A neural network
- Layers and layers and layers of linear models and non-linear transformations
- Around for about 50 years
 - Fell in "disfavor" in 90s
- In last few years, big resurgence
 - Impressive accuracy on several benchmark problems
 - Powered by huge datasets, GPUs, & modeling/learning alg improvements

### Application of deep learning to computer vision
- Image features
 - Features = local detectors
   - Combined to make prediction
   - (in reality, features are more low-level)
   - ex. Nose detector, Eye detector, Mouth detector, ... =-> Face
- Typical local detectors look for locally "interesting points" in image
 - Image features: collections of locally intersting points
   - Combined to build classifiers
 - Many hand create freatures exist for finding interest points
 - Standard image classification approach
   - Input -> Extract features ( Hand-created features ) -> Use imple classifier (e.g., logistic regression, SVMs) -> Face?
 - but very painful for design..
- **Deep learning: implicitly learns features**
  - Input -> Layer 1 -> Layer 2 -> Layer 3 -> Prediction
  - Each Layer learn interest points to detect
- **Deep learning performance**
 - Sample results using deep nerual networks
   - German traffic sign recognition benchmark ( 99.5% accuracy )
   - House number recognition ( 97.8% accuracy )
 - ImageNet 2012 competition: 1.2M training images, 1000 categories
   - SuperVision ( deep-learning classifier ) won
   - 8 layers, 60M parameters

### Challenges of deep learning
- Pros
 - Enables learning of features rather than hand tuning
 - Impressive performance gains
   - Computer vision
   - Speech recognition
   - Some text analysis
 - Potential for more impact
- Deep learning workflow
 - Lots of labeled data
   - Training set -> Learn deep neural net
   - Validation set -> Validate -> Adjust parameters, network architecture, ...
- Many tricks needed to work well..
  - Defferenct types of layers, connections, ... needed for high accuracy
- Cons
 - Requires a lot of data for high accuracy
 - Computationally really expensive
 - Extremely hard to tune
   - Choice of architecture
   - Parameter types
   - Hyperparameters
   - Learning algorithm
   - ...
- Computational cost + so many choices = incredibly hard to tune

### Deep feautres: Deep learning + Transfer learning
- Can we learn features from data, even when we don't have data or time?
- **Transfer learning: Use data from one task to help learn on another**
 - Old idea, explored for deep learning by Donahue
 - Flow:
   - Lots of data(cat, dog) -> Learn neural net -> Great accuracy on cat v. dog
   - Some data(many items) -> Neural net as feature extractor + Simple classifier -> Great accuracy on 101 categories
   
**What's learned in a neural net**
- Neural net trained for Task 1: cat vs. dog
  - input -> `Layer 1` -> `Layer 2` -> `Layer ...` -> `End part layers` -> output
  - `Layer 1 ~ Layer ...` : More generic Can be used as feature extractor
  - `End part layers` : Very specific to Task 1 Should be ignored for other tasks
- For Task 2, predicting 101 categories, learn only end part of neural net
  - `Layer 1 ~ Layer ...`: Keep weights fixed!
  - `End of part` : Use simple clasisfier(e.g., logistic regression, SVMs, nearest neighbor, ...) 
  
**Transfer learning with deep features workflow**
- Some labeled data -> Extract features with neural net trained on different task
 - Training set -> Learn simple classifier
 - Validation set -> Validate

### Deep learning ML block diagram
- `Training Data`( images, labels ) -> `Featrue extraction` -> $x$ ( deep features )
- $x$ -> `ML model(logistic regression)` ( $\hat{w}$: weights of features) -> $\hat{y}$ ( predicted labels )
- `y` is true label
- `Quality metric` -> `ML algorithm` -> classification accuracy -> $\hat{w}$
 - loop, updating for maximize accuracy