# Notes for Machine Learning class by by DeepLearning.AI & Stanford University
# Terminology

| Definition        | Explanation            |
|-------------------|------------------------|
| Training set      | Data used to train the model | 
| Lower case *x*    |Input variable feature |
| Lower case *y*    | Output variable / Target variable |
| Lower case *m* | number of training example (rows in a trainingset)|
| Lower case *$\hat{y}$* | Prediction or output of *y* |
| Lower case *f* | Function, model or hypothesis|
| Lower case *w,b* | parameters, coefficients or weights|
|x<sup>rownumber</sup>,y<sup>rownumber</sup> or x<sup>i</sup>,y<sup>i</sup> = i<sup>th</sup> training example | Single training example | 
| *f<sub>w,b</sub> (x)=wx+b* | linear regression model|
| Cost function | comparing  *$\hat{y}$<sup>i</sup>* - *y<sup>i</sup>* to measure the error|
|Gradient descent| The algorithm that calculates the best value for *w* and *b* |

# Types of machine learning

## Supervised learning

In this case you give your algorithm examples to learn form. You do this by giving the right answer or **label**. This turns *input **X** -> output label **Y*** . To clarify you give examples of correct output Y labels.

| Input X           | Output Y               | Application         |
|-------------------|------------------------|---------------------|
| email             | spam                   | spam filter         |
| audio             | text transcript        | speech recognition  |
| English           | Spanish                | machine translation |
| ad, user info     | click (0/1)            | online advertising  |
| image, radar info | position of other cars | self-driving car    |
| Image of phone    | defect? (0/1)          | visual inspection   |
#### Types of supervised learning

##### **Regression**

If you have for example data of house sizes and prices, you can create a regression line. You can make the algorithm to fit a regression line [ straight line or a curved line]. It is important to choose the most appropriate curve, not the most wanted answer. **The goal of is to predict a number from infinitely many possible outputs**
  
![example of supervised learning](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/supervised_learning.PNG)  
  
For this you use the function *f<sub>w,b</sub> (x)=wx+b*
*w* gives the slope time *x*, *b* gives the starting point on the Y axis (aka y-intercept) 
   
![example of linear model](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/linear_model.PNG)

##### Cost Function
To calculate the value for *w* and *b* so that the predicition *$\hat{y}$<sup>i</sup>* is close to the true target *y<sup>i</sup>* for many/all training examples, *x<sup>i</sup>*, *y<sup>i</sup>*, you need to construct a cost function. This is mostly done by using the Squared error cost function *J(w,b)$\frac{1}{2m}$ $\sum\limits_{i=1}^{m}$ f<sub>w,b</sub>(x<sup>(i)</sup>)-y<sup>(i)</sup>)<sup>2</sup>*, which gives you the error. The goal is to 
$\displaystyle \min_{w,b} imize J w,b$ (little bit weirdly written by me). To do this you can simplyfy the cost function to *f<sub>w</sub>(x)=wx* by saying b= $\cancel{0}$. This turns the cost function into 
J(w)$\frac{1}{2m}$ $\sum\limits_{i=1}^{m}$ f<sub>w</sub>(x<sup>(i)</sup>)-y<sup>(i)</sup>)<sup>2</sup>* and the goal becomes $\displaystyle \min_{w} imize J w$.

![example of cost funtion](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/cost_function.PNG)  
Cost function  
![setting w to 0](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/cost_function_w_1.PNG)  
When testing the simplified cost function, we first try it by setting *w* to 0.   
![setting w to 0.5](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/cost_function_w_0.5.PNG)  
this is how it looks setting it to 0.5  
![Final cost function](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/cost_function_w_final.PNG)  
Lastly you can see here all the cost functions plotted.  
  
If we would not simplify the function, but us both *w* and *b* you get instead a 3D plot. This looks something like this:  
![Complex cost function plot](https://github.com/DouweHorsthuis/machine-learning-cousera/blob/main/images/cost_function_w_not_simplified.PNG)    
  
The algorithem that can calculate the best value of *w* and *b* is called **Gradient descent**

##### **Classification**

Try to classify data into multiple possible outputs. In this case class of categories are used interchangeably and these are what is predicted by the algorithm. You can have multiple inputs. **The goal is to predict a category from a small number of possible outputs.**

### Unsupervised learning

*Data only comes with inputs X, but not output lables Y. Algorithm has to find structures in the data.* The algorithm wont have a right answer to work from. Because of this the algorithm will look for *"something interesting"* in the data, like some structure of pattern. Clustering is one of the ways it does that.

#### Types of unsupervised learning

##### Clustering

In this case it will look for clusters of data that are similar. For example, in Google news it might look for words in a headline of a article and look for all the articles that have some of the key words as well.

##### Anomaly detection

In this case it will look for unusual data points. This is often used for fraud detection

##### Dimensional reduction

Compress data using fewer numbers.

In [1]:
import numpy
%matplotlib widget
import matplotlib.pyplot as plt
from lab_utils_uni import plt_intuition, plt_stationary, plt_update_onclick, soup_bowl
plt.style.use('./deeplearning.mplstyle')

ModuleNotFoundError: No module named 'ipympl'