Notes taken from [here](https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch01.html).

---

### What is Machine Learning?

ML Definition: Gives computers ability to learn without being told explicitly what to do. Often they improve with experience.

---

### Why use Machine Learning?

- complex rules such as a spam filter. Easier to have a computer find the patterns than to program them all.
- problems without known algorithmic solutions, ie. speech recognition.
- can help humans learn. ML can find patterns humans never would that can be illuminating, aka data mining.
- can deal with changing data, ie adjusting spam rules over time.

---

### Types of Machine Learning

Can be classified as:

- supervised/unsupervised
- online/batch (can they learn on the fly)
- instance based vs. model based. (compare to existing data or build a model)

#### Supervision

4 types:
- Supervised
- Unsupervised
- Semi-Supervised
- Reinforcement


##### Supervised

The algorithm is given solutions (labels) to the training data so that it knows how it should be labeling. Often used for classification (hotdog/not-hotdog) or regression (finding a numeric value based on attributes, aka real estate price based on location/size/etc).

Some regression algorithms can be used for classification, ie. to give a % chance that something is a given classification.

Common algortihms:

- k-nearest neighbors
- linear regression
- logistic regression
- support vector machines (SVMs)
- decision trees / random forests
- neural networks

##### Unsupervised

Data is unlabled. System tries to find patterns.


Common Algorithms:

- clustering
- k-means
- hierarchical cluster analysis (HCA)
- expectation maximization
- visualization and dimensionality reduction
- principal component analysis (PCA)
- kernal PCA
- locally-linear embedding (LLE)
- t-stochastic neigbor embedding (t-SNE)
- association rule learning
- apriori
- eclat

*clustering* can find large groups that share characteristic you may have never considered. ie, 50% of all users are from a given city and of a certain age etc.

*visualization* takes a high dimensionality dataset and reduces it to 2/3D so that it can be visualized so humans can look for patterns.

*dimensionality reduction* is related to visualization. it's the process of combining attributes that are highly correlated. This can improve the speed and accuracy of model training.

*anomaly detection* is when the algorithm detects unusual data points. useful for fraud prevention etc.

*association rule learning* is discovering interesting relations between attributes. ie. a driver that drives over 100km/day often speeds etc.

##### Semi-supervised

Data is mostly unlabeled with some labels. Combination of supervised/unsupervised algorithms. IE, identify a few faces  in pictures and photo app identifies the rest.

##### Reinforcement

An agent is exposed to an environment, and then rewarded/penalized based on it's actions (policy). Maximuming rewards and minimizing penalties can lead to strong policies. IE alphaGO.

#### Batch vs Online learning

Some machine learning algorithms can learn incrementally, some cannot.

2 types:
- Batch
- Online

##### Batch

The system is trained using all available data at once. It can be resource intensive. If you want to add new datapoints for the model to consider, the entire system needs to be retrained.

##### Online

Data is fed to the system incrementally, or in small batches. Learning is fast, so the system can adapt to changing conditions on the fly.

Online learning can happen 'out of core' when training set doesn't fit in memory. Here, the training data is split into multiple batches to be run sequentially.

The 'learning rate' affects how quickly/slowly an online model adapts to change. Too quick and they will forget older patterns, too slow and they will fail to adjust to new patterns.

Bad data can corrupt the model, and may require rolling back to a known 'good' model.

#### Instance-Based Versus Model-Based Learning

Defines how data is generalized.

2 types:
- Instance-Based
- Model-Based

##### Instance-Based

Samples are memorized by the system, and new data is compared to samples using a measure of similarity to find the most similar.

##### Model-Based

System builds a model, and that model is used to make predictions. Original samples can be forgotten.

---



### Main Challenges of Machine Learning

- bad algorithms
- bad data

#### Insufficient Quantity of Training Data

A lot of data required. Thousands of samples for simple problems, sometimes millions for more complicated problems.

#### Nonrepresentative Training Data

If your data is not representative of all the types of data it will be unable to generalize to new data that is outside the scope of what it has been trained on.

#### Poor Quality Data

Errors and noise in the data will corrupt the model. Much time needs to be spent on 'data cleaning' before training a model.

#### Irrelevant Features

Good features must be selected, aka 'feature engineering'. This process consists of:

- Feature Selection: Selecting only useful features
- Feature Extraction: Combining features to produce richer featuers, aka dimensionality reduction.
- Creating new features by collecting more data.

#### Overfitting the Training Data

Model performs well on training data, but doesn't generalize well. Can happen when model is too complex compared to size and noisiness of training data.

Possible solutions include:

- Choosing a model with fewer parameters (et. linear instead of polynomial). Aka 'regularization'.
- Gather more training data
- Reduce noise in training data

#### Underfitting the Training Data

Occurs when model is too simple to learn the structure of the data.
Solutions include:

- Using a more powerful model with more parameters.
- Better feature engineering.
- Reducing constraints on the model (reduce regularization hyperparameters).

---

### Testing and Validation

To know if a model works it needs to be tested. Data can be split into training and validation (holdout) sets to validate that the model trained on training set still works when shown test set. Often 80% training and 20% test split is used. For even better results cross-validation can be used, create different training/test splits and running on all of them to ensure that all data gets a chance to be trained on and tested against.

---

## Exercises

**Q1**: How would you define Machine Learning?

Machine learning is the ability for a computer to learn to solve problems without being explicity programmed how to do so. ML systems can improve given more experience.

**Q2:** Can you name 4 types of problems and where it shines?

- Problems with too many rules to hand code (spam filter)
- Problems with no algorithmic solution (translation, voice recognition etc.)
- Problems that need to adapt while running to changing environments.
- Allowing humans to learn (data mining).

**Q3:** What is a labeled training set?

A dataset used to train a model with a set of known labels that the model will be trained to assign to future, unknown data.

**Q4:** What are the two most common supervised tasks?

- Classification
- Regression

**Q5:** Can you name 4 common unsupervised tasks?

- Clustering
- Visualization
- Dimensionality Reduction
- Anomaly Detection
- Association Rule Learning

**Q6:** What type of ML algoritm would you use to allow a robot to walk in various unknown terrains?

Reinforcement learning. It would learn a policy given positive rewards as it navigates its environment.

**Q7**: What type of algorithm would you use to segment your customers into multiple groups.

Clustering

**Q8**: Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?

Supervised learning, assuming you already have examples of spam/not-spam.

**Q9**: What is an online learning system?

A system that learns incrementally while being exposed to new data. Capable of adapting to changing conditions.

**Q10:** What is out-of-core learning?

It is online learning when the problem doesn't fit in system memory. So big batch is broken up into smaller batches.

**Q11:** What type of learning algorithm relies on a similarity measure to make predictions?

Instance based learning.

**Q12:** What is the difference between a model parameter and a learning algorithm’s hyperparameter?

A hyperparameter controls how an algorithm operates, such as learning rate or regularization. A model parameter is an output of the algorithm that defines the model.

**Q13**: What do model-based learning algorithms search for? What is the most common strategy they use to succeed? How do they make predictions?

They search for optimal parameters to make generalizations from the training data to the test data. They often proceed by trying to minimize some cost function. New instances are fed into the model using the parameters found during training to make predictions.

**Q14:** Can you name four of the main challenges in Machine Learning?

- overfitting
- underfitting
- unrepresentative data
- insufficient data
- errors in data (bad data)
- useless features

**Q15:** If your model performs great on the training data but generalizes poorly to new instances, what is happening? Can you name three possible solutions?

It's overfitting the training data. You could apply some regularlization, cross-validate, choose a different model, add more data, reducing noise in the training data.

**Q16:** What is a test set and why would you want to use it?

You use it to validate the model that you built with your training set.

**Q17:** What is the purpose of a validation set?

It's used as a holdout set that the algorithm can't see during training, to ensure that your model generalizes well.

**Q18:** What can go wrong if you tune hyperparameters using the test set?

You may overfit the model.

**Q19**: What is cross-validation and why would you prefer it to a validation set?

It uses different combinations of data as training/validation sets so you can measure the effectiveness multiple times. It allows you to use more data to train the model and gives more accurate measure of the model's effectiveness.