# CHAPTER 1: THE MACHINE LEARNING LANDSCAPE

---

## What is Machine Learning?

[Machine Learning is the] field of study that gives computers the ability to learn
without being explicitly programmed.
—Arthur Samuel, 1959

A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.
—Tom Mitchell, 1997

---

## Why Use Machine Learning?

- Using traditional approach will make a long list of rules, that will work only  in specific use. With machine learning, ML automaticaly learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples.
- Reveal unsuspected corelations or new trends, and thereby lead to a better understanding of the problem.

---

## Types of Machine Learning Systems

- Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised, and Reinforcement Learning).
- Whether or not they can learn incrementally on the fly (online versus batch learning).
- Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning).

### Supervised/Unsupervised Learning

#### Supervised learning

- Include the desired solutions, called labels.
- Types: Classification (categorical) and Regression (numerical)
- In machine learning, an attribute is a data type (e.g., "Mileage"), while a feature has several meanings depending on the context but generally means an attribute plus its value (e.g., "Milage = 15,000")
- Some of the most important supervised learning algorithms:
    - k-Nearest Neighbors
    - Linear Regression
    - Logistic Regression
    - Support Vector Machines (SVMs)
    - Decision Trees and Random Forests
    - Neural Networks

#### Unsupervised learning

- Unlabeled
- Some of the most important unsupervised learning algorithms:
    - Clustering: 
        - Cluster data points with similar features
        - Make groups with high similarity
    - Visualization and dimensionality reduction: 
        - Feed complex, get 2D or 3D representation
        - Simplify data without losing too much information
        - Feature extraction: merge or combine features that highly correlated to each other, so we get less features
        - Anomaly detection: prevent fraud, removing outliers
    - Association rule learning
        - Dig large amounts of data and discover interesting relations between attributes

#### Semisupervised learning

- Partially labeled training data and unlabeled data
- Combinations of unsupervised and supervised algorithms

#### Reinforcement learning

- The learning system (agent) observe the environment, select and perform actions, get rewards or penalties. Learn by itself what is the best strategy (policy) to get the most reward over time

### Batch and Online Learning

#### Batch learning

- The system is incapable of learning incrementally: it must be trained using all the available data. Also called as offline learning
- Train it all over again to learn the new version
- Requires a lot of computing resources

#### Online learning

- We traing the system incrementally by feeding it data instances sequentially, either individually or with mini batches
- High learning rate means the system will rapidly adapt to new data, but it will also tend to quickly forget the old data
- If we get bad data, the performance will decline

### Instance-Based vs Model Based Learning

#### Instance-based learning

- Flag not only to the same sample, but also to new sample that similar to the known sample
- The system learns the example by heart, then generalizes to new cases using a similarity measure.

#### Model-based learning

- Build a model of these examples, then use that model to make predictions
- Model selection: explore the data, looking insights and choose the best model to use, that represents the data in the best way
- Utility function (or fitness function) that measures how good the model is, or we can define a cost function that measures how bad it is.

---

## Main Challenges of Machine Learning

### Insufficient Quantity of Training Data

- For very simple problems we typically need thousands of examples, and for complex problems such as image or speech recognition we may need millions of examples.

### Nonrepresentative Training Data

- How the training data can represents the general topic of the problem at hand
- By using a nonrepresentative training set, we trained a model that is unlikely to make accurate predictions
- Source of failures: sampling noise (for small dataset) and sampling bias (the sampling method is flawed).

### Poor-Quality Data

- If the data is full of errors, outliers, and noise
- Better to cleaning it up first
- Garbage in, garbage out
- If some instances are outliers, the choices are to discard them or try to fix the errors manually.
- If some instances are missing a few features, the choices are ignore this attribute, ignore these instances, fill in the missing values, or train one model with the feature and one model without it.

### Irrelevant Features

- Enough relevant features and not too many irrelevant ones.
- Feature engineering: get good set of  features that represents the data without losing any important information
    - Feature selection: selecting the most useful features
    - Feature extraction: combining existing features to produce a more useful one
    - Creating new features by gathering new data

### Overfitting the Training Data

- Overfitting: the model performs well on the training data, but it does not generalize well
- If the data is too noisy, the model will also look at the small irrelevant pattern that maybe exist within the data.
- Overfitting happens when the model is too complex relative to the amount and noisiness of the training data, solutions:
    - Simplify the model, reducing the number of attributes
    - Gather more training data
    - Reduce the noisiness in the training data (fix data errors, remove outliers)
- Regularization: contraining a model to make it simpler and reduce the rist of overfitting, makes the model to have a smaller slope. The amount of regularization can be controlled by a hyperparameter.
- Hyperparameter: a parameter of a learning algorithm, it must be set prior to training and remanins constant during training, large regulartization parameter, we get almost flat model

### Underfitting the Training Data

- The opposite of overfitting, the model is to simple to learn the underlying structure of the data.
- Solutions:
    - Selecting more powerful model with more parameters
    - Feeding better features to the learning algorithm
    - Reducing the constrains on the model

---

## Testing and Validating

- Put the model in production and monitor how well it performs. 
- Split the data into training set and test set
- The error rate on new cases is called the generalization error (or out-of-sample error)
- 80% for training set and 20% for test set
- Training set for training the data, validation set to find hyperparameters, and test set to know the generalization performance (or using cross-validation