# Hands-On Machine Learning by Géron

Part I. The Fundamentals of Machine Learning 

# 1. The Machine Learning landscape 

A look at the map and learn about the main regions and the most notable landmarks: 
- supervised versus unsupervised learning
- online versus batch learning, 
- instance based versus model-based learning

Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. 

## What is Machine Learning? 

Machine Learning is the science (and art) of programming computes so they can *learn from data*.

## Why Use Machine Learning? 

Machine Learning can help humans learn. ML algorithms can be inspected to see what they have learned (although for some algorithms this can be tricky). 
Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called *data mining*. 

To summarize, Machine Learning is great for: 
- Problems for which existing solutions require a lot of fine-tuning or long lists of rules: one Machine learning algorithm can often simplify code and perform better than the traditional approach. 

- Complex problems for which using a traditional approach yields no good solution: the best Machine Learning techniques can perhaps find a solution. 

- Fluctuating environments: a Machine Learning system can adapt to new data. 

- Getting insights about complex problems and large amounts of data. 

## Types of Machine Learning Systems


### Supervised/Unsupervised Learning 

Machine Learning systems can be classified according to the amount and type of supervision they get during training. 
There are four major categories: 
1. supervised learning,
2. unsupervised learning,
3. smisupervised learning, 
4. and Reinforcement learning.

#### Supervised Learning

In *supervised learning*, the training set you feed to the algorithm includes the desired solutions, called *labels*. 

A typical supervised learning task is *classification* given a set of data with their *class*. 

Another typical task is to predict a *target* numeric value given a set of *features* called *predictors*. This sort of task is called *regression*. To train the system, you need to give it many examples of instances including both their predictors and their labels. 

Some regression algorithms can be used for classification as well, and vice versa. For example, *Logistic Regression* is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class. 

> Note: In machine learning an *attribute* is a data type, while a *feature* has several meanings, depending on the context, but generally means an attribute plus its value. 
Many people use the words *attribute* and *feature* interchangeably. 

Some of the most important supervised learning algorithms: 
- k-nearest neighbors
- linear regression
- logistic regression 
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests 
- Neural Networks 

some neural network architectures can be unsupervised. They can also be semisupervised and unsupervised pretraining. 


#### Unsupervised learning 

In *unsupervised learning* the training data is unlabeled. The system tries to learn without a teacher. 

Some of themost important unsupervised learning algorithms (ch8-ch9): 
* Clustering 
    - K-Means
    - DBSCAN
    - Hierarchical Cluster Analysis (HCA)
* Anomaly detection and novelty detection
    - One-class SVM
    - Isolation Forest
* Visualization and dimensionality reduction 
    - Principal Component Analysis (PCA) 
    - Kernel PCA
    - Locally Linear Embedding (LLE)
    - t-Distributed Stochastic Neighbor Embedding (t-SNE)
* Association rule learning 
    - Apriori
    - Eclat

*Visualization* algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D and 3D representation of our data that can easily be plotted. 
These algorithms try to preserve as much structure as they can so that you can understand how the data is organized and perhaps identify unsuspected patterns. 

A realted task is *dimensinality reduction*, in which the goal is to simplify the data without losing too much information. 
One way to do this is to merge several correlated features into one. 
This is called *feature extraction*. 

> Note: It is often a good idea to try to reduce the dimension of your training data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a supervised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also perform better. 

Yet another important unsupervised task is *anomaly detection*. The system is shown mostly normal instances during training, so it learns to recognize them; then, when it sees a new instance, it can tell whether it looks like a normal one or whether it is likely an anomaly. 
A very similar task is *novelty detection*: it aims to detect new instances that look different from all instances in the training set. 

Finally, another common unsupervised task is *association rule learning*, in which the goal is to dig into large amounts of data and discover interesting relations between attributes. 


#### Semisupervised Learning 


#### Reinforcement Learning 


### Batch and Online Learning 


#### Batch Learning 


#### Online Learning


### Instance-Based versus Model-Based Learning 


#### Instance-based Learning 

#### Model-based Learning 


## Main Challenges of Machine Learning 


### Insufficient Quantity of Training Data 


### Nonrepresentative Training Data 


### Poor-Quality Data 


### Irrelevant Features 


### Overfitting the Training Data 


### Underfitting the Training Data 


### Stepping Back 




## Testing and Validating


### Hyperparameter Tuning and Model Selection 


### Data Mismatch 



## Exercise 

