# Chapter-1

# THE MACHINE LEARNING LANDSCAPE

### Chapter Overview

- What is ML?
- Why use ML?
- Types of ML systems
    - Supervised v/s Unsupervised
    - Online v/s Batch
    - Instance-based v/s Model-based
- Workflow of a typical ML project
- Main challenges of ML
- Testing & Validating (evaluate & fine-tune an ML system)

## What is Machine Learning?
(ML Definitions)

1. ML is the science (art) of programming computers so they can learn from data.

2. ML is the field of study that gives computers the ability to learn without being explicitly programmed. (Arthur Samuel, 1959)

3. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. (Tom Mitchell, 1997)

- **Training set**: The examples that the system uses to learn. 
- Each training example is called a **training instance (or sample)


- In above definition:   
**T** = task  
**E** = experience = training data  
**P** = performance measure to be defined (e.g: Accuracy)

## Why use Machine Learning?

- ML techniques make a task (like spam filter) much shorter, easier to maintain, and most likely more accurate than traditional programming technique.
- ML techniques automatically adapt to change.
- ML shines for problems that are either too complex for traditional approaches or have no known algorithm. e.g: speech recognition system.
- ML can help humans learn (using data mining)

- **Data Mining**: Applying ML techniques to dig into large amounts of data in order to discover patterns that were not immediately apparent.

#### In summary, ML is great for:
1. Problems for which existing solutions require a lot of hand-tuning or long lists of rules. ML algorithm can simplify the code and perform better.
2. Complex problems for which there is no good solution at all using a traditional approach. The best ML techniques can find a solution in such cases.
3. Fluctuating environments: an ML system can adapt to new data.
4. Getting insights about complex problems and large amounts of data.

## Types of Machine Learning Systems

ML systems can be broadly classified based on:
1. whether or not they are trained with human supervision (supervised, unsupervised, semi-supervised and reinforcement learning)

2. whether or not they can learn incrementally on the fly (online v/s batch learning)

3. whether they work by simply computing new data points to known data points, or instead detect patterns in the training data and build a predictive model (instance-based v/s model-based learning)

- Above criteria can be combined in any way you like.

### Supervised / Unsupervised Learning

- ML systems can be classified based on the **amount and type of supervision they get during training**
- There are four major categories:

#### 1. Supervised Learning

- The training data you feed to the algorithm includes the desired solutions, called **Labels**.
- Given a set of **features**, the typical supervised learning tasks are:
> - Classification: to classify, e.g: spam filter
> - Regression: to predict a target numeric value, e.g: price of car

#### Difference between 'attribute' and 'feature' in ML

- An **attribute** is a data type, e.g: Mileage
- A **feature** generally means an attribute plus its value, e.g: Mileage = 15,000
- But these two words are used interchangeably by many people

Some important supervised learning algorithms
- k-Nearest Neighbours
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural Networks (exception: Autoencoders)

> Exceptions:  
> Unsupervised Neural Network architectures
> - Autoencoders
> - Restricted Boltzmann machines


> Semisupervised Neural Network architectures
> - Deep Belief networks
> - Unsupervised pretraining

#### 2. Unsupervised Learning

- The training data is **unlabeled**.
- The system tries to learn without a teacher.

Some important unsupervised learning algorithms
- Clustering
> - K-Means
> - DBSCAN
> - Hierarchical Cluster Analysis (HCA)


- Anomaly detection and novelty detection
> - One-class SVM
> - Isolation forest


- Visualization and dimensionality reduction
> - Principal Component Analysis (PCA)
> - Kernel PCA
> - Locally-Linear Embedding (LLE)
> - t-distributed Stochastic Neighbor Embedding (t-SNE)


- Association rule learning
> - Apriori
> - Eclat

- **Clustering algorithm** tries to detect groups of similar observations.
- **Hierarchical clustering algorithm** subdivides each group into smaller groups.
- **Visualization algorithms** take a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted.
- In **Dimensionality Reduction**, the goal is to simplify the data without losing too much information.
> - One way to do this is **Feature Extraction**: to merge several correlated features into one.
> - If we reduce dimensions of training data before feeding it into an ML algorithm, it will run much faster, data will take up less disk and memory space, and the algorithm may perform better.

- **Anomaly Detection**: Example: detecting unusual credit card transactions to prevent fraud. Here, system is shown normal instances during training > it learns to recognize them > when it sees a new instance, it can tell whether it looks normal or is a likely anomaly.

- **Novelty Detection**: Such algorithms expect to see only normal data during training while anomaly detection algorithms are usually more tolerant.

- **Association Rule Learning**: the goal here is to dig into large amounts of data and discover interesting relations between attributes.

#### 3. Semisupervised Learning

- Learning with algorithms that can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.
- Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms. e.g: ***Deep Belief Networks (DBNs)*** are based on unsupervised components called ***Boltzmann machines (RBMs)*** stacked on top of one another.

#### 4. Reinforcement Learning

- The learning system - **agent** - can observe the environment, select and perform actions, and get **rewards** in return - or **penalties** in the form of negative rewards
- Then, it learns by itself what is best strategy - **policy** - to get the most reward over time.
- A **policy** defines what action the **agent** should choose when it is in a given situation.

### Batch and Online Learning

- ML systems can also be classified based on **whether or not the system can learn incrementally from a stream of incoming data**

#### 1. Batch Learning (or Offline Learning)

- In this, the system is capable of learning incrementally
- It must be trained using all the available data
- This takes a lot of time and computing resources, so it is typically done offline.
- First the system is trained > then launched into production and runs without learning anymore > it just applies what it has learned.

- To update a batch learning system > you need to train a new version of the system from scratch on the full dataset > then stop the old system and replace it

#### Limitations of Batch Learning

- If your system needs to adapt to rapidly changing data, then you need a more reactive solution than batch learning.

- If amount of data is huge, it may even be impossible to use a batch learning algorithm.

- If the system needs to learn autonomously and it has limited resources > like smartphone app > then it is impractical to carry around large amounts of training data and take up a lot of resources to train for hours every day

> Solution is incremental learning (online learning)

#### 2. Online Learning (or Incremental Learning)