# Chapter 1 - The Machine Learning Landscape
This chapter primarily introduces a lot of the fundamental concepts and jargon that everyone practicing ML should know.

## 1.1 What is Machine Learning

- Machine Learning is the science (and art) of programming computers so they can *learn from data*
- The set of data that a machine learning system uses to learn from is called the *training set*

## 1.2 Why Use Machine Learning

- It is great for problems for which existing solutions require a lot of fine-tuning or long lists of rules
    - Often, a ML algorithm can simplify code and perform better than the traditional approach
- ML techniques can possibly find a solution for complex problems for which using a traditional approach doesn't yield
a good solution
- ML systems can easily adapt to new data from fluctuating environments
- Getting insights about complex problems and large amounts of data
    - Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent
    is called *data mining*

## 1.3 Examples of Applications
- *Image Classification* is the analysis of images in an attempt to automatically identify them as particular objects/shapes/etc.
    - Typically performed using convolutional neural networks (CNN)
- *Semantic Segmentation* is the analysis of an image where each pixel is classified
    - Typically performed using convolutional neural networks
    - Used to determine the exact location and shape of tumors
- *Text Classification*
    - Automatically flagging offensive comments on discussion forums
    - Is a part of Natural Language Processing (NLP)
- *Text Summarization*
    - Automatically summarizing long documents
    - Also NLP
- *Chatbot*
    - Involves NLP components including understanding and question-answering modules
- *Regression*
    - Forecasting company revenue next year based on performance metrics
    - Can be performed using Linear Regression, Polynomial Regression, Support Vector Machine (SVM), Random Forest,
    Neural Network
- *Speech Recognition*
    - Audio samples are processed for speech recognition
    - Typically uses Recurrent Neural Networks (RNNs), CNN, or transformers
- *Anomaly Detection*
    - Detecting fraud
- *Clustering*
    - Segmenting clients based on their purchases so that you can design a different marketing strategy for each segment
- *Dimensionality Reduction*
- *Recommender Systems*
    - Usually done with the use of Neural Networks
- *AI Bots for Games*
    - Usually done through reinforcement learning (RL)

## 1.4 Types of Machine Learning Systems

- Machine Learning systems can be broadly classified into these broad categories:
    - Whether or not they are training with human supervision (supervised, unsupervised, semisupervised, and reinforcement learning)
    - Whether or not they can learn incrementally on the fly
    - Whether they work by simply comparing new data points to known data points or instead by detecting patterns in the training data
    and building a predictive models (instance-based versus model-based learning)
    
### 1.4.1 Supervised/Unsupervised Learning

- There are 4 major categories (how they learn):
    - Supervised learning
    - Unsupervised learning
    - Semisupervised learning
    - Reinforcement learning
    
#### 1.4.1.1 Supervised Learning

- In *supervised learning*, the training set you feed to the algorithm includes the desired solutions called *labels*
- A typical supervised learning task is *classification*
- Another typical task is to predict *target* numeric values given a set of *features* called *predictors* called *regression*
- An *attribute* is a data type
- A *feature* generally means an attribute with its value

#### 1.4.1.2 Unsupervised Learning

- In *unsupervised learning* the training data in unlabeled
- The system will try to learn without any intervention
- Some important unsupervised learning tasks include:
    - Dimensionality reduction
    - Anomaly detection
    - Novelty detection
    - Association rule learning
    
#### 1.4.1.3 Semisupervised Learning

- This type of learning system deals with data where the data is only partially labelled
- Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms

#### 1.4.1.4 Reinforcement Learning

- This system involves an *agent* that can observe the "environment", select and perform actions, and get *rewards* in return
    - These rewards can be positive or negative (penalties)
- The system then learns by itself what the best strategy is, called a *policy*, where it maximizes the reward over time

### 1.4.2 Batch and Online Learning

#### 1.4.2.1 Batch Learning

- In *batch learning*, the system is incapable of learning incrementally meaning it must be trained using all the available data
- The system is training, and then it is launched into production and runs without learning anymore; it just applies what
it has learned called *offline learning*
    - If you want a batch learning system to know about new data, yuo need to train a new version of the system from scratch
    on the full dataset and replace the old one with the new one
    
#### 1.4.2.2 Online Learning

- In *online learning*, you train the system incrementally by feeding it data instances sequentially, either individually 
or in small groups called *mini-batches*
- *Online learning* is great for system that need to adapt to change rapidly
- *Online learning* can also be used to training systems on huge datasets that cannot fit in one machine's main memory 
(called *out-of-core* learning)
- An important parameter of online learning systems is how fast they should adapt to changing data: this is called the 
*learning rate*
    - High learning rates mean the system will rapidly adapt to new data but will also quickly forget what it has learned
    - Low learning rates mean the system will remember longer but also adapt to new data more slowly

### 1.4.3 Instance-Based Versus Model-Based Learning

- One more way to categorize ML systems is by how they *generalize*
    - How well does the system adapt to new (unseen) data?
- Two main approaches to generalization:
    - Instance-based learning
    - Model-based learning
    
#### 1.4.3.1 Instance-Based Learning

- Instance-based learning is basically "learning by heart" or using existing examples, and flagging new examples when they
are identical to previous examples
- The other method to instance-based learning is using a *measure of similarity* where new examples are compared to previous
examples, and if they meet some threshold, are identified as such

#### 1.4.3.2 Model-Based Learning

- Model-based learning is the method of generalizing from a set of examples by building a model from those examples, and then
using the model to make *predictions*
- In order to do model-based learning, you need to specify a performance measure
    - This is often done through defining a *utility function* (or *fitness function*) that measures how good the model is
    - This can also be defined as a *cost function* (or how bad a model is)
    - For Linear Regression, the cost function typically revolves around a measure of distance between the predictions and
    the actual values, and the model works to minimize this distance
 
    

## 1.5 Main Challenges of Machine Learning

- The two things that can go wrong are "bad algorithm" and "bad data"

### 1.5.1 Insufficient Quantity of Training Data

- In general, given enough data, simple ML models can perform just as well (or better) as more complex models
- There is an inherent trade-off that must be considered when thinking about spending time and money on algorithm development
and corpus development (training data)

### 1.5.2 Nonrepresentative Training Data

- In order to generalize well, it is crucial that your training data be representative of the new cases you want to generalize to
    - It is crucial to use a training set that is representative of the cases you want to generalize to
    - If the sample is too small, you will have *sampling noise* (nonrepresentative data as a result of chance)
    - Another source of error is from *sampling bias*, when the sampling method is flawed
    
### 1.5.3 Poor-Quality Data

- If the training data is full of errors, outliers, and noise, it will be harder for the algorithm to detect patterns, and thus
your system is highly likely to perform poorly
    - Severe outliers can be be discarded or dealt with manually
    - Whenever a feature is missing a lot of information, action must be taken whether it is to:
        - Ignore the feature
        - Fill in the missing values
        - Ignore the instances of missing values
        - Train two models with and without the feature
        
### 1.5.4 Irrelevant Features 

- One of the most important parts of a successful ML project is *feature engineering*, or coming up with a good set of features
to train on. This process involves the following steps:
    - *Feature selection*: selecting the most useful features to train on among existing features
    - *Feature extraction*: combining existing features to produce a more useful one
    - Creating new features by gathering new data
    
### 1.5.5 Overfitting the Training Data

- Overfitting is when the model performs well on the training data, but does not generalize well to new data
    - Complex models such as deep neural networks are able to detect subtle patterns in data and because of this, if a dataset
    is noisy or too small, the model will likely feel like the noise is useful information
- Overfitting can be solved by:
    - Simplifying the model
        - Selecting a model with fewer parameters
        - Reducing the number of attributes in the training data
        - Constraining the model
    - Gathering more training data
    - Reduce the noise in the training data
        - Fix errors
        - Remove outliers
- Constraining a model to make it simpler and reduce the risk of overfitting is called *regularization*
- The ultimate objective is to find the right balance between fitting the training data and keeping the model simple so that 
it can generalize well
- Regularization is typically controlled through *hyperparameters*
    - *Hyperparameters* are parameters of a learning algorithm and not the model itself
    
### 1.5.6 Underfitting the Training Data

- This occurs when the model is too simple to learn the underlying structure of the data
- The main options for solving this problem are:
    - Selecting a more powerful model, with more parameters
    - Feed better features to the learning algorithm (feature engineering)
    - Reduce the constraints on the model (reduce the regularization parameters)
    
### 1.5.7 Stepping Back

- ML is about making machines get better at some task by learning from data instead of having to explicitly code rules
- There are many different types of ML systems: supervised or not, batch or online, instance-based or model-based
- In an ML project, you gather data in a training set, feed the training set to a learning algorithm:
    - If the algorithm is model-based, it tunes some parameters to fit the model to the training data
    - If the algorithm is instance-based, it just learns the examples by heart and generalizes to new instances by using
     a similarity metric
 - The system will not perform well if your training set:
    - Is too small
    - Not representative
    - Is noisy
    - Is polluted with irrelevant features

## 1.6 Testing and Validating

- The only way to know how well a model will generalize to new cases is to actually try it out on new cases
- This is accomplished by splitting your data into two sets:
    - The *training set*
    - The *test set*
    - It is very common to use 80% of the data for training and 20% on testing
- The error rate on new cases is called the *generalization error* and this estimate can be determined by the performance 
of the model on the *test set*
- If the training error is low, but the generalization error is high, it means that your model is overfitting the training
data

### 1.6.1 Hyperparameter Tuning and Model Selection

- Evaluating a model is done on the *test set*
- A common solution to solving generalization error on a *test set* is to use a *holdout validation* set or *validation set*
    - You can use this set to evaluate several models and select the best one
    - This is also the time to experiment with different hyperparameters
- The pitfalls of using a single validation set are solved through repeated *cross-validation*
    - This uses many small *validation sets*
    - Each model is evaluated once per validation set after it is training on the rest of the data
    - The error measure on all those models will give a better measure of the performance

### 1.6.2 Data Mismatch

- This is when the data is not perfectly representative of the data that will be used in production