In [1]:
# Global imports and settings
%matplotlib inline
from preamble import *
plt.rcParams['savefig.dpi'] = 100 # Use 300 for PDF, 100 for slides
HTML('''<style>html, body{overflow-y: visible !important} .CodeMirror{min-width:105% !important;} .rise-enabled .CodeMirror, .rise-enabled .output_subarea{font-size:140%; line-height:1.2; overflow: visible;} .output_subarea pre{width:110%}</style>''') # For slides

# Introduction
In this notebook, we will:
    
- Explain the main machine learning concepts
- Used scikit-learn to build a first model
- Learn our first algorithm (kNN)

# Types of machine learning
We often distinguish 3 `types` of machine learning:

- __Supervised Learning__: learn a model from labeled _training data_, then make predictions
- __Unsupervised Learning__: explore the structure of the data to extract meaningful information
- __Reinforcement Learning__: develop an agent that improves its performance based on interactions with the environment 

Note:

- Semi-supervised methods combine the first two.
- ML systems can combine many types in one system.

## Supervised Machine Learning

- Learn a model from labeled training data, then make predictions
- Supervised: we know the correct/desired outcome (label)

2 subtypes:
- Classification: predict a _class label_ (category), e.g. spam/not spam
    - Many classifiers can also return a _confidence_ per class
- Regression: predict a continuous value, e.g. temperature
    - Some algorithms can return a _confidence interval_

Most supervised algorithms that we will see can do both.

<img src="images/01_supervised.png" alt="ml" style="width: 500px;"/>

### Classification

- Class labels are discrete, unordered
- Can be _binary_ (2 classes) or _multi-class_ (e.g. letter recognition)
- Dataset can have any number of predictive variables (predictors)
    - Also known as the dimensionality of the dataset
- The predictions of the model yield a _decision boundary_ separating the classes

![classification](../images/01_classification.png)

### Regression

- Target variable is numeric
- Find the relationship between predictors and the target.
    - E.g. relationship between hours studied and final grade
- Example: Linear regression (fits a straight line)

![regression](../images/01_regression2.png)

### Reinforcement learning

- Develop an agent that improves its performance based on interactions with the environment
    - Example: games like Chess, Go,...
- _Reward function_ defines how well a (series of) actions works
- Learn a series of actions that maximizes reward through exploration

![reinforcement learning](../images/01_rl2.png)

## Unsupervised Machine Learning

- Unlabeled data, or data with unknown structure
- Explore the structure of the data to extract information
- Many types, we'll just discuss two.

### Clustering

- Organize information into meaningful subgroups (clusters)
- Objects in cluster share certain degree of similarity (and dissimilarity to other clusters)
- Example: distinguish different types of customers

![clustering](../images/01_cluster2.png)

### Dimensionality reduction

- Data can be very high-dimensional and difficult to understand, learn from, store,...
- Dimensionality reduction can compress the data into fewer dimensions, while retaining most of the information
- Contrary to feature selection, the new features lose their (original) meaning
- Is often useful for visualization (e.g. compress to 2D)

![dimred](../images/01_dimred.png)

## Basic Terminology (on Iris dataset)
![terminology](../images/01_terminology.png)

## Building machine learning systems
A typical machine learning system has multiple components:
    
- Preprocessing: Raw data is rarely ideal for learning
    - Feature scaling: bring values in same range
    - Encoding: make categorical features numeric
    - Discretization: make numeric features categorical
    - Feature selection: remove uninteresting/correlated features
    - Dimensionality reduction can also make data easier to learn
    

- Learning and model selection
    - Every algorithm has its own biases
    - No single algorithm is always best (No Free Lunch)
    - _Model selection_ compares and selects the best models
        - Different algorithms
        - Every algorithm has different options (hyperparameters)
    - Split data in training and test sets

- Together they form a _workflow_ of _pipeline_

![pipelines](../images/01_ml_systems.png)