# Introduction

This notebook will be a short introduction to machine learning so you can see what we will encounter during our Machine Learning journey.

## Chapter objectives

Understand
- What is, and what is not, machine learning
- The components of a machine learning algorithm
    - data
    - models
    - criterions
    - optimisation
- basic model regularisation
- basic hyperparameter tuning
- the mathematics describing what machine learning is doing, and why
- the polychotomy of learning algorithms
    - supervised, unsupervised and reinforcement learning
    - parametric and non-parametric models
    - classification vs regression algorithms
- how the most popular traditional (non-neural) models work and what they are good for
    - Linear regression
    - Logistic regression
    - Support vector machines
    - K-nearest neighbours
    - K-means clustering
    - DBSCAN
    - Classification and regression trees (CARTs)
- ensembles

## So what is machine learning?

A machine learning algorithm is one which is able to learn from data to increase it's performance at some task.

"A computer program is said to learn from experience **E** with respect to some class of tasks **T** and performance measure **P**, if its performance at tasks in **T**, as measured by **P**, improves with experience **E**." - [Tom Mitchell](http://profsite.um.ac.ir/~monsefi/machine-learning/pdf/Machine-Learning-Tom-Mitchell.pdf)

## The task, T

- regression - predicting a continuous output
- classification - predicting output being in one of multiple classes
- clustering - grouping sets of examples together
- lots of others (some of which we might later see)

The word "predicting" very well describes the formulation of all of these examples. 

> The task can always be framed as solving a prediction problem, where we have some input and want to predict some output.

Because we will be letting our computers do all of the hard work, our inputs and outputs must be represented numerically.

> Once we have numerical inputs and outputs, we can write mathematical functions that perform this input-output mapping.

As such, the bulk of this chapter will be focused on different types of functions, their advantages and disadvantages.

Our aim will be to not rigidly define these functions ourself, but to allow our machines to find the transformation to be applied to the input in order to product

![](images/inp-out.jpg)

## The performance, P

Given a task, we need to be able to measure how well we are doing. 

> There are many ways of doing this, and the one you should choose often depends on the context of what problem you are trying to solve. 

Examples of performance measures:
- accuracy - how many predictions were correct
- squared error - how far off our continuous predictions are

These performance metrics are easy to evaluate, but in reality they may not convert into the results that you really want. 

We will talk more about performance measurements in a few notebooks

## The experience, E

> Experience is the data that machine learning algorithm processes and with which it updates itself

## Supervised vs Unsupervised

Those are two most commonly used machine learning subfields and they differ with respect to:
- task (T)
- experience (E)
- performance measures (P)

### Supervised

> In supervised machine learning our task (T) is to predict some output, given some input

For example, given years of work experience predict salary.

> In supervised machine learning our performance (P) metrics are based on predicted outputs and real data

Say our model predicted `22k` when in reality it should be `30k`. We made a mistake of `8k`!

> In supervised machine learning our experience is predicting on inputs, comparing predicted outputs to the real one and improving based on those mistakes

We made a mistake of `8k`, now we would like to __minimize it__ to model real data better.

### Unsupervised

> In unsupervised machine learning our task (T) is finding structure in data

Say we have a shop with with `10.000` shirts. We would like to group those shirts into following size:
- large (L)
- medium (M)
- small (S)

> In supervised machine learning our performance (P) metrics are based on predicted outputs __and we have no real "targets"__

If our model predicts that `9.900` shirts are large, we intuitively feel there is something wrong. There are ways to measure this intuition we will talk about later.

> In unsupervised machine learning our experience is iterating over the unstructured data in order to improve pattern finding

# The general framework for machine learning algorithms

Almost all machine learning algorithms consist of 4 components:

1. __data__
2. __model__
3. __loss__
4. __optimiser__

This next two modules will introduce you to all of those. 

By the end, we will have used all of them to implement our first machine learning algorithm - linear regression.

## 1. Data

The data represents the input-output relationship that our algorithm will learn.

Our aim is to produce a mathematical function that:
- takes in a sample
- makes a prediction about it

The data will determine the meaning and shape of our function and what it outputs as a prediction.

Below you can see data about people and their salary (this is what we want to predict)

| Age   | Height  | KnowsML | KnowsProgramming  | Salary  |
|---|---|---|---|---|
| 30  | 175.2  | 0  | 1  |  85.15 |
| 25  |  182 |  1 |  1 | 120.2  |
| 17  | 165  | 0  | 0  | 40.5  |

### Firstly, how can we represent these examples numerically?

> To start with, we need to separate the **label** (regression target, `salary` above) from the **features** (input to algorithm, all other columns above). So for each example (row), we will have a single scalar label $y$.

In our case, each example has several features. We can group these together mathematically as a vector, $x$. 

It will have as many rows as there are features in the example. Let's call this number of features per example $n$. 

![](images/single_data_point.jpg)

We have $m$ examples, and indicate an arbitrary example's index with an $i$.

We can stack these examples in columns to produce a **design matrix**, $X$, which will then contain all of our data, as shown below.

![](images/design_matrix.jpg)

The scalar labels for each example can also be arranged into a single vector.

![image](images/labels.jpg)

Again, please note that this is just one specific example, and that other problems may have wildly different input-output formats and relations. 

As long as __inputs__ and __targets__ can be represented mathematically, it will be possible to create a model that predicts __targets__ with it's __outputs__.

> Can you think of any data type that might be hard to represent mathematically?

## Exercise

Create Pandas `DataFrame` with two columns:
- Years of Experience (`float`)
- Salary (in thousands) (`float`)

Create `5` to `10` data points (manually or via some function, your choice)

In [None]:
import pandas as pd

data = pd.DataFrame(
    {
        "Years of Experience": [1, 1.5, 5.5, 3.5, 7.5, 25.5],
        "Salary (in thousands)": [22.5, 30.0, 125.0, 80.0, 95.0, 110.0],
    }
)

data

## Classification vs regression

Most machine learning problems are parts of supervised machine learning which is further split into:

- regression problems (continuous target)
- classification problems (categorical target)

For example, __predicting house prices from its features is a regression problem__ because those prices can take any floating points value, for example:
- $ 35.45 $ \$ (in thousands)
- $ 120 $ \$ (in thousands)

On the other hand, __predicting house type from its features is a CLASSification problem__, becase house can be __of only one type__, for example:
- detached (class `0`)
- semi-detached (class `1`)
- terraced (class `2`)

It would not make sense to have continuous values here (what would `0.3` mean? Mostly detached but partially semi-detached?)

## The Model

> A model is a less detailed representation of something. For example, a globe is a model of the world. 

It's not perfect, and there is a lot that we can't find our about the world by looking at our globe, but it can be useful to show foreigners where your country is located.

> Model gets input data, internalizes knowledge about it and is able to predict something on new (unseen) data afterwards

In the next lesson we are gonna use our first models

## Challenges

- Represent a few tasks as machine learning tasks. Are those regression or classification? Are those supervised or unsupervised?
- What other machine learning disciplines are there? Check out [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning), [semi-supervised learning](https://en.wikipedia.org/wiki/Semi-supervised_learning), [active learning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning))
- What other supervised subfields can you find instead of classification and regression?

## Summary

- Machine learning algorithms use __experience__ to improve their __performance__ on defined __task__
- ML is canonically split into `unsupervised`, `supervised`
- Supervised machine learning is mainly __learning mathematical functions that represent the input-output relationships__ (features $\rightarrow$ labels)
- Unsupervised machine learning __learns mathematical function to represent inherent structure in data__
- `supervised` is usually either `regression` or `classification`
- Model is __mathematical function__ which tries to model __input-output relationship__ (usually implemented as `class`)