# Chapter Intro & Overview: ML Foundations

## Chapter Content
- ML fundamentals
    - Linear regression & a 4-part framework for ML algorithms
    - Gradient descent
    - Bias, variance and generalisation
    - Regularisation
    - Hyperparameters
    - MLE
- Other traditional learning algorithms
    - Linear regression
    - Logistic regression
    - Support vector machines
    - K-nearest neighbours
    - K-means clustering
    - DBSCAN
    - Classification and regression trees (CARTs)
- ensembles

## Chapter objectives
Understand
- What is, and what is not, machine learning
- The components of a machine learning algorithm
    - data
    - models
    - criterions
    - optimisation
- basic model regularisation
- basic hyperparameter tuning
- the mathematics describing what machine learning is doing, and why
- the polychotomy of learning algorithms
    - supervised, unsupervised and reinforcement learning
    - parametric and non-parametric models
    - classification vs regression algorithms
- how the most popular traditional (non-neural) models work and what they are good for
    - Linear regression
    - Logistic regression
    - Support vector machines
    - K-nearest neighbours
    - K-means clustering
    - DBSCAN
    - Classification and regression trees (CARTs)
- ensembles
- boosting

## So what is machine learning?

A machine learning algorithm is one which is able to learn from data to increase it's performance at some task.

"A computer program is said to learn from experience **E** with respect to some class of tasks **T** and performance measure **P**, if its performance at tasks in **T**, as measured by **P**, improves with experience **E**." - [Tom Mitchell](http://profsite.um.ac.ir/~monsefi/machine-learning/pdf/Machine-Learning-Tom-Mitchell.pdf)

### The task, T
- regression - predicting a continuous output
- clustering - grouping sets of examples together
- machine translation - translating between one language and another
- transcription - predicting the words from an audio input
- density estimation - predicting a probability distribution
- anomaly detection - predicting data that is out of the ordinary distribution of examples
- structured output - predicting an output where the values are highly interrelated by arrangement as well as value. For example when producing images, audio or text as an output, the outputs have to seem like realistic examples. This is dependent not only on the values they consist of, but how they are arranged in the output.

The word "predicting" very well describes the formulation of all of these examples. 
The task can always be framed as solving a prediction problem, where we have some input and want to predict some output.

Because we will be letting our computers do all of the hard work, our inputs and outputs must be represented numerically. 
Once we have numerical inputs and outputs, we can write mathematical functions that perform this input-output mapping.
As such, the bulk of this chapter will be focused on different types of functions, their advantages and disadvantages.

Our aim will be to not rigidly define these functions ourself, but to allow our machines to find the transformation to be applied to the input, to produce the correct output, for themselves.

![](images/inp-out.jpg)

### The performance

Given a task, we need to be able to measure how well we are doing. There are many ways of doing this, and the one you should choose often depends on the context of what problem you are trying to solve. 

Examples of performance measures:
- accuracy - how many predictions were correct
- error rate - 1-accuracy
- mean squared error loss - continous prediction
- KL-divergence - dissimilarity between two probability distributions 
- cross entropy - similar to KL-divergence
- BLEU score - measures the quality of translated sentences

These performance metrics are easy to evaluate, but in reality they may not convert into the results that you really want. For example, a prediction algorithm that performs perfectly on all examples in your dataset may not generalise well to perform well on other examples. The real challenge is to build algorithms that do what you want, not what you told them to do.

### The experience

You've probably heard that machine learning algorithms need data. 
This is the experience that they process. 

Some datasets contain examples which have labels. In this case, we call the datasets supervised. 
Datasets containing examples without labels are known as unsupervised.
Another paradigm of machine learning, called **reinforcement learning**, considers the case where the algorithm can interact with its environment and collect more experience as it goes. 

The lines between supervised, unsupervised and reinforcement learning are blurred, and the definitions are loose. An algorithm could perform them in conjunction.

## 

## Summary
- Machine learning algorithms use experience to improve their performance on some task
- Machine learning is mainly learning mathematical functions that represent the input-output relationships between the features and the labels of some data 
- ML is canonically split into unsupervised, supervised and reinforcement learning 

## Next steps
- [Linear regression and a framework for machine learning algorithms]()