# Machine Learning - An Introduction

## What is Machine Learning?

Machine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed.
—Arthur Samuel, 1959


A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.
—Tom Mitchell, 1997

### Examples of machine learning

* Email spam filter
* Image recognition
* Character recognition
* Sentiment analysis
* Stock market prediction
* Credit risk evaluation
* Product recommendations (Amazon, Youtube, etc.)
* Web search
* Self-driving cars

## Why should we use machine learning?

It is impossible to know the function that returns the best results based on large amounts of data (e.g., we couldn't possibly write an algorithm comprising all the rules that might decide whether an email is spam).
<center>

<img src="images/geron1_1.png" align="center" width="800" />
</center>
(Traditional approach, Geron, (Fig 1-1)

Solution: Instead choose only the method used to solve the problem and use data to learn the parameters.

<center>

<img src="images/geron1_2.png" align="center" width="800" />
</center>


Machine Learning approach (Geron, Fig 1-2)

## Types of Machine Learning

There are three types of machine learning: __supervised learning__, __unsupervised learning__, and __reinforcement learning__. We will mostly be concerned with supervised learning.

## Supervised Learning

<center>

<img src="images/01_02.png" align="center" width="600" />
</center>
(Raschka, ch 1)

In supervised learning, we use __training data__ containing __labels__ to learn the __predictive model__.

A __label__ is the outcome of interest that we want to predict.

__Training data__ is used only to determine the predictive model. We already know the labels for this data set.


Each data point is called a __sample__ (equivalent to what would be called an observation in Econometrics).


The data contains __features__, which are used to make the prediction. The term `feature` in ML corresponds to the term `predictor` in regression models in Statistics or Econometrics.

The __predictive model__ is then used to make predictions for new data. This new data is usually called __test data__.


## Types of prediction in Supervised Learning

There are two different prediction tasks for which we use supervised learning: __classification__ and __regression__ (not to be confused with regression in Statistics and Econometrics). We will later see that most machine learning algorithms can be used for either type of prediction task.

## Classification
Classification is the task of predicting a categorical label. I.e., the label can take on two or more discrete values and we would like to
* predict which category an instance most likely belongs to, or
* determine the probabilities for each category.

<center>
<img src="images/01_03.png" align="center" width="600" />
</center>
(Raschka, ch 1)

## Regression

Regression is the task of predicting a numerical value of the label. I.e., the label can take on a continuum of values and we would like to predict the expected value.

<center>

<img src="images/01_04.png" align="center" width="800" />
</center>
(Raschka, ch 1)


## Reinforcement learning
In reinforcement learning, the algorithm improves its performance based on a __reward signal__ it receives in response to its actions. Reinforcement learning is used, e.g., in the software used by chess engines or self-driving cars. We won't look at reinforcement learning in any detail in this course.

<center>

<img src="images/01_05.png" align="center" width="1000" />
</center>
(Raschka, ch 1)


## Unsupervised learning
In unsupervised learning, we want to learn something about the structure of data that doesn't contain labels.

### Clustering
One popular unsupervised machine learning task is clustering, which groups the data into clusters containing samples that are similar to one another and different from those in other clusters.
<center>

<img src="images/01_06.png" align="center" width="800" />
</center>
(Raschka, ch 1)

### Dimensionality reduction
Another important unsupervised learning task is dimensionality reduction, i.e., taking high-dimensional data and reducing it to a lower-dimensional data set containing most of the information in the higher-dimensional data.

### Anomaly detection
Anomaly detection tries to identify outliers in the data. Applications include fraudulent transactions by bank customers, and simple errors in the data we use.  
<center>

<img src="images/geron1_10.png" align="center" width="1000" />
</center>
(Geron, fig 1-10)

### Overview of terminology
<center>

<img src="images/01_08.png" align="center" width="800" />
</center>
(Raschka, ch 1)

## Building a machine learning system
<center>

<img src="images/01_09.png" align="center" width="800" />
</center>
(Raschka, ch 1)