## 1. Introduction

### 1.1 Definitions and Terminology

#### What is machine learning?

Machine learning (ML) is a **field of artificial intelligence** that uses statistical techniques to give computer systems the ability to **learn (...) from data**, without being explicitly programmed.
The name machine learning was coined in 1959 by Arthur Samuel.

[Wikipedia ML](https://en.wikipedia.org/wiki/Machine_learning)
<img alt="ML AI venn diagram" src="../img/1/ML_AI_venn.png" width=200>

#### What is statistical learning?
- Machine Learning is a subfield of computer science / artificial intelligence focussing on prediction accuracy and big data 
- **Statistical Learning** is a subfield of statistics focussing on model interpreatbility
- Both Machine Learning and Statistical learning focus on supervised and unsupervised learning. They have a lot of overlap.



#### What is AI?
 In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.
[Wikipedia AI](https://en.wikipedia.org/wiki/Artificial_intelligence)

#### Neural Networks
Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.
[Wikipedia NN](https://en.wikipedia.org/wiki/Artificial_neural_network) 

A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.
[Wikipedia Deep Neural Networks](https://en.wikipedia.org/wiki/Deep_learning#Neural_networks)

### 1.2 Machine Learning Concepts

#### Difference between traditional programming and Machine Learning
![ML vs traditional programming](../img/1/ML_vs_trad.png)



#### A Simple Example
<table>
    <tr>
        <td  style="text-align:left" width=300>
            When looking at data one tries to identify patterns in the data as shown here. 
            We wonder what is the cause of that pattern. 
            We can build a <em>model</em> to try to explain the pattern we observe.
            <ul>
                <li> Y is the dependant variable, outcome, response, target</li>
                <li> X is the independent variable, features</li>
            </ul>
        </td>
        <td>
            <img alt="simple example" src="../img/1/simple_example.png" width=400>
        </td>
    </tr>
</table>



We believe there is a true mechanism ($\color{red}{target function f(x)}$ ) that describes the relationship between our variables. The aim of machine learning is to find a model ($\color{blue}{model \hat{f}(x)}$) that estimates this function from the data. 

![estimating the target function](../img/1/target_function.jpg)

On the highest level machine learning can be subdived into *supervised* and *unsupervised learning*.

The criteria is the presence or absence of **labels**. 

Next you see some code to generate a data frame with of random numbers.

Rows are **observations**. Columns 1-4 are **features**. Out is the **label**.

In [1]:
from sklearn.datasets import make_classification
import pandas as pd
# Create a simulated feature matrix and output vector
X, y = make_classification(n_samples = 100, n_features = 5, n_informative = 3, n_redundant = 2,
                                       n_classes = 3, weights = [.2, .3, .5])

# View the first 5 observations
print(pd.DataFrame(X).head())

# View the first 5 labels
print(pd.DataFrame(y).head())

          0         1         2         3         4
0  1.002135 -2.130391  0.498463  1.291054 -1.938559
1 -2.171791 -0.314611  1.607718 -1.782366 -1.912394
2  0.748225  1.018423 -1.234628  0.470358  1.902838
3  0.649036  0.848954  0.933063 -0.054601  0.363106
4 -0.307742 -1.937922  2.617822 -0.183965 -3.469640
   0
0  1
1  1
2  2
3  2
4  0


#### Supervised Learning
you start with
- y = outcome measurement (label, dependent variable, response, target)
	- regression problem y is continuous (5, 3.456, 1023) 
	- classification problem y is discrete (yes/no, A,B,C)
- X = vector of predictor measurement (independent variable, features)
- training data: observations of measurements: $(x_1,y_1), (x_2,y_2), (x_3,y_3) ... (x_n,y_n)$


e.g. Classification, Regression

You will learn more about [supervised learning](2_supervised_learning.ipynb) in the next class.

#### Unsupervised Learning
you start with
- no outcome variable
- X = vector of measurements(independent variable, features)
- objective is more fuzzy: groups samples, find features
- more difficult to evaluate, but easier to get training data
- observations of measurements: $x_1, x_2, x_3 ... x_n$


Objectives are more fuzzy for unsupervised learning

Using the training data we want:
- Find samples that behave similarly
- Find features that behave similarly
- Difficult to assess quality

e.g. Clustering
<img src="../img/1/cats_and_dogs.png" alt="cats and dogs" width=500>

You will learn more about [unsupervised learning](3_unsupervised_learning.ipynb) later today.

Unsupervised vs Supervised Learning

<img src="../img/1/cluster_vs_classify.png" alt="clustering vs classification" width=500>

source: https://deepcast.ai/media/article3/

#### Concept Map
<img src="../img/1/ML_concept_map.png" alt="ML concept map" width=500>
<img src="../img/1/machine_learning_cmap.jpg" alt="ML concept map" width=500>


#### Regression vs Classification
Supervised learning problems can be further divided into regression and classification problems.


Regression covers situations where y is continuous/numerical. e.g.
- Predicting the value of the Dow in 6 months.
- Predicting the price of a given house based on various inputs.

Classification covers situations where y is categorical e.g.
- Will the Dow be up (U) or down (D) in 6 months?
- Is this email a SPAM or not?


#### Why do we estimate f(x)? There are 2 reasons:
1. Prediction
2. Inference

##### Prediction
If we can produce a good estimate for f, we can make accurate predictions for the response, Y, based on a new value of X.

E.g. predicting how much money someone will earn based on education level and age.

##### Inference
Alternatively, we may also be interested in **understanding** the type of relationship between Y and the X's. For example, 
- Which particular predictors actually affect the response? 
- Is the relationship positive or negative? 
- Is the relationship a simple linear one or is it more complicated etc.?


##### There is a Trade-off: Prediction vs Inference
Simple models are often advantageous for both prediction as well as Inference.
- Simple models are easier to interpret. Easier to understand.
- Also, a simple model is often easier to train. It is harder to fit a flexible model. Only really complex tasks require more flexible models such as neural networks.

Take home message: Always start with a simple model.

#### Assessing Model Quality

How do we know if the model we build is close enough to the target function?

How good are our predictions?

We need to test our model. One way to do this is to split our data set into:
    - train 
    - test


In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
print(X_train.shape, y_train.shape)

### 1.3 Technology
Have a look at the technology used in this course [here](../md/technologies.md).


Scikit learn is a collection of ML methods that can be used throughout any data science project.

Here is an example on a simple classification model. The procedure is always consitent:

- create a model -->  fit (learning from data)  --> predict (using the model to make predicionts) --> evaluate predictions


First build a classifier

In [None]:
from sklearn import svm
clf = svm.SVC(gamma='scale')

Next, fit the model

In [None]:
print(X_train.shape, y_train.shape)
clf.fit(X_train, y_train)

Now you can use the model to make predictions and compare with the truth.

In [None]:
y_pred = clf.predict(X_test)
print("test:", y_test)
print("pred:", y_pred)