# Table of Contents
- [1. Introduction](#1)
    - [1.1 Definitions and Terminology](#1_1)
    - [1.2 Machine Learning Concepts](#1_2)
        - [1.2.1 Unsupervised and Supervised Learning](#1_2_1)
    - [1.3 Technology](#1_3)
    - [1.4 Exercise](#1_4)
    - [1.5 Learning Material](#1_5)


# 1. Introduction<a name="1"></a>

## 1.1 Definitions and Terminology<a name="1_1"></a>

### What is AI?
In computer science AI research is defined as: 

>The study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.


Another definition of artificial intelligence is Andreas Kaplan’s and Michael Haenlein’s one; they define artificial intelligence as:
>A system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation.

[Wikipedia AI](https://en.wikipedia.org/wiki/Artificial_intelligence)

### What is machine learning?
Machine learning (ML) is a **field of artificial intelligence** that uses statistical techniques to give computer systems the ability to **learn (...) from data**, without being explicitly programmed.
The name machine learning was coined in 1959 by Arthur Samuel.

<A href="https://en.wikipedia.org/wiki/Machine_learning">Wikipedia ML</A>

<img alt="ML_AI_venn_diagram" src="../img/1/ML_AI_venn.png" width="200">

### What is statistical learning?
- **Machine Learning** is a *subfield of computer science / artificial intelligence* focussing on prediction accuracy and big data 
- **Statistical Learning** is a *subfield of statistics* focussing on model interpretability
- Both Machine Learning and Statistical learning focus on supervised and unsupervised learning. They have a lot of overlap.

### Neural Networks
>Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.
[Wikipedia NN](https://en.wikipedia.org/wiki/Artificial_neural_network) 

>A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.
[Wikipedia Deep Neural Networks](https://en.wikipedia.org/wiki/Deep_learning#Neural_networks)

Artificial neural networks are used for various supervised and unsupervised machine learning problems.

<img alt="ML_AI_venn_diagram" src="../img/1/ML_AI_ANN_DNN_venn.png" width="250">

## 1.2 Machine Learning Concepts<a name="1_2"></a>

### Difference between traditional programming and Machine Learning


<img src="../img/1/ML_vs_trad.png" width="400" alt="ML vs traditional programming">


### General Concept
The learning problem is characterized by **observations** comprised of: **input data (X)** and **output data (y)** and some unknown but **coherent relationship** between the two.


In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 4), columns=["x1", "x2", "x3", "y"])
print("5 observations,  3 features,  one output variable")
print(df)

The goal of the learning system is to **learn a generalized mapping** between input and output data such that skillful predictions can be made for new instances drawn from the domain where the output variable is unknown.

![ML mapping input to output](../img/1/ml_in_out.png)


In statistical learning the problem is framed as the learning of a **target function ($f$)** given **input data (X)** and associated **output data (y)**.

$$y = f(X) + noise$$

We have a sample of X and y and do our best to come up with a **function  $\hat{f}$ that approximates $f$**, so that we can make **predictions ($\hat{y}$)** given **new examples ($X$)** in the future.

$$\hat{y} = \hat{f}(X)$$

$\hat{y}, \hat{f}$ are pronounced **y-hat** and **f-hat**

[machinelearningmastery](https://machinelearningmastery.com/applied-machine-learning-as-a-search-problem/)

#### A Simple Example
<table>
    <tr>
        <td  style="text-align:left; font-size:120%" width=40%>
            When looking at data one tries to <strong>identify patterns</strong> in the data as shown here. 
            We wonder what is the <strong>cause</strong> of that pattern. 
            We can build a <em>model</em> to try to explain the pattern we observe.
            <ul>
                <li> Y is the dependant variable, outcome, response, target</li>                
                <li> X is the independent variable, features</li>
            </ul>
        </td>
        <td>
            <img alt="simple example" src="../img/1/simple_example.png" width="60%" />
        </td>
    </tr>
</table>



We believe there is a true mechanism (target function $\color{red}{ f(x)}$) that describes the relationship between our variables. The aim of machine learning is to find a model (model $\color{blue}{ \hat{f}(x)}$) that estimates this function from the data. 

<img alt="estimating_the_target_function" src="../img/1/target_function.jpg" width="300">

source: [ISLR](http://www-bcf.usc.edu/~gareth/ISL/)


### 1.2.1 Supervised and Unsupervised Learning<a name="1_2_1"></a>

On the highest level machine learning can be subdived into *supervised* and *unsupervised learning*.

The criteria is the presence or absence of **labels**. 

Next you see some code to generate a data frame with random numbers.

Rows are **observations** or **samples**. Columns 0-4 are **features**. Y is the **class label**.

In [None]:
from sklearn.datasets import make_classification

# Create a simulated feature matrix and output vector
X, y = make_classification(n_samples = 100, n_features = 4, n_informative = 3, 
                           n_redundant = 0, n_classes = 3, weights = [.2, .3, .5])

In [None]:
# View the first 10 observations
print(type(X), X.shape)
print(X[0:10,:], "\n")

# View the first 10 labels
print(type(y), y.shape)
print(y[0:10])

In [None]:
# plotting the first two of the features
import matplotlib.pyplot as plt
%matplotlib inline
plt.scatter(X[:,0], X[:,1], c=y)

#### Supervised Learning
you start with
- y = outcome measurement (label, dependent variable, response, target)
	- regression problem y is continuous (5, 3.456, 1023) 
	- classification problem y is discrete (yes/no, A,B,C)
- X = vector of predictor measurement (independent variable, features)
- training data: observations of measurements: $(x_1,y_1), (x_2,y_2), (x_3,y_3) ... (x_n,y_n)$


e.g. Classification, Regression

You will learn more about [supervised learning](2_supervised_learning.ipynb) in the next class.

#### Unsupervised Learning
you start with
- no label (outcome variable)
- X = vector of measurements(independent variable, features)
- objective is more fuzzy: groups samples, find features
- more difficult to evaluate, but easier to get training data
- observations of measurements: $x_1, x_2, x_3 ... x_n$


Objectives are more fuzzy for unsupervised learning

Using the training data we want:
- Find samples that behave similarly
- Find features that behave similarly
- Difficult to assess quality

e.g. Clustering
<img src="../img/1/cats_and_dogs.png" alt="cats and dogs" width="500">

You will learn more about [unsupervised learning](3_unsupervised_learning.ipynb) later today.

Unsupervised vs Supervised Learning

<img src="../img/1/cluster_vs_classify.png" alt="clustering vs classification" width="500">

source: https://deepcast.ai/media/article3/

#### Concept Map
<table>
    <tr>
        <td>
            <img src="../img/1/machine_learning_cmap.jpg" alt="ML concept map" width="500">
        </td>
        <td>
            <img src="../img/1/ML_concept_map.png" alt="ML concept map" width="250">
        </td>
    </tr>
</table>


#### Regression vs Classification

<div style="font-size:110%;padding:25px; margin:25px; background:#ffff81; border:#000000 2px dashed;">
Supervised learning problems can be further divided into <b>regression and classification</b> problems.<br>
This depends on the <b>data type of the target variable y</b>.
</div>

1. Regression covers situations where y is **numerical**. e.g.
    - Predicting the **value** of the DAX next week.
    - Predicting the **price** of a given house based on various inputs.


2. Classification covers situations where y is **categorical-nominal** e.g.
    - Will the DAX be **up or down** next week?
    - Is this email a **SPAM or not**?        


- Either regression or classification could be used if y is **categorical-ordinal** e.g.
    - Predict the grade in an exam (A, B, C, D, F) based on hours studied for the exam.
    - Predict the bank credit rating based on bank features.

  Regression is usually more appropriate here.

<img src="../img/1/categorical_numerical_vars.jpg" width="350">


#### Why do we estimate $f(x)$? 

There are 2 reasons:
1. Prediction
2. Inference

##### 1. Prediction
If we can produce a good estimate for f, we can make accurate predictions for the response Y, based on a new value of X.

E. g. predicting how much money someone will earn based on education level and age.

##### 2. Inference
Alternatively, we may also be interested in **understanding** the type of relationship between Y and the X's. For example, 
- Which particular feature actually affect the response? 
- Is the relationship positive or negative? 
- Is the relationship a simple linear one or is it more complicated etc.?


##### There is a Trade-off: Prediction vs Inference
Simple models are often advantageous for both prediction as well as Inference.
- Simple models are easier to interpret. Easier to understand.
- Also, a simple model is often easier to train. It is harder to fit a flexible model. Only really complex tasks require more flexible models such as neural networks.

Take home message: Always start with a simple model.

## 1.3 Technology<a name="1_3"></a>
Have a look at the technology used in this course [technologies.md](../md/technologies.md).


**Scikit-learn** is a collection of ML methods that can be used throughout any data science project.

Here is an example on a simple classification model. The procedure is always the same:

1. split the data into train and test
2. create a model
3. fit (learning from train data)
4. predict (using the model to make predicionts on test data)
5. evaluate predictions


#### 1. split the data into train and test
##### Assessing Model Quality
- How do we know if the model we build is close enough to the target function?
- How good are our predictions?

We need to test our model. One way to do this is to **split our data** set into:
    - train 
    - test


In [None]:
from sklearn.model_selection import train_test_split
THE_ANSWER=42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=THE_ANSWER)
print("Train data X and y: ", X_train.shape, y_train.shape)
print("Test data X and y: ", X_test.shape, y_test.shape)

#### 2. create a model

In [None]:
from sklearn import svm
# our model is a classifier clf
clf = svm.SVC(gamma="auto")
print(clf)

#### 3. fit the model

In [None]:
print(X_train.shape, y_train.shape)
print(X_train.dtype, y_train.dtype)
print(type(X_train), type(y_train))
clf.fit(X_train, y_train)

#### 4. Predict
Now you can use the model to make predictions and compare with the truth.

In [None]:
y_pred = clf.predict(X_test)
print("test:", y_test)
print("pred:", y_pred)


##### 5. Evaluate

In [None]:
print("sum of correct predictions", sum(y_test == y_pred))
print("length of predictions", len(y_pred))

print("accuracy: ", sum(y_test == y_pred) / len(y_pred))

## 1.4 Exercise<a name="1_4"></a>
- Change the code in this function: **make_classification**.  [API doc on make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)
    	What happens to the accuracy, if you change the values for
        n_samples, n_features, n_informative, n_redundant, n_classes, weights?
- Change the random_state in the function **train_test_split**. [API doc on train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
        What does this do?

## 1.5 Learning Material<a name="1_5"></a>

- [machinelearningmastery](https://machinelearningmastery.com/applied-machine-learning-as-a-search-problem/)
- [ISLR](http://www-bcf.usc.edu/~gareth/ISL/)
- [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)
- [make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)