## 1. Introduction

### 1.1 Definitions and Terminology

#### What is machine learning?

Machine learning (ML) is a **field of artificial intelligence** that uses statistical techniques to give computer systems the ability to **learn (...) from data**, without being explicitly programmed.
The name machine learning was coined in 1959 by Arthur Samuel.

[Wikipedia ML](https://en.wikipedia.org/wiki/Machine_learning)

![ML AI venn diagram](../img/1/ML_AI_venn.png)

#### What is statistical learning?
- Machine Learning is a subfield of computer science / artificial intelligence focussing on prediction accuracy and big data 
- **Statistical Learning** is a subfield of statistics focussing on model interpreatbility
- Both Machine Learning and Statistical learning focus on supervised and unsupervised learning. They have a lot of overlap.

#### What is AI?
 In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.
[Wikipedia AI](https://en.wikipedia.org/wiki/Artificial_intelligence)

#### Neural Networks
Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.
[Wikipedia NN](https://en.wikipedia.org/wiki/Artificial_neural_network) 

A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.
[Wikipedia Deep Neural Networks](https://en.wikipedia.org/wiki/Deep_learning#Neural_networks)

### 1.2 Machine Learning Concepts

#### Difference between traditional programming and Machine Learning
![ML vs traditional programming](../img/1/ML_vs_trad.png)



#### A Simple Example
![](../img/1/simple_example.png) 

When observing data, we wonder about the relationship of that data. What is the underlying cause? We believe there is a true mechanism that describes the relationship between our variablse which can be described as f(x) (blue). The aim of machine learning is to estimate this function from data with f_hat(x) (blue)

![estimating the target function](../img/1/target_function.jpg)

#### Supervised Learning
you start with
- Y = outcome measurement (dependent variable, response, target)
	- regression problem Y is continuous (5, 3.456, 1023) 
	- classification problem Y is discrete (yes/no, A,B,C)
- X = vector of predictor measurement (independent variable, features)
- training data: observations of measurements: (x1,y1), (x2,y2), (x3,y3) ... (xn,yn)


e.g. Classification, Regression

#### Unsupervised Learning
you start with
- no outcome variable
- X = vector of measurements(independent variable, features)
- objective is more fuzzy: groups samples, find features
- more difficult to evaluate, but easier to get training data
- observations of measurements: x1, x2, x3, ... xn


Objectives are more fuzzy for unsupervised learning

Using the training data we want:
- Find samples that behave similarly
- Find features that behave similarly
- Difficult to assess quality

e.g. Clustering


Unsupervised vs Supervised Learning

![](../img/1/cluster_vs_classify.png)
source: https://deepcast.ai/media/article3/

#### Concept Map
![ISLR figure](../img/1/ML_concept_map.png "ML concept map")

#### Regression vs Classification
Supervised learning problems can be further divided into regression and classification problems.


Regression covers situations where Y is continuous/numerical. e.g.
- Predicting the value of the Dow in 6 months.
- Predicting the value of a given house based on various inputs.

Classification covers situations where Y is categorical e.g.
- Will the Dow be up (U) or down (D) in 6 months?
- Is this email a SPAM or not?


#### Why do we estimate f(x)? There are 2 reasons:
1. Prediction
2. Inference

##### Predicion
If we can produce a good estimate for f, we can make accurate predictions for the response, Y, based on a new value of X.

E.g. predicting how much money someone will earn based on education level and age.

##### Inference
Alternatively, we may also be interested in the type of relationship between Y and the X's. For example, 
- Which particular predictors actually affect the response? 
- Is the relationship positive or negative? 
- Is the relationship a simple linear one or is it more complicated etc.?


##### There is a Trade-off: Prediction vs Inference
Simple models are often advantegous for both prediction as well as Inference.
- Simple models are easier to interpret. Easier to understand.
- Also, a simple model is often easier to train. It is harder to fit a flexible model. Only really complex tasks require more flexible models such as neural networks.

### 1.3 Technology
Have a look at the technology used in this course [here](../md/technologies.md).


Most important is Scikit learn

Here is an example on a simple classification model. The procedure is always consitent:

- create a model -->  fit (learning from data)  --> predict (using the model to make predicionts)


First build a classifier and load some data

In [None]:
from sklearn import svm
from sklearn import datasets
clf = svm.SVC(gamma='scale')
iris = datasets.load_iris()

Next, prepare the data and fit the model

In [None]:
X, y = iris.data, iris.target
print(X.shape, y.shape)
clf.fit(X, y)

Now you can use the model to make predictions

In [None]:
y_pred = clf.predict([[2,2,1,1],[1,1,2,2]])
print(y_pred)