# Introduction to Machine Learning

- **Machine learning**: the science and art of programming computers so they can learn from data.

- In *classical programming* humans input rules (a program) and data to be processed according to these rules, and the outputs are *answers*. 

- In **machine learning**, humans input data and the expected answers, and the outputs are the rules.

<img src="img_ML/Figure_1_ML.png"/>

<sub>Adapted from ‘Deep Learning with Python’ by Francois Chollet.</sub>

What is **machine learning** useful for?

- Problems for which existing solutions require a lot of hand-tuning or long lists of rules.
- Complex problems for which there is no solution using a traditional approach.
- Fluctuating environments since it can adapt to new data.
- Large amounts of data.

### Machine Learning: Email Spam Filtering

- Machine learning algorithms deal with spam filtering problem:
e.g. multilayer perceptron, C4.5 decision tree.

<img src="img_ML/Figure_2_spam_filter.png"/>

### Machine Learning: Autonomous Driving

- Driver assistance; partial, conditional, high or complete automation. 

<img src="img_ML/Figure_3_autonomous_driving.png"/>

<sub>By Dllu - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64517567
By Bcschneider53 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=63494208</sub>

### Machine Learning: Video Surveillance

- Track unusual behavior of people like stumbling or stop moving for a long time.
- Detection of intruders.

<img src="img_ML/Figure_4_video_surveillance.png"/>

<sub>By Hustvedt - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3466036
By Jimmy answering questions.jpg: Wikimania2009 Beatrice Murchderivative work: Sylenius (talk) - Jimmy answering questions.jpg, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=11309460<sub>

### Machine Learning: Data Security

- Malwares (program or file harmful to a computer user): every new version varies between 2 and 10% from a previous one. 

<img src="img_ML/Figure_5_data_security.png"/>

<sub>By Keministi - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=65840376<sub>

### Machine Learning: Virtual Personal Assistants

- Siri, Alexa: find information when asked over voice.

<img src="img_ML/Figure_6_virtual_personal_assistants.png"/>

<sub>By Maurizio Pesce from Milan, Italia - Android Assistant on the Google Pixel XL smartphone, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=52110130
By Frmorrison at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=47040540
https://support.apple.com/lv-lv/HT204389<sub>

### Machine Learning: Financial Trading

- Predict stock markets moves to improve profits.

<img src="img_ML/Figure_7_financial_trading.png"/>

<sub>By Katrina.Tuliao - https://www.tradergroup.org, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12262407<sub>

### Machine Learning: Healthcare

- Disease detection.
- Assessing risk factors.
- Drug discovery.
- Predict readmissions.

<img src="img_ML/Figure_8_healthcare.png"/>

<sub>By Unknown - http://www.dodmedia.osd.mil/Assets/1991/Army/DA-ST-91-01841.JPEG, Public Domain, https://commons.wikimedia.org/w/index.php?curid=780720
By Seattle Municipal Archives from Seattle, WA - Doctors with patient, 1999, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=9842204<sub>

### Machine Learning: Personalized Marketing

- Understand customer’s behavior to deliver individualized messages and product offerings to current or prospective customers.

<img src="img_ML/Figure_9_personalized_marketing.png"/>

<sub>By Econ5470team3 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48541946<sub>

### Machine Learning: Fraud Detection

- Detect potential fraud cases e.g. insurance companies, banks.

<img src="img_ML/Figure_10_fraud_detection.png"/>

<sub>https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorflow-for-hackers-part-vii-20e0c85301bd<sub>

### Machine Learning: Recommender Systems

- Predict the rating or preference a user would give to an item:
e.g. movie, song, research article, search queries, restaurants.

<img src="img_ML/Figure_11_recommender_systems.png"/>

<sub>https://www.offerzen.com/blog/how-to-build-a-content-based-recommender-system-for-your-product<sub>

### What do you need for **machine learning**?

- **Input data points**: e.g. pictures, sound files.
- **Examples of the expected output (answers)**: e.g. tags such as ‘dogs’, ‘cats’; human-generated transcripts of sound files.
- **A way to measure whether the algorithm is doing a good job**: to determine the distance between the algorithm’s output and its expected output. 

This is a feedback signal to adjust the way the algorithm works. 

<img src="img_ML/Figure_12_needs_ML.png"/>

<sub>Adapted from ‘Deep Learning with Python’ by Francois Chollet<sub>

**Machine Learning**: learn useful representations of the input data to get us closer to the expected output.   

<img src="img_ML/Figure_13_data_representations.png"/>

<sub>Adapted from ‘Deep Learning with Python’ by Francois Chollet.<sub>

- **ML algorithms** automatically find transformations that turn data into useful representations of a given task. 

- To find these transformations they search into a set of operations, the **hypothesis space**.

- **ML** searches useful **representations** of input data, within a predefined space of possibilities, using guidance from a **feedback signal**.

<img src="img_ML/Figure_14_ML_steps.png"/>

### Data Acquisition

- Understand the problem.
- Identify data sources.
- Spot possible problems with the data.

### Data Types

<img src="img_ML/Figure_15_data_types.png"/>

<img src="img_ML/Figure_16_data_types.png"/>

### Preprocessing: Data

- Separate relevant data from noise.
- Establish the data types available and the missing data.
- Handling missing values.
- Redundancy problems.
- Data transformation.
- Data reduction: e.g. aggregation.

### Processing: Train a Model  

- Build a machine learning model of data to solve problems such as classification and regression. 
- The different parameters of this model can then be tweaked. 

### Machine Learning Algorithms

<img src="img_ML/Figure_17_ML_algorithms.png"/>

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

### Types of Machine Learning Systems

*Are they trained with human supervision?*

                - Supervised
                - Unsupervised 
                - Semisupervised
                - Reinforcement Learning

Can they learn incrementally on the fly?

             - Online
             - Batch Learning

Do they compare new input data to known data points or do they detect patterns?
                     - Instance-based
                     - Model-based 

**These criteria can be combined.**

### Supervised Learning

The training data fed to the algorithm includes the desired solutions, called labels.

In classification, the labels are discrete categories e.g. spam detection. 
In regression, the labels are continuous quantities e.g predicting the price of a car given the brand, mileage, etc

Algorithms:
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forest
- Neural Networks

<img src="img_ML/Figure_18_supervised_learning.png"/>

<sub>From ‘Python Data Science Handbook’ by Jake Vander Plas.<sub>

### Supervised Learning: Linear Regression

- In regression, the labels are continuous quantities.
- A linear model makes a prediction by simply computing a weight sum of input features, plus a constant called the bias term or intercept term.

A simple regression model:

$$y = ax + b$$

<img src="img_ML/Figure_19_linear_regression.png"/>

- A linear model can be used to fit nonlinear data: polynomial regression.

- Pass powers of each feature as new features and train the linear model on this set of features.

$$y = a_0 + a_1x_1 + a_2x_2 + a_3x_3 + ...$$ to $$y = a_0 + a_1x + a_2x^2 + a_3x^3+ ...$$ 

### Supervised Learning: Linear Regression
#### Regularization

- A way to reduce overfitting is to regularize the model (e.g. constrain it), having less degrees of freedom.

- For example in a polynomial model: reduce the number of polynomial degrees.

- Regularization of a linear model is usually achieved by constraining the weights of the model. 

### Supervised Learning: Logistic Regression

- Some regression algorithms can be used for **classification** as well.
- **Logistic regression**: estimates the probability that an instance belongs to a particular class (e.g. what is the probability that an email is spam?).
- If the probability is grater that 50% the model predicts the instance belongs to that class (positive class, labeled ‘1’), or else it predicts that not (negative class, labeled ‘0’)
- **Logistic Regression** can be generalized to support multiple classes directly.

<img src="img_ML/Figure_20_logistic_regression.png"/>

<sub>Adapted from ‘Deep Learning with Python’ by Francois Chollet.<sub>

### Supervised Learning: Support Vector Machines (SVM)

- **SVM** performs linear and non-linear regression, classification, regression and outlier detection.
- **SVM** classifiers do not output probabilities for each class.
- An **SVM model** is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a gap as wide as possible.
- New examples are then mapped into that space and predicted to belong to a category based on which side of the gap they fall.

<img src="img_ML/Figure_21_SVM.png"/>

<sub>From ‘Python Data Science Handbook’ by Jake Vander Plas.<sub>

- **Linear SVM**: the line that maximizes this margin is the one chosen as the optimal model. Support vectors: training points touch the margin.

- **Kernel SVM**: data non-linearly separable, projected into higher-dimensional space to fit for non-linear relationships with a linear classifier.

### Supervised Learning: Decision Trees

- **Decision Trees** ask a series of questions, and perform a sequence of branching operations based on comparisons of some quantities.
- Predict the value of a target variable by learning simple decision rules inferred from the data features. 
- Classification and regression tasks and multioutput tasks (with more than two classes). 
- Estimate the probability that an instance belongs to a particular class k.

<img src="img_ML/Figure_22_decision_trees.png"/>

<sub>From ‘Hands-On Machine Learning with Scikit-Learn and TensorFlow’ by Aurélien Géron.<sub>

- Select the feature that bests splits the data (produces pure partitions).
- Remove this chosen feature from the feature list and repeats the previous step until leaf nodes are found.
- Attribute selection measures: 
	- Information gain.
	- Gini index: impurity measure. 
- **CART (Classification and Regression Trees) Algorithm**: produces binary trees (Scikit Learn)
    - **Classification**: at each step, picks a feature and splits the dataset into 2 parts based on how best to reduce node impurities at the next lower level.
    - **Regression**: the continuous value is computed to be the ‘average’ of the other nodes within that leaf node, it splits the dataset in a way that minimizes the MSE.

### Supervised Learning: Random Forest

- **Random Forest**: a group of Decision Trees classifiers (or regressors), each trained on a different random subset of the training set. 
- Obtains the predictions of all individual trees and then predict the class that gets majority of votes (or the average for regression).

- Instead of searching for the very best feature when splitting a node, it searches for the best feature among a random subset of features.

- **Random Forest** builds multiple CART models with different samples and initial variables, each tree gives a ‘vote’ for a certain class, the forest chooses the class that gets more votes. 
- In **regression** it takes the average of outputs by different trees. 



<img src="img_ML/Figure_23_random_forest.png"/>

<sub>From https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d<sub>

**Advantages**

- Both training and prediction are very fast.
- The multiple trees allow for a probabilistic classification.
- The nonparametric model is extremely flexible performing well on tasks that are under-fit by other estimators.


**Disadvantages**

- The results are not easily interpretable.

### Unsupervised Learning

- The training data is unlabeled.
- **Clustering**: grouping similar entities together. 
- **Visualization**: output 2D or 3D representation of complex and data that can be easily plotted. 

**Algorithms**

*Clustering* 
- k-Means.
- Hierarchical Cluster Analysis

*Visualization*
- Principal Component Analysis (PCA)
- t-distributed Stochastic Neighbor Embedding (t-SNE)

### Unsupervised Learning: k-Means

- k-Means algorithm searches for a pre-determined number of clusters within an unlabeled multidimensional dataset.
- The cluster center is the arithmetic mean of all the points belonging to the cluster.
- The partitions try to minimize the within-cluster sum of squares (inertia).
- Each point is closer to its own cluster center than to other cluster centers.

<img src="img_ML/Figure_24_k_means.png"/>

<sub>From ‘Python Data Science Handbook’ by Jake Vander Plas.<sub>

### Unsupervised Learning: Principal Component Analysis (PCA)

- **Dimensionality reduction**: when there are many features (e.g. thousands or millions) for each training instance it makes training slow and it could be hard to find a good solution.

- **PCA**: 
	- identifies the hyperplane that lies closest to the data 
	- projects the data onto the hyperplane
	- selects the projection that preserves the maximum amount of variance

<img src="img_ML/Figure_25_PCA.png"/>

<sub>From ‘Hands-On Machine Learning with Scikit-Learn and TensorFlow’ by Aurélien Géron.<sub>

### Unsupervised Learning: t-Distributed Stochastic Neighbor Embedding (t-SNE)

- Dimensionality reduction: tries to keep similar instances close and dissimilar instances apart.
- It is useful for visualization.

<img src="img_ML/Figure_26_tsne.png"/>

<sub>From https://towardsdatascience.com/an-introduction-to-t-sne-with-python-example-5a3a293108d1<sub>

### Semisupervised Learning

The training data is **partially labeled**: a lot of unlabeled data with only a little of labeled data.
Labeling massive amounts of data is 	
- time-consuming 
- expensive. 
- introduces human biases on the model.

**Example**: Webpage classification, labeled data is used to identify specific groups of webpage types, then the algorithm is  trained on unlabeled data to define the boundaries of those webpage types and may even identify new types of webpages that were unspecified in the existing human-inputted label.

### Reinforcement Learning

- The learning system (**agent**) observes the environment, selects and performs **actions**, and gets **rewards** or **penalties** in return.
- It learns by itself what is the best strategy (policy) to maximize the rewards over time.

**Algorithms**

- Monte Carlo
- Q-learning
- Proximal Policy Optimation (PPO)

<img src="img_ML/Figure_27_reinforcement_learning.png"/>

<sub>From https://pixabay.com/en/robot-machine-technology-science-312566/<sub>

### Batch Learning

- The systems is trained using **all the available data**, does not learn incrementally. 
- It’s typically done **offline**, it takes time and computing resources.
- The system is trained and then launched into production running without learning any more.
- The data can be updated and then the system is trained again from scratch.

### Online Learning

- The systems is **trained incrementally** feeding it data instances sequentially, either individually or in mini-batches. 
- Each learning step is fast and cheap.
- **Learning rate**: how fast the system should adapt to new data. 
- The whole process is usually done offline, it is **incremental learning**.
- Can be used to train systems on huge datasets that don’t fit in one machine’s main memory.
- Good for systems that receive data as a continuous flow (e.g. stock prices).

### Instance-based Learning

The systems learns the examples by heart, then generalizes to new cases using a similarity measure.

<img src="img_ML/Figure_28_instance_based_learning.png"/>

<sub>Adapted from ‘Hands-On Machine Learning with Scikit-Learn and TensorFlow’ by Aurélien Géron.<sub>

### Model-based Learning

From a set of examples build a model of these examples and then make predictions.

<img src="img_ML/Figure_29_model_based_learning.png"/>

<sub>Adapted from ‘Hands-On Machine Learning with Scikit-Learn and TensorFlow’ by Aurélien Géron.<sub>

### Model Evaluation

The performance of a model must be evaluate according to the learning problem: unsupervised learning problems have different metrics from supervised learning ones.

- *Supervised learning metrics*
		Accuracy
		Confusion matrix
		ROC Curve

- *Unsupervised learning metrics*
		Silhouette Score
		Adjusted Rand

### Challenges of Machine Learning

- Insufficient **quantity** of training data.
- **Non-representative** training data: 
	- if sample is too small, there is sample **noise** as a result of chance 
	- if the sample is large it could be non-representative if the method is flawed (sample **bias**). 
- Poor **quality** of data: if training data is full of outliers, errors and noise.
- Irrelevant features.

- **Overfitting** the training data

	- the model performs well on training data but does not generalizes well 
	- the model is too complex given the underlying data

- **Underfitting** the training data

	- the model is too simple to learn the underlying structur

<img src="img_ML/Figure_30_over_underfitting.png"/>

**High-bias models**: the performance of the model on the validation set is similar to the performance on the training set.

**High-variance models**: the performance of the model on the validation set is far worse than the performance on the training set.

<img src="img_ML/Figure_31_bias_variance.png"/>

### Deep Learning

- **Deep learning** is a subfield of machine learning: learning successive layers of meaningful representations of the data.
- **Depth**: number of layers contributing to the model of the data.
- These layered representations are (almost always) learned via models called **neural networks**.
- **Neural networks**: structured in layers stacked on top of each other. 

<img src="img_ML/Figure_32_deep_learning.png"/>

<sub>Adapted from ‘Deep Learning with Python’ by Francois Chollet.<sub>

**Deep learning** achieved breakthroughs in:

- image classification
- speech recognition
- handwriting transcription
- machine translation
- text to speech conversion
- digital assistants
- autonomous driving
- ad targeting
- search results on the web
- superhuman Go playing

### Discussion

**Which problems would you like to solve with machine learning (use your imagination)?**
- Work.
- Everyday life.
- Holidays.

**Discuss ethical implications of using machine learning:**
- DNA data.
- Recommendation systems: recommend only similar items in case of extreme content?
- Algorithmic bias: over and underrepresentation of groups.
- Who is legally responsible for a robot’s action?


# scikit-learn
*Machine Learning in Python*

![scikit-learn-website](img_ML/scikit-learn-website.png)

* "Simple and efficient tools for data mining and data analysis
* Accessible to everybody, and reusable in various contexts
* Built on NumPy, SciPy, and matplotlib
* Open source, commercially usable - BSD license"

See for the official website with the documentation https://scikit-learn.org/stable/ and https://github.com/scikit-learn/scikit-learn for the GitHub repository.


## Data Representation
* train data
  * rows = samples
  * columns = features
* test data
  * column = labels

In [23]:
import numpy as np
import pandas as pd

x = pd.DataFrame(np.random.randint(low=0, high=10, size=(7, 5)),
                 index=['sample_0', 'sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6'
                      ],
                 columns=['feature_0', 'feature_1', 'feature_2', 'feature_3', 'feature_4',])
x

Unnamed: 0,feature_0,feature_1,feature_2,feature_3,feature_4
sample_0,6,6,8,0,4
sample_1,3,5,1,0,8
sample_2,2,8,0,4,9
sample_3,3,2,8,4,8
sample_4,8,8,1,9,9
sample_5,4,0,2,8,9
sample_6,5,5,4,5,1


In [25]:
y = pd.Series([True, False, True, False, False, False, True], name='labels')
y

0     True
1    False
2     True
3    False
4    False
5    False
6     True
Name: labels, dtype: bool

In [24]:
import numpy as np
from sklearn.model_selection import train_test_split

X, y = np.arange(10).reshape((5, 2)), range(5)

print("Full data")
print("X: \n{}".format(X))
print("y: {}\n".format(list(y)))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

print("Train and test data")
print("X_train: \n{}".format(X_train))
print("X_test: \n{}".format(X_test))
print("y_train: {}".format(y_train))
print("y_test: {}".format(y_test))

Full data
X: 
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]
y: [0, 1, 2, 3, 4]

Train and test data
X_train: 
[[4 5]
 [0 1]
 [6 7]]
X_test: 
[[2 3]
 [8 9]]
y_train: [2, 0, 3]
y_test: [1, 4]


## scikit-learn cheat sheet

![scikit-learn-cheat-sheet](img_ML/scikit-learn-cheat-sheet.png)

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

# Tensorflow

![tensorflow](img_ML/tensorflow.png)

"TensorFlow makes it easy for beginners and experts to create machine learning models."

See https://www.tensorflow.org/overview and https://github.com/tensorflow/tensorflow.

# PyTorch

![pytorch](img_ML/pytorch.png)

"An open source deep learning platform that provides a seamless path from research prototyping to production deployment."

https://pytorch.org/

"PyTorch is a Python package that provides two high-level features:
* Tensor computation (like NumPy) with strong GPU acceleration
* Deep neural networks built on a tape-based autograd system"

https://github.com/pytorch/pytorch


# Keras
![keras](img_ML/keras.png)

"Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. *Being able to go from idea to result with the least possible delay is key to doing good research*.

Use Keras if you need a deep learning library that:
* Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
* Supports both convolutional networks and recurrent networks, as well as combinations of the two.
* Runs seamlessly on CPU and GPU."

See https://keras.io/ and https://github.com/keras-team/keras/.