# Session 11

## Machine Learning

![Course Hero](images/hero.png)

## What is Artificial Intelligence and Machine Learning?

> Machine Learning is like sex in high school. Everybody wants to do it, lots of people say they are doing it, only a few know how, and a very select few actually do it.

## Computing Machinery and Intelligence

[![Alan Turing Photo](images/alan_turing.jpg)](https://en.wikipedia.org/wiki/Alan_Turing)

In 1950 [Alan Turing](https://en.wikipedia.org/wiki/Alan_Turing) wrote a paper in Mind (a journal in Oxford University) called ["Computing Machinery and Intelligence"](https://en.wikipedia.org/wiki/Computing_Machinery_and_Intelligence). In it he discusses the subject of Artificial Intelligence.

## What is Intelligence?

**Computational or Structural Intelligence**: A System that perceives itself and its environment and takes action to maximize its goals.

**Executive Intelligence**: The System that decides the goals that should be pursued, controlling and guiding the Computational Intelligence.

Recommendation: Read ["La inteligencia fracasada. Teoría y práctica de la estupidez"](https://es.wikipedia.org/wiki/La_inteligencia_fracasada._Teor%C3%ADa_y_pr%C3%A1ctica_de_la_estupidez), José Antonio Marina, 2004.

[![Book Cover](images/inteligencia_fracasada_book.jpg)](https://es.wikipedia.org/wiki/La_inteligencia_fracasada._Teor%C3%ADa_y_pr%C3%A1ctica_de_la_estupidez)

## What is Machine Learning

It is a part of Artificial Intelligence. We create a **model** trained with the available data using statistics and numerical methods, so we are able to reach automated conclusions over new data.


## How can we categorize Machine Learning Algorithms

There are four big "Families" of Machine Learning Algorithms:

- Supervised: In which we have examples where we know the right answer (**labeled data**).
  - Regression: We want to obtain a discreet value. A number.
  - Classification: We want to obtain a discreet value. A category.
- Unsupervised: In which we don´t know the right answer (**unlabeled data**).
  - Clustering: Separate the observations into discreet groups.
  - Generalization: Find the relevant characteristics.
  - Association: Which observations frequently go together.
- Ensemble Methods: A Bunch of stupid trees learning to correct errors from each other.
  - Stacking: Run several models in parallel, and a last one makes the decision.
  - Bagging: We average the result of several models.
  - Boosting: Each model learns to correct and interpret the errors of the last one sequentially.
- Reinforcement Learning: We decide the next action depending on the response to the previous ones.
  - Neural Networks: They try to "mimic" how we think Neuron Work.
  - Deep Learning: Huge neural networks, made possible by the advent of cheap inference computing power.

![Machine Learning Kinds](images/machine_learning.jpeg)

![Deep Learning Image](images/neural_networks.png)

![Clustering using DBScan](images/clustering_dbscan.gif)


## Is it working? How to evaluate a Machine Learning Model.

Because of its nature, there is no way to have complete certainty of the results of a Machine Learning Model. There will always be the possibility of error. Almost always our target is to **minimize** the error.

![AUROC Graph](images/roc_auc.png)

To know the result of our model training we separate our data in two sets.

- Train Data: Is the data we are going to use to train the model. Normally 75% to 80%.
- Test Data: Data we are going to use in the trained model to see how it behaves. The rest of the data.

![Training Result](images/training_result.png)


## Machine Learning in Python - SciPy

Besides NumPy and pandas, we use the SciPy package as a base of fundamental algorithms.

![SciPy Logo](images/scipy_logo.png)

## Machine Learning Libraries

There are many of different Machine Learning Libraries, for example:

- [TensorFlow](https://www.tensorflow.org)
- [Theano](https://pypi.org/project/Theano/)
- [Keras](https://keras.io)
- [PyTorch](https://pytorch.org)
- [Orange3](https://orangedatamining.com)
- [Scikit-learn](https://scikit-learn.org/stable/)

![ML Libraries](images/ml-libraries.jpg)


## Scikit-learn

Simple and efficient tools for predictive data analysis

- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- Open source, commercially usable - BSD license

![Scikit-learn logo](images/scikit_learn_logo.png)


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset('iris')

sns.pairplot(iris, hue='species', height=3)

![Iris dataset flowers](images/iris_dataset.jpg)


In [None]:
iris.sample(5)

In [None]:
petal_width = iris["petal_width"]
petal_length = iris["petal_length"]

plt.scatter(petal_width, petal_length)

In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression(fit_intercept = True)


We need to reshape the feature matrix petal_width to make it an array of size [n_samples, n_features]. It can be done as follows.


In [None]:
petal_width_array = petal_width.to_numpy()[:, np.newaxis]
petal_width_array.shape
type(petal_width_array)

Then we can fit the model


In [None]:
model.fit(petal_width_array, petal_length)

In [None]:
print(model.coef_, model.intercept_)

We create some predictions


In [None]:
xfit = np.linspace(0, 2.6)
Xfit = xfit[:, np.newaxis]
yfit = model.predict(Xfit)
print(yfit)

In [None]:
plt.scatter(petal_width, petal_length)
plt.plot(xfit, yfit);