# 6 The universal workflow of machine learning

## Define the task

### Frame the problem

What are you trying to predict?

And is the data (samples and labels) available?

What is the type of the problem?

Binary classification, single label multiclass, multilabel multiclass, regression?

Hypotheses

- the outputs can be predicted from the inputs;
- the available data is sufficiently informative.

The hypotheses might not be met even if your problem is well defined.

For example, an attempt to the predict stock prices from the recent past will likely fail because recent prices contain too little predictive information.

### Note on Ethics

This probably should go without saying, but when a project seems fishy, *use your judgement*!

A project to classify people as trustworthy or not on the basis of their portrait is **clearly a non-starter**.

We do encode our own biases in all sort of ways, as individuals and societies, when we build models. Beware claims of **objectivity** and **authority** linked to these technologies.

Technology is not neutral, and has an impact on our societies. It is important to think about these issues and be open about them.

#### Example: detecting criminality based on face images

<!-- ![Criminality paper](images/criminality.cnns.png) -->
![Criminality paper](https://github.com/jchwenger/DLWP/blob/main/lectures/04/images/criminality.cnns.png?raw=true)

<small>[Xiaolin Wu, Xi Zhang, "Automated Inference on Criminality using Face Images"](https://arxiv.org/abs/1611.04135v1)</small>

<!-- ![Criminality paper](images/criminality.cnns.2.png) -->
![Criminality paper](https://github.com/jchwenger/DLWP/blob/main/lectures/04/images/criminality.cnns.2.png?raw=true)

<small>[Mahdi Hashemi, Margeret Hall, "Criminal tendency detection from facial images and the gender bias effect"](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0282-4)</small>

**References**

[Kevin W. Bowyer, Michael King, Walter Scheirer, Kushal Vangara, "The Criminality From Face Illusion"](https://arxiv.org/abs/2006.03895)

#### Historical example of pseudoscience: phrenology

[Phrenology, Wikipedia](https://en.wikipedia.org/wiki/Phrenology)

<!-- ![Phrenology husband](images/phrenology.husband.jpg) -->
![Phrenology husband](https://github.com/jchwenger/DLWP/blob/main/lectures/04/images/phrenology.husband.jpg?raw=true)


<small>[33 Absurd Phrenology Diagrams From A Century Ago](https://allthatsinteresting.com/phrenology-charts)</small>

<!-- ![Phrenology husband](images/phrenology.wife.jpg) -->
![Phrenology husband](https://github.com/jchwenger/DLWP/blob/main/lectures/04/images/phrenology.wife.jpg?raw=true)

<small>[Phrenology, University of Missouri](https://library.missouri.edu/news/special-collections/phrenology)</small>

## Nonstationariness / Concept drift

### Handling change in our data / prediction task

Much of the data in the world changes constantly.

Nonstationary problems have a time ordering.

Chollet's example: recommender system, you can't have the same system now as in a few years ago! It could be clothing, music, anything. A good recommendation is time-dependent.

Either gather information from a period when the problem is stationary (a succession of summers) or train the model on recent trends (the past few weeks) or on all data but include the time of year as an input.

A third hypothesis: **the future is like the past**.

The kind of machine learning we study now **only spots patterns in collected data** â€“ data that lies in the past.

We assume, when we use a trained model, that the past is relevant today.

---

## Collect a dataset

### Beware of non-representative data

You should strive to make sure that your **training data** and the **real world (unseen)** data you're using your model on come from the **same distribution**.

#### Examples

Very clean and well-lit pictures in training, real-world pictures from social media in production, you're in for trouble!

The same goes with text: if your chatbot is trained on very well edited and clean text, there is no way it could then chat with real people using slang, abbreviations, spelling mistakes, etc.

#### The problem of sampling bias

If you based your estimation of who won the election that night just on this one newspaper, you would be wrong!
<!-- <img style="float:right;height:550px" src="images/chollet/figure6.1.png"> -->
<img  style="height:550px" src="https://github.com/jchwenger/DLWP/blob/main/lectures/04/images/chollet/figure6.1.png?raw=true">

[DLWP](https://deeplearningwithpython.io/chapters/chapter06_universal-workflow-of-ml/#beware-of-nonrepresentative-data), Figure 6.1

## Types of problems: summary table

|Problem type | Last layer activation | Loss function| Metric |
|:---|:---|:---|:---|
Binary classification | `sigmoid` | `binary_crossentropy`| (binary) `accuracy`, [ROC AUC](https://keras.io/api/metrics/classification_metrics/) |
Multiclass, single-label classification | `softmax` | `categorical_crossentropy`| (categorical) `accuracy`, top-k (categorical) `accuracy`, [ROC AUC](https://keras.io/api/metrics/classification_metrics/)  |
Multiclass, multi-label classification | `sigmoid` | `binary_crossentropy` | (binary) `accuracy`, [ROC AUC](https://keras.io/api/metrics/classification_metrics/)  |
Regression to arbitrary values | `None` | `mse` | `mae` |
Regression to values in [0, 1] | `sigmoid` | `mse` or `binary_crossentropy` | `mae`|
