# Introduction to Machine Learning (ML)

by [@barbara_plank](https://twitter.com/barbara_plank)

[with parts inspired by many, amongst which: Anders Johannsen, Malvina, sklearn tutorial.. thanks!]

## Machine learning = learning from data

learning what? 

to make **predictions**

* is today a good day to get an ice cream?
* what is the sentiment of this tweet?
* how is the weather in 24h from now?

## What do you do in front of a zebra crossing?

<img src="pics/zebracrossing.jpg">
[Example inspired by traffic light by M.Nissim]

## Zebra crossing

**STOP** or **GO**

How can we teach someone this behavior?

* create ad hoc **rules** (as exhaustive as possible)
* collect a set of real **examples** of what people do at a zebra crossing

### Examples

collect **examples** (cases) of zebra crossings and people's behavior (stop or go)

* zebra crossing $\rightarrow$ **features** (characteristics)
* result $\rightarrow$ **label** (category: stop, go)

with these examples we can use machine learning to **induce** a classifier (= **build a predictor**) that **generalizes** from the observed exampels

## Why can't we just build a predictor by coding it up?

* can't be exhaustive enough
* often we don't know how
* trade-off between cost of obtaining **data** versus **knowledge**

### Machine Learning versus traditional programming

<img src="pics/prog-vs-ml.png" width=600>

## How do we know that our model generalized?

We want to build a classifier that generalizes, i.e., that works *beyond* the training data.

It generalizes reasonably well if it can predict well on new **unseen** test cases.

## Machine Learning is ubiquitous

* recommended books in online book stores
* your spam classifier
* automatic machine translation
* NetFlix movie recommendation


## ML is the future, and you know it
*Name one thing that computers cannot do today but might be able to accomplish in five years.*



- "Make interesting conversational partners"
- "Flawless object recognition (when objects are shown from an unfamiliar angle)"
- "Cook food via robots?"
- "Having AI similar to humans ... Strong AI."
- "Summarize the plot of a movie by visual analysis."

[examples from AJ]

## Classification

In classification we assign a *discrete* label to an object.

<img src="pics/running.jpg">


For instance, **what kind of food is passing on the running belt**?

In programming terms, a classifier is an algorithm for deciding which category the object belongs to.
In math terms, a classifier is a function that maps the object to a set of discrete categories.

### Function notation

$$f: \mathbb{R} \mapsto \mathbb{R}$$

In [6]:
def triple(a_number):
    return 3 * a_number

$$f: \mathbb{R} \mapsto \{-1, 1\}$$

In [59]:
def is_expensive_house(house_price):
    if house_price > 1000000:
        return 1
    else: 
        return -1

### Classifier as a function

Formally, we can think of a classifier as a mathmatical function $h$, mapping from the input to one of $k$ output categories. Often the input is a vector of real numbers.

$$h: \mathbb{R}^d \mapsto \{1, 2, \ldots, k\} $$

In some cases our instances can be represented by a binary vector

$$h: \mathbf{2}^d \mapsto \{1, 2, \ldots, k\}$$

In [8]:
# `instance` is a set of properties
def classify_animal(instance):
    if 'extinct' in instance and 'feathered' in instance:
        return 'dinosaur'
    elif 'feathered' in instance:
        return 'bird'
    else:
        return 'mammal'

## Machine learning algorithms

Machine learning algorithms are a special kind of algorithms that take data as input and return a new algorithm as output. E.g. 

$$f: \mathcal{D} \mapsto \left(\mathbb{R}^d \mapsto \{1, 2, \ldots, k\}\right)$$


Machine learning classification algorithms differ with respect to 

- What kind of input they can learn from (labeled, partly labeled, unlabeled).
- How the hypothesis function $h$ is represented.
- How well the hypothesis $h$ generalizes to new data.

## What we need

1. Data
  * what your data looks, the input $X$ and output (labels) $Y$ 
2. Features
  * how to represent your data (the actual features): how to decompose $X$ into its parts by $\phi$
3. Model/Algorithm
  * the machine learning algorithm used 
4. Evaluation
  * how to measure how good your model is 

To visualize the whole:

<img src="pics/learning.png" width=800>

## Classification vs Regression


The goal of machine learning is to find a function $f$ that, given some input $x$, produces predictions for that input, $y$.

In **supervised machine learning** the y’s are given, and are called the labels. They can be categorial, like ”sports”, ”news”, etc. or numerical, e.g. 7, 8,10. If the labels are categorical we speak of classification, in case of numerical labels the task is regression.

# References

* [sklearn: Working with text data](http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)
* Malvina Nissim and Johannes Bjerva. Learning from data, [ESSLLI 2016 lecture 1](http://esslli2016.unibz.it/wp-content/uploads/2015/10/lecture1.pdf)