## MNIST (handwritten digits dataset)
___
Every machine learning student knows about the MNIST dataset. Fewer know what the acronym stands for. In fact, we had to look it up to be able to tell you that the M stands for Modified, and NIST stands for National Institute of Standards and Technology. Now you probably know something that an average machine learning expert doesn’t!

## The types of ML
___
**Supervised learning**: We are given an input, for example a photograph with a traffic sign, and the task is to predict the correct output or label, for example which traffic sign is in the picture (speed limit, stop sign, etc.). In the simplest cases, the answers are in the form of yes/no (we call these binary classification problems).

**Unsupervised learning**: There are no labels or correct outputs. The task is to discover the structure of the data: for example, grouping similar items to form “clusters”, or reducing the data to a small number of important “dimensions”. Data visualization can also be considered unsupervised learning.

**Reinforcement learning**: Commonly used in situations where an AI agent like a self-driving car must operate in an environment and where feedback about good or bad choices is available with some delay. Also used in games where the outcome may be decided only at the end of the game.

The categories are somewhat overlapping and fuzzy, so a particular method can sometimes be hard to place in one category. For example, as the name suggests, so-called **semisupervised learning** is partly supervised and partly unsupervised.

## Humans teaching machines
___
Instead of manually writing down exact rules to do the classification, the point in supervised machine learning is to take a number of examples, label each one by the correct label, and use them to “train” an AI method to automatically recognize the correct label for the training examples as well as (at least hopefully) any other images. 

## Regression Example
___
Suppose we have a data set consisting of apartment sales data. For each purchase, we would obviously have the price that was paid, together with the size of the apartment in square meters (or square feet, if you like), and the number of bedrooms, the year of construction, the condition (on a scale from “disaster“ to “spick and span”). We could then use machine learning to train a **regression model** that predicts the selling price based on these features.

## Watch Out!!!
___
* Split your data set into two parts: the training data and the **test data**.
*  While a model may be a very good predictor in the training data, it is no proof that it can **generalize** to any other data. This is where the test data comes in handy
* ML methods are especially prone to **overfitting** because they can try a huge number of different “rules” until one that fits the training data perfectly is found.
* Learning to avoid overfitting and choose a model that is **not too restricted, nor too flexible**, is one of the most essential skills of a data scientist.

## Unsupervised Learning
___
**In unsupervised learning, the correct answers are not provided**. This makes the situation quite different since we can't build the model by making it fit the correct answers on training data. It also makes the evaluation of performance more complicated since we can't check whether the learned model is doing well or not.

Typical unsupervised learning methods attempt to learn some kind of “structure” underlying the data. This can mean, for example, **visualization** where similar items are placed near each other and dissimilar items further away from each other. It can also mean **clustering** where we use the data to identify groups or “clusters” of items that are similar to each other but dissimilar from data in other clusters.

## Unsupervised Example
___
As a concrete example, grocery store chains collect data about their customers' shopping behavior (that's why you have all those loyalty cards). To better understand their customers, the store can either visualize the data using a graph where each customer is represented by a dot and customers who tend to buy the same products are placed nearer each other than customers who buy different products. Or, the store could apply clustering to obtain a set of customer groups such as ‘low-budget health food enthusiasts’, ‘high-end fish lovers’, ‘soda and pizza 6 days a week’, and so on. Note that the machine learning method would only group the customers into clusters, but it wouldn't automatically generate the cluster labels (‘fish lovers’ and so on). This task would be left for the user.