# Supervised Learning

In supervised learning, we are given training data $\{x_i, y_i\}$, where the features/explanatory variables/attributes $x_i$ can be either:
+ nominal -- set membership (e.g. has windows, doors)
+ ordinal -- allows ordering (e.g. cold, hot)
+ interval -- measures differences (e.g. centigrade scale)
+ ratio -- interval with absolute/meaningful zero (e.g. Kelvin scale)

and the $y_i$ is called labels (for classification task) or targets/outputs (for regression task). What we want out of a supervised learning task is to extract rules that can make accurate predictions of y given x.

## One-nearest neighbour classification

Here it is conceptually equivalent to constructing a Voronoi diagram, where each Voronoi region belongs to a single instance. Therefore, the label of a test instance can be predicted based on the Voronoi region it is located in. However, storing the Voronoi regions is expensive in multi-dimensional space. Similarly, computing the decision boundary for the Voronoi diagram is also not efficient. Instead, 1NN is often implemented by finding the nearest neighbor of a query and predict the label based on the label of its nearest neighbour.

So in 1NN, we predict the class membership of a query based on the label of its nearest neighbour (axis scaling matters). One disadvantage of 1NN is that as we go to higher dimensions, practically all observations contribute to the decision boundary, creating artifacts such as 'islands'. Further properties:
- low bias: low error on the training set (averaging over multiple classifiers trained on dataset generated from the same distribution)
- high variance: the decisions boundary will be very different (averaging over multiple classifiers trained on dataset generated from the same distribution)
- lazy strategy: zero compute at train time
- $\mathcal{O}(n)$ effort for each prediction