## Topics covered in this notebook:
1. What is K-Nearest Neighbors(kNN) mean?
2. Implementation.
3. How to choose K?
4. Common Issues & Fix.
5. Where kNN can fail?
6. References.

## 1. K - Nearest Neighbors:

1. The idea is to make prediction using the closest know data points.
2. Look at the image below:
    1. There are 2 categories: Star & Triangle.
    2. Consider a new test data point -> Green square. What is this point classified as?
        1. K = 3 -> 3-nearest neighbor -> Pick star.
        2. K = 5 -> 5-nearest neighbor -> Pick traingle.
        3. We classify by calcualting eulidean distance between nearby K points & their corresponding classes.
3. Forms complex decision boundaries; adapts to data density.
4. These type of models are called non-parametric models.
5. These type of classifiers are also called as lazy classifiers.
    1. train(X,Y) doesn't do anything. Just stores X & Y.
    2. predict(X') does all the work by looking through stored X & Y.
<img src="Images/knn.png" alt="Drawing" style="width: 300px;"/>
<br>
6. Few assumptions:
    1. Output varies smoothly with input.
    2. Data occupies sub-space of high-dimensional input space.
    

## 2. Implementation:
3. Idea is simple but implementation can be tricky.
4. Keeping track of arbitrary number of distances not so easy.
5. First -> need to look through all of the training data -> O(N).
6. Then need to look through the closest distances you have stored so far -> O(K).
\begin{align}
\large \lvert\lvert \large x^{a} \,-\, x^{b}\rvert\rvert_2 \, &= \large \sqrt {\Sigma_{j=1}^d (x_j^{a} \,-\, x_j^{b})^2} \\
\end{align}
7. Total O(NK).
    8. Searching through a sorted list would be O(log K), a little better.
    9. Even better: Ball Tree, K-D Tre etc.
8. Once we have the k-nearest neighbors we need to turn them into votes which means we need to store the class as well.
    9. {dist1: class1, dist2:class2,...}
9. Count up the values:
    1. {class1: num_class1,class2:num_class2...}
10. Pick the class that has the highest votes.
    1. What if there is tie?
        1. Use whatever argmax(votes) outputs.
        2. Pick one at random.
        3. Weight by distance to neighbors.
        
## 3. How to choose K?:
1. No easy answer.
2. K is hyperparameter.
3. Use cross-validation.
4. Larger k may lead to better performance.
5. But if we set k too large we may look at samples that are not neighbors.
6. Rule of thumb: k < sqrt(n), where n is number of training examples.

## 4. Common Issues & Fix:
1. If some attribute have larger ranges, they are treated as more important:
    1. Normalize scale:
        1. Linear transformatio to be between [0,1].
        2. Scale to have 0 mean and 1 variance.
        3. Caution: Sometimes scale matters.
2. Irrelevant attributes can add noise to distance measure. 
    1. Remove attributes.
    2. Adapt weights using regularization techniques (using othere types of classifier).
3. Computation:O(NK)
    1. Use subset of dimensions.
    2. Pre-sort training examples into fast data structures(e.g. kd-trees) - Need to read about it.
    3. Compute only approximate distance(e.g. LSH).
    4. Remove redundant data(e.g., condensing).
4. High Dimensional Data: 'Curse of dimensionality'
    1. Required amount of data increases exponentially with dimension.
    2. Computation cost also increases.

## 5. Where kNN can fail?
1. Grid of alteranting dots.
    1. If you choose K=3, there will always 2/3 vote from wrong class.
        1. Can fix by choosing K = 1.
        2. Weighing each points by distance.
<img src="Images/knn_fail.png" alt="Drawing" style="width: 500px;"/>

## 6. References:

1. An Introduction to Statistical Learning Textbook by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
2. University of Michigan EECS 445 - Machine Learning Course (https://github.com/eecs445-f16/umich-eecs445-f16).<br>
3. University of Toronto CSC 411 - Intro. to Machine Learning (http://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/CSC411_Fall16.html).<br>
4. Stanford CS109 - Intro. to proabability for computer scientists (https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/). <br>
5. Few online courses on Udemy, Coursera etc.