# kNN (k Nearest Neighbors)

## Instance Based Learning   
---
Previous algorithms used $X$ features to describe $y$, finding a generalized function $f(x)$:

$f(x) = wx+b$

For instance based learning create a database of all $x, y$ relationships and upon receiving a new value $x$, simply perform a lookup of $x$ to find $y$. 

$f(x) = \textrm{lookup}(x)$

### Advantages:
- remembers (exact recording of the $x$ to $y$ relationship)
- fast (does not have to perform learning)
- simple (low complexity)

### Disadvantages:
- No generalization (can't return unseen data)  
- Models noise exactly
- Overfitting
	- believes data too much
	- what if to x is stored multiple time?  return both y's?  

## kNN Algorithm
---
Look at a set of nearest neighbors and selects the majority label. 

### Pseudocode 
#### 1. Given : 
- Training Data: $D= \{ x_i, y_i \}$
- Distance Metric: $d(q, x)$
    - Or some similarity metric
    - note this is domain knowledge (up to designer)
- Number of Neighbors: $k$
    - note this is domain knowledge (up to designer)
- Query Point: $q$

#### 2. Find:
- A set of Nearest Neighbors, $NN$, such that they are the $k$ closest to your query point   
$NN = \{ i: d(q, x_i), k_{\textrm{smallest}} \} $

#### 3. Return:
- Classification
    - vote of the $y_i \in NN$, where $y_i$ is the plurality (majority)  
        - could do a weighted vote by how far way point is  
        - note this is domain knowledge (up to designer)
	- ties - randomly pick, or take one that is most common in data set, or ...  
        - note this is domain knowledge (up to designer)

- Regression
    - mean of the $y_i \in NN$
        - could do a weighted average
        - note this is domain knowledge (up to designer)

### Domain Knowledge
All the possible paramters to change means you can come up with completely different answers depending on the values you choose for those parameters 

## Computation and Space Complexity (Big-O)
---
Assuming sorted data

| Model | Task Type| Running Time | Space  |
| :------ | :------: | :-----: |  :-----: |
|  **1-NN** | learning   |  $1$  |   $n$ |
|  **1-NN** | query  |  $\log{n}$  | $1$ |
|  **k-NN** | learning  |  $1$  |  $n$  |
|  **k-NN** | query  |  $\log{n} + k$   |  1  |
|  **linear reg.** | learning  | $n$ |   $1$ (m, b) |
|  **linear reg.** | query  |  $1$  |  $1$  |

kNN is a lazy learner - put off any learning unto absolutely has to  
linear regression is an eager learner - learns right away   

## kNN Domain Knowledge
---
Distance metrics  
- *Euclidean*: $ d = \sqrt{(y_2 - y_1)^2 + (x_2 - x_1)^2} $
- *Manhattan*: $ d = \lvert y_2 - y_1 \rvert + \lvert x_2 - x_1 \rvert  $

Data (from $y = x_1^2 + x_2$)     

| $X$       | $y$ | 
|:---:		| :---: | 
|	1, 6	|  	7	|  
|	2, 4  	| 	8 	|
|	3, 7	| 	16 	|  
|	6, 8 	| 	44	|	
|	7, 1	|	50 	|  
|	8, 4	| 	68	|   

query point:
$ q = 4, 2 $   

Prediction   

|  type 	| $k$ 	| $ave$	|
| :------     | :------: | :-----: |
| Euclidean   | 1 		 |	  8    |
| Euclidean   | 3 		 |	  42   |
| Manhattan   | 1 		 |	  29   |
| Manhattan   | 3		 |	  35.5 |   

Data was from function $y = x_1^2 + x_2$  
So for q = 4,2;  y = 18  
**kNN did not do well in this case due to its bias**

## kNN Bias
---
### Preference Bias
Why we prefer one hypothesis over another.  Our belief about what makes a good hypothesis.

- Locality
    - near points are similar to one another
	- this aspect is embedded in the distance function selected
	- good distance functions and bad one - specific to domain
	
- Smoothness
    - averaging  
    - expecting smooth changing data

- All features matter equally

## The Curse of Dimensionality
---
<img src="../images/curse_of_dimensionality.png" width=600 align="left"/>   
**As the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially!**    $O(2^d)$

## Summary
---
- distance metric - $d(x,q)$ - really matters 
    - If you pick the wrong one you get strange behavior
    - Euclidean, Manhattan  - (weighted)  
    - Distance function is just a black box (how similar arbitrary things are)  
- how you pick $k$ is important
    - what if k = n and use a weighted average or some type of regression over region
    - **locally weighted regression** - curve fitting  
        - locally weighted regression uses nearby or distance-weighted training examples to form a local approximation to f 
    - can replace average with regression or classification  

TODO - kNN from scratch
https://github.com/xbno/Projects/blob/master/Models_Scratch/KNN%20from%20scratch.ipynb   
and locally weighted regression