## **1. KNN For Classification**
---

### Classifying Models with KNN
- If we have a space with a number of labelled feature vectors (ie. each vector of features corresponds to a certain label) then we can look at the nearest feature vectos and see what their labels are
- So, each new observation is compared against K other labelled features with similar observations, and we see what the highest label proportion is, and then that new observation inherits that label

### Choosing K
- Sometimes there can be an equal number of points with each label for the K nearest neighbours, so we have an unsure classification
- To solve this, we can sometimes only choose odd values of K, or we can weight points by their distance to the new observation
- K is just another hyperparameter we can tune to get the best model for prediction

### What is needed to select a KNN Model?
- We need a correct value of K (ie. how many nearest neighbours)
- We also need a way to measure the closeness of neighbours (number of ways to do this)


## **2. Decision Boundary for KNN**
---
- Shows the regions in space where a point would be classified to a certain region if it landed there
- The decision boundary depends on our value of K
    - Choosing K=1 will just give you a model that predicts based on whichever point your observation vector is closest too (Underfit)
    - Choosing K=(all points) will give you a model that predicts the majority class for everything, as every time all points are considered (Overfit)

### Choosing the right value for K
- KNN does not provide a "correct" value for K, and the right value depends on what error metric is most important to you
- A common approach is to plot the error rate as a function of K, and find the so called "elbow point" (essentially the minima of the curve)

## **3. Measurement of Distance in KNN**
---
### Euclidean Distance
- The actual physical distance between two points
- Take the square-root of the sum of the squares of the distance in each axes (square root of square of differences between each feature)

### Manhattan Distance L1 Norm
- The sum of the absolute distances between each feature

### Scale for Distance Measurement
- Since our model is so dependent on distance between features, the scaling of features matters a lot for feature importance and classification
- So we need to scale our features to the same range/mean/variance:
    - Min/max scaling: (fit data into 0-1 range)
    - Standard scaling: (take the z-score of the data)

### Prediction of Multiple classes
- Very simple for KNN, as our decision boundaries just fit around new classes (we just look at which class is the majority nearest)
- If we have N classes, we may choose to make K = aN + 1 so that one class always has majority vote

## **4. Regression with KNN**
---
- Also incredibly simple, where our predicted value is just the mean value of all of its K neighbours

- We can also choose to weight each neighbour by its relative distance to the point
    - For large K, we basically always predict that new observations have value equal to the mean of the data
    - For medium K, KNN acts as a smoothing function between points
    - For K=1, KNN just predicts the output as the value of the nearest neighbour

## **5. Pros and Cons of KNN**
---
### Pros
- Simple to implement (no estimation required)
- Adapts well as new data is introduced
- Easy to interpret, very useful when leveraging machine learning for business

### Cons
- Slow to predict because there are many distance calculations
- Does not generate insight into the data generating process (no model)
- Can require a lot of memory if dataset is large (or as it grows), as the model needs to store all the values in the training set every time it fits the model
- When there are many features, KNN accuracy can break down due to more dimensions meaning more distances ie. larger error

### KNN vs Linear Regression
- Linear regression:
    - Fitting involves minimising cost function (slow)
    - Model has few parameters (memory efficient)
    - Prediction involves calculation (fast)

- KNN
    - Fitting involves storing training data (fast)
    - Model has many parameters (memory intensive)
    - Prediction involves finding closesnt neighbors (slow)


## **6. Implementing KNN**
---

In [None]:
# Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier

# Create an instance of the KNN classifier
KNN = KNeighborsClassifier(n_neighbors=3)

# Fit the model to the training data
KNN = KNN.fit(X_train, y_train) # type: ignore

# Predict the response for the test dataset
y_predict = KNN.predict(X_test) # type: ignore

# Regression can also be done with:
from sklearn.neighbors import KNeighborsRegressor