# Week 5: Supervised Learning and Regression


### <a id='Table-of-Contents'>Table of Contents</a>

#### [1.1) KNN](#1.1-knn)

#### [1.2) Cross Validation](#1.2-cross-validation)

#### [2.1) Linear Regression Intro](#2.1-linear-regression-intro)

#### [2.2) Predictive Linear Regression](#2.2-predictive-linear-regression)

#### [3.1) Regularized Regression](#3.1-Regularized-Regression)

#### [3.2) Inferential Regression](#3.2-Inferential-Regression)

#### [4.1) Logistic Regression](#4.1-Logistic-Regression)

#### [4.2) Classification Measures of Effectiveness](#4.2-Classification-Measures-of-Effectiveness)
___
#### [Assessment 4](#Assessment-4)
___

## <a id='1.1-knn'>1.1) KNN</a>
### k Nearest Neighbors
* non-parametric

#### Key Concepts
* kNN for **classification**: predict category, most common label
* kNN for **regression**: predict number, average label
* pseudocode for kNN prediction algorithm:

```
1. Calculate the distances from each data point to x
2. Order the points by distance from lowest to highest
 - The first k points are the "k nearest neighbors"
3. Return a prediction:
    If this is a classification problem (y is categorical):
        return the most common label among the k nearest neighbors
    If this is a regression problem (y is continuous):
        return the mean of the y values of the k nearest neighbors
```
* **similarity: s(p1,p2) = $\frac{1}{1 + d(p1,p2)}$**
    * Cosine Similarity
    > s(a,b) = $\frac{a \cdot b}{||a|| \times ||b||}$ = $\frac{\sum_{i=1}^{N} a_i b_i}{\sqrt{\sum_{i=1}^{N} a_i^2}\sqrt{\sum_{i=1}^{N} b_i^2}}$
    * Jaccard Similarity
* **distance: d(p1,p2) = 1 − s(p1,p2)**
    * Euclidean Distance, or L2 norm
    > d(a,b) = ||a - b|| = $\sqrt{\sum_{i=1}^{N} (a_i - b_i)^2}$
    * Manhattan Distance, or L1 norm
    > d(a,b) = $\sum_{i=1}^{N} |a_i - b_i|$
    * Minkowski Distance, a generalized form
    > d(a,b) = $(\sum_{i=1}^{N} (a_i - b_i)^p)^{1/p}$
    
* **curse of dimensionality** (FOR ML reliant on distance IN GENERAL!)
> As the # of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially.

    * For kNN, nearest neighbors tend to be far away in high dimensions
    * kNN works pretty well (in general) for dimensions < 5
* choosing an optimal k
    * low k, k = 1 (1NN)
        * overfitting risks
        * too much flexibility (high variance)
        * very high sensitivity to noise/outliers
    * high k
        * overall performance decreases
        * lack of flexibility (low variance)
        * high bias
    * cross-validation approach
        * error (1 - accuracy) can be plotted as a function of complexity (1/k)
            * For kNN, the complexity of the model decreases as k increases
        * In cases where minimum test error is fairly flat, lean towards less complexity (higher k)
    * general rule: start with $k = \sqrt{n}$
* limitation of kNN: accuracy increases as training data increases, but prediction time grows

[Table of Contents](#Table-of-Contents)

## <a id='1.2-cross-validation'>1.2) Cross Validation</a>

[Table of Contents](#Table-of-Contents)

## <a id='2.1-linear-regression-intro'>2.1) Linear Regression Intro</a>

[Table of Contents](#Table-of-Contents)

## <a id='2.2-predictive-linear-regression'>2.2) Predictive Linear Regression</a>

[Table of Contents](#Table-of-Contents)

## <a id='3.1-Regularized-Regression'>3.1) Regularized Regression</a>

[Table of Contents](#Table-of-Contents)

## <a id='3.2-Inferential-Regression'>3.2) Inferential Regression</a>

[Table of Contents](#Table-of-Contents)

## <a id='4.1-Logistic-Regression'>4.1) Logistic Regression</a>

[Table of Contents](#Table-of-Contents)

## <a id='4.2-Classification-Measures-of-Effectiveness'>4.2) Classification Measures of Effectiveness</a>

[Table of Contents](#Table-of-Contents)

___

## <a id='Assessment-4'>Assessment 4</a>

[Table of Contents](#Table-of-Contents)