# Machine Learning in Python
#### by: Chenshu Liu
Reference link: https://youtu.be/rLOyrWV8gmA?list=PLGenESRtZKmFRJ7ZLXbhQRygnRwQOolXi

## Table of Contents
1. [KNN](#KNN)
2. [Linear Regression](#Linear-Regression)
3. [Logistic Regression](#Logistic-Regression)
4. [Regression Refactoring](#Regression-Refactoring)
5. [Naive Bayes](#Naive-Bayes)
6. [Perceptron](#Perceptron)
7. [SVM](#SVM)
8. [Decision Tree](#Decision-Tree)
9. [Random Forest](#Random-Forest)
10. [PCA](#PCA)
11. [K-Means](#K\-Means)
12. [AdaBoost](#AdaBoost)
13. [LDA](#LDA)

## Libraries

In [8]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from collections import Counter
cmap = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])

## KNN
> KNN is the abreviation of K Nearest Neighbors. 

The process of determining the classification of the sample is find `k` (_defined by user_) nearest neighbors of the sample point (**e.g. the green point**) and calculate the euclidean distance of the neighbor points (**the blue and orange points**) from the sample points. 

Since we already know the categories that the neighbor points belong to, we can use the majority of the neighbor category as the classification for the sample point (_e.g. in the case of the image shown below, since two of the three points are blue, the sample point, i.e. the green point, is classified as blue_).

![K-Nearest Neighbor](Images/KNN.jpeg)

In [17]:
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.2, 
                                                    random_state = 1234)
def euclidean_distance(x1, x2):
    return np.sqrt(np.sum(x1 - x2)**2)

class KNN:
    def __init__(self, k = 3):
        assert (k >= 1) & (type(k) == int), f"k value {k} is invalid"
        self.k = k
        
    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
    
    # to predict sample cases
    def predict(self, X):
        predicted_labels = [self._predict(x) for x in X]
        return np.array(predicted_labels)
        
    def _predict(self, x):
        # compute the distances
        distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
        # get k nearest samples and labels
        k_indices = np.argsort(distances)[:self.k]
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # majority vote --> most common class label
        # the Counter function returns a tuple(value, # of occurrence)
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]
    
clf = KNN(k = 3)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
accuracy = np.sum(predictions == y_test) / len(y_test)
print(f"The prediction accuracy is {accuracy}.")

The prediction accuracy is 0.9333333333333333.


## Linear Regression

## Logistic Regression

## Regression Refactoring

## Naive Bayes

## Perceptron

## SVM

## Decision Tree

## Random Forest

## PCA

## K-Means

## AdaBoost

## LDA