### Machine Learning Basic
<img src="pics/canada.jpg" width="800" height="400">

### Agenda
1. What is Machine Learning.
2. Types of Data.
3. Types of Machine Learning.
4. Data preprocessing.

### 1. What is Machine Learning
Machine learning is a subset field in field of Artificial Intelligence which aims to create algorithms that can learn from experience or data to solve problems, instead of human explicitly program it to do so.

<img src="pics/AI-vs-ML-vs-Deep-Learning.png" width="700" height="350">

### 2. Types of Data
Because data is the most important component of doing machine learning project, therefore, let's learn about types of data in the world of machine learning. There are two types of data, Quantitative and Qualitative, each has different ways to dealing with.
#### 2.1. Quantitative data
We can think about quantitative data as something that we can represent as number or thing that we can measure it objectively (can be counted). There are two types of quatitative data, continuous and discrete. For instance, height, width, length, temperature, humidity and price are continuous data, number of people and age are discrete data.
#### 2.2. Qualitative data
Qualitative data are about characteristics and descriptors that cannot be easily measure (cannot be counted). For example, gender, color, nation, texture, taste, smell, level of pain and mood.

### 3. Types of Machine Learning and applications
Main types of Machine learning can be divided into 3 types by the ways it learns.

<img src="pics/types_of_ML.png" width="700" height="350">

#### 3.1. Supervised Learning
Supervised learning is a type of machine learning which learns to predict target output from given input features by **learning from example pair of input and output data**, for example, if you want model to predict *age* of people from they *image* you must have images of his/her pair with his/her *age*, and we call this *age* as **labels**.

There are two types of supervised learning model, **regression** and **classification**, the only different is that regression outputs **quantitative** value but classification outputs **qualitative** value. As the prior example, is it regression or classification?

Examples of Supervised learning:
- K-Nearest Neighbors (KNN)  
KNN is the simplest algorithm of supervised learning model, the way it works is simple. Imagine that you want to sell your phone to buy a new one, at first you have no idea what price you should set to sell, then you search on the internet to see what price other people set for the same model of the phone you have and you set your price to be the same or close to them. This is exactly what KNN does, it searchs for the closest K samples then output the average value.

<img src="pics/knn.png" height="800" width="400">

- Linear Regression  
The way linear regression works is that it try to draw a single line which represent all of the samples we gave it, by trying to minimize summation of errors between all sample points and the line when error is the distance measure from each sample point to the line. Linear regression is as it name, it's only used for regression problem.

<img src="pics/linear_regression.png" width="350">

- Logistic Regression  
Logistic regression is similar to linear regression except that it's used for classification problem by adding sigmoid function at the tail of linear regression equation, so that it output range changed from (-inf, inf) to (0, 1) and can be used as binary classification.

<img src="pics/logistic_regression.png" width="800">

- Decision Tree  
You can think of decision tree as a serial of if-else conditions node but what interesting is that you aren't the one who decide which feature and what value to be used at each node, but the algorithm looks at the dataset and creates these serial of node on it own.

<img src="pics/decision_tree.png" height="600" width="500">

- Random Forest  
Because decision tree tend to overfit to the datasets, random forest is one way to solve the overfitting problem of decision tree by using multiple trees instead of only one tree, the outputs of each tree are voted to the final output.

<img src="pics/random_forest.png" height="600" width="500">

- SVM (Support Vector Machine)  
SVM is a classification model which works by drawing a line that separate each class with objective to maximize margin between each class. But what interesting about SVM is that it can create non-linear function to separate non-linear data by projecting the data from one space into another space which data in the new space can be separated by linear hyperplane, then compute a linear hyperplane to separate the data then project the data and the hyperplane back into original space, the hyperplane which projected back into original space will be a non-linear function.

<img src="pics/svm.png"  height="600" width="300">
<img src="pics/svm.gif"  height="600" width="300">

#### 3.2. Unsupervised Learning
Unsupervised learning is a type of machine learning which learns to cluster data into groups or reduce dimension of data by learning from the data without output pair sample.

<img src="pics/cluster.png" height="800" width="700">

Examples of cluster algorithms:
- K-means cluster
- Hierarchical cluster

<img src="pics/tSNE.gif" height="800" width="400">

Example of dimensional reduction:
- PCA (Principal Component Analysis)
- LDA (Linear Discriminant Analysis)
- GDA (Generalized Discriminant Analysis)
- t-SNE (t-Distributed Stochastic Neighbor Embedding)

#### 3.3. Reinforcement Learning
Reinforcement learning is a type of machine learning which learns to taking suitable action in a given environment to maximize return reward via trial and error process.

<img src="pics/Reinforcement_learning_diagram.png" height="600" width="300">

Examples of Reinforcement learning:
- Q-Learning
- Deep Q-Learning
- SARSA (State-Action-Reward-State-Action)
- DDPG (Deep Deterministic Policy Gradient)

### 4. Data preprocessing
There are various ways to preprocess data depending on types of data and algorithms, some algorithms may require different way of processing but the following are the most common way to process data.

#### 4.1. Quantitative data
Quantitative data of different features usually have different range, for example, age and income of people are completely in the different range. If we feed this data to the model directly for the model to learn its parameters, it will has hard time learning the parameters. For this reason, it is best practice to normalize or standardize quantitative features of the model to have to the same range or the same distribution.

- Standardization:  
Standardization is the way to transform a feature to have the same mean of 0 and standard deviation of 1, using the following equations.
#### $$x_i = \frac{x_i - \bar{x}}{S} \dotsm (1)$$
#### $$\bar{x} = \frac{1}{N}\sum_{i=1}^{N} x_i \dotsm (2)$$
#### $$S = \sqrt{\frac{1}{N}\sum_{i=1}^{N} (x_i - \bar{x})^2} \dotsm (3)$$
Where  
$\bar{x}$: is the mean of the feature.  
$S$: is the standard deviation of the feature.

- Normalization:  
Normalization is means to scale a feature to have values in range of 0 and 1, using the following equation.
#### $$x_i = \frac{x_i - x_{min}}{x_{max} - x_{min}}$$
Where  
$x_{min}$: is the minimum value of the feature.  
$x_{max}$: is the maximum value of the feature.

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

In [None]:
# Implement Standardization
class Standardization:
    def __init__(self, epsilon=1e-9):
        """
        epsilon is a constant value used to avoid division by zero
        """
        # Put your code here
        pass

    def fit(self, x):
        """
        x is DataFrame or ndarray of shape (batch_size, feature_nums)
        """
        # Put your code here
        pass

    def transform(self, x):
        """
        x is DataFrame or ndarray of shape (batch_size, feature_nums)
        return ndarray
        """
        # Put your code here
        pass

    def fit_transform(self, x):
        """
        x is DataFrame or ndarray of shape (batch_size, feature_nums)
        """
        self.fit(x)
        return self.transform(x)

In [None]:
# Test Standardization
example_scaler = StandardScaler()
scaler = Standardization()

# x = np.random.normal(loc=5.0, scale=2.0, size=(1, 10))
x = np.random.normal(loc=5.0, scale=2.0, size=(1000, 10))
x = pd.DataFrame(x)
print("x mean: {}".format(np.mean(x)))
print("x std: {}".format(np.std(x)))

x_example = example_scaler.fit_transform(x)
print("x_example mean: {:.4f}".format(np.mean(x_example)))
print("x_example std: {:.4f}".format(np.std(x_example)))

x_new = scaler.fit_transform(x)
print("x_new mean: {:.4f}".format(np.mean(x_new)))
print("x_new std: {:.4f}".format(np.std(x_new)))

assert x_example.shape == x_new.shape
assert np.mean(np.mean(x_example, axis=0) - np.mean(x_new, axis=0)) <= 1e-7
assert np.mean(np.std(x_example, axis=0) - np.std(x_new, axis=0)) <= 1e-7
assert np.mean(x_example - x_new) <= 1e-7
print("pass")

In [None]:
# Implement Normalization
class Normalization:
    def __init__(self, epsilon=1e-9):
        """
        epsilon is a constant value used to avoid division by zero
        """
        # Put your code here
        pass
    
    def fit(self, x):
        """
        x is DataFrame or ndarray of shape (batch_size, feature_nums)
        """
        # Put your code here
        pass
    
    def transform(self, x):
        """
        x is DataFrame or ndarray of shape (batch_size, feature_nums)
        return ndarray
        """
        # Put your code here
        pass
    
    def fit_transform(self, x):
        """
        x is DataFrame or ndarray of shape (batch_size, feature_nums)
        """
        self.fit(x)
        return self.transform(x)

In [None]:
# Test Normalization
example_scaler = MinMaxScaler()
scaler = Normalization()

# x = np.random.normal(loc=5.0, scale=2.0, size=(1, 10))
x = np.random.normal(loc=5.0, scale=2.0, size=(1000, 10))
x = pd.DataFrame(x)
print("x max: {}".format(np.max(x)))
print("x min: {}".format(np.min(x)))

x_example = example_scaler.fit_transform(x)
print("x_example max: {:.4f}".format(np.max(x_example)))
print("x_example min: {:.4f}".format(np.min(x_example)))

x_new = scaler.fit_transform(x)
print("x_new max: {:.4f}".format(np.max(x_new)))
print("x_new min: {:.4f}".format(np.min(x_new)))

assert x_example.shape == x_new.shape
assert np.mean(np.mean(x_example, axis=0) - np.mean(x_new, axis=0)) <= 1e-7
assert np.mean(np.std(x_example, axis=0) - np.std(x_new, axis=0)) <= 1e-7
assert np.mean(x_example - x_new) <= 1e-7
print("pass")

#### 4.2. Qualitative data
Because qualitative data is usually represented by strings but most of machine learning require data to be numbers. For this reason, there are several ways to encode the data to be represented by numbers.
- Label encoding:  
Concept of label enconding is pretty straightforward by encoding each unique string with unique number, for example, let's say you have feature for animals as "cat", "dog" and "bird" you can encode them to be 0, 1 and 2 respectively.

<img src="pics/label_encoding.png" width="700">

- One-hot encoding:  
Encoding categorical data with label encoding is only understandable for human but not for machine, for instance, when we encode "cat", "dog" and "bird" as 0, 1 and 2 respectively, we know that 0 is refer to "cat" and 1 is refer to "dog" but 0 and 1 have no mathematics relationship, so that "cat" is not less than "dog" or not similar to "dog" than "bird", unfortunately this is not the case for machine. Concept of one-hot encoding is to create dummy columns for each class and value in the column would be either 0 or 1 as figure below.

<img src="pics/onehot_encoder.jpg" width="700">

<img src="pics/onehot_encoder-2.png" width="900">

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder as ExampleLabelEncoder
from sklearn.preprocessing import OneHotEncoder as ExampleOneHotEncoder

In [None]:
# Implement Label encoding
class LabelEncoder:
    def __init__(self):
        # Put your code here
        pass
    
    def fit(self, x):
        """
        x is Series of shape (batch_size, )
        """
        # Put your code here
        pass
    
    def transform(self, x):
        """
        x is Series of shape (batch_size, )
        """
        # Put your code here
        pass
    
    def fit_transform(self, x):
        """
        x is Series of shape (batch_size, )
        """
        self.fit(x)
        return self.transform(x)

In [None]:
# Test Label encoding
example_encoder = ExampleLabelEncoder()
encoder = LabelEncoder()

x = pd.DataFrame({"animal": ["cat", "dog", "bird", "horse", "duck", "duck", "fox", "dog", "cat", "dog"],
                  "gender": ["male", "male", "female", "male", "female", "female", "male", "female", "female", "male"]})

x_example_1 = example_encoder.fit_transform(x.iloc[:, 0])
x_example_2 = example_encoder.fit_transform(x.iloc[:, 1])
print("x_example_1:")
print(x_example_1)
print("x_example_2:")
print(x_example_2)
print()

x_encoded_1 = encoder.fit_transform(x.iloc[:, 0])
x_encoded_2 = encoder.fit_transform(x.iloc[:, 1])
print("x_encoded_1:")
print(x_encoded_1)
print("x_encoded_2:")
print(x_encoded_2)

In [None]:
# Implement One-hot encoding
class OneHotEncoder:
    def __init__(self):
        # Put your code here
        pass
    
    def fit(self, x):
        """
        x is Series of shape (batch_size, )
        """
        # Put your code here
        pass
    
    def transform(self, x):
        """
        x is Series of shape (batch_size, )
        """
        # Put your code here
        pass
    
    def fit_transform(self, x):
        """
        x is Series of shape (batch_size, )
        """
        self.fit(x)
        return self.transform(x)

In [None]:
# Test One-hot encoding
example_encoder = ExampleOneHotEncoder()
encoder = OneHotEncoder()

x = pd.DataFrame({"animal": ["cat", "dog", "bird", "horse", "duck", "duck", "fox", "dog", "cat", "dog"],
                  "gender": ["male", "male", "female", "male", "female", "female", "male", "female", "female", "male"]})

x_example_1 = example_encoder.fit_transform(x.iloc[:, 0].to_numpy().reshape(-1, 1))
x_example_2 = example_encoder.fit_transform(x.iloc[:, 1].to_numpy().reshape(-1, 1))
print("x_example_1:")
print(x_example_1.toarray())
print("x_example_2:")
print(x_example_2.toarray())
print()

x_encoded_1 = encoder.fit_transform(x.iloc[:, 0])
x_encoded_2 = encoder.fit_transform(x.iloc[:, 1])
print("x_encoded_1:")
print(x_encoded_1)
print("x_encoded_2:")
print(x_encoded_2)