The implementation of Naive Bayes and Logistic Regression is supposed to be from scratch.

# Naive Bayes (50)

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots, x_n \mid y)} {P(x_1, \dots, x_n)}$

$P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y)$

$P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)}{P(x_1, \dots, x_n)}$

$\begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y)\end{aligned}\end{align}$

In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They require a small amount of training data to estimate the necessary parameters. 

Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality.

On the flip side, although naive Bayes is known as a decent classifier, it is known to be a bad estimator, so the probability outputs are not to be taken too seriously.

In [2]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
np.random.seed(123)

### Dataset
Load the given dataset. The last column contains the labels. 

Preprocess if needed.

In [3]:
#TODO: Load the .txt file
data = ... # Numpy array format
labels = ... # Numpy array format

Consider the values of each class. Create a dictionary for the dataset, with classes as keys and the entries of the dataset as values.

In [4]:
def create_class_dictionary(data, labels):
  #TODO
  pass

For the dataset dictionary, find the mean and standard deviation of all classes. The output format should be a list of two lists, the first one the mean and standard deviation of the first column and the second one is for the second column.

In [6]:
def info(data):
  #TODO
  pass

In [7]:
def class_info(class_dictionary):
  #TODO: call the info function to return the mean and standard deviation of each column for each class
  pass

### Visualization
Use the imported libraries to visualize the given data. 

Why is the info step valid in this dataset? 

What is the type of this dataset's distribution? With other distribution types, what action would be needed to obtain the mean and standard variation info?

In [None]:
#TODO

### Model Details

As explained above, to create this model, you need a prior function and a likelihood function.

In the likelihood function, you need to calculate the probability of the query belonging to a class. 

In [8]:
def prior(class_dictionary, labels):
  #TODO
  pass

In [9]:
def likelihood(class_dictionary, query):
  #TODO
  pass

### Predict


In [10]:
def predict(data, labels, query):
  #TODO
  pass

In [11]:
def NB(data, labels, queries):
  #TODO: call the predict function for all queries
  pass

### Test
To test the model, import a suitable dataset from sklearn library to check the accuracy of your model. Then import GaussianNB from sklearn and compare your model's result with it.

In [12]:
def train_test_split(data, labels, test_size):
  #TODO return X_train, X_test, y_train, y_test
  pass

In [14]:
def accuracy(ground_truth, predictions):
  # call the NB function for your chosen dataset
  # calculate accuracy
  pass

In [15]:
# compare with GaussianNB from sklearn

# Linear Regression (35)

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

Ordinary least squares Linear Regression.

LinearRegression from sklearn.linear_model fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

In this section, you will implement a simple linear regression model using sklearn. Only the first feature of the diabetes dataset is required for this part.

In [17]:
from sklearn.metrics import r2_score
from sklearn import datasets, linear_model

In [18]:
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_X = ... # Change this so that your model will use only one feature (arbitary)

Split the data into training and testing sets.

In [19]:
# you can use the train_test_split(data, labels, test_size) function from the previous section

Create the model using sklearn. Then train it using the training set.

In [None]:
model = ... #TODO

Make predictions for the test set.

In [None]:
#TODO

Visualize your predictions and compare them to ground truth using the imported libraries.

In [None]:
#TODO

# Logistic Regression (15 + 50)

This type of statistical model (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the probability of an event occurring, such as voted or didn't vote, based on a given dataset of independent variables.

$S(h(x)) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 x_1 + \theta_1 x_2 + \cdots + \theta_n x_n})} = \frac{1}{1 + e^{-\theta^T x}} \tag{2}$

The sigmoid function is of importance here and is defined as:

$S(x) = \frac{1}{1 + e^{-x}}$






Calculate the sigmoid function and visualize it.

In [21]:
def sigmoid(x):
  #TODO
  pass

In [22]:
#TODO: visualize the sigmoid function for arbitary range of x. you can use np.linspace

#Dataset
Load the given dataset (same as naive bayes). Add a new column at the end of the dataset containing only 1s.

In [23]:
#TODO

### Predictions

Simply implement the math above to make predictions. Since we are using numbers here, define the threshold of 0.5 for classification.

In [24]:
def predict(weights, x):
  #TODO
  pass

### Loss Function
Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0.

In binary classification, where the number of classes M
 equals 2, cross-entropy can be calculated as:

$−(ylog(p)+(1−y)log(1−p))$


In [25]:
def cross_entropy(y_true, y_pred):
  #TODO: calculate cross entropy using the formula above
  pass

### Gradient Descent

Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent in machine learning is simply used to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible.

$\theta := \theta - \alpha \nabla_\theta H \tag{6}$

In [None]:
def gradient_descent(X, y, weight, num_of_epochs, learning_rate = 0.005):
  #TODO: calculate gradient descent
  pass

In [26]:
def LR(train_set, labels, test_set, num_of_epochs, learning_rate = 0.005):
  #TODO: simply gather all the functions you already implemented in this section to make valid and complete predictions for a given dataset
  pass

### Test

To test the model, import a suitable dataset from sklearn library to check the accuracy of your model. Then import LogisticRegression from sklearn and compare your model's result with it.



In [27]:
def accuracy(ground_truth, predictions):
  # call the LR function for your chosen dataset
  # calculate accuracy
  pass

In [None]:
# compare with sklearn.linear_model.LogisticRegression

### Visualization

During your model's training, save the accuracy and loss of each epoch, and then plot them using the imported libraries. Explain the pattern. If the result is not satisfactory, change the learning rate, num of epochs, initial weights, etc and observe their effects on the result.