<div style="text-align:center">
    <img src="../files/monolearn-logo.png" height="150px">
    <h1>ML course</h1>
    <h3>Session 07: Regularization, Ridge, Lasso, Naïve bayes</h3>
    <h4><a href="https://amzenterprise.ir/">Ali Momenzadeh</a></h5>
</div>

### Regularization

<img src = "../files/7/1_3XvSvKfde8u89TMwjkz3kg.png" width=60%>

#### There are 3 types of Regularization. We will cover the first two in this article

* L1 regularization (Lasso)
* L2 regularization (Ridge)
* Dropout regularization (Neural Networks and Elastic Net regularization)

A regression model that uses L1 regularization technique is called Lasso Regression and model which uses L2 is called Ridge Regression. The key difference between these two is the penalty term.

<img src = "../files/7/1_SBUK2QEfCP-zvJmKm14wGQ.png">

#### Lasso Regression

Least absolute shrinkage and selection operator regression (usually just called lasso regression) is another regularized version of linear regression: just like peak regression, it adds a regularization term to the cost function. , but it uses the ℓ1 norm of the weight vector instead of half the square of the ℓ2 norm.

<img src = "../files/7/1_q0ekc8YEoIoFI7EfM2ZdfA.png">

A model that uses l1 regularization is called Lasso regression. Lasso regression (Least Absolute Shrinkage and Selection Operator) adds the absolute value of the magnitude of coefficient as penalty term to the loss function.

The value of lambda must be balanced. A very small value will lead back to an OLS (Ordinary Least Square) and a very large value will drive the coefficients to zero. Hence the model will under-fit.

#### Ridge Regression

Ridge regression is a regularized version of linear regression. This forces the training algorithm not only to fit the data but also to keep the model weights as small as possible.
Note that the accrual term should only be added to the cost function during training. After you train the model, you want to use the unregulated performance measure to evaluate the performance of the model.

<img src = "../files/7/1_LJXbFDr8xHq72UOdkGrTDA.png">

A model that uses l2 regularization is called Ridge regression. It is one of the more widely used techniques. This technique adds the “squared magnitude” of the coefficient as the penalty to the loss function. Here the value of lambda should be chosen appropriately just like l1 regularization. A small value of lambda will lead to OLS and a large value will lead to an under-fitting issue.

ues. This technique adds the “squared magnitude” of the coefficient as the penalty to the loss function. Here the value of lambda should be chosen appropriately just like l1 regularization. A small value of lambda will lead to OLS and a large value will lead to an under-fitting issue.

<img src = "../files/7/Ridge_Regression_print.png" width=50%>

##### One key point to note about l1 regularization (or) Lasso regression is that it shrinks the less important features to zero making it extremely useful for feature selection.

#### Import Libraries

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

import warnings
warnings.filterwarnings('ignore')

#### Load and prepare data

In [None]:
data = pd.read_csv("advertising.csv")
data.head()

#### EDA

In [None]:
def scatter_plot(feature, target):
    plt.scatter(data[feature],
                data[target],
                c='black'
                )
    plt.xlabel("Money Spent on {} ads ($)".format(feature))
    plt.ylabel("Sales ($k)")
    plt.show()
scatter_plot("TV", "Sales")
scatter_plot("Radio", "Sales")
scatter_plot("Newspaper", "Sales")

#### Train and test

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

xs = data.drop(["Sales"], axis=1)
y = data["Sales"].values.reshape(-1,1)
linreg = LinearRegression()
r2score = cross_val_score(linreg, xs, y, scoring="r2", cv=5)

mean_r2score = np.mean(r2score)
print(mean_r2score)

### Ridge Regression

For the ridge regression algorithm, I will use GridSearchCV model provided by Scikit-learn, which will allow us to automatically perform the 5-fold cross-validation to find the optimal value of alpha.

This is how the code looks like for the Ridge Regression algorithm:

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge

ridge = Ridge()
parameters = {"alpha":[1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]}
ridge_regression = GridSearchCV(ridge, parameters, scoring='r2', cv=5)
ridge_regression.fit(xs, y)

print(ridge_regression.best_params_)
print(ridge_regression.best_score_)

### Lasso Regression

For the Lasso Regression also we need to follow the same process as we did in the Ridge Regression. This is how the code looks like:

In [None]:
from sklearn.linear_model import Lasso

lasso = Lasso()
parameters = {"alpha":[1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]}
lasso_regression = GridSearchCV(lasso, parameters, scoring='r2', cv=5)
lasso_regression.fit(xs, y)

print(lasso_regression.best_params_)
print(lasso_regression.best_score_)

<hr/>

### Naive Bayes

<img src = "../files/7/naive-bayes-classifier.jpg" width=50%>

https://www.kdnuggets.com/2020/06/naive-bayes-algorithm-everything.html

Naive Bayes is a **classification** algorithm that works based on the Bayes theorem. Before explaining about Naive Bayes, first, we should discuss Bayes Theorem. Bayes theorem is used to find the probability of a hypothesis with given evidence.

<img src = "../files/7/34725nv1.png" width=50%>

In this, using Bayes theorem we can find the probability of A, given that B occurred.

> A is the hypothesis and B is the evidence.

> P(B|A) is the probability of B given that A is True.

> P(A) and P(B) is the independent probabilities of A and B.

<img src = "../files/7/30337nv.png" width=50%>

#### The concept behind the algorithm

Let’s understand the concept of the Naive Bayes Theorem through an example. We are taking a dataset of employees in a company, our aim is to create a model to find whether a person is going to the office by driving or walking using salary and age of the person.

<img src = "../files/7/75704nv3.png" width=50%>

* Note that we are taken age on the X-axis and Salary on the Y-axis. 

In the above, we can see 30 data points in which red points belong to those who are walking and green belongs to those who are driving.

<img src = "../files/7/60483nv4.png" width=50%>

Now let’s add a new data point into it. Our aim is to find the category that the new point belongs to.

<img src = "../files/7/67866nv5.png" width=50%>

#### Import libraries

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

import warnings
warnings.filterwarnings('ignore')

#### Load and prepare data

<img src = "../files/7/facebook_ads_instagram_ads.jpg" width=50%>

We are using the Social network ad dataset. The dataset contains the details of users in a social networking site to find whether a user buys a product by clicking the ad on the site based on their salary, age, and gender.

In [None]:
df = pd.read_csv("Social_Network_Ads.csv")

#### EDA

In [None]:
df.head()

In [None]:
df.info()

In [None]:
pd.crosstab(df.Gender,df.Purchased)

In [None]:
df["Gender"].value_counts()

In [None]:
df.describe()

In [None]:
X = df.iloc[:, [1, 2, 3]]
y = df.iloc[:, -1]

In [None]:
X

In [None]:
y

#### Encoding the independent variable (LabelEncoder) 

In [None]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
X['Gender'] = le.fit_transform(X['Gender'])

In [None]:
X

#### Train and test

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

#### Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
X_train[0]

#### Train and test

In [None]:
from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

In [None]:
y_pred

#### Evaluation

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
sns.heatmap(cm, annot=True)

In [None]:
ac = accuracy_score(y_test,y_pred)
ac