# **Machine Learning Algorithms** - Autism Spectrum Disorder


## What is Machine Learning?

Machine Learning is a branch of AI which uses many algorithms to improve its performance on analyzing the data and also on making intelligent decisions automatically from the previous experiences. It relies on defining behavioral rules by examining and comparing lage datasets to find common patterns. 

# Types of Machine Learning:

* Supervised Learning
* Unsupervised Learning
* Reinforcement Learning

# Types of ML Algorithms:

* **Regression** - The output that we want to predict will be a continuous variable. Eg: Score of student on a subject.
* **Classification** - The output that we want to predict will be a categorical variable. Eg: Classifying the emails spam (or) ham.
* **Clustering** - The output will be groups or clusters. No predefined notion of the label is allocated. Eg: Customer Segmentation.



Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent and independent variable

### Variables

* **Continuous Variable** - The continuous variables can take any two numbers and also all the set of real values between that two numbers.

* **Categorical Variable** - The Categorical Variables can take two or more categories (values) and assign each of the individual into any one of the category and form a group. Eg: yes/no , healthy/unhealthy.


* **Dependent Variable** - It is a variable that depends on the other factors or variables and it will change according to the according to the factors. In ML it is nothing but the data that we want to predict.

* **Independent Variable** - It ia a variable that does not depend on others factors and it will not change by the impact of the other variables we are trying to measure. In ML it is nothing but the data that is given.

* ***Easy Way to Remember:***
    
    * **Independent Variable** causes an change in the **Dependent Variable** but it is not possible that **Dependent Variable** causes a change in the **Independent Variable**.



### Supervised Learning

* Which is  nothing that there is a supervisor. Supervised Learning is that we train and teach or build the models with the well labelled dataset (or) input-output pairs. It has labels to say what the data represents.

* Both Regression and Classification Algorithms comes under the Supervised Learning.

### Unsupervised Learning

* which implies that there is no supervisor. In Unsupervised Learning we will train and teach (or) build the models on the unstructured data. There are no labels or correct outputs, so our task will be to find the structure of the data (i.e. we want to find the class labels).

* Clustering will come Unsupervised Learning.

### Reinforcement Learning

* Reinforcement Learning uses a reward function which will penalize bad actions and reward the good actions. It is all about making decisions sequentially. The output depends on the state of current input and the next input depends on the state of the previous output.


# Steps Involved In Machine Learning:

* STEP 0 - GET THE REQUIRED DATASET
* STEP 1 - DATA PREPROCESSING
* STEP 2 - BUILD THE MODEL
* STEP 3 - TRAIN THE MODEL
* STEP 4 - TEST THE MODEL
* STEP 5 - DEPLOY THE MODEL

## STEP 0 - GET THE REQUIRED DATASET

WE GOT THE DATASET FROM KAGGLE WEBSITE
Autism Screening data for Toddlers. (Autism data for infants’ classification). July 2018.
https://www.kaggle.com/fabdelja/autism-screening-for-toddlers/version/1

## STEP 1 - DATA PREPROCESSING

What is Data Preprocessing?

Data Preprocessing is Data Mining Technique that used to convert the obtained raw data into an understandable/data driven format. It will help us to remove the missing values, remove noisy data and outliers, removes each and every discrepancies in the data.

### Steps Involved In Data Preprocessing:

* Import the Libraries
* Import the Dataset
* Checking for Missing Values
* Encoding the Categorical Variables
* Spliting dataset into Training Set and Test Set
* Feature Scaling

In [1]:
# Importing the Libraries
import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd

In [2]:
# Import the Dataset
dataset = pd.read_csv("data.csv")

# Knowing the dependent and independent variables
x = dataset.iloc[:, 1:18].values
y = dataset.iloc[:, -1].values

In [3]:
dataset.head()

Unnamed: 0,Case_No,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,Age_Mons,Qchat-10-Score,Sex,Ethnicity,Jaundice,Family_mem_with_ASD,Who completed the test,Class/ASD Traits
0,1,0,0,0,0,0,0,1,1,0,1,28,3,f,middle eastern,yes,no,family member,No
1,2,1,1,0,0,0,1,1,0,0,0,36,4,m,White European,yes,no,family member,Yes
2,3,1,0,0,0,0,0,1,1,0,1,36,4,m,middle eastern,yes,no,family member,Yes
3,4,1,1,1,1,1,1,1,1,1,1,24,10,m,Hispanic,no,no,family member,Yes
4,5,1,1,0,1,1,1,1,1,1,1,20,9,f,White European,no,yes,family member,Yes


In [4]:
print("These are the independent variables\n")
print(x)

print("\n\nThese are the dependent variable\n")
print(y)

These are the independent variables

[[0 0 0 ... 'yes' 'no' 'family member']
 [1 1 0 ... 'yes' 'no' 'family member']
 [1 0 0 ... 'yes' 'no' 'family member']
 ...
 [1 0 1 ... 'yes' 'no' 'family member']
 [1 0 0 ... 'no' 'yes' 'family member']
 [1 1 0 ... 'yes' 'yes' 'family member']]


These are the dependent variable

['No' 'Yes' 'Yes' ... 'Yes' 'No' 'Yes']


In [5]:
# Checking for missing values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = "mean")
imputer.fit(x[:, 10:12])
x[:, 10:12] = imputer.transform(x[:, 10:12])
print(x)

[[0 0 0 ... 'yes' 'no' 'family member']
 [1 1 0 ... 'yes' 'no' 'family member']
 [1 0 0 ... 'yes' 'no' 'family member']
 ...
 [1 0 1 ... 'yes' 'no' 'family member']
 [1 0 0 ... 'no' 'yes' 'family member']
 [1 1 0 ... 'yes' 'yes' 'family member']]


In [6]:
# Encoding the Categorical variables
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
columntrans = ColumnTransformer(transformers = [("encoder", OneHotEncoder(), [12,13,14,15,16])], remainder="passthrough")
x = np.array(columntrans.fit_transform(x))
print(x)

[[1.0 0.0 0.0 ... 1 28.0 3.0]
 [0.0 1.0 0.0 ... 0 36.0 4.0]
 [0.0 1.0 0.0 ... 1 36.0 4.0]
 ...
 [0.0 1.0 0.0 ... 1 18.0 9.0]
 [0.0 1.0 0.0 ... 1 19.0 3.0]
 [0.0 1.0 0.0 ... 0 24.0 6.0]]


In [7]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)
print(y)

[0 1 1 ... 1 0 1]


In [8]:
# Splitting dataset into Training set and Test Set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.35, random_state = 1)

In [9]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
ss = MinMaxScaler()
x_train = ss.fit_transform(x_train)
x_test = ss.transform(x_test)
print(x_train)
print()
print(x_test)

[[0.         1.         0.         ... 1.         0.875      0.4       ]
 [0.         1.         0.         ... 0.         0.125      0.7       ]
 [0.         1.         0.         ... 1.         0.66666667 1.        ]
 ...
 [0.         1.         0.         ... 1.         0.125      0.1       ]
 [0.         1.         0.         ... 1.         0.58333333 0.6       ]
 [1.         0.         0.         ... 1.         0.         0.2       ]]

[[0.         1.         0.         ... 1.         1.         0.9       ]
 [0.         1.         0.         ... 1.         0.25       1.        ]
 [0.         1.         0.         ... 1.         0.75       0.2       ]
 ...
 [1.         0.         0.         ... 1.         0.66666667 1.        ]
 [0.         1.         0.         ... 1.         1.         0.4       ]
 [0.         1.         0.         ... 1.         0.41666667 0.9       ]]


## STEP 2 - BUILD THE MODEL

With the help of the ML Algorithms we can build our own model to predict the outcome of the data. Since there are so many algorithms in which will come under in any one type of the Machine Learning Type.

### Supervised Learning Techniques

* Regression
        * Simple Linear Regression
        * Multiple Linear Regression
        * Polynomial Linear Regression
        * Support Vector Regression
        * Decision Tree Regression
        * Random Forest Regression
        * Ridge Regression
        * Lasso Regression
        * ElasticNet Regression

* Classification
        * Logistic Regression
        * K-Nearest Neighbors (KNN)
        * Support Vector Machine (SVM)
        * Kernel SVM
        * Naive Bayes
        * Decision Tree Classification
        * Random Forest Classification


## STEP 3 - TRAIN THE MODEL

From the above mentioned models select the model which you want to do and then build and then train the model with the training set that we have derived from the dataset.

###Supervised Learning - Regression

* **Simple Linear Regression** - It explains us the relationship between the dependent variable and the independent variaable using a straight line. It tries to approximate the relationship between dependent and independent variables in a straight line.

* **Multiple Linear Regrssion** - It tells us the relationship between two or more independent input variables and a dependent (response) variable. The more information we put into the model, the better chances of explaining the item and make accurate predictions.

In [10]:
# we have done multiple linear regression
from sklearn.linear_model import LinearRegression
LR = LinearRegression()
LR.fit(x_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [11]:
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from math import sqrt
y_pred = LR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: 60.81153733491011
MEAN SQUARED ERROR: 7.742082135787179
MEAN ABSOLUTE ERROR: 23.161204268292682
ROOT MEAN SQUARED ERROR: 27.82459727612815


* **Polynomial Linear Regression** - It tells us the relationship between independent variable and the dependent variable is modelled as an nth degree polynomial. It fits a non-linear relationship between the dependent and independent variable.

In [12]:
# Since the PLR is similar to SLR we need to use them when have single independent variable and dependent variable.
# But this is how we want to implement ( just an outline )
# we cannot do since we many independent variables
from sklearn.preprocessing import PolynomialFeatures
PR = PolynomialFeatures(degree = 2)
x_poly = PR.fit_transform(x)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(x_poly, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

* **Support Vector Regression** - In this we will try to fit the error within a certain threshold. In SVR We are basically considering the points within the boundary line. Our best fit line is the line hyperplane that has maximum number of points.

In [13]:
from sklearn.svm import SVR
SVR = SVR(kernel = 'rbf')
SVR.fit(x_train, y_train)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [14]:
y_pred = SVR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: 80.50570219706609
MEAN SQUARED ERROR: 3.851298175681159
MEAN ABSOLUTE ERROR: 14.524946110755163
ROOT MEAN SQUARED ERROR: 19.624724649485298


* **Decision Tree Algorithm** - It is decision making tool which uses a flowchart like tree structure. It is mainly focuses on all the decisions and their outputs and possible results. This will comes under both regression and classification.

* **Decision Tree Regression** - It will observe the features and will train the model in a tree structure to predict the data and to produce a continuous output.

In [15]:
from sklearn.tree import DecisionTreeRegressor 
DTR = DecisionTreeRegressor(random_state = 0)
DTR.fit(x_train, y_train)

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=0, splitter='best')

In [16]:
y_pred = DTR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: 100.0
MEAN SQUARED ERROR: 0.0
MEAN ABSOLUTE ERROR: 0.0
ROOT MEAN SQUARED ERROR: 0.0


* **Random Forest Algorithm** - It is an ensemble technique which performs both the classification and regression tasks with the use of the multiple decision trees. The main idea behind this is to combine multiple decision trees for predicting the final output rather than dependind on a single decision tree. 

* **Ensemble Learning** - It is technique of combining multiple machine learning models together to obtain more accurate predictions than any of the individual models.

* **Random Forest Regression** - It operates by by constructing a crowd of decision trees at the training time and outputing the mean prediction of the individual trees.

In [17]:
from sklearn.ensemble import RandomForestRegressor
RFR = RandomForestRegressor(n_estimators=100, random_state = 0)
RFR.fit(x_train, y_train)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=0, verbose=0, warm_start=False)

In [18]:
y_pred = RFR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: 100.0
MEAN SQUARED ERROR: 0.0
MEAN ABSOLUTE ERROR: 0.0
ROOT MEAN SQUARED ERROR: 0.0


* In order to create a less complex (or) parsimonious model when we have more number of features in our dataset we use some regularization techniques to overcome over-fitting, They are **L1 and L2 regularization techniques**. The key difference between them is the penalty term.

* **Ridge Regression** - A regression model which uses L2 regularization technique is called Ridge Regression Model. It adds squared magnitude of coefficients as penalty term to the loss function.

In [19]:
from sklearn.linear_model import Ridge
RR = Ridge(alpha = 1.0)
RR.fit(x_train, y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [20]:
y_pred = RR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: 61.09284503081555
MEAN SQUARED ERROR: 7.686506919536883
MEAN ABSOLUTE ERROR: 23.061449780306397
ROOT MEAN SQUARED ERROR: 27.724550347186668


* **Lasso Regression** - A regression model which uses L1 regularization technique is called Lasso Regression. Lasso Stands for Least Absolute Shrinkage and Selection Operator. It adds absolute value of magnitude of coefficients as penalty term to loss function.

* **For More Information:**  for regularization and its techniques
* https://towardsdatascience.com/over-fitting-and-regularization-64d16100f45c 
* https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c

In [21]:
from sklearn.linear_model import Lasso
LAR = Lasso(alpha = 1.0)
LAR.fit(x_train, y_train)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

In [22]:
y_pred = LAR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: -1.7574754860551822
MEAN SQUARED ERROR: 20.10323139940867
MEAN ABSOLUTE ERROR: 42.210749114790424
ROOT MEAN SQUARED ERROR: 44.83662721415235


* **ElasticNet Regression** - A regression model which uses both L1 and L2 regularization techniques is called elasticNet Regression. It linearly combines both the L1 and L2 regularization techinques from Lasso and Ridge Regression Models.

In [23]:
from sklearn.linear_model import ElasticNet
ENR = ElasticNet(alpha=1.0, l1_ratio = 0.5)
ENR.fit(x_train, y_train)

ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, l1_ratio=0.5,
           max_iter=1000, normalize=False, positive=False, precompute=False,
           random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [24]:
y_pred = ENR.predict(x_test)
R2 = r2_score(y_test, y_pred)
print("R^2: "+str(R2*100))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE*100))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE*100))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE*100))

R^2: -1.7574754860551822
MEAN SQUARED ERROR: 20.10323139940867
MEAN ABSOLUTE ERROR: 42.210749114790424
ROOT MEAN SQUARED ERROR: 44.83662721415235


* The **J48 algorithm** is similar to the CART algorithm that we use in the Scikit learn which is nothing but the normal Decision Tree Classifier.

### Supervised Learning - Classification

* **Logistic Regression** - It is used when the dependent variable is a categorical variable. It is used to predict the probability of the target variable, since it does predictive analysis. It is used when the nature of the target variable is dichotomous, which means they will have two possible classes. In this we are applying sigmoid function.

* **Types of Logistic Regression**
* **Binary Logistic Regression** - The categorical response has only two classes or outputs. Eg. spam or not.
* **Multinomial Logistic Regression** - The categorical response has three or more classes or outputs without ordering. Eg. Vegan, Non-Veg, Veg.
* **Ordinal Logistic Regression** - The categorical response has three or more classes or outputs with ordering. Eg. Movie Rating 1 to 5.

* For More: https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc

In [25]:
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(x_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [26]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from math import sqrt
y_pred = LR.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred)
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[100   0]
 [  0 269]]
ACCURACY: 1.0
PRECISION: 1.0
RECALL: 1.0
F1: 1.0
MEAN SQUARED ERROR: 0.0
MEAN ABSOLUTE ERROR: 0.0
ROOT MEAN SQUARED ERROR: 0.0


* **K-Nearest Neighbors(KNN)** - This algorithm finds the nearest neighbor datapoints from the datasetthat we have plotted and assigns the value to it. In other words similar things that are near to each other. it uses the feature similarity to predict the values.

In [27]:
from sklearn.neighbors import KNeighborsClassifier
KNN = KNeighborsClassifier(n_neighbors = 5, metric = "minkowski", p = 2)
KNN.fit(x_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [28]:
y_pred = KNN.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred.round())
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[ 96   4]
 [ 19 250]]
ACCURACY: 0.9376693766937669
PRECISION: 0.984251968503937
RECALL: 0.929368029739777
F1: 0.9560229445506692
MEAN SQUARED ERROR: 0.06233062330623306
MEAN ABSOLUTE ERROR: 0.06233062330623306
ROOT MEAN SQUARED ERROR: 0.24966101679323718


* **Support Vector Machine(SVM)** - An Svm model is basically a representation of different classes in a hyperplane with a high N-dimesional space(N - the number of features). The objective is to find a plane that has a maximum margin (i.e the maximum distance between the data points of both classes).

In [29]:
from sklearn.svm import SVC
SVM = SVC(kernel = "linear", random_state = 0)
SVM.fit(x_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

In [30]:
y_pred = SVM.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred.round())
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[100   0]
 [  0 269]]
ACCURACY: 1.0
PRECISION: 1.0
RECALL: 1.0
F1: 1.0
MEAN SQUARED ERROR: 0.0
MEAN ABSOLUTE ERROR: 0.0
ROOT MEAN SQUARED ERROR: 0.0


* **Kernel SVM** - The SVM algorithm uses a set of mathematical functions that are defined as kernel. Th kernel will take the dataas input and will transform it into required format. There are many different kernel functions like linear, nonlinear, polynomial, radial basis function(rbf), and sigmoid. Kernel rbf is the most commonly used kernel.

In [31]:
from sklearn.svm import SVC
KSVM = SVC(kernel = "rbf", random_state = 0)
KSVM.fit(x_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

In [32]:
y_pred = KSVM.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred.round())
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[ 96   4]
 [  1 268]]
ACCURACY: 0.986449864498645
PRECISION: 0.9852941176470589
RECALL: 0.9962825278810409
F1: 0.9907578558225507
MEAN SQUARED ERROR: 0.013550135501355014
MEAN ABSOLUTE ERROR: 0.013550135501355014
ROOT MEAN SQUARED ERROR: 0.1164050492949297


* **Decision Tree Classification** - It breaks down a dataset into a smaller and smaller subsets while in the mean time an associated decision tree is incrementally developed. It has two main entities Decision Nodes(where the data is splitted) and Leave Nodes(where we get the output), and the top node is called the Root Node(Decision node which corresponds to best predictor attribute). It is done on the catergorical variable.

In [33]:
from sklearn.tree import DecisionTreeClassifier
DTC = DecisionTreeClassifier(criterion = "entropy", random_state = 0)
DTC.fit(x_train, y_train)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='entropy',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=0, splitter='best')

In [34]:
y_pred = DTC.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred.round())
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[100   0]
 [  0 269]]
ACCURACY: 1.0
PRECISION: 1.0
RECALL: 1.0
F1: 1.0
MEAN SQUARED ERROR: 0.0
MEAN ABSOLUTE ERROR: 0.0
ROOT MEAN SQUARED ERROR: 0.0


* **Random Forest Classification** - It is an ensemble method which combines a lot of individual decision trees and then runs them for multiple times. Each single tree in the random forest will produce a class prediction and the class with the most votes becomes our model's prediction. The low correlation between the models is the key. A large number relatively uncorrelated models(trees) operating as a committee will outperform any individual constituent models. 

* For More: https://towardsdatascience.com/understanding-random-forest-58381e0602d2


In [35]:
from sklearn.ensemble import RandomForestClassifier
RFC = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
RFC.fit(x_train, y_train)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='entropy', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=10,
                       n_jobs=None, oob_score=False, random_state=0, verbose=0,
                       warm_start=False)

In [36]:
y_pred = RFC.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred.round())
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[ 99   1]
 [  2 267]]
ACCURACY: 0.991869918699187
PRECISION: 0.996268656716418
RECALL: 0.9925650557620818
F1: 0.994413407821229
MEAN SQUARED ERROR: 0.008130081300813009
MEAN ABSOLUTE ERROR: 0.008130081300813009
ROOT MEAN SQUARED ERROR: 0.09016696346674323


* **Naive Bayes Classification** - It is a probabilistic machine learning model that is used for classification. The crux of the classifier is based on the Bayes Theorem. The assumption is that the presence of a feature in a class is independent to the presence of any other features in the same class. The presence of one particular feature does affect the others that's why called as naive.

* **Types of Naive Bayes Classification:**
    * Multinomial Naive Bayes 
    * Bernoulli Naive Bayes
    * Gaussian Naive Bayes

* For More: https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c 

In [37]:
# Here we have done Gaussian Naive Bayes
from sklearn.naive_bayes import GaussianNB
NBC = GaussianNB()
NBC.fit(x_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [38]:
y_pred = NBC.predict(x_test)
CM = confusion_matrix(y_test, y_pred)
print("CONFUSION MATRIX:")
print(CM)
ACC = accuracy_score(y_test, y_pred)
print("ACCURACY: "+str(ACC))
PRE = precision_score(y_test, y_pred)
print("PRECISION: "+str(PRE))
REC = recall_score(y_test, y_pred)
print("RECALL: "+str(REC))
F1 = f1_score(y_test, y_pred.round())
print("F1: "+str(F1))
MSE = mean_squared_error(y_test, y_pred)
print("MEAN SQUARED ERROR: "+str(MSE))
MAE = mean_absolute_error(y_test, y_pred)
print("MEAN ABSOLUTE ERROR: "+str(MAE))
RMSE = sqrt(mean_squared_error(y_test, y_pred))
print("ROOT MEAN SQUARED ERROR: "+str(RMSE))

CONFUSION MATRIX:
[[ 99   1]
 [130 139]]
ACCURACY: 0.6449864498644986
PRECISION: 0.9928571428571429
RECALL: 0.516728624535316
F1: 0.6797066014669927
MEAN SQUARED ERROR: 0.35501355013550134
MEAN ABSOLUTE ERROR: 0.35501355013550134
ROOT MEAN SQUARED ERROR: 0.5958301353032602
