<a href="https://colab.research.google.com/github/Niweera/LearnDataAnalytics/blob/master/ML_With_SciKit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Steps

1. Seperate xlsx dataset file into train.csv and test.csv
2. Import two data sets into keras
3. Data pre-processing
4. Data visualization
5. Building the model
6. Training the model
7. Testing the model
8. Get the confusion metrix
9. Get the important metrics
    1. Accuracy
    2. Precision
    3. Sensitivity
    4. Specificity
    5. Error Rate



Kernel Support Vector Machines Classification

Data preprocessing

In [0]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
import os
from sklearn.impute import SimpleImputer 
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score, GridSearchCV

simpleImputer = SimpleImputer(missing_values=np.nan, strategy='mean')
np.set_printoptions(precision=3, suppress=True,threshold=1000)

In [0]:
TRAIN_DATA_URL = "https://raw.githubusercontent.com/Niweera/LearnDataAnalytics/master/datasets/1/train.csv"
TEST_DATA_URL = "https://raw.githubusercontent.com/Niweera/LearnDataAnalytics/master/datasets/1/test.csv"

CSV_COLUMN_NAMES = ["ID","Age","Gender","TB","DB","ALK","SGPT","SGOT","TP","ALB","AG_Ratio","Class"]

LABEL_COLUMN = "Class"
LABELS = ["Yes","No"]

CATEGORICAL_FEATURES = ["Gender","Class"]
NUMERIC_FEATURES = ["Age","TB","DB","ALK","SGPT","SGOT","TP","ALB","AG_Ratio"]

train_dataset = pd.read_csv(TRAIN_DATA_URL, names=CSV_COLUMN_NAMES, header=0,index_col=False, usecols=CSV_COLUMN_NAMES[1:], na_values="?")
test_dataset = pd.read_csv(TEST_DATA_URL, names=CSV_COLUMN_NAMES, header=0,index_col=False, usecols=CSV_COLUMN_NAMES[1:], na_values="?")

In [0]:
print(train_dataset)
print(test_dataset)

In [0]:
X_train = train_dataset.iloc[:,:-1].values
X_test = test_dataset.iloc[:,:-1].values

y_train = train_dataset.iloc[:,10].values
y_test = test_dataset.iloc[:,10].values

In [0]:
print(X_train)
print()
print(X_test)


print()
print()


print(y_train)
print()
print(y_test)
# ["Female","Male","Age","TB","DB","ALK","SGPT","SGOT","TP","ALB","AG_Ratio"]

In [0]:
for col in CSV_COLUMN_NAMES[1:]:
  print(f'{col}: {train_dataset[col].isnull().values.any()}')

In [0]:
train_imputer = simpleImputer.fit(X_train[:,2:10])
test_imputer = simpleImputer.fit(X_test[:,2:10])

X_train[:,2:10] = train_imputer.transform(X_train[:,2:10])
X_test[:,2:10] = test_imputer.transform(X_test[:,2:10])

In [0]:
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[1])], remainder="passthrough") 
X_train = ct.fit_transform(X_train) 
X_test = ct.fit_transform(X_test) 

In [0]:
labelencoder = LabelEncoder()
y_train = labelencoder.fit_transform(y_train) 
y_test = labelencoder.fit_transform(y_test) 

In [0]:
standardScalar = StandardScaler()
X_train = standardScalar.fit_transform(X_train)
X_test = standardScalar.fit_transform(X_test)

In [0]:
classifier = SVC(kernel="rbf", random_state=0)
classifier.fit(X_train,y_train)

In [0]:
y_pred = classifier.predict(X_test)
print(y_pred)

In [0]:
cm = confusion_matrix(y_test,y_pred)
print(cm)

|   True Positive| False Positive  | 
|----------------|-----------------|
|False Negative | True Negative|

In [0]:
accuracies = cross_val_score(estimator=classifier,X=X_train,y=y_train,cv=100,n_jobs=-1)
print(accuracies.mean())
print(accuracies.std())

In [0]:
parameters = [{
    "C": np.arange(0.001, 0.700, 0.001),
    "kernel": ["rbf"],
    "gamma":np.arange(0.001, 0.500, 0.001)
}]

grid_search = GridSearchCV(estimator=classifier,param_grid=parameters,scoring="accuracy",cv=100,n_jobs=-1)
grid_search = grid_search.fit(X_train,y_train)
grid_search

In [0]:
grid_search.best_params_