## Importing the Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
dataset = pd.read_csv('customer_data.csv')

### Checking for null values

- `dataset.head()` shows the first 5 rows of the data set.
- `dataset.info()` shows the number of total range, the data type of each column, the number of columns, and if a column has null or non-null entries. 

In [None]:
dataset.info()

In [None]:
dataset.head()

### We have 1 column that contains null entries, so we will be replacing the null entries with mean

Sklearn SimpleImputer was imported so that the null data entries can be imputed, making every entry a non-null entry.
- `dataset['fea_2']` was imputed using the mean strategy

In [None]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy = 'mean')

In [None]:
dataset['fea_2'] = imputer.fit_transform(dataset[['fea_2']])


## Stratified K-fold Cross Validation with SVC classifier

Provides train/test indices to split data in train/test sets.

This cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class.

In [None]:
from sklearn.svm import SVC
SVC_model=SVC()

In [None]:
x=dataset.drop('label', axis=1)
y=dataset['label']

### SVC parameters used

`kernel` {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’
Specifies the kernel type to be used in the algorithm.

`max_iter` int, default=-1
Hard limit on iterations within solver, or -1 for no limit.

`random_state` int, RandomState instance or None, default=None
Controls the pseudo random number generation for shuffling the data for probability estimates.

In [None]:
from sklearn.model_selection import KFold, cross_val_score
SVC_model=SVC(kernel='linear', max_iter=100, random_state=42)
KFoldVal=KFold(10)

In [None]:
cross_val_score,
result_KFold=cross_val_score(SVC_model,x,y,cv=KFoldVal)
print(result_KFold)

In [None]:
KFold_Score=np.mean(result_KFold)
print(KFold_Score)

## Repeated Random SubSampling with SVC classifier

Repeated Random SubSampling is a random permutation cross-validator

Yields indices to split data into training and test sets.

In [None]:
from sklearn.model_selection import ShuffleSplit, cross_val_score
SVC_model=SVC(kernel='linear', max_iter=100, random_state=42)
ssplit=ShuffleSplit(n_splits=10, test_size=.15)
result_ssplit=cross_val_score(SVC_model,x,y,cv=ssplit)

In [None]:
result_ssplit

In [None]:
ssplit_score=np.mean(result_ssplit)
print(ssplit_score)

## Randomized Search with Logistic Regressor

Randomized search on hyper parameters.

In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.15, random_state=42)

### Logistic Regressor parameters used

`C` float, default=1.0
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

`solver` {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’
Algorithm to use in the optimization problem. Default is ‘lbfgs’.

`max_iter` int, default=100
Maximum number of iterations taken for the solvers to converge.

`random_state` int, RandomState instance, default=None
Used when solver == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data

In [None]:
from sklearn.model_selection import RandomizedSearchCV
param={'C':[0,1,1,10,100], 'solver':['liblinear', 'saga', 'lbfgs'], 'max_iter':[100,1000,10000], 'random_state':[10, 50, 100]}

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
random_model=RandomizedSearchCV(LogisticRegression(), param, verbose=3, n_iter=3)
random_model.fit(x_train, y_train)
random_model.best_params_
random_model.best_score_

In [None]:
random_model.best_params_
random_model.best_score_

## Grid Search with SVC classifier

Exhaustive search over specified parameter values for an estimator.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

### SVC parameters used

`kernel` {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’
Specifies the kernel type to be used in the algorithm.

`C` float, default=1.0
Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

`gamma` {‘scale’, ‘auto’} or float, default=’scale’
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid`

In [None]:
from sklearn.model_selection import GridSearchCV
param={'C':[1,10,100], 'kernel':['rbf', 'linear', 'lbfgs'], 'gamma':[0.1, 0.01, 0.001]}

In [None]:
grid_model=GridSearchCV(SVC(), param, verbose=2, cv=2)
grid_model.fit(x_train, y_train)

In [None]:
grid_model.best_params_
grid_model.best_score_

## Visualizing the results

In [None]:
from prettytable import PrettyTable
myTable=PrettyTable(['Technique used', 'Score'])
myTable.add_row(['Stratified K-fold Cross Validation with SVC classifier', result_KFold])
myTable.add_row(['Repeated Random SubSampling with SVC classifier', ssplit_score])
myTable.add_row(['Randomized Search with Logistic Regressor', random_model.best_score_])
myTable.add_row(['Grid Search with SVC classifier', grid_model.best_score_])