### In this dataset we have data of original and Fake Bank notes. We will try to build a Machine learning model which will help us to classify notes as Authentic [1] or Fake [0] . In this notebooke we will cover following things :

1. Data Exploration

2. Preprocessing Data 

3. Model building 

4. Predictions 

# Imports

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score, learning_curve, cross_val_predict                
from sklearn.metrics import confusion_matrix, accuracy_score      
from sklearn.svm import SVC
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
import os
os.listdir("../input/banknote-authentication-uci/")

# 1. Data Exploration

## Understanding the features
* `variance` - variance is the amount by which something changes or is different from something else
* `skewness` - skewness is the amount by which something changes or is different from something else
* `kurtosis` - kurtosis refers to the pointedness of a peak in the distribution curve. 
* `entropy` - entropy is the measure of disorder or uncertanity
***
!['none'](https://i.pinimg.com/originals/e1/f0/b2/e1f0b20eb0773915fc6e9b91909adfa3.jpg)
* Source - [Pintrest](https://i.pinimg.com/originals/e1/f0/b2/e1f0b20eb0773915fc6e9b91909adfa3.jpg)

In [None]:
df = pd.read_csv("../input/banknote-authentication-uci/BankNoteAuthentication.csv")
df.head()

In [None]:
df.tail()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
sns.countplot(df['class'])

In [None]:
cols = list(df.drop("class", axis=1))

fig, ax = plt.subplots(ncols = 4, figsize=(16, 4))
fig.suptitle("Distribution Plot")

for index, col in enumerate(cols):
    sns.distplot(df[col], ax=ax[index])

### Correlation b/w the features

In [None]:
sns.heatmap(df.corr(),annot=True,cmap='mako',linewidths=0.2)
fig=plt.gcf()
fig.set_size_inches(20,12)
plt.show()

In [None]:
g = sns.pairplot(data=df, hue='class', palette = 'seismic',
                 height=2,diag_kind = 'kde',diag_kws=dict(shade=True),plot_kws=dict(s=10) )
g.set(xticklabels=[])

# Model Building

In [None]:
X = df.drop('class', axis=1)
y = df['class']

In [None]:
X.head()

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
kfold = StratifiedKFold(n_splits = 10)

In [None]:
svc = SVC(probability = True, gamma='auto')
cv = cross_val_score(svc, X_train, y_train, cv=kfold)
print("Model :",svc,"\n",cv,'\n',"CV Score :",cv.mean())

# Model Evaluation

In [None]:
clf_svc = svc.fit(X_train, y_train)
print("SVC score :",svc.score(X_train, y_train))

In [None]:
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
                        n_jobs=-1, train_sizes=np.linspace(.1, 1.0, 5)):
    """Generate a simple plot of the test and training learning curve"""
    plt.figure()
    plt.title(title)
    if ylim is not None:
        plt.ylim(*ylim)
    plt.xlabel("Training examples")
    plt.ylabel("Score")
    train_sizes, train_scores, test_scores = learning_curve(
        estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
    train_scores_mean = np.mean(train_scores, axis=1)
    train_scores_std = np.std(train_scores, axis=1)
    test_scores_mean = np.mean(test_scores, axis=1)
    test_scores_std = np.std(test_scores, axis=1)
    plt.grid()

    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
                     train_scores_mean + train_scores_std, alpha=0.1,
                     color="b")
    plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
                     test_scores_mean + test_scores_std, alpha=0.1, color="r")
    plt.plot(train_sizes, train_scores_mean, 'o-', color="b",
             label="Training score")
    plt.plot(train_sizes, test_scores_mean, 'o-', color="r",
             label="Cross-validation score")

    plt.legend(loc="best")
    return plt

In [None]:
g = plot_learning_curve(svc,"SVC learning curves",X_train,y_train,cv=kfold)

# Prediction 

In [None]:
y_pred = clf_svc.predict(X_test)
y_pred

In [None]:
clf_svc.predict([[0.40614,1.34920,-1.4501,-0.55949]])

### Accuracy Check 

In [None]:
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='3.0f')

In [None]:
accuracy_score(y_test,y_pred)

### Great! Work notebook is completed for Bank Note Authentication Dataset with a accuracy of 100% 😃
### Don't forget to **`UPVOTE`** if you liked the notebook.😊
### Suggestions are appreciated in comments below ; )

# Thank you 🙂