### Instructions

#### Goal of the Project

This project is designed for you to practice and solve the activities that are based on the concepts covered in the following lesson:

Naive Bayes Classifier




---

#### Getting Started:

1. Follow the next 3 steps to create a copy of this colab file and start working on the project.

2. Create a duplicate copy of the Colab file as described below.

  - Click on the **File menu**. A new drop-down list will appear.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/0_file_menu.png' width=500>

  - Click on the **Save a copy in Drive** option. A duplicate copy will get created. It will open up in the new tab on your web browser.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/1_create_colab_duplicate_copy.png' width=500>

3. After creating the duplicate copy of the notebook, please rename it in the **YYYY-MM-DD_StudentName_Project125** format.

4. Now, write your code in the prescribed code cells.

---

### Problem Statement

This project  aims to detect fraudulent notes accurately. Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have $400 \times 400$ pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi was gained. Wavelet Transform tool were used to extract features from images.

The dataset contains the following attributes:

|Attribute|Description|
|-|-|
|variance| variance of Wavelet Transformed image (continuous).|
|skewness| skewness of Wavelet Transformed image (continuous).|
|curtosis| curtosis of Wavelet Transformed image (continuous).|
|entropy| entropy of image (continuous).|
|class|0: fake bank note, 1: real bank note|


**Source:** https://archive.ics.uci.edu/ml/datasets/banknote+authentication

**Owner:** Volker Lohweg (University of Applied Sciences, Ostwestfalen-Lippe, volker.lohweg '@' hs-owl.de)

**Donor:** Helene DÃ¶rksen (University of Applied Sciences, Ostwestfalen-Lippe, helene.doerksen '@' hs-owl.de).


**Citation:**
Dua, D., & Graff, C.. (2017). UCI Machine Learning Repository.








---

### List of Activities

**Activity 1:** Import Modules and Read Data

  
**Activity 2:**  Perform Train-Test Split

**Activity 3:** Build Naive Bayes Classifier Model






---

#### Activity 1: Import Modules and Read Data

Import the necessary Python packages.

Read the data from a CSV file to create a Pandas DataFrame.

**Dataset Link:** https://s3-whjr-curriculum-uploads.whjr.online/f72c6f40-c536-4e0c-848c-e47c8da4685b.csv

Also, print the first five rows of the dataset. Check for null values and treat them accordingly.


In [1]:
# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


df=pd.read_csv('https://s3-whjr-curriculum-uploads.whjr.online/f72c6f40-c536-4e0c-848c-e47c8da4685b.csv')
df.head()
# Remove warnings


# Load the dataset


# Print first five rows using 'head()' function


Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.6216,8.6661,-2.8073,-0.44699,0
1,4.5459,8.1674,-2.4586,-1.4621,0
2,3.866,-2.6383,1.9242,0.10645,0
3,3.4566,9.5228,-4.0112,-3.5944,0
4,0.32924,-4.4552,4.5718,-0.9888,0


In [2]:
# Check if there are any null values. If any column has null values, treat them accordingly
df.isnull().sum()

Unnamed: 0,0
variance,0
skewness,0
curtosis,0
entropy,0
class,0


**Q:** Are there any missing or null values in the dataset?

**A:**

---

#### Activity 2: Perform Train-Test Split

In this dataset, `class` is the target variable and all other columns other than `class` are feature variables.

Create two separate DataFrames, one containing the feature variables and the other containing the target variable.





In [4]:
# Split the dataset into dependent and independent features
feat_df=df.iloc[:,:-1]
target_df=df['class']

print(feat_df.shape)
print(target_df.shape)

(1372, 4)
(1372,)


Split the dataset into train set and test set such that the train set contains 70% of the instances and the remaining instances will become the test set.

In [5]:
# Split the DataFrame into the train and test sets.
# Perform train-test split using 'train_test_split' function.
from sklearn.model_selection import train_test_split

x_train1,x_test1,y_train1,y_test1=train_test_split(feat_df,target_df,test_size=0.3,random_state=42)
print(x_train1.shape)
print(y_train1.shape)
print(x_test1.shape)
print(y_test1.shape)

# Print the shape of train and test sets.


(960, 4)
(960,)
(412, 4)
(412,)


After this activity, you must obtain train and test sets so that they can be used for training and testing the `GaussianNB` Classifier.

---

#### Activity 3: Build Naive Bayes Classifier Model

Deploy the Naive Bayes Classifier model using the steps given below:   


1. Import the required library which contains methods and attributes to design a Naive Bayes classifier.

  ```python
  from sklearn.naive_bayes import GuassianNB
  ```
2. Create an object (say `nb_clf`) of the `GaussianNB` constructor.

4. Call the `fit()` function on the above constructor with train features and target variables as inputs.

5. Get the predicted target values for both train and test sets by calling the `predict()` function on the `nb_clf` object.

6. Get the accuracy score on both train and test sets by calling the `score()` function on the classifier object.

In [6]:
# Create Naive Bayes Classifier model.

# Import the required library

from sklearn.naive_bayes import GaussianNB

nb=GaussianNB()
nb.fit(x_train1,y_train1)
train_pred1=nb.predict(x_train1)
test_pred1=nb.predict(x_test1)
print(nb.score(x_train1,y_train1))
print(nb.score(x_test1,y_test1))

# Train the NB classifier


# Perform prediction using 'predict()' function.


# Call the 'score()' function to check the accuracy score of the train set and test set.


0.8395833333333333
0.837378640776699


**Q:** Write down the train and test set accuracy scores for the `GaussianNB` classifier.

**A:**



Plot the confusion matrix and print the classification report to get an in-depth overview of the classifier performance for the test set.

In [7]:
# Obtain the confusion matrix and print classification report for the classifier
from sklearn.metrics import confusion_matrix,classification_report

print(confusion_matrix(y_test1,test_pred1))
print(classification_report(y_test1,test_pred1))


[[207  22]
 [ 45 138]]
              precision    recall  f1-score   support

           0       0.82      0.90      0.86       229
           1       0.86      0.75      0.80       183

    accuracy                           0.84       412
   macro avg       0.84      0.83      0.83       412
weighted avg       0.84      0.84      0.84       412



**Q:** What are the total number of misclassified cases?

**A:**

After this activity, you must obtain a Naive Bayes Classifier model using the `sklearn` module for predicting whether the bank notes are real or fake.

---

### Submitting the Project:

1. After finishing the project, click on the **Share** button on the top right corner of the notebook. A new dialog box will appear.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/2_share_button.png' width=500>

2. In the dialog box, make sure that '**Anyone on the Internet with this link can view**' option is selected and then click on the **Copy link** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/3_copy_link.png' width=500>

3. The link of the duplicate copy (named as **YYYY-MM-DD_StudentName_Project125**) of the notebook will get copied

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/4_copy_link_confirmation.png' width=500>

4. Go to your dashboard and click on the **My Projects** option.
   
   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/5_student_dashboard.png' width=800>

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/6_my_projects.png' width=800>

5. Click on the **View Project** button for the project you want to submit.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/7_view_project.png' width=800>

6. Click on the **Submit Project Here** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/8_submit_project.png' width=800>

7. Paste the link to the project file named as **YYYY-MM-DD_StudentName_Project125** in the URL box and then click on the **Submit** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/9_enter_project_url.png' width=800>

---