# Case Study - Handwritten Classification Using MNIST Data

## General Description
You are requested to perform classification using the Naive Bayes and SVM algorithms to recognize handwritten digits from the MNIST dataset.

## About MNIST Dataset

The MNIST (Modified National Institute of Standards and Technology) dataset is a collection of grayscale images sized 28x28 pixels, containing handwritten digits ranging from 0 to 9. The dataset comprises a total of 70,000 handwritten images.

## Tasks

1. Ensure the number of data points for each digit label (0-9). Is there any data imbalance in the MNIST dataset being used? Explain your group's answer and provide evidence!

2. Display the first 15 images from the MNIST dataset along with their labels. You can refer to Job Sheet 03 for guidance.

3. Perform feature extraction on the MNIST data. In this process, you are allowed to:
   - Use the original pixel values of the images as features.
   - Perform other feature extraction methods such as histograms, PCA, or others. You are allowed to explore this process.

4. Create training and testing data using ratios of 70:30, 80:20, and 90:10.

5. Conduct classification using the Naive Bayes and SVM algorithms.
   - You are allowed to tune parameters.
   - You are allowed to explore different types of kernels for SVM.

6. Evaluate the models you have created on both the training and testing data.
   - Use the *accuracy* metric to assess the accuracy on training and testing data.
   - Utilize the *classification_report* function to understand the model's overall performance.
   - Use a confusion matrix to determine the accuracy of labeling.

7. Display the testing data images along with their predicted labels. You can use Job Sheet 03 as a reference.

8. What is the best model you obtained? What is its configuration? What level of accuracy did you achieve? Explain!

Please note that this appears to be a set of instructions for a machine learning project using the MNIST dataset. You would need to follow these steps, perform the necessary coding and analysis, and then provide your findings and results.

# Helper

Here is the code snippet to help you download the MNIST dataset

In [None]:
# Download MNIST Dataset
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784')

  warn(


In [None]:
# Please read MNIST dataset description
# It may help you to understand the dataset
print(mnist.DESCR)

**Author**: Yann LeCun, Corinna Cortes, Christopher J.C. Burges  
**Source**: [MNIST Website](http://yann.lecun.com/exdb/mnist/) - Date unknown  
**Please cite**:  

The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples  

It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image b

In [None]:
# Inpsect dataset keys
# It may help you to understand the dataset structure
mnist.keys()

dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])

In [None]:
print(mnist.data)

       pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  pixel8  pixel9  \
0         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
1         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
2         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
3         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
4         0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
...       ...     ...     ...     ...     ...     ...     ...     ...     ...   
69995     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
69996     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
69997     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
69998     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
69999     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0   

       pixel10  ...  pixel7