# MNIST Classification
#### Dataset : mnist_784
- **Source** : sklearn.datasets -> fetch_openml('mnist_784', version = 1)
- **Structure**
    - DESCR key - Describes the dataset
    - data key - Dataset features
    - target key - Dataset labels
- Dataset has 70000 images and 784 (28 x 28) features. Each instance (row) represents an image. each feature represents a pixel's intensity from 0 (white) to 255 (black).

#### Objective
- Build a classifier to predict label for the given features (28 x 28).

#### Disclaimer
<ul>This project is to use technics shown in the book - Hands On Ml 2. We will try to outperform the outcome of chapter 3 - Classification Machine Learning project. We will also be using code from Hands On Ml 2 github repository.</ul>

## Table of Contents
<ol>
<li><a href = '#setup'>Setup</a></li>
<li><a href = '#gather'>Gather</a></li>
<li><a href = '#asses'>Asses</a></li>
<li><a href = '#test'>Test set</a></li>
<li><a href = '#visualize-analyze'>Visualize & Analyze</a></li>
<li><a href = '#wrangle'>Data Wrangling</a></li>
<li><a href = '#ml'>Machine Learning</a></li>
<li><a href = '#conclusion'>Conclusion</a></li>
</ol>

<a id = 'setup'></a>
### 1. Setup

In [1]:
# Python ≥ 3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= '0.20'

# Common imports
import numpy as np
import pandas as pd
import os

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize = 14)
mpl.rc('xtick', labelsize = 12)
mpl.rc('ytick', labelsize = 12)

# Where to save the figures
PROJECT_ROOT_DIR = '.'
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, 'images')
os.makedirs(IMAGES_PATH, exist_ok = True)

def save_fig(fig_id, tight_layout = True, fig_extension = 'png', resolution = 300):
    path = os.path.join(IMAGES_PATH, fig_id + '.' + fig_extension)
    print('Saving figure', fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format = fig_extension, dpi = resolution)

<a id = 'gather'></a>
### 2. Gather

In [2]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version = 1, as_frame = False)
mnist.keys()

dict_keys(['data', 'target', 'frame', 'categories', 'feature_names', 'target_names', 'DESCR', 'details', 'url'])