# Binary Classification with MNIST - Detecting the digit 5

In this exercise, we will build a *Binary Classifier* that can detect whether an image from MNIST dataset represents the digit 5 or not-5.

## Installing the packages

In [1]:
!pip install scikit-learn
!pip install numpy



### Step 1

Fetching the MNIST data samples using `sklearn.datasests` and `fetch_openml`.

In [2]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version='active')
X, Y = mnist["data"], mnist["target"]

### Step 2

Splitting the data into `Train` and `Test` sets for 60000 samples and 10 samples respectively.

In [3]:
# Training samples
train = X[:60000]
train_labels = Y[:60000]

# Testing samples
test = X[60000:]
test_labels = Y[60000:]

### Step 3

Transforming `Train` and `Test` labels into boolean array for **5-detector**.

In [None]:
import numpy as np

# Use isin() on the pandas Series to create the binary labels
train_labels_binary = train_labels.isin(['5']).astype(np.int64)
test_labels_binary = test_labels.isin(['5']).astype(np.int64)


print(f"Unique values in train_labels_binary: {np.unique(train_labels_binary)}")
print(f"Data type of train_labels_binary: {train_labels_binary.dtype}")

### Step 4

Training SDG Classifier and test it on the new dat - `Testing samples`.

In [None]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(train, train_labels_binary)

# Predicting on test samples
print(sgd_clf.predict(test))