# MNIST Data Prediction Using Daimensions

MNIST is a well-known dataset of handwritten digits and a standard for machine learning models. Here, we test how Daimensions does on it.

## 0. Setup

We'll get the csv from the OpenML link and use a pandas dataframe to split it into training and validation data in csv's.

In [3]:
# using pandas to get csv as a dataframe and see how it looks
import pandas as pd
from sklearn.model_selection import train_test_split

dataset_url = 'https://www.openml.org/data/get_csv/52667/mnist_784.csv'
data = pd.read_csv(dataset_url)
data.describe()

Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784,class
count,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,...,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0,70000.0
mean,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.099543,0.046629,0.016614,0.012957,0.001714,0.0,0.0,0.0,0.0,4.452429
std,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,4.256304,2.783732,1.561822,1.553796,0.320889,0.0,0.0,0.0,0.0,2.890195
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0
max,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,254.0,253.0,253.0,254.0,62.0,0.0,0.0,0.0,0.0,9.0


In [4]:
# split data into training and testing csv's, y is for the target column (int0)
y = data['class']
X = data.drop('class', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.2)
pd.concat([X_train, y_train], axis=1).to_csv('mnist_train.csv',index=False)
pd.concat([X_test, y_test], axis=1).to_csv('mnist_valid.csv',index=False)

## 1. Get Measurements

We always want to measure our data before building our predictor in order to ensure we are building the right model. For more information about how to use Daimensions and why we want to measure our data beforehand, check out the Titanic notebook.

In [5]:
! btc -measureonly mnist_train.csv 

Brainome Daimensions(tm) 0.99 Copyright (c) 2019, 2020 by Brainome, Inc. All Rights Reserved.
Licensed to: Alexander Makhratchev
Expiration date: 2021-04-30 (64 days left)
Number of threads: 1
Maximum file size: 30720MB
Running locally.



## 2. Build the Predictor

Based on our measurements, Daimensions recommends we use a neural network (higher expected generalization) and more effort for this dataset. 

In [6]:
! ./btc -vvv -f NN mnist_train.csv -o mnist_predict.py -e 5

Brainome Daimensions(tm) 0.99 Copyright (c) 2019, 2020 by Brainome, Inc. All Rights Reserved.
Licensed to: Alexander Makhratchev
Expiration date: 2021-04-30 (64 days left)
Number of threads: 1
Maximum file size: 30720MB
Running locally.

Input: mnist_train.csv

Cleaning...-

## 3. Validate the Model

Now we can validate our model on a separate set of data that wasn't used for training.

In [7]:
! python3 mnist_predict.py -validate mnist_valid.csv

python3: can't open file 'mnist_predict.py': [Errno 2] No such file or directory


Hooray! We've finished building our model and validating its accuracy. We also have the confusion matrix, which compares the actual target classes (columns) with the predicted class (rows). Diagonal values are correctly predicted values.