### Introduction
----------------
This is a modified version of Tensorflow's Tutorial: [MNIST For ML Beginners](https://www.tensorflow.org/get_started/mnist/beginners).

### Data Description
----------------
* Training data is from **THE MNIST DATABASE of handwritten digits** (image + label = 60000 data): http://yann.lecun.com/exdb/mnist/
  * [train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz):  training set images (9912422 bytes) 
  * [train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz):  training set labels (28881 bytes)
  

* Testing Data:
  * [t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz): test set images (1648877 bytes) 
  * [t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz): test set labels (4542 bytes)

### References
----------------
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998. [[on-line version]](http://yann.lecun.com/exdb/publis/index.html#lecun-98)

### API References
----------------
The following links are the API references (ordered by the code):
* [np.fromfile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html)
* [fseek](https://www.tutorialspoint.com/python/file_seek.htm)
* [fseek then fromfile](http://stackoverflow.com/questions/14245094/how-to-read-part-of-binary-file-with-numpy)
* [np.reshape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html)
* [show reshape image from np.ndarray](http://stackoverflow.com/questions/2659312/how-do-i-convert-a-numpy-array-to-and-display-an-image)
* [numpy data types](https://docs.scipy.org/doc/numpy/user/basics.types.html)
* [numpy.random.choice](https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html)
* [numpy.mean](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html)

In [1]:
import numpy as np
import tensorflow as tf


def read_data(train_data_path, train_label_path,
              test_data_path, test_label_path):
    train_data, train_label = None, None
    test_data, test_label = None, None
    with open(train_data_path, 'rb') as f: # train data
        f.seek(16)
        train_data = np.fromfile(f, dtype=np.uint8)
        train_data = np.reshape(train_data, (60000, 28, 28))
    with open(train_label_path, 'rb') as f: # train label
        f.seek(8)
        train_label = np.fromfile(f, dtype=np.int8)
    with open(test_data_path, 'rb') as f: # test data
        f.seek(16)
        test_data = np.fromfile(f, dtype=np.uint8)
        test_data = np.reshape(test_data, (10000, 28, 28))
    with open(test_label_path, 'rb') as f: # test label
        f.seek(8)
        test_label = np.fromfile(f, dtype=np.int8)   
        
    return train_data, train_label, test_data, test_label

prefix = './MNIST-DATA/'
train_images, train_labels, test_imgs, test_labels = read_data(prefix+'train-images-idx3-ubyte',
                                                               prefix+'train-labels-idx1-ubyte',
                                                               prefix+'t10k-images-idx3-ubyte',
                                                               prefix+'t10k-labels-idx1-ubyte')

In [2]:
print(np.shape(train_images))
print(np.shape(train_labels))
print(np.shape(test_imgs))
print(np.shape(test_labels))

(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)


In [3]:
import time

# estimated elapsed time calculation
# start = time.time()
# end = time.time()
# print(end - start)

def gen_one_hot(labels, _class):
    one_hot_labels = np.zeros(shape=(labels.shape[0], _class), dtype=np.int)
    for index, label in enumerate(labels):
        one_hot_labels[index] = [1 if i==label else 0 for i in range(_class)]
    return one_hot_labels

test = np.array(gen_one_hot(test_labels, 10))