# Build and train a logistic regression model with MNIST dataset 

## Table of contents
1. [MNIST database](#MNIST)
2. [Dataset, loss, and accuracy ](#data)
3. [Build and train a logistic regression model with MNIST dataset](#logre)

<div class="alert alert-block alert-info"> <b>

### 1. MNIST database  <a name="MNIST"></a>

</b></div>

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

Introduced by [LeCun et al. in Gradient-based learning applied to document recognition](https://arxiv.org/pdf/1102.0183.pdf)

Source: http://yann.lecun.com/exdb/mnist/ 


<div class="alert alert-block alert-info"> <b>

### 2. Dataset, loss, and accuracy  <a name="data"></a>

</b></div>

### Training dataset and test dataset
Define a dataset as $D = \{(x_j, y_j)\}$ and $D=D_1 \cup D_2$, where:
* $D_1$ is the training dataset and $|D_1|$ is the number of data in $D_1$.
* $D_2$ is the test dataset and $|D_2|$ is the number of data in $D_2$.


### Training loss and test loss
Define the loss function:
$$L(\theta) :=\frac{1}{N} \sum_{j=1}^N\ell(y_j, h(x_j; \theta)).$$
Here $\ell(y_j,h(x_j; \theta))$ is the  general distance between real label and predicted label. $h(x_j; \theta)$ is a probability distribution of data $x$.
* Training loss is defined as $L(\theta) :=\frac{1}{|D_1|} \sum_{j=1}^{|D_1|}\ell(y_j, h(x_j; \theta)).$
* Test loss is defined as $L(\theta) :=\frac{1}{|D_2|} \sum_{j=1}^{|D_2|}\ell(y_j, h(x_j; \theta)).$

### Training accuracy and test accuracy
* Training accuracy $= \frac{\text{The number of correct classifications in training dataset}}{\text{the total number of data in training dataset}}$
* Test accuracy $= \frac{\text{The number of correct classifications in test dataset}}{\text{the total number of data in test dataset}}$

Remark: We usually use the max-out method to do classification. For a given data point $x$, we first compute $h(x;\theta)$, then we attached $x$ to the class $i= \arg\max_j h_j(x; \theta)$.


### Epoch vs Batch Size vs Iterations

When the data is too big which happens all the time in machine learning and we can’t pass all the data to the machine at once. To overcome this problem we need to divide the data into smaller sizes and update the weights of the neural networks at the end of every step to fit it to the data given.


#### Epoch

One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only once. Since one epoch is too big to feed to the computer at once we divide it in several smaller batches.

#### Batch

Total number of training examples present in a single batch.

#### Iteration

Iteration is the number of batches needed to complete one epoch.


When the model goes through the whole 60k images once,learning how to classify 0-9, it's consider 1 epoch. However, there's a concept of batch size where it means the model would train with 100 images everytime. When the model updates its weights (parameters) after looking at all the images, this is considered 1 iteration. We arbitrarily set 3000 iterations here which means the model would update 3000 times. 

One epoch consists of 60,000 / 100 = 600 iterations. Because we would like to go through 3000 iterations, this implies we would have 3000 / 600 = 5 epochs as each epoch has 600 iterations.  


total data : 60000

batch size: 100 (number of examples in 1 iteration)

iterations: 3000 (one batch forward & backward pass)

epochs = iterations / (total data/ minibatch)
       = 3000/(60000/100) 
       = 5

<div class="alert alert-block alert-info"> <b>

### 3. Build and train a logistic regression model with MNIST dataset  <a name="logre"></a>

</b></div> 

We train and test a logistic regression model with MNIST dataset. This dataset contains 6000 images for training and 10000 images for testing the out-of-sample performance.

### Steps
* Step 1: Load MNIST Train and test Dataset
* Step 2: Load Dataset into DataLoader
* Step 3: Build Model with nn.Module
* Step 4: Instantiate Model Class
* Step 5: Instantiate Loss Class
* Step 6: Instantiate Optimizer Class
* Step 7: Train Model