# Fashion MNIST
Use this notebook as a skeleton for developing your own network to solve this classification problem!
Feel free to experiment (as a matter of fact, its encouraged) with what you've learned so far here. Don't be afraid to ask questions and use different architectures.
Be conscious of what you don't know so that you know what to ask/look for.

No GPU required!

The basic 7 steps for building models in general are listed so:
 1. Load Dataset
 2. Make Dataset Iterable
 3. Create Model Class
 4. Instantiate Model Class
 5. Instantiate Loss Class
 6. Instantiate Optimizer Class
 7. Train Model

I have handled steps 1 and 2 for you. Please handle the rest!

### Run the below cells until 'stop' to get your data processed and loaded

In [1]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import pandas as pd
import numpy as np
import re
import string


from torch.utils.data import Dataset, DataLoader

In [2]:
'''
STEP 1: LOAD DATASET
'''
test_df = pd.read_csv('fashionmnist/fashion-mnist_test.csv')
test_labels_df = test_df['label']
test_pixels_df = test_df.drop('label', axis=1)

'''
If you're curious about how I did this see the below cells. If not just skip to STEP 1.5

Pandas is a library for dataprocessing. You might run into dask.DataFrame at some point if you continue with ML.
dask.DataFrame is built ontop of Pandas with the purpose of concurrency and parallelized computing...basically when
working with datasets so large that you require multiple machines to handle it. This is part of the data pipeline!
'''

# This reads the csv file into a pandas dataframe
train_df = pd.read_csv('fashionmnist/fashion-mnist_train.csv')
train_df.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,6,0,0,0,0,0,0,0,5,0,...,0,0,0,30,43,0,0,0,0,0
3,0,0,0,0,1,2,0,0,0,0,...,3,0,0,0,0,1,0,0,0,0
4,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [3]:
# We create a new dataframe without the 'label' column here so we only get the pixel data
# The original dataframe train_df is unmodified
train_pixels_df = train_df.drop('label', axis=1)
train_pixels_df.head()

Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,5,0,0,...,0,0,0,30,43,0,0,0,0,0
3,0,0,0,1,2,0,0,0,0,0,...,3,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
# Now we grab only the labels. Keep in mind that we do not change the order of either the pixel values nor the labels
# so that they stay consistent
train_labels_df = train_df['label']
train_labels_df.values

array([2, 9, 6, ..., 8, 8, 7])

In [5]:
'''
STEP 1.5: defining and instantiating Dataset subclass 
'''

'''
This is our custom Dataset class. Remember from 1st meeting that we need this to pipeline our data into training our model.

The pipeline is important!!! At larger scale, machine learning can get bottlenecked at disk reads (in image classification for example)
so understanding the various stages is important. We don't have to worry about that kind of stuff now since we're just creating small
project models as opposed to complex production models.

NOTE: this is not the only way to create a dataset. An alternative is to simply pass in a dataframe that contains both pixel and label data.
Then we can index the label and pixel data inside of __getitem__ as opposed to separating labels and pixel data before hand like I did.
'''
class FashionDataset(Dataset):
    def __init__(self, dataframe, labels):
        self.labels = torch.LongTensor(labels)
        self.df = dataframe
        
    def __getitem__(self, index):
        # I'm using .loc to access the row of the dataframe by index
        # HINT You don't need to do this but try normalizing your image vector before making it a torch Tensor.
        # BONUS train your model with and without normalization and see what happens
        img = torch.Tensor(self.df.loc[index].values)
        label = self.labels[index]
        return img, label

    def __len__(self):
        return len(self.labels)
    
    
'''
This class is for providing image data as (1, 28, 28) tensor as opposed to a (784) tensor. You
use these for conv2d layers which are powerful for image recognition!
'''
class Fashion2DDataset(Dataset):
    def __init__(self, dataframe, labels):
        self.labels = torch.LongTensor(labels)
        self.df = dataframe
        
    def __getitem__(self, index):
        # I'm using .loc to access the row of the dataframe by index
        a = self.df.loc[index].values
        a = np.split(a, 28)
        a = np.array([a])
        img = torch.Tensor(a)
        
        label = self.labels[index]
        return img, label

    def __len__(self):
        return len(self.labels)

In [6]:
'''
STEP 2: MAKING DATASET ITERABLE
'''
train_dataset = FashionDataset(train_pixels_df, train_labels_df)
test_dataset = FashionDataset(test_pixels_df, test_labels_df)

'''
Batch_size will determine how many data samples to go through before 
updating the weights of our model with SGD (stochastic gradient descent)

Currently at 100 but feel free to change this to whatever you want. You can consider
batch size a hyper parameter!
'''
batch_size = 100

# shuffle is true so that we train our data on all labels simultaneously. The data is already shuffled in 
# this case(You can verify this by looking through the training labels by running train_labels in its own cell)
# If this wasn't the case, and we had shuffle=False, we might end up training the model on label = 0 and 
# then ending with label = 9. This would cause the model to 'forget' what label = 0 looked like
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

# shuffle=False because theres no reason to do so with testing
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

# stop

Below this block is your responsibility! Best of luck

In [None]:
# STEP 2.5: CLEANING DATA
movie_text = open('moviedialogues/movie_lines.txt', encoding='utf-8', errors='ignore').read().split('\n')
conv_lines = open('moviedialogues/movie_conversations.txt', encoding='utf-8', errors='ignore').read().split('\n')

lineToText = {}  # mapping of line number to text
# inputToOutput = {}
inputs = []
outputs = []
for line in movie_text:
    things = line.split("+++$+++")
    if (len(things) == 5):  
#         key = re.sub("[^0-9]", "", things[0])
        val = things[4].translate(str.maketrans('', '', string.punctuation))
#         lineToText[int(key)] = val
        lineToText[things[0]] = val

        
# print(lineToText[295])


for conversation in conv_lines:
    things = conversation.split("+++$+++")
    if (len(things) == 4):
        convo = things[3]
        #convo is a string, need to split by comma, remove first [ and last ], and then do this
        for i in range(0, len(convo) - 1):
#             inputSentenceIndex = re.sub("[^0-9]", "", convo[i])
#             outputSentenceIndex = re.sub("[^0-9]", "", convo[i + 1])    
            print(convo[i])
            inputSentenceIndex = convo[i]
            outputSentenceIndex = convo[i + 1]
            if (inputSentenceIndex in lineToText) and (outputSentenceIndex in lineToText):
                inputs.append(lineToText[inputSentenceIndex])
                outputs.append(lineToText[outputSentenceIndex])
                
            
print(len(inputs))
# for i in range(0, 10):
#     print(inputs[i])
#     print(outputs[i])
#     print("~~~~~")

In [None]:
# '''
# STEP 3: CREATE MODEL CLASS
# '''

class Encoder(nn.Module):
    def __init__()
    def forward()
    def hidden()
    
    
class Decoder(nn.Module):
    def __init__()
    def forward()
    def hidden()

    
class Attention(nn.Module):

In [15]:
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
model = FeedForwardModel()

In [16]:
'''
STEP 5: INSTANTIATE LOSS CLASS
'''
loss_func = torch.nn.MSELoss()

In [17]:
'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
"""
Most of the time I use SGD. Feel free to use another optimizer if you wish.
What hyperparameters would you use/set here?
"""
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

In [None]:
'''
STEP 7: TRAIN THE MODEL
'''
# This iteration variable keeps track of what iteration you're on so you may print out your progress during the training loop
iteration = 0

'''
Write your training loop here

HINT: You'll need two for loops. 1 for every epoch you wish to train and 1 to iterate over your train_loader
HIN2: see the bottom of this doc if you want more hints on how to write your training loop
'''
#LOOP HERE

for ep in range(2):
    for tl in train_loader:
        # Load images with gradient accumulation capabilities
        ??
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        # Forward pass to get outputs/logits
        ??
        # Calculate loss
        loss = loss_func()
        # Getting gradients w.r.t. parameters
        loss.backwards()
        # Updating parameters
        optimizer.step()
        
    
        
        # I've left this for your use
"""
        The below code block prints out your iteration number, loss, and accuracy
        
        This may need to be modified depending on how you implemented steps 3-7
        
        If it doesn't work and you have no clue what is wrong send me your code so I may help debug!
"""
        
'''
        if iteration % YOUR_NUMBER == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images with gradient accumulation capabilities
                images = images.view(-1, 28*28).requires_grad_()
                
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()

            accuracy = 100 * correct / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))
'''

In [None]:
"""
HINT 2: for your inner for loop you need to do these steps:
    # Load images with gradient accumulation capabilities
    # Clear gradients w.r.t. parameters
    # Forward pass to get output/logits
    # Calculate Loss: softmax --> cross entropy loss
    # Getting gradients w.r.t. parameters
    # Updating parameters

HINT 3: You may look at FF NN MNIST.ipynb if you're stuck or have no clue where to start. Yes it is difficult but you're all very capable <3
"""