# Fashion MNIST
Use this notebook as a skeleton for developing your own network to solve this classification problem!
Feel free to experiment (as a matter of fact, its encouraged) with what you've learned so far here. Don't be afraid to ask questions and use different architectures.
Be conscious of what you don't know so that you know what to ask/look for.

No GPU required!

The basic 7 steps for building models in general are listed so:
 1. Load Dataset
 2. Make Dataset Iterable
 3. Create Model Class
 4. Instantiate Model Class
 5. Instantiate Loss Class
 6. Instantiate Optimizer Class
 7. Train Model

I have handled steps 1 and 2 for you. Please handle the rest!

### Run the below cells until 'stop' to get your data processed and loaded

In [1]:
import torch
import torch.nn as nn
import torchvision.datasets as dsets
from torch import optim
import torch.nn.functional as F
import pandas as pd
import numpy as np
import re
import string

SOS_token = '<SOS>'
EOS_token = '<EOS>'

from torch.utils.data import Dataset, DataLoader


from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import re
import random


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(torch.__version__)

1.0.1.post2


In [2]:
'''
STEP 1: LOAD DATASET
'''
test_df = pd.read_csv('fashionmnist/fashion-mnist_test.csv')
test_labels_df = test_df['label']
test_pixels_df = test_df.drop('label', axis=1)

'''
If you're curious about how I did this see the below cells. If not just skip to STEP 1.5

Pandas is a library for dataprocessing. You might run into dask.DataFrame at some point if you continue with ML.
dask.DataFrame is built ontop of Pandas with the purpose of concurrency and parallelized computing...basically when
working with datasets so large that you require multiple machines to handle it. This is part of the data pipeline!
'''

# This reads the csv file into a pandas dataframe
train_df = pd.read_csv('fashionmnist/fashion-mnist_train.csv')
train_df.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,6,0,0,0,0,0,0,0,5,0,...,0,0,0,30,43,0,0,0,0,0
3,0,0,0,0,1,2,0,0,0,0,...,3,0,0,0,0,1,0,0,0,0
4,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [3]:
# We create a new dataframe without the 'label' column here so we only get the pixel data
# The original dataframe train_df is unmodified
train_pixels_df = train_df.drop('label', axis=1)
train_pixels_df.head()

Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,5,0,0,...,0,0,0,30,43,0,0,0,0,0
3,0,0,0,1,2,0,0,0,0,0,...,3,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
# Now we grab only the labels. Keep in mind that we do not change the order of either the pixel values nor the labels
# so that they stay consistent
train_labels_df = train_df['label']
train_labels_df.values

array([2, 9, 6, ..., 8, 8, 7])

In [5]:
'''
STEP 1.5: defining and instantiating Dataset subclass 
'''

'''
This is our custom Dataset class. Remember from 1st meeting that we need this to pipeline our data into training our model.

The pipeline is important!!! At larger scale, machine learning can get bottlenecked at disk reads (in image classification for example)
so understanding the various stages is important. We don't have to worry about that kind of stuff now since we're just creating small
project models as opposed to complex production models.

NOTE: this is not the only way to create a dataset. An alternative is to simply pass in a dataframe that contains both pixel and label data.
Then we can index the label and pixel data inside of __getitem__ as opposed to separating labels and pixel data before hand like I did.
'''
class FashionDataset(Dataset):
    def __init__(self, dataframe, labels):
        self.labels = torch.LongTensor(labels)
        self.df = dataframe
        
    def __getitem__(self, index):
        # I'm using .loc to access the row of the dataframe by index
        # HINT You don't need to do this but try normalizing your image vector before making it a torch Tensor.
        # BONUS train your model with and without normalization and see what happens
        img = torch.Tensor(self.df.loc[index].values)
        label = self.labels[index]
        return img, label

    def __len__(self):
        return len(self.labels)
    
    
'''
This class is for providing image data as (1, 28, 28) tensor as opposed to a (784) tensor. You
use these for conv2d layers which are powerful for image recognition!
'''
class Fashion2DDataset(Dataset):
    def __init__(self, dataframe, labels):
        self.labels = torch.LongTensor(labels)
        self.df = dataframe
        
    def __getitem__(self, index):
        # I'm using .loc to access the row of the dataframe by index
        a = self.df.loc[index].values
        a = np.split(a, 28)
        a = np.array([a])
        img = torch.Tensor(a)
        
        label = self.labels[index]
        return img, label

    def __len__(self):
        return len(self.labels)

In [6]:
'''
STEP 2: MAKING DATASET ITERABLE
'''
train_dataset = FashionDataset(train_pixels_df, train_labels_df)
test_dataset = FashionDataset(test_pixels_df, test_labels_df)

'''
Batch_size will determine how many data samples to go through before 
updating the weights of our model with SGD (stochastic gradient descent)

Currently at 100 but feel free to change this to whatever you want. You can consider
batch size a hyper parameter!
'''
batch_size = 100

# shuffle is true so that we train our data on all labels simultaneously. The data is already shuffled in 
# this case(You can verify this by looking through the training labels by running train_labels in its own cell)
# If this wasn't the case, and we had shuffle=False, we might end up training the model on label = 0 and 
# then ending with label = 9. This would cause the model to 'forget' what label = 0 looked like
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

# shuffle=False because theres no reason to do so with testing
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

# stop

Below this block is your responsibility! Best of luck

In [7]:
# STEP 2.5: CLEANING DATA
movie_text = open('moviedialogues/movie_lines.txt', encoding='utf-8', errors='ignore').read().split('\n')
conv_lines = open('moviedialogues/movie_conversations.txt', encoding='utf-8', errors='ignore').read().split('\n')

lineToText = {}  # mapping of line number to text
# inputToOutput = {}
inputs = []
outputs = []
for line in movie_text:
    things = line.split("+++$+++")
#     print(things)
    if (len(things) == 5):  
#         key = re.sub("[^0-9]", "", things[0])
        val = things[4].translate(str.maketrans('', '', string.punctuation))
#         lineToText[int(key)] = val
        lineToText[things[0].replace(" ", "")] = val

        
# print(lineToText[295])


for conversation in conv_lines:
    things = conversation.split("+++$+++")
    if (len(things) == 4):
        convo = things[3]
        convo = [x.strip() for x in convo.split(',')]
        convo[0] = convo[0].replace("[", "")
        convo[len(convo) - 1] = convo[len(convo) - 1].replace("]", "")
        for index in range(0, len(convo)):
            convo[index] = convo[index].replace("'", "")
#         print(convo)
        #convo is a string, need to split by comma, remove first [ and last ], and then do this
        for i in range(0, len(convo) - 1):
#             inputSentenceIndex = re.sub("[^0-9]", "", convo[i])
#             outputSentenceIndex = re.sub("[^0-9]", "", convo[i + 1])    
            #print(convo[i])
            inputSentenceIndex = convo[i]
            outputSentenceIndex = convo[i + 1]
            if (inputSentenceIndex in lineToText) and (outputSentenceIndex in lineToText):
                inputs.append(lineToText[inputSentenceIndex])
                outputs.append(lineToText[outputSentenceIndex])
                
            
print(len(inputs))
# for i in range(0, 10):
#     print(inputs[i])
#     print(outputs[i])
#     print("~~~~~")

221616


In [8]:
# '''
# Dataset Class
# '''

class ConvoDataset(Dataset):
    def __init__(self, inputs, outputs):
        self.inputs = inputs
        self.outputs = outputs
        
    def getitem(self, index):
        return self.inputs[index], self.outputs[index]
    
    def __len__(self):
        return len(self.inputs)
        

In [9]:
training_input = inputs[0:16000]
training_output = outputs[0:16000]
testing_input = inputs[16000:]
testing_output = outputs[16000:]

In [10]:
# '''
# MAKE DATA ITERABLE
# '''
params = {'batch_size' : 16,
         'shuffle': True,
         'num_workers': 1}

training_set = ConvoDataset(training_input, training_output)
training_generator = DataLoader(training_set, **params)

testing_set = ConvoDataset(testing_input, testing_output)
testing_generator = DataLoader(testing_set, **params)

In [11]:
# '''
# STEP 2.75: CREATE EMBEDDINGS
# '''
import gensim
import gensim.downloader as api
from gensim.models import Word2Vec
model = api.load("word2vec-google-news-300")
#model = Word2Vec(inputs,size=100, window=5, min_count=5, workers=4) # download dataset to replace inputs
#model = gensim.models.KeyedVectors.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True)
#gensim model created
import torch

weights = torch.FloatTensor(model.wv.vectors)
embedding = nn.Embedding.from_pretrained(weights)


  del sys.path[0]


In [None]:
model.add(['<SOS>', '<EOS>'], [np.random.rand(300), np.random.rand(300)])

In [43]:
weights = torch.FloatTensor(model.wv.vectors)
embedding = nn.Embedding.from_pretrained(weights)

  """Entry point for launching an IPython kernel.


In [44]:
embedding(torch.tensor(model.vocab["<EOS>"].index, dtype=torch.long))

tensor([0.1453, 0.6172, 0.6165, 0.3474, 0.1037, 0.6990, 0.0521, 0.1901, 0.3776,
        0.3883, 0.4041, 0.3574, 0.3952, 0.8900, 0.8913, 0.0297, 0.4472, 0.8572,
        0.6891, 0.3160, 0.7748, 0.5838, 0.5794, 0.7212, 0.9374, 0.0327, 0.1464,
        0.1327, 0.5570, 0.9236, 0.8850, 0.2076, 0.8011, 0.3624, 0.7049, 0.0183,
        0.1303, 0.9086, 0.4756, 0.2867, 0.7380, 0.5163, 0.9921, 0.1965, 0.4111,
        0.0532, 0.8405, 0.2368, 0.6797, 0.0812, 0.7030, 0.2844, 0.3732, 0.8184,
        0.3131, 0.0910, 0.2042, 0.5997, 0.8267, 0.5525, 0.5999, 0.4653, 0.5116,
        0.0468, 0.1375, 0.4523, 0.3250, 0.0458, 0.4220, 0.4426, 0.3989, 0.3325,
        0.6994, 0.7767, 0.6663, 0.0338, 0.1991, 0.0144, 0.6819, 0.7263, 0.5545,
        0.2266, 0.5390, 0.6937, 0.4518, 0.3735, 0.4956, 0.1749, 0.0064, 0.4725,
        0.2901, 0.5992, 0.8727, 0.0129, 0.1018, 0.2321, 0.1934, 0.9008, 0.6081,
        0.1060, 0.9450, 0.7688, 0.5177, 0.9638, 0.2092, 0.7477, 0.5586, 0.9034,
        0.4990, 0.4723, 0.3300, 0.4647, 

In [47]:
# '''
# STEP 3: CREATE MODEL CLASS
# '''

class EncoderRNN(nn.Module):
    def __init__(self, hidden_sz):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size
#         self.embedding = nn.Embedding(input_sz, hidden_sz)
        self.embedding = embedding
        self.gru = nn.GRU(hidden_sz, hidden_sz)
        
    def forward(self, _input, hidden):
        output = self.embedding(_input).view(1, 1, -1) # the -1 infers the dimension, the 1, 1 is a 1D vector
        output, hidden = self.gru(output, hidden)
        return output, hidden
        
    def hidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)
    
    
class DecoderRNN(nn.Module):
    def __init__(self, hidden_sz, output_sz):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_sz

        #self.embedding = nn.Embedding(output_sz, hidden_sz)
        self.embedding = embedding
        self.gru = nn.GRU(hidden_sz, hidden_sz)
        self.out = nn.Linear(hidden_sz, output_sz)
        self.softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, _input, hidden):
        output = self.embedding(_input).view(1, 1, -1)
        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden
    
    def hidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

    
class Attention(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=20):
        super(Attention, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [14]:
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
input_ = [torch.randn(1, 3) for _ in range(5)]  # make a sequence of length 5

# initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))
for i in input_:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)

In [48]:
'''
STEP 4: INSTANTIATE MODEL CLASS
'''

#in translation example, first arg for encoder and second arg for attnetion is num of words in a sentence? idk if we
#should be having that or something else?
# model = FeedForwardModel()
hidden_size = 256
vocab_size = 3000000
encoder1 = EncoderRNN(hidden_size).to(device)
decoder1 = DecoderRNN(hidden_size, vocab_size).to(device)

attn_decoder1 = Attention(hidden_size, len(outputs), dropout_p=0.1).to(device)
# attn_decoder1 = Attention(hidden_size, len(outputs), dropout_p=0.1)

# trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

In [16]:
'''
STEP 5: INSTANTIATE LOSS CLASS
'''
loss_func = torch.nn.MSELoss()

In [18]:
'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
"""
Most of the time I use SGD. Feel free to use another optimizer if you wish.
What hyperparameters would you use/set here?
"""
# optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

'\nMost of the time I use SGD. Feel free to use another optimizer if you wish.\nWhat hyperparameters would you use/set here?\n'

In [19]:
'''
STEP 7: TRAIN THE MODEL
'''
# we want to call torch.tensor() on a list of indexes
# each sentence becomes a list of indexes --> an input tensor that we put into train()

MAX_LENGTH = 20
def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=MAX_LENGTH):
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(
            input_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    decoder_input = torch.tensor([[model.vocab[SOS_token].index]], device=device)

    decoder_hidden = encoder_hidden
    
    # Without teacher forcing: use its own predictions as the next input
    for di in range(target_length):
#         decoder_output, decoder_hidden, decoder_attention = decoder(
#                 decoder_input, decoder_hidden, encoder_outputs)        
        decoder_output, decoder_hidden = decoder(
                decoder_input, decoder_hidden)
        topv, topi = decoder_output.topk(1)
        decoder_input = topi.squeeze().detach()  # detach from history as input

        loss += criterion(decoder_output, target_tensor[di])
        if decoder_input.item() == model.vocab[EOS_token].index:
            break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length
    

def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    #pass in our data here?
    training_pairs = [tensorsFromPair(random.choice(pairs))
                      for i in range(n_iters)]
    criterion = nn.NLLLoss()

    for i, (_input, _output) in enumerate(training_generator):

#split input into array and make into pytorch tensor

        input_words = _input.split(" ")
        input_indexes = [model.vocab[word].index for word in input_words]
        input_tensor = torch.tensor(input_indexes, dtype=torch.long)
        
        output_words = _output.split(" ")
        output_indexes = [model.vocab[word].index for word in output_words]
        output_tensor = torch.tensor(output_indexes, dtype=torch.long)
        
        loss = train(input_tensor, target_tensor, encoder,
                     decoder, encoder_optimizer, decoder_optimizer, criterion)
        print_loss_total += loss
        plot_loss_total += loss

        if iter % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
                                         iter, iter / n_iters * 100, print_loss_avg))

        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    showPlot(plot_losses)


In [None]:
"""
HINT 2: for your inner for loop you need to do these steps:
    # Load images with gradient accumulation capabilities
    # Clear gradients w.r.t. parameters
    # Forward pass to get output/logits
    # Calculate Loss: softmax --> cross entropy loss
    # Getting gradients w.r.t. parameters
    # Updating parameters

HINT 3: You may look at FF NN MNIST.ipynb if you're stuck or have no clue where to start. Yes it is difficult but you're all very capable <3
"""