# RNN for text generation

In this exercise, you'll unleash the hidden creativity of your computer, by letting it generate Country songs (yeehaw!). You'll train a character-level RNN-based language model, and use it to generate new songs.


### Special Note

Our Deep Learning course was packed with both theory and practice. In a short time, you've got to learn the basics of deep learning theory and get hands-on experience training and using pretrained DL networks, while learning PyTorch.
Past exercises required a lot of work, and hopefully gave you a sense of the challenges and difficulties one faces when using deep learning in the real world. While the investment you've made in the course so far is enormous, We strongly encourage you to take a stab at this exercise.

Some songs contain no lyrics (for example, they just contain the text "instrumental"). Others include non-English characters. You'll often need to preprocess your data and make decisions as to what your network should actually get as input (think - how should you treat newline characters?)

More issues will probably pop up while you're working on this task. If you face technical difficulties or find a step in the process that takes too long, please let me know. It would also be great if you share with the class code you wrote that speeds up some of the work (for example, a data loader class, a parsed dataset etc.)

### Imports

In [None]:
import pandas as pd
from collections import Counter
import re
import spacy
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
import time
import copy
import seaborn as sns
import string
from torch.autograd import Variable

In [None]:
nlp = spacy.load("en_core_web_sm")

#### Modify spaCy tokenization rules so that it allows contractions

In [None]:
nlp.tokenizer.rules = {key: value for key, value in nlp.tokenizer.rules.items() if "'" not in key and "’" not in key and "‘" not in key}

In [None]:
lyrics_df = pd.read_parquet("https://raw.githubusercontent.com/omriallouche/ydata_deep_learning_2021/master/data/metrolyrics.parquet")

### Pre-processing Data

<b>ISSUE: Some songs contain no lyrics (for example, they just contain the text "instrumental"). Others include non-English characters. You'll often need to preprocess your data and make decisions as to what your network should actually get as input (think - how should you treat newline characters?) <b>

<b> *Comments:* <b> 

    (Output is displayed below)
- Lyrics that just contain the word "instrumental" are still informative. This will likely point to the type of genre that it is.
- Non-English characters are relevant as input. Even if the lyrics are not readable, it is possible that certain language patterns will appear in specific genres.
- Some songs say "We are not in a position to display these lyrics due to licensing restrictions. Sorry for the inconvinience.". These should be removed from the dataset, since there is no useful information in the lyrics.
- Some genres may have shorter lines and thus, new line character counts and locations may be relevant to the model. 
- Some "lyrics" are simply descriptions of the song. For example: "[habituation to the use of cannabis][sung by Ron from Mandrake & arclila acoustic guitar by Tony Perez]." Unfortunately there are no ways of identifying these songs specifically, so they will be kept in and will probably damage the performance of the model

#### Analyzing songs with under 20 words.

In [None]:
for idx, row in lyrics_df[lyrics_df.lyrics.apply(str.split).apply(len) <=20].iterrows():
    print('Genre:',row.genre)
    print('Song:')
    print(row.lyrics)
    print('')

Genre: Metal
Song:
Snorted salvation
Methamphetamine inhilation
An army of skeletal
Zombies looking for a fix
Survival of the sick

Genre: Metal
Song:
Faal I flammar
Fanga under is
Mara rir evig
Nightmare
Fall in flames
Trapped under ice
Ridden by nightmares forever

Genre: Hip-Hop
Song:
Hiding emotions, concealing the truth.
(Understand, understand)
Feelings that should be shared, held inside.

Genre: Metal
Song:
Anyone who has intelligence,
may interpret the number of the beast.
It's a man's number.
This number is 666.

Genre: Pop
Song:
Lady killerrrrrrrrrrrrrrrrrrrrrrrr...
Be warned that the physical might be killer [echoes]
Lady killerrrrrrrrrrrrrrrrrrrrrrrr...

Genre: Pop
Song:
Der Schlange Kur
Ist nur L'amour, for shure
Der Schlange Kur toujurs
Anakondamour.
BEMERKUNG: der ist wirklich nicht länger...

Genre: Country
Song:
We are not in a position to display these lyrics due to licensing restrictions. Sorry for the inconvinience.

Genre: Rock
Song:
Instrumental
Instrumental
Instr

In [None]:
lyrics_df = lyrics_df[lyrics_df.lyrics != 'We are not in a position to display these lyrics due to licensing restrictions. Sorry for the inconvinience.']

## RNN for Text Generation
In this section, we'll use an LSTM to generate new songs. You can pick any genre you like, or just use all genres. You can even try to generate songs in the style of a certain artist - remember that the Metrolyrics dataset contains the author of each song. 

For this, we’ll first train a character-based language model. We’ve mostly discussed in class the usage of RNNs to predict the next word given past words, but as we’ve mentioned in class, RNNs can also be used to learn sequences of characters.

First, please go through the [PyTorch tutorial](https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html) on generating family names. You can download a .py file or a jupyter notebook with the entire code of the tutorial. 

As a reminder of topics we've discussed in class, see Andrej Karpathy's popular blog post ["The Unreasonable Effectiveness of Recurrent Neural Networks"](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). You are also encouraged to view [this](https://gist.github.com/karpathy/d4dee566867f8291f086) vanilla implementation of a character-level RNN, written in numpy with just 100 lines of code, including the forward and backward passes.  

Other tutorials that might prove useful:
1. http://warmspringwinds.github.io/pytorch/rnns/2018/01/27/learning-to-generate-lyrics-and-music-with-recurrent-neural-networks/
1. https://github.com/mcleonard/pytorch-charRNN
1. https://github.com/spro/practical-pytorch/blob/master/char-rnn-generation/char-rnn-generation.ipynb

In [None]:
text = ' '.join(lyrics_df[(lyrics_df.genre == 'Hip-Hop')&(lyrics_df.artist == 'eminem')].lyrics.values)

In [None]:
lyrics_df[(lyrics_df.genre == 'Hip-Hop')&(lyrics_df.artist == 'eminem')].describe()

Unnamed: 0,year,num_chars,num_words
count,209.0,209.0,209.0
mean,2008.004785,3496.665072,2291.588517
std,3.075176,1731.918473,1123.204581
min,2002.0,276.0,145.0
25%,2006.0,2527.0,1657.0
50%,2007.0,3755.0,2435.0
75%,2010.0,4449.0,2942.0
max,2016.0,11996.0,6670.0


In [None]:
chars = tuple(set(text))
int2char = dict(enumerate(chars))
char2int = {ch: ii for ii, ch in int2char.items()}
encoded = np.array([char2int[ch] for ch in text])

In [None]:
def one_hot_encode(arr, n_labels):
    
    # Initialize the encoded array
    one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.float32)
    
    # Fill the appropriate elements with ones
    one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.
    
    # Finally reshape it to get back to the original array
    one_hot = one_hot.reshape((*arr.shape, n_labels))
    
    return one_hot

In [None]:
def get_batches(arr, n_seqs, n_steps):
    '''Create a generator that returns mini-batches of size
       n_seqs x n_steps from arr.
    '''
    
    batch_size = n_seqs * n_steps
    n_batches = len(arr)//batch_size
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches * batch_size]
    # Reshape into n_seqs rows
    arr = arr.reshape((n_seqs, -1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y = np.zeros_like(x)
        try:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n+n_steps]
        except IndexError:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
        yield x, y

In [None]:
class CharRNN(nn.Module):
    def __init__(self, tokens, n_steps=100, n_hidden=256, n_layers=2,
                               drop_prob=0.5, lr=0.001):
        super().__init__()
        self.drop_prob = drop_prob
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.lr = lr
        
        self.chars = tokens
        self.int2char = dict(enumerate(self.chars))
        self.char2int = {ch: ii for ii, ch in self.int2char.items()}
        
        self.dropout = nn.Dropout(drop_prob)
        self.lstm = nn.LSTM(len(self.chars), n_hidden, n_layers, 
                            dropout=drop_prob, batch_first=True)
        self.fc = nn.Linear(n_hidden, len(self.chars))
        
        self.init_weights()
        
    def forward(self, x, hc):
        ''' Forward pass through the network '''
        
        x, (h, c) = self.lstm(x, hc)
        x = self.dropout(x)
        
        # Stack up LSTM outputs
        x = x.contiguous().view(x.size()[0]*x.size()[1], self.n_hidden)
        
        x = self.fc(x)
        
        return x, (h, c)
    
    def predict(self, char, h=None, cuda=False, top_k=None):
        ''' Given a character, predict the next character.
        
            Returns the predicted character and the hidden state.
        '''
        if cuda:
            self.cuda()
        else:
            self.cpu()
        
        if h is None:
            h = self.init_hidden(1)
        
        x = np.array([[self.char2int[char]]])
        x = one_hot_encode(x, len(self.chars))
        inputs = Variable(torch.from_numpy(x), volatile=True)
        if cuda:
            inputs = inputs.cuda()
        
        h = tuple([Variable(each.data, volatile=True) for each in h])
        out, h = self.forward(inputs, h)

        p = F.softmax(out).data
        if cuda:
            p = p.cpu()
        
        if top_k is None:
            top_ch = np.arange(len(self.chars))
        else:
            p, top_ch = p.topk(top_k)
            top_ch = top_ch.numpy().squeeze()
        
        p = p.numpy().squeeze()
        char = np.random.choice(top_ch, p=p/p.sum())
            
        return self.int2char[char], h
    
    def init_weights(self):
        ''' Initialize weights for fully connected layer '''
        initrange = 0.1
        
        # Set bias tensor to all zeros
        self.fc.bias.data.fill_(0)
        # FC weights as random uniform
        self.fc.weight.data.uniform_(-1, 1)
        
    def init_hidden(self, n_seqs):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x n_seqs x n_hidden,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        return (Variable(weight.new(self.n_layers, n_seqs, self.n_hidden).zero_()),
                Variable(weight.new(self.n_layers, n_seqs, self.n_hidden).zero_()))

In [None]:
def train(net, data, epochs=10, n_seqs=10, n_steps=50, lr=0.001, clip=5, val_frac=0.1, cuda=False, print_every=10):
    ''' Traing a network 
    
        Arguments
        ---------
        
        net: CharRNN network
        data: text data to train the network
        epochs: Number of epochs to train
        n_seqs: Number of mini-sequences per mini-batch, aka batch size
        n_steps: Number of character steps per mini-batch
        lr: learning rate
        clip: gradient clipping
        val_frac: Fraction of data to hold out for validation
        cuda: Train with CUDA on a GPU
        print_every: Number of steps for printing training and validation loss
    
    '''
    
    net.train()
    opt = torch.optim.Adam(net.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    # create training and validation data
    val_idx = int(len(data)*(1-val_frac))
    data, val_data = data[:val_idx], data[val_idx:]
    
    if cuda:
        net.cuda()
    
    counter = 0
    n_chars = len(net.chars)
    for e in range(epochs):
        h = net.init_hidden(n_seqs)
        for x, y in get_batches(data, n_seqs, n_steps):
            counter += 1
            
            # One-hot encode our data and make them Torch tensors
            x = one_hot_encode(x, n_chars)
            x, y = torch.from_numpy(x), torch.from_numpy(y)
            
            inputs, targets = Variable(x), Variable(y)
            if cuda:
                inputs, targets = inputs.cuda(), targets.cuda()

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([Variable(each.data) for each in h])

            net.zero_grad()
            
            output, h = net.forward(inputs, h)
            loss = criterion(output, targets.view(n_seqs*n_steps))

            loss.backward()
            
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
           
            nn.utils.clip_grad_norm_(net.parameters(), clip)

            opt.step()
            
            if counter % print_every == 0:
                
                # Get validation loss
                val_h = net.init_hidden(n_seqs)
                val_losses = []
                for x, y in get_batches(val_data, n_seqs, n_steps):
                    # One-hot encode our data and make them Torch tensors
                    x = one_hot_encode(x, n_chars)
                    x, y = torch.from_numpy(x), torch.from_numpy(y)
                    
                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    with torch.no_grad():
                        val_h = tuple(val_h)
                        inputs, targets = x, y
                        if cuda:
                            inputs, targets = inputs.cuda(), targets.cuda()
                        output, val_h = net.forward(inputs, val_h)
                        val_loss = criterion(output, targets.view(n_seqs*n_steps))
                
                        val_losses.append(val_loss.item())
                
                print("Epoch: {}/{}...".format(e+1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.4f}...".format(loss.item()),
                      "Val Loss: {:.4f}".format(np.mean(val_losses)))

In [None]:
if 'net' in locals():
    del net

In [None]:
net = CharRNN(chars, n_hidden=512, n_layers=2)

First, we will try to run the model where each batch contains 128 sequences with  100 chars in each.

In [None]:
n_seqs, n_steps = 128, 100
train(net, encoded, epochs=10, n_seqs=n_seqs, n_steps=n_steps, lr=0.001, cuda=False, print_every=10)

Epoch: 1/10... Step: 10... Loss: 3.4113... Val Loss: 3.3921
Epoch: 1/10... Step: 20... Loss: 3.2964... Val Loss: 3.3042
Epoch: 1/10... Step: 30... Loss: 3.1943... Val Loss: 3.2047
Epoch: 1/10... Step: 40... Loss: 3.0721... Val Loss: 3.0679
Epoch: 1/10... Step: 50... Loss: 2.9469... Val Loss: 2.9047
Epoch: 2/10... Step: 60... Loss: 2.7227... Val Loss: 2.7538
Epoch: 2/10... Step: 70... Loss: 2.6298... Val Loss: 2.6556
Epoch: 2/10... Step: 80... Loss: 2.5516... Val Loss: 2.5856
Epoch: 2/10... Step: 90... Loss: 2.4924... Val Loss: 2.5293
Epoch: 2/10... Step: 100... Loss: 2.4820... Val Loss: 2.4851
Epoch: 3/10... Step: 110... Loss: 2.4042... Val Loss: 2.4468
Epoch: 3/10... Step: 120... Loss: 2.3679... Val Loss: 2.4220
Epoch: 3/10... Step: 130... Loss: 2.3676... Val Loss: 2.3941
Epoch: 3/10... Step: 140... Loss: 2.3053... Val Loss: 2.3718
Epoch: 3/10... Step: 150... Loss: 2.2994... Val Loss: 2.3451
Epoch: 4/10... Step: 160... Loss: 2.3205... Val Loss: 2.3236
Epoch: 4/10... Step: 170... Loss:

In [None]:
checkpoint = {'n_hidden': net.n_hidden,
              'n_layers': net.n_layers,
              'state_dict': net.state_dict(),
              'tokens': net.chars}
with open('rnn.net', 'wb') as f:
    torch.save(checkpoint, f)

In [None]:
def sample(net, size, prime='The', top_k=None, cuda=False):
        
    if cuda:
        net.cuda()
    else:
        net.cpu()

    net.eval()
    
    # First off, run through the prime characters
    chars = [ch for ch in prime]
    h = net.init_hidden(1)
    for ch in prime:
        char, h = net.predict(ch, h, cuda=cuda, top_k=top_k)

    chars.append(char)
    
    # Now pass in the previous character and get a new one
    for ii in range(size):
        char, h = net.predict(chars[-1], h, cuda=cuda, top_k=top_k)
        chars.append(char)

    return ''.join(chars)

In [None]:
print(sample(net, 2000, prime='shit', top_k=5, cuda=False))



shit
The cols the back in the crubber the my alman that the whot they that stimes over my fram
I conl she mound a cauld of, then that and where the beet to take the fick is the stop me to say
So when the shit any held of you the shoted make you and to make your bother as mither the beat a bund as throbe, shere and shat they shees what is shat me whone would your all the for this shown they when you say my newer that mosh the cars
But you daint and motherfullin' the bettire and so be me,, I'm sicking,
And it's somesting to get throut the merstict
Somethere ain't the site to the from or the bord of thats and shit shinger all throw wolld and the shit op ond make, suck it's all ond it
We triell the sticking to some ald mind to sem it
Treeste fag on my nead then ain't out the stricked it
I drestallin' the sonce and threw mess a bordes
That's what whill the say it this bad i the wannats in the sang
Trink, that who's way then sould
I stape its this its that's a bat a be the thile
Sleamy
And w

Let’s test if the generated lyrics would make more sense when we will run various combinations of  “n_steps” and “n_seqs”

In [None]:
n_seqs_list=[10,50,150,800] 
n_steps_list=[10,50,150,800] 
max_batches=100
min_batches=30

for n_seqs in n_seqs_list:
  for n_steps in n_steps_list:
    n_b=len(encoded)/(n_seqs*n_steps)
    if n_b<max_batches and n_b>min_batches:
      print("===============================================")
      print("n_steps:", n_steps, "n_seqs:", n_seqs, "num of batches:",n_b )
      if 'net' in locals():
        del net
      net = CharRNN(chars, n_hidden=512, n_layers=2)
      train(net, encoded, epochs=10, n_seqs=n_seqs, n_steps=n_steps, lr=0.001, cuda=False, print_every=10)
      print(sample(net, 2000, prime='shit', top_k=5, cuda=False))

n_steps: 800 n_seqs: 10 num of batches: 91.376375
Epoch: 1/10... Step: 10... Loss: 3.3793... Val Loss: 3.4060
Epoch: 1/10... Step: 20... Loss: 3.2805... Val Loss: 3.2992
Epoch: 1/10... Step: 30... Loss: 3.2666... Val Loss: 3.2005
Epoch: 1/10... Step: 40... Loss: 3.0838... Val Loss: 3.0735
Epoch: 1/10... Step: 50... Loss: 2.9298... Val Loss: 2.9262
Epoch: 1/10... Step: 60... Loss: 2.8089... Val Loss: 2.7752
Epoch: 1/10... Step: 70... Loss: 2.7246... Val Loss: 2.6864
Epoch: 1/10... Step: 80... Loss: 2.6161... Val Loss: 2.6210
Epoch: 2/10... Step: 90... Loss: 2.5502... Val Loss: 2.5627
Epoch: 2/10... Step: 100... Loss: 2.5197... Val Loss: 2.5179
Epoch: 2/10... Step: 110... Loss: 2.4727... Val Loss: 2.4757
Epoch: 2/10... Step: 120... Loss: 2.4341... Val Loss: 2.4475
Epoch: 2/10... Step: 130... Loss: 2.4762... Val Loss: 2.4155
Epoch: 2/10... Step: 140... Loss: 2.4625... Val Loss: 2.3895
Epoch: 2/10... Step: 150... Loss: 2.3601... Val Loss: 2.3583
Epoch: 2/10... Step: 160... Loss: 2.3612... 



shit to mys, stick off alm sime sing my bedy say man as
I'm say my sick an my shit is the fent, they, I camed it's so but I was startin mar ands that a fucked and was the show mine on a carse a care take to stall a been in in a canstend,
I'm time in
They stray want you a mant
I was dence my niggas the wayn't would wanna shat in your foolle we and staped the closh
I'm to saight
I said I am that's a streed man, with a botthes back and that
Said, I'm trying a munne the with threst's and my car sack
Talk year,
I'm sing thes to the the blat its
If to tell handed me
Trust this shit to be show the same atsent my backs that I'm stilling me
All I got a lot one sitches
Whel you said I can be the botceres to be the back and shit and the wonten they stread of man and you couse
They so she was the sead, wanna get it all that
I'm tired, stillin shat that as moth mathere and send, to the cancer a sellart take me they careded my niggas, the mest they white your tasters and me
I'm same still of you cau

In [None]:
n_seqs_list=[80,150] 
n_steps_list=[80,150] 
max_batches=100
min_batches=30

for n_seqs in n_seqs_list:
  for n_steps in n_steps_list:
    n_b=len(encoded)/(n_seqs*n_steps)
    if n_b<max_batches and n_b>min_batches:
      print("===============================================")
      print("n_steps:", n_steps, "n_seqs:", n_seqs, "num of batches:",n_b )
      if 'net' in locals():
        del net
      net = CharRNN(chars, n_hidden=512, n_layers=2)
      train(net, encoded, epochs=10, n_seqs=n_seqs, n_steps=n_steps, lr=0.001, cuda=False, print_every=10)
      print(sample(net, 2000, prime='shit', top_k=5, cuda=False))

n_steps: 150 n_seqs: 80 num of batches: 60.91758333333333
Epoch: 1/10... Step: 10... Loss: 3.4402... Val Loss: 3.4057
Epoch: 1/10... Step: 20... Loss: 3.2769... Val Loss: 3.3135
Epoch: 1/10... Step: 30... Loss: 3.1819... Val Loss: 3.2024
Epoch: 1/10... Step: 40... Loss: 3.0904... Val Loss: 3.0681
Epoch: 1/10... Step: 50... Loss: 2.9182... Val Loss: 2.9086
Epoch: 2/10... Step: 60... Loss: 2.7622... Val Loss: 2.7550
Epoch: 2/10... Step: 70... Loss: 2.6872... Val Loss: 2.6617
Epoch: 2/10... Step: 80... Loss: 2.5986... Val Loss: 2.5922
Epoch: 2/10... Step: 90... Loss: 2.4930... Val Loss: 2.5361
Epoch: 2/10... Step: 100... Loss: 2.4681... Val Loss: 2.4873
Epoch: 3/10... Step: 110... Loss: 2.4258... Val Loss: 2.4430
Epoch: 3/10... Step: 120... Loss: 2.3861... Val Loss: 2.4104
Epoch: 3/10... Step: 130... Loss: 2.3447... Val Loss: 2.3881
Epoch: 3/10... Step: 140... Loss: 2.3435... Val Loss: 2.3682
Epoch: 3/10... Step: 150... Loss: 2.3370... Val Loss: 2.3462
Epoch: 3/10... Step: 160... Loss: 2.



shit out to to getto to them take i with a bass astal of
I'm a can if the will stalkin' shat
I'm lookin' this whole
They's got me
It's to see you to dambe i some it
I don't want it's back with athar thess
I'm stakid a filling bort wast out a fuckin' but I've sand me
To the tay to my a sard the seen a can alling
I con't shoutt to throw to sell
I got's beady with them to the werly trick to stall word
Wass that walkin' be the but I wont and I'm a crown a stirle on my stint an the sholle
I'll be yaug you're all make that shad you're to bothing we go back
I seep micking they sead off at mise my as
As they's they go but he someto
And I'm andin' one mars, woll the mar the fremin'
Indo that we'll she we to so the seens and shat that the whole at me the mor the bock to mades, bothing thene shit on the well some in itser
And your ast semather that I would with at
Witches so live anound
You wonted the wordd atton mo to she sand
And your bett me in me is some to more
We're shop they was somet and 

<b> *Comments:* <b> 


*   Tried to improve the model by testing different "n_steps" and "n_seqs", but it didn't really improve much the validation loss
*  the model ran each time with 10 epochs, but it is almost certain that the text will be more logical with more epochs. the challange was the time it took to run the model
* by the 10th epoch, some of the words are still not english, 
but we can see that the model learned a lot of profanity, which is characteristic of Eminem.



### Final Tips

As a final tip, we do encourage you to do most of the work first on your local machine. They say that Data Scientists spend 80% of their time cleaning the data and preparing it for training (and 20% complaining about cleaning the data and preparing it). Handling these parts on your local machine usually mean you will spend less time complaining. You can switch to the cloud once your code runs and your pipeline is in place, for the actual training using a GPU.

We also encourage you to use a small subset of the dataset first, so things run smoothly. The Metrolyrics dataset contains over 300k songs. You can start with a much much smaller set (even 3,000 songs) and try to train a network based on it. Once everything runs properly, add more data.

Good luck!

#### This exericse was originally written by Dr. Omri Allouche.