<a href="https://colab.research.google.com/github/TerryTian21/PyTorch-Practice/blob/main/Tutorials/RNN_Basic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RNNs

Following the tutorial by [Gabriel Loye](https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/) we will explore a practical implementation of RNNs.

The goal of this notebook is to create a sentence completion reccommender based on a word or a few characters passed into the network. We will train the model using textual data so it learns the most common "sequences of words" enabling it to give us the next most likely words.

## First Implementation

To keep this first implementation simple, we will define a few sentences to see how much the model learns. The process goes as follows:
1. Creating Vocabulary Dictionaries
2. Padding and splitting into input/labels
3. One-hot encoding
4. Define the Model
5. Train Model
6. Evaluate Model

In [1]:
# import libraries

import torch
from torch import nn
import numpy as np

## Create Vocab Dictionaries

In [3]:
text = [ "hey how are you",
        "good I am fine",
         "have a nice day",
         "the weather is good today",
         "how was your weekend",
         "It was great, thank you"]

In [4]:
# Join all the sentences together and extract unique characters from combined sentences

chars = set("".join(text))
print(chars)

int2char = dict(enumerate(chars))
print(int2char)

# Dict that maps chars to ints
char2int = {char: val for val, char in int2char.items()}
print(char2int)

{'t', 'v', 'f', 's', ',', ' ', 'y', 'i', 'c', 'o', 'u', 'a', 'I', 'd', 'k', 'g', 'h', 'r', 'e', 'n', 'w', 'm'}
{0: 't', 1: 'v', 2: 'f', 3: 's', 4: ',', 5: ' ', 6: 'y', 7: 'i', 8: 'c', 9: 'o', 10: 'u', 11: 'a', 12: 'I', 13: 'd', 14: 'k', 15: 'g', 16: 'h', 17: 'r', 18: 'e', 19: 'n', 20: 'w', 21: 'm'}
{'t': 0, 'v': 1, 'f': 2, 's': 3, ',': 4, ' ': 5, 'y': 6, 'i': 7, 'c': 8, 'o': 9, 'u': 10, 'a': 11, 'I': 12, 'd': 13, 'k': 14, 'g': 15, 'h': 16, 'r': 17, 'e': 18, 'n': 19, 'w': 20, 'm': 21}


The char2int library : holds all the letters/symbols that were present in our sentences and maps them to a unique integer

## Padding Input Sentences

Typically RNNs are able to take variable sized inputs we usually want to feed the input in batches to speed up training. In order to be used as batches the inputs (sequences) are the same length.

<br>

In our case : we will be padding shorter sentences with spaces to match the length of the longest sentences.

In [6]:
# Find max length of longest string

maxlen = max([len(input.split(" ")) for input in text])
print(maxlen)

# Padd sentences until sentences matches length of longest sentences

5
