<a href="https://colab.research.google.com/github/Antony-gitau/machine_learning_playground/blob/main/Neurons_with_recurrence.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I am following the [MIT 6.S191 lecture on recurrent neural network](https://youtu.be/ySEx_Bqxvvo) and taking some notes and here I document them.

Sequence modelling applications:
- machine translation 
- image captioning
- semantic classification


Neuron with recurrence
- RNN

pseudocode of an RNN

1. Define the rnn;
my_rnn = RNN()
2. iterate through all the inputs
3. calculate and update the hidden state using an activation function
4. generate a predicted output.

design criteria for developing networks for sequence modelling:
- handle variable lengths
- track long dependencies
- maintain information about the order of the sequence
- share parameters across the sequence

example: 
predicting the next word.

1. represent language to a neural network
- represent words as numerical representation. 

one way to represent words as input vectors of a neural network, we use a one hot encoding technique. By one hot encoding we mean, taking a count of every word in a single vector and identifying the word with a 1 and 0 everywhere else. e.g [0,1,0,0] is a one hot vector of a word in the second index (that is appearing second on the count of words in the sequence)

2. Training and learning through neural networks

- backpropagation through time.

challenges:
1. exploding gradients
- the gradient gets bigger and bigger until its unfeaseble to calculate it, and by extension, training a model becomes unstable.
2. vanishing gradients
- the gradient on the other hand gets smaller and smaller, until it becomes insignificant.

tricks to overcome the challenges:
1. changing activation functions
e.g ReLU is an a function that prevents the gradient from shrinking
2. parameter initialization
3. introducing gated cells.
select flow of information in the neural network. like the LSTMs

applications and limitations of RNN

Music generation
- Design an RNN that can predict the next musical note.

limitation
- encoding bottleneck
- no easy parallelization techniques
- not that long memory for quite long sequences, like the 10,000s of words

Attention is all you need:
- attend to the most import part of an input example.
- extract the features deserve the highest attention.


Let jump into a practical section drawing inspiration from [music generation with RNN lab](https://github.com/aamini/introtodeeplearning/blob/master/lab1/Part2_Music_Generation.ipynb) by MIT Introduction to Deep learning course.

The goal is to train a model to generate new music from learning the patterns in raw sheet music.



In [1]:
%%capture
%tensorflow_version 2.x #ensuring we are using any tensorflow 2. something version
import tensorflow as tf

# the data we are using lives in mitdeeplearning package
!pip install mitdeeplearning
import mitdeeplearning as mdl





Data:
- the mitdeeplearning package has an irish folk song data set that has thousands os songs.


In [3]:
songs = mdl.lab1.load_training_data()
first_song = songs[0]
print("This is just an example\n ", first_song)

Found 817 songs in text
This is just an example
  X:1
T:Alexander's
Z: id:dc-hornpipe-1
M:C|
L:1/8
K:D Major
(3ABc|dAFA DFAd|fdcd FAdf|gfge fefd|(3efe (3dcB A2 (3ABc|!
dAFA DFAd|fdcd FAdf|gfge fefd|(3efe dc d2:|!
AG|FAdA FAdA|GBdB GBdB|Acec Acec|dfaf gecA|!
FAdA FAdA|GBdB GBdB|Aceg fefd|(3efe dc d2:|!


In [5]:
#converting the abc notation of the songs to audio file
play_first_song = mdl.lab1.play_song(first_song)
play_first_song

  and should_run_async(code)


Important questions:

how does the number of different characters present in the text file impact the complexity of the learning problem?



In [6]:
#join the songs leaving a blank line between them
joined_songs = "\n\n".join(songs)

# lets get unique characters from the list of songs we just joined 
vocab = sorted(set(joined_songs))
print("we have ", len(vocab), "unique characters in the irish folk songs dataset")

we have  83 unique characters in the irish folk songs dataset


In [7]:
vocab

  and should_run_async(code)


['\n',
 ' ',
 '!',
 '"',
 '#',
 "'",
 '(',
 ')',
 ',',
 '-',
 '.',
 '/',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 ':',
 '<',
 '=',
 '>',
 'A',
 'B',
 'C',
 'D',
 'E',
 'F',
 'G',
 'H',
 'I',
 'J',
 'K',
 'L',
 'M',
 'N',
 'O',
 'P',
 'Q',
 'R',
 'S',
 'T',
 'U',
 'V',
 'W',
 'X',
 'Y',
 'Z',
 '[',
 ']',
 '^',
 '_',
 'a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z',
 '|']

preprocessing:

- we are asking the model: given a sequence of characters, what is the most probable next one? This is the goal of this model development.

- the data we have is ABC notation, and we want the RNN to learn the pattern.


so,

1. we need to vectorize the text.
- creating a numerical representation of the musical text. 
- we will therefore generate two lookup tables: one to map the characters to numbers and the other will map numbers back to characters.


notes on the dictionary comprehension below:

its equivalent for loop

      char2indx = {}
      for ind, ch in enumerate(vocab): 
         char2indx[ch] = ind


In [15]:
import numpy as np

# mapping characters to unique index
char2indx = {ch:indx for indx, ch in enumerate(vocab)}

# now we move from the unique index to the characters in vocab list
indx2char = np.array(vocab)

  and should_run_async(code)


In [16]:
char2indx

{'\n': 0,
 ' ': 1,
 '!': 2,
 '"': 3,
 '#': 4,
 "'": 5,
 '(': 6,
 ')': 7,
 ',': 8,
 '-': 9,
 '.': 10,
 '/': 11,
 '0': 12,
 '1': 13,
 '2': 14,
 '3': 15,
 '4': 16,
 '5': 17,
 '6': 18,
 '7': 19,
 '8': 20,
 '9': 21,
 ':': 22,
 '<': 23,
 '=': 24,
 '>': 25,
 'A': 26,
 'B': 27,
 'C': 28,
 'D': 29,
 'E': 30,
 'F': 31,
 'G': 32,
 'H': 33,
 'I': 34,
 'J': 35,
 'K': 36,
 'L': 37,
 'M': 38,
 'N': 39,
 'O': 40,
 'P': 41,
 'Q': 42,
 'R': 43,
 'S': 44,
 'T': 45,
 'U': 46,
 'V': 47,
 'W': 48,
 'X': 49,
 'Y': 50,
 'Z': 51,
 '[': 52,
 ']': 53,
 '^': 54,
 '_': 55,
 'a': 56,
 'b': 57,
 'c': 58,
 'd': 59,
 'e': 60,
 'f': 61,
 'g': 62,
 'h': 63,
 'i': 64,
 'j': 65,
 'k': 66,
 'l': 67,
 'm': 68,
 'n': 69,
 'o': 70,
 'p': 71,
 'q': 72,
 'r': 73,
 's': 74,
 't': 75,
 'u': 76,
 'v': 77,
 'w': 78,
 'x': 79,
 'y': 80,
 'z': 81,
 '|': 82}

In [19]:
indx2char

array(['\n', ' ', '!', '"', '#', "'", '(', ')', ',', '-', '.', '/', '0',
       '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', '<', '=', '>',
       'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
       'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
       '[', ']', '^', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
       'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
       'w', 'x', 'y', 'z', '|'], dtype='<U1')

comment on dtype = '<UI' 

seen above from running the indx2char.

that datatype specifies that the array elements are unsigned 1-byte integers.

In [None]:
# vetorize the song strings
def vectorize_string(string):
  '''
  we pass a string ie the song 
  then convert the string to indexes 
  then the indexes to arrays for easier storage and manipulation
  '''