# Baby Name generator on chracter level

### Reference 


<a href="https://www.youtube.com/watch?v=6ORnRAz3gnA&t=59s"> YouTube Channel on LSTM in keras</a>

<a href="https://keras.io/api/layers/recurrent_layers/lstm/"> Keras official documentation</a>             

<a href="https://kite.com/python/docs/keras.layers.LSTM">Kite LSTM Documentation</a>

# RNN

**Recurrent Neural Networks**  - suffer from short-term memory. If a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning.
During back propagation, recurrent neural networks suffer from the vanishing gradient problem.


# LSTM

*LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have ‘short term memory’ in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node*

**How does it work**

*LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of ‘gates’. These are contained in memory blocks which are connected through layers, like this*


                                          
**A simple LSTM model consist four gates in it**

**Forget Gate:**
After getting the output of previous state, h(t-1), Forget gate helps us to take decisions about what must be removed from h(t-1) state and thus keeping only relevant stuff. It is surrounded by a sigmoid function which helps to crush the input between [0,1].

**Input Gate:**

In the input gate, we decide to add new stuff from the present input to our present cell state scaled by how much we wish to add them.


**Output Gate:** 
Finally we’ll decide what to output from our cell state which will be done by our sigmoid function.


**LSTM is divided into six parts**

**Vanilla LSTM** - A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.

**Stacked LSTM** -  Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.
An LSTM layer requires a three-dimensional input and LSTMs by default will produce a two-dimensional output as an interpretation from the end of the sequence.

**Bidirectional LSTM** -  On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.We can implement a Bidirectional LSTM for univariate time series forecasting by wrapping the first hidden layer in a wrapper layer called Bidirectional.

**CNN LSTM** - A convolutional neural network, or CNN for short, is a type of neural network developed for working with two-dimensional image data.
The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence data such as univariate time series data.
A CNN model can be used in a hybrid model with an LSTM backend where the CNN is used to interpret subsequences of input that together are provided as a sequence to an LSTM model to interpret. This hybrid model is called a CNN-LSTM.

**ConvLSTM** - A type of LSTM related to the CNN-LSTM is the ConvLSTM, where the convolutional reading of input is built directly into each LSTM unit.
The ConvLSTM was developed for reading two-dimensional spatial-temporal data, but can be adapted for use with univariate time series forecasting.
The layer expects input as a sequence of two-dimensional images, therefore the shape of input data must be:

In [1]:
import pandas as pd
import numpy as np
import string
from string import digits
import matplotlib.pyplot as plt
%matplotlib inline
import re
from sklearn.model_selection import train_test_split

In [3]:
data=pd.read_csv('baby-names.csv')

In [4]:
data.shape

(258000, 4)

In [24]:
data.head()

Unnamed: 0,name,target
0,\tJohn,John\n
1,\tWilliam,William\n
2,\tJames,James\n
3,\tCharles,Charles\n
4,\tGeorge,George\n


In [26]:
#Finding duplicate rows

data.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
6777    False
6778    False
6779    False
6780    False
6781    False
Length: 6782, dtype: bool

In [5]:
#Creating a dataframe with unique words
data=pd.DataFrame({'name':data.name.unique()})

#Shifting my target by one timestamp t  because Target is the next char

data['name']=data.name.apply(lambda x:'\t'+x)

#Appending '\n' to represent that word ended here'
data['target']=data.name.apply(lambda x:x[1:len(x)]+'\n')

In [6]:
# Get the vocab dict
all_chars=set()
for name in data.name:
    for c in name:
        if c not in all_chars:
            all_chars.add(c)
all_chars.add('\n')

# max length of a name is 11
char_to_ix = { ch:i for i,ch in enumerate(sorted(all_chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(all_chars)) }

In [7]:
# Here giving  training datasize, length of name and vocabulary size as 54 with datatype

lenght_list=[]
for l in data.name:
    lenght_list.append(len(l))
max_len = np.max(lenght_list)
input_data = np.zeros((len(data.name), max_len, 54),dtype='float32')
output_data = np.zeros((len(data.name), max_len, 54),dtype='float32')

In [8]:
# Generate input and output data
for i, x in enumerate(data.name):
    for t, ch in enumerate(x):
        input_data[i, t, char_to_ix[ch]] = 1.
for i, x in enumerate(data.target):
    for t, ch in enumerate(x):
        output_data[i,t, char_to_ix[ch]] = 1

# Model Building

In [9]:

from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM, Dropout
from keras.callbacks import LambdaCallback
from keras.layers import TimeDistributed
from keras.optimizers import RMSprop
from keras.utils import plot_model

Using TensorFlow backend.


### Model Description: 

**1. TimeDistributed** - Is used to keep one-to-one relations on input and output. Let's say you have 60 time steps with 100 samples of data (60 x 100 in another word) and you want to use RNN/LSTM with output of 200. If you don't use timedistributed dense layer, you will get 100 x 60 x 200 tensor. So you have the output flattened with each timestep mixed. If you apply the timedistributed dense, you are going to apply fully connected dense on each time step and get output separately by timesteps.

**2. RMSprop**- Is used as an optimizer

**3. Parameter "Return_Sequences"** in first layer (LSTM), If the return_sequences is set to False in Keras RNN layers, this means the RNN layer will only return the last hidden state output. But here I am setting it as True hence won't return the only Last hidden state output 


In [10]:
model = Sequential()
model.add(LSTM(50, input_shape=(max_len, len(all_chars)), return_sequences=True))
model.add(TimeDistributed(Dense(len(all_chars))))
model.add(TimeDistributed(Activation('softmax')))
optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [11]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 12, 50)            21000     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 54)            2754      
_________________________________________________________________
time_distributed_2 (TimeDist (None, 12, 54)            0         
Total params: 23,754
Trainable params: 23,754
Non-trainable params: 0
_________________________________________________________________


In [23]:
#Writing  Sample function to generate new names**
def onend(epoch, logs):
    if epoch%2==0 and epoch !=0:
        print('----- Generating text after Epoch: %d' % epoch)
        for i in range(0,10):
            stop=False
            ch='\t'
            counter=1
            target_seq = np.zeros((1, max_len, 54))
            target_seq[0, 0, char_to_ix[ch]] = 1.
            while stop == False and counter < 10:
                #sample the data
                probs = model.predict_proba(target_seq, verbose=0)[:,counter-1,:]
                c= np.random.choice(sorted(list(all_chars)), replace =False,p=probs.reshape(54))
                #c=ix_to_char[np.argmax(probs.reshape(28))]
                if c=='\n':
                    stop=True
                else:
                    ch=ch+c
                    target_seq[0,counter , char_to_ix[c]] = 1.
                    counter=counter+1
            print(ch)

In [16]:
# Fitting the model with callbacks
# fit the model
print_callback = LambdaCallback(on_epoch_end=onend)
Model_LSTM=model.fit(input_data, output_data, batch_size=32,epochs=50, callbacks=[print_callback])

Epoch 1/50
Epoch 2/50
Epoch 3/50
----- Generating text after Epoch: 2
	Albreda
	Sone
	Isham
	Ander
	Gennie
	Buzle
	Viola
	Shetta
	Mealisa
	Frona
Epoch 4/50
Epoch 5/50
----- Generating text after Epoch: 4
	Arah
	Fricly
	Dannielle
	Eus
	Javion
	Letal
	Minnit
	Erna
	Nichole
	Alisha
Epoch 6/50
Epoch 7/50
----- Generating text after Epoch: 6
	Barretta
	Louis
	Son
	Alford
	Maxulla
	Treitha
	Fedy
	Haylee
	Romona
	Queea
Epoch 8/50
Epoch 9/50
----- Generating text after Epoch: 8
	Dana
	Rovon
	Ewing
	Taviona
	Welden
	Kimp
	Willa
	Dyan
	Micayla
	Oren
Epoch 10/50
Epoch 11/50
----- Generating text after Epoch: 10
	Janelle
	Speckeyli
	Izya
	Contine
	Yvonne
	Delison
	Benjiman
	Margo
	Raynor
	Georgia
Epoch 12/50
Epoch 13/50
----- Generating text after Epoch: 12
	Rili
	Selan
	Erolfa
	Kathlyn
	Glinda
	Lucian
	Lakle
	Jearahino
	Isaylai
	Dustyn
Epoch 14/50
Epoch 15/50
----- Generating text after Epoch: 14
	Pawel
	Adit
	Lita
	Daisley
	Mariano
	Edd
	Colen
	Cera
	Trayvon
	Adell
Epoch 16/50
Epoch 17/50
----- 