### This tutorial is inspired from Coursera Deep Learning Specialization
#### We will use faker for generating Fake Dates

In [0]:
"https://jscriptcoder.github.io/date-translator/Machine%20Translation%20with%20Attention%20model.html"

In [1]:
!pip -q install faker
from faker import Faker
import numpy as np
import random
from babel.dates import format_date

[?25l[K     |▍                               | 10kB 24.0MB/s eta 0:00:01[K     |▊                               | 20kB 2.1MB/s eta 0:00:01[K     |█▏                              | 30kB 3.1MB/s eta 0:00:01[K     |█▌                              | 40kB 2.1MB/s eta 0:00:01[K     |█▉                              | 51kB 2.6MB/s eta 0:00:01[K     |██▎                             | 61kB 3.1MB/s eta 0:00:01[K     |██▋                             | 71kB 3.5MB/s eta 0:00:01[K     |███                             | 81kB 4.0MB/s eta 0:00:01[K     |███▍                            | 92kB 4.5MB/s eta 0:00:01[K     |███▊                            | 102kB 3.4MB/s eta 0:00:01[K     |████▏                           | 112kB 3.4MB/s eta 0:00:01[K     |████▌                           | 122kB 3.4MB/s eta 0:00:01[K     |████▉                           | 133kB 3.4MB/s eta 0:00:01[K     |█████▎                          | 143kB 3.4MB/s eta 0:00:01[K     |█████▋                    

In [0]:
faker = Faker()
faker.seed(5)
np.random.seed(5)

In [20]:
#these are the date formats we are going to generate
FORMATS = ['short','medium','medium','medium','long','long','long','long','long','full','full','full','d MMM YYY','d MMMM YYY','d MMMM YYY',
           'd MMMM YYY','d MMMM YYY','d MMMM YYY','dd/MM/YYY','EE d, MMM YYY','EEEE d, MMMM YYY','MMM d, YYY','MMMM d, YYY','YYY, d MMM','YYY, d MMMM']
for format in FORMATS:
    print('%s => %s' %(format, format_date(faker.date_object(), format=format, locale='en')))

short => 12/10/72
medium => Mar 27, 1973
medium => Jan 29, 1983
medium => Jun 21, 2017
long => July 26, 1972
long => February 27, 2005
long => July 8, 1989
long => September 28, 1976
long => May 31, 2019
full => Tuesday, October 8, 2002
full => Thursday, May 12, 2005
full => Thursday, December 29, 2005
d MMM YYY => 22 Jan 1997
d MMMM YYY => 5 September 1998
d MMMM YYY => 5 February 1987
d MMMM YYY => 15 July 1996
d MMMM YYY => 9 October 1970
d MMMM YYY => 22 May 2013
dd/MM/YYY => 07/01/1995
EE d, MMM YYY => Tue 25, Oct 1983
EEEE d, MMMM YYY => Saturday 6, August 2016
MMM d, YYY => Jan 10, 2006
MMMM d, YYY => July 18, 1997
YYY, d MMM => 1974, 15 Oct
YYY, d MMMM => 2002, 25 July


In [21]:
def random_date():
    dt = faker.date_object()
    try:
        date = format_date(dt, format=random.choice(FORMATS), locale='en')
        human_readable = date.lower().replace(',', '')
        machine_readable = dt.isoformat()
    except AttributeError as e:
        return None, None, None
    return human_readable, machine_readable
random_date()

('dec 27 1972', '1972-12-27')

In [22]:
human_vocab = set()
machine_vocab = set()
dataset = []
m = 50000
for i in range(m):
  hd,md = random_date()
  dataset.append((hd,md))
  human_vocab.update( tuple(hd) )
  machine_vocab.update( tuple(md) )
  
human_vocab.add('<pad>')
human_vocab = dict(enumerate(human_vocab))
human_vocab = { v:i for i,v in human_vocab.items()  }

machine_vocab = dict(enumerate(machine_vocab))
inv_machine_vocab = { v:i for i,v in machine_vocab.items()}

print(len(dataset),len(human_vocab),len(machine_vocab))
dataset[:10]

50000 35 11


[('may 4 1997', '1997-05-04'),
 ('9 april 1980', '1980-04-09'),
 ('march 6 2018', '2018-03-06'),
 ('oct 19 1989', '1989-10-19'),
 ('wednesday september 12 1979', '1979-09-12'),
 ('wednesday may 22 2013', '2013-05-22'),
 ('1973 24 january', '1973-01-24'),
 ('5 april 2013', '2013-04-05'),
 ('11/07/2002', '2002-07-11'),
 ('dec 3 1970', '1970-12-03')]

In [23]:
HUMAN_VOCAB = len(human_vocab)
MACHINE_VOCAB = len(machine_vocab)
Tx = 30
Ty = 10
print( HUMAN_VOCAB, MACHINE_VOCAB )

35 11


#### 1. Converting Human readable dates to character vectors
#### 2. Converting Machine Dates to character vectors

In [0]:
def string_to_ohe( string, T, vocab ):
  string = string.lower()
  arr = []
  while len(arr) < len(string):
    arr.append( vocab[ string[len(arr)] ] )
  while len(arr) < T:
    arr.append( vocab['<pad>'] )
    
  onehot = np.zeros( (T,len(vocab)) )
  for i in range(T):
    onehot[ i, arr[i] ] = 1
  return onehot, arr

def output_to_date( out, vocab ):
  arr = np.argmax(out,axis=-1)
  string = ''
  for i in arr:
    string += vocab[ i ]
  return string

In [25]:
X = []
Y = []
for x,y in dataset:
  X.append( string_to_ohe(x, Tx, human_vocab)[0] )
  Y.append( string_to_ohe(y, Ty, inv_machine_vocab)[0] )
X,Y = np.array(X), np.array(Y)
X.shape, Y.shape

((50000, 30, 35), (50000, 10, 11))

## Defining Attention Model

| Overall | Attention Mechanism |
|-------------|------------------------|
| ![alt text](https://github.com/adityajn105/Coursera-Deep-Learning-Specialization/raw/26cf7da29b2f1cb32799e045cc9cdfab99ad0757/4.%20Sequence%20Models/Week%203/Machine%20Translation/images/attn_model.png) | ![alt text](https://raw.githubusercontent.com/adityajn105/Coursera-Deep-Learning-Specialization/26cf7da29b2f1cb32799e045cc9cdfab99ad0757/4.%20Sequence%20Models/Week%203/Machine%20Translation/images/attn_mechanism.png) |

* The post-attention LSTM passes $s^{\langle t \rangle}, c^{\langle t \rangle}$ from one time step to the next.
* in this model the post-attention LSTM at time $t$ does will not take the specific generated $y^{\langle t-1 \rangle}$ as input; it only takes $s^{\langle t\rangle}$ and $c^{\langle t\rangle}$ as input. 
* We use $a^{\langle t \rangle} = [\overrightarrow{a}^{\langle t \rangle}; \overleftarrow{a}^{\langle t \rangle}]$ to represent the concatenation of the activations of both the forward-direction and backward-directions of the pre-attention Bi-LSTM.
* The diagram on the right uses a RepeatVector node to copy $s^{\langle t-1 \rangle}$'s value $T_x$ times, and then Concatenation to concatenate $s^{\langle t-1 \rangle}$ and $a^{\langle t \rangle}$ to compute $e^{\langle t, t'}$, which is then passed through a softmax to compute $\alpha^{\langle t, t' \rangle}$. We'll explain how to use RepeatVector and Concatenation in Keras below.



In [0]:
from keras.layers import RepeatVector, Concatenate, Dense, Dot, Activation
#combines weights generated from BiLSTM with previous state of Post LSTM cell to get attention to be given to each timestep
#heart of attention model
def one_step_attention( a, s_prev ):
  x = RepeatVector(Tx)(s_prev)             #repeat s_prev Tx times
  x = Concatenate(axis=-1)( [ a, x ] )     #concat each copy of s_prev with each timestep hidden state
  e = Dense(10, activation='tanh')(x)      #pass each concatenated vector through Dense Layer to get intermediate energies
  energy = Dense(1, activation='relu')(e)  #get energy 
  alphas = Activation('softmax')(energy)   #convert energy to probabilities i.e. attention weights
  context = Dot(axes=1)([alphas,a])        #multiply attention weights and timestep hidden state to get context vector
  return context

In [0]:
from keras.layers import Input, Bidirectional, LSTM
from keras.models import Model

n_a = 32 #pre attention LSTM state, since Bi directional attention=64
n_s = 64 #post attention LSTM state

inp = Input( (Tx, HUMAN_VOCAB ) )
s0 = Input( (n_s,) )
c0 = Input( (n_s,) )

outputs = []

s=s0
c=c0
a = Bidirectional( LSTM( n_a, return_sequences=True ) )(inp) #generate hidden state for every timestep

"https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/"
postLSTM = LSTM( n_s, return_state = True)

output = Dense( MACHINE_VOCAB, activation='softmax') #our final output layer

for _ in range(Ty): #iterate for every output step
  context = one_step_attention(a, s) #get context
  s,_,c = postLSTM(context, initial_state=[s,c]) #generate
  out = output(s) 
  outputs.append(out)
  
model = Model( [inp,s0,c0], outputs )
#model.summary()

In [28]:
from keras.optimizers import Adam
model.compile( optimizer=Adam(lr=0.005, beta_1=0.9, beta_2=0.999, decay=0.01), loss='categorical_crossentropy', metrics=['accuracy'] )

s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))

Y = list(Y.swapaxes(0,1))

history = model.fit( [X,s0,c0], Y, epochs=30, batch_size=128, verbose=1)
model.save_weights('attention_weights.h5')

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [32]:
model.load_weights('attention_weights.h5')
def getTranslation(date,model):
  date = date.lower().replace(',','')
  source = np.array(string_to_ohe(date, Tx, human_vocab)[0])
  source = np.expand_dims(source,axis=0)
  prediction = np.array(model.predict([source, s0, c0]))
  prediction = np.squeeze(prediction.swapaxes(0,1))
  return output_to_date(prediction,machine_vocab)

EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', 
            '1 March 2001','jun 10 2017','11/07/2002']

for example in EXAMPLES:
    print(f"{example} -> {getTranslation(example,model)}")

3 May 1979 -> 1979-05-03
5 April 09 -> 2009-04-04
21th of August 2016 -> 2016-08-22
Tue 10 Jul 2007 -> 2007-07-10
Saturday May 9 2018 -> 2018-05-09
March 3 2001 -> 2001-03-03
March 3rd 2001 -> 2001-03-03
1 March 2001 -> 2001-03-01
jun 10 2017 -> 2017-06-10
11/07/2002 -> 2002-07-17


In [33]:
done = False
while not done:
  dt = input("Enter Date : ")
  print(f"Translation : {getTranslation(dt,model)}     Continue('y/n') :",end="")
  done = input() == 'n'

Enter Date : 21 july 1976
Translation : 1976-07-21     Continue('y/n') :y
Enter Date : 11/07/2002
Translation : 2002-07-17     Continue('y/n') :n


In [0]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            (None, 30, 35)       0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            (None, 64)           0                                            
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 30, 64)       17408       input_4[0][0]                    
__________________________________________________________________________________________________
repeat_vector_11 (RepeatVector) (None, 30, 64)       0           input_5[0][0]                    
__________________________________________________________________________________________________
concatenat