# Basic Attention Operation: Ungraded Lab

As you've learned, attention allows a seq2seq decoder to use information from each encoder step instead of just the final encoder hidden state. In the attention operation, the encoder outputs are weighted based on the decoder hidden state, then combined into one context vector. This vector is then used as input to the decoder to predict the next output step.

In this ungraded lab, you'll implement a basic attention operation as described in [Bhadanau, et al (2014)](https://arxiv.org/abs/1409.0473) using Numpy.

This is a practice notebook, where you can train writing your code. All of the solutions are provided at the end of the notebook.

In [3]:
import numpy as np

def softmax(x , axis=0):
    """_summary_
        axis=0 calculates softmax across rows which means each column sums to 1 
        axis=1 calculates softmax across columns which means each row sums to 1

    Args:
        x (_type_): _description_
        axis (int, optional): _description_. Defaults to 0.
    """
    return np.exp(x) / np.expand_dims(np.sum(np.exp(x), axis=axis), axis)

## 1: Calculating alignment scores

The first step is to calculate the alignment scores. This is a measure of similarity between the decoder hidden state and each encoder hidden state. From the paper, this operation looks like

$$
\large e_{ij} = v_a^\top \tanh{\left(W_a s_{i-1} + U_a h_j\right)}
$$

where $W_a \in \mathbb{R}^{n\times m}$, $U_a \in \mathbb{R}^{n \times m}$, and $v_a \in \mathbb{R}^m$
are the weight matrices and $n$ is the hidden state size. In practice, this is implemented as a feedforward neural network with two layers, where $m$ is the size of the layers in the alignment network. It looks something like:

![alignment model](./images/alignment_model_3.jpg)

Here $h_j$ are the encoder hidden states for each input step $j$ and $s_{i - 1}$ is the decoder hidden state of the previous step. The first layer corresponds to $W_a$ and $U_a$, while the second layer corresponds to $v_a$.

To implement this, first concatenate the encoder and decoder hidden states to produce an array with size $K \times 2n$ where $K$ is the number of encoder states/steps. For this, use `np.concatenate` ([docs](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)). Note that there is only one decoder state so you'll need to reshape it to successfully concatenate the arrays. The easiest way is to use `decoder_state.repeat` ([docs](https://numpy.org/doc/stable/reference/generated/numpy.repeat.html#numpy.repeat)) to match the hidden state array size.

Then, apply the first layer as a matrix multiplication between the weights and the concatenated input. Use the tanh function to get the activations. Finally, compute the matrix multiplication of the second layer weights and the activations. This returns the alignment scores.

In [5]:
hidden_size = 16
attention_size = 10
input_length = 5

np.random.seed(4)

encoder_states = np.random.randn(input_length, hidden_size)
decoder_state = np.random.randn(1, hidden_size)

layer_1 = np.random.randn(2 * hidden_size, attention_size)
layer_2 = np.random.randn(attention_size, 1)

In [7]:
test = np.concatenate([encoder_states, decoder_state])
test

array([[ 0.05056171,  0.49995133, -0.99590893,  0.69359851, -0.41830152,
        -1.58457724, -0.64770677,  0.59857517,  0.33225003, -1.14747663,
         0.61866969, -0.08798693,  0.4250724 ,  0.33225315, -1.15681626,
         0.35099715],
       [-0.60688728,  1.54697933,  0.72334161,  0.04613557, -0.98299165,
         0.05443274,  0.15989294, -1.20894816,  2.22336022,  0.39429521,
         1.69235772, -1.11281215,  1.63574754, -1.36096559, -0.65122583,
         0.54245131],
       [ 0.04800625, -2.35807363, -1.10558404,  0.83783635,  2.08787087,
         0.91484096, -0.27620335,  0.7965119 , -1.14379857,  0.50991978,
        -1.3474603 , -0.0093601 , -0.13070464,  0.80208661, -0.30296397,
         1.20200259],
       [-0.19674528,  0.8365287 ,  0.78660228, -1.84087587,  0.03754749,
         0.03592805, -0.77873992,  0.17941071, -1.45553433,  0.55618522,
         0.50977885,  0.30044554,  2.47658416,  0.3523434 ,  0.067471  ,
        -0.7322647 ],
       [ 0.29714121, -0.9617768 ,  1

In [None]:

def alignment(encoder_states, decoder_state):
    # First, concatenate the encoder states and the decoder state
    inputs = np.concatenate([encoder_states, decoder_state])
    assert inputs.shape == (input_length, 2 * hidden_size)