<a href="https://colab.research.google.com/github/Natural-Language-Processing-YU/Exercises/blob/main/M9_attention_network_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#1.0 Introduction to Attention Network Exercise

## Overview:
In this exercise, we will explore the concept of attention in neural networks, particularly in the context of the Attention Network. Attention mechanisms have proven to be highly effective in handling sequential data, allowing the model to focus on important parts of the input while ignoring less relevant information.

The Attention Network uses the concept of "queries," "keys," and "values" to calculate attention scores that indicate the relevance of each element in the input sequence. The attention scores are then used to generate a weighted sum of the input sequence, which captures the most critical information based on the attention mechanism.

## Formula for Attention Scores:
To calculate the attention scores, we use the dot product between the query vector (Q) and each key vector (K_i) in the input sequence:

\
$ \text{Attention Scores (A)} = Q \cdot K_i$

\

The softmax function is then applied to the attention scores to obtain attention weights (W), representing the importance of each element in the input sequence:

\
$\text{Attention Weights (W)} = \text{softmax}(\text{Attention Scores (A)}) ]$


\
Finally, the weighted sum (S) is computed as the dot product between the attention weights and each value vector (V_i) in the input sequence:

\
$\text{Weighted Sum (S)} = W \cdot V_i $


\



##Objective:
In this exercise, we will implement a simple Attention Network using a small input sequence and visualize how the attention mechanism assigns importance to each element in the sequence. By the end of the exercise, you will gain a deeper understanding of how attention works in neural networks and its significance in processing sequential data.

Let's proceed with the implementation and exploration of the Attention Network!


##Step 1: Define the Input Sequence
Let's start with a small input sequence represented as a list of numbers.

In [9]:
input_sequence = [1, 2, 3, 4, 5]


##Step 2: Convert Input Sequence to Embeddings
We'll represent each element of the input sequence as an embedding using a simple embedding table.

In [10]:
embedding_table = {
    1: [0.1, 0.2, 0.3],
    2: [0.4, 0.5, 0.6],
    3: [0.7, 0.8, 0.9],
    4: [1.0, 1.1, 1.2],
    5: [1.3, 1.4, 1.5]
}

embedded_sequence = [embedding_table[num] for num in input_sequence]


##Step 3: Define Key and Value Vectors
Now, we'll define the key and value vectors for each element in the input sequence. For simplicity, we'll use the same embedding values as the key and value vectors.

In [11]:
key_vectors = embedded_sequence
value_vectors = embedded_sequence

##Step 4: Calculate Attention Scores
Next, we'll calculate the attention scores using the dot product between the query vector and each key vector.

In [12]:
query = [0.5, 0.5, 0.5]  # The query vector for attention calculation

attention_scores = np.dot(query, np.array(key_vectors).T)

##Step 5: Apply Softmax to Get Attention Weights
We'll apply the softmax function to obtain the attention weights.

In [13]:
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

attention_weights = softmax(attention_scores)
print(attention_weights)

[0.06695687 0.10500927 0.16468731 0.25828112 0.40506543]


##Step 6: Calculate Weighted Sum
The weighted sum is the final output of the Attention Network. It is the result of multiplying each value vector with its corresponding attention weight and summing up the results.

In [14]:
weighted_sum = np.dot(attention_weights, np.array(value_vectors))


##Step 7: Final Output
The final output of the Attention Network is the weighted sum, which represents the aggregated information from the input sequence, taking into account the attention weights.

In [15]:
print("Input Sequence:", input_sequence)
print("Attention Weights:", attention_weights)
print("Weighted Sum:", weighted_sum)

Input Sequence: [1, 2, 3, 4, 5]
Attention Weights: [0.06695687 0.10500927 0.16468731 0.25828112 0.40506543]
Weighted Sum: [0.9488467 1.0488467 1.1488467]


# So, what does this mean?
The weighted sum output in an Attention Network provides a way to focus on the most important parts of the input sequence. It calculates a new representation of the input sequence, where each element in the sequence is given a weight or importance based on its relevance to a specific query.

Imagine the input sequence as a list of information, such as words in a sentence or numbers in a dataset. The query represents a specific question or target we want to focus on.

The Attention Network looks at the query and the input sequence together. It evaluates how much attention each element in the input sequence deserves with respect to the query. The attention mechanism assigns higher weights to elements that are more relevant to the query and lower weights to less relevant elements.

The weighted sum output is the final result, and it's calculated by combining the values from the input sequence, each multiplied by its respective attention weight. This means that elements with higher attention weights will contribute more to the weighted sum, while elements with lower attention weights will contribute less.

The output, or weighted sum, is a condensed and focused representation of the input sequence. It highlights the essential information that relates to the query, allowing the model to concentrate on what matters the most for the specific task at hand.

By using the attention mechanism and generating the weighted sum, the Attention Network enables more effective and targeted processing of sequential data, making it a powerful tool for natural language processing, machine translation, sentiment analysis, and other tasks involving sequences of data.





