
EXERCICE 1

Exercise 1: Identifying Limitations of Traditional Language Models
Objective: Recognize the challenges faced by traditional language models in handling long-range dependencies and complex contexts.
Instructions:
Consider the following sentence: “The scientist, who had been working on the project for years, finally made a breakthrough discovery.”
Analyze how a traditional language model, which processes text sequentially, might struggle to capture the relationship between “scientist” and “discovery” due to the intervening words.
Explain how this limitation can affect the model’s ability to accurately understand the sentence’s meaning and perform tasks like question answering or summarization.
Discuss how the attention mechanism addresses this challenge by allowing the model to focus on relevant words regardless of their position in the sentence.

Traditional language models, which process text sequentially, often struggle with capturing long-range dependencies in sentences. This is primarily because they rely on the immediate context to generate or interpret each word, which can dilute the relationship between words that are far apart in a sentence.

Let's break down the example sentence: “The scientist, who had been working on the project for years, finally made a breakthrough discovery.”

Sequential Processing Challenge:

In traditional models, the information about the "scientist" is processed at the beginning of the sentence. As the model processes each subsequent word, the initial context about the scientist can become less influential.
By the time the model reaches the word "discovery," the direct connection to the "scientist" may be weakened due to the intervening words and clauses ("who had been working on the project for years, finally made a breakthrough").
This can lead to a diminished understanding of who is making the discovery, potentially impacting tasks like question answering or summarization where identifying the subject and their actions is crucial.
Impact on Understanding and Tasks:

Question Answering: If asked, "Who made the breakthrough discovery?" the model might struggle to accurately identify the "scientist" as the subject due to the weakened connection.
Summarization: In summarizing the sentence, the model might fail to emphasize the relationship between the scientist and the discovery, leading to a less coherent summary.
Attention Mechanism:

The attention mechanism addresses this challenge by allowing the model to dynamically focus on different parts of the input sequence, regardless of their position.
It assigns weights to each word in the sentence, determining the importance of each word relative to the current word being processed. This means that even if "scientist" and "discovery" are far apart, the model can still strongly associate them.
For example, when processing the word "discovery," the attention mechanism can assign a high weight to the word "scientist," reinforcing the connection between the two.
This ability to focus on relevant words helps the model maintain a coherent understanding of the sentence, improving performance in tasks that require comprehension of long-range dependencies.
In summary, while traditional language models may struggle with long-range dependencies due to their sequential processing nature, the attention mechanism enhances the model's ability to capture and maintain relationships between distant words, leading to better understanding and task performance.

Exercise 2: Exploring the Impact of Attention in Transformers
Objective: Understand how the attention mechanism enhances the capabilities of transformer models in various NLP tasks.

Instructions:

Choose an NLP task, such as machine translation, text summarization, or question answering.
Research how transformer models, like BERT or GPT, utilize the attention mechanism to achieve state-of-the-art results in the chosen task.
Provide specific examples of how attention helps the model capture long-range dependencies, resolve ambiguities, and handle complex contexts.
Compare the performance of transformer models with and without attention mechanisms on the chosen task, highlighting the improvements achieved through attention.


Transformer models, such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pretrained Transformer), have revolutionized the field of natural language processing (NLP) by leveraging the attention mechanism to achieve state-of-the-art results across various tasks. Let's explore how attention enhances the capabilities of these models in the context of machine translation, one of the fundamental NLP tasks.

Machine Translation

How Attention is Utilized

Capturing Long-Range Dependencies:

In machine translation, capturing the relationship between words across the source and target languages is crucial. Attention mechanisms allow the model to focus on relevant parts of the input sentence, regardless of the distance between related words.
For example, when translating the sentence "The cat, which was sitting on the mat, is black" from English to French, the model needs to associate "cat" with "black" despite the intervening words. The attention mechanism helps maintain this association by assigning higher weights to these words during the translation process.
Resolving Ambiguities:

Attention helps resolve ambiguities by allowing the model to consider the context of each word in the sentence. For instance, the word "bank" can refer to a financial institution or the side of a river. The attention mechanism enables the model to focus on surrounding words that provide context, such as "money" or "river," to determine the correct translation.
Handling Complex Contexts:

In complex sentences with multiple clauses or nested structures, attention mechanisms help the model keep track of the relationships between different parts of the sentence. This is particularly important in languages with different grammatical structures, where the order of words may vary significantly between the source and target languages.
Performance Comparison

With Attention:

Transformer models with attention mechanisms have achieved significant improvements in machine translation tasks. For example, the original Transformer model introduced in the paper "Attention Is All You Need" by Vaswani et al. demonstrated superior performance on various machine translation benchmarks, such as the WMT 2014 English-to-German and English-to-French translation tasks.
Attention allows the model to generate more accurate and fluent translations by capturing the nuances and dependencies in the source language and effectively mapping them to the target language.
Without Attention:

Traditional models, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, process text sequentially and struggle with long-range dependencies. These models often rely on hidden states to capture context, which can be insufficient for complex sentences with multiple clauses or nested structures.
Without attention, these models may produce translations that lack coherence and accuracy, particularly in sentences with ambiguous or complex contexts. The sequential processing nature of RNNs and LSTMs can lead to a loss of important contextual information, resulting in poorer performance on machine translation tasks.
Examples and Improvements

Example Sentence:

Consider the sentence "The agreement on the bank account was signed yesterday." In this sentence, the word "bank" is ambiguous. The attention mechanism helps the model focus on the word "account," which provides context and indicates that "bank" refers to a financial institution.
Improvements Achieved:

Transformer models with attention have shown significant improvements in translation quality, as measured by metrics such as BLEU (Bilingual Evaluation Understudy) scores. For instance, the Transformer model achieved a BLEU score of 28.4 on the WMT 2014 English-to-German translation task, outperforming previous state-of-the-art models.
Attention mechanisms also enable the model to handle rare and unseen words more effectively, as they can leverage contextual information to infer the meaning of these words.
In summary, the attention mechanism plays a crucial role in enhancing the capabilities of transformer models in machine translation. By capturing long-range dependencies, resolving ambiguities, and handling complex contexts, attention enables these models to achieve state-of-the-art results and generate more accurate and fluent translations.

EXERCICE 3


In [None]:
pip install transformers torch faiss-cpu

Step 2: Load and Preprocess the Knowledge Source

Choose a knowledge source, such as a Wikipedia article or a collection of text documents. For this example, let's assume you have a list of text documents.

In [None]:
documents = [
    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
    "It is named after the engineer Gustave Eiffel, whose company designed and built the tower.",
    "Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair, it was initially criticized by some of France's leading artists and intellectuals for its design, but it has become a global cultural icon of France and one of the most recognizable structures in the world.",
    "The tower is 330 meters tall, about the same height as an 81-storey building, and the tallest structure in Paris."
]

EXERCICE 4

In [1]:
from transformers import BertTokenizer, BertModel
import torch

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Function to generate embeddings
def generate_embeddings(texts):
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)
    return embeddings

# Generate embeddings for the documents
document_embeddings = generate_embeddings(documents)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

NameError: name 'documents' is not defined

EXERCICE 5

In [2]:
from transformers import BertTokenizer, BertModel
import torch

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Function to generate embeddings
def generate_embeddings(texts):
    # Tokenize the input texts
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512)

    # Generate embeddings
    with torch.no_grad():
        outputs = model(**inputs)

    # Use the mean of the last hidden states as the embedding
    embeddings = outputs.last_hidden_state.mean(dim=1)
    return embeddings

# Example list of documents
documents = [
    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
    "It is named after the engineer Gustave Eiffel, whose company designed and built the tower.",
    "Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair, it was initially criticized by some of France's leading artists and intellectuals for its design, but it has become a global cultural icon of France and one of the most recognizable structures in the world.",
    "The tower is 330 meters tall, about the same height as an 81-storey building, and the tallest structure in Paris."
]

# Generate embeddings for the documents
document_embeddings = generate_embeddings(documents)

# Print the embeddings
print(document_embeddings)

tensor([[-0.2696,  0.2138,  0.0747,  ..., -0.0606,  0.1561,  0.2414],
        [-0.0578,  0.2224,  0.1793,  ...,  0.0750,  0.0988,  0.2229],
        [-0.2990,  0.2202, -0.1587,  ..., -0.1633,  0.0174,  0.1159],
        [-0.2566,  0.0132,  0.3361,  ...,  0.0572,  0.3264,  0.1562]])


EXERCICE 6

Role of BERT in the Retrieval Component

Generating Embeddings:

Document Embeddings: BERT is used to generate embeddings for documents in the knowledge source. Each document is converted into a fixed-size vector that represents its semantic content. This is done by feeding the document text into BERT and extracting the embeddings from the model's output.
Query Embeddings: Similarly, when a user submits a query, BERT generates an embedding for the query. This embedding captures the semantic meaning of the query and is used to retrieve relevant documents.
Similarity Search:

The retrieval component uses the embeddings generated by BERT to perform a similarity search. The query embedding is compared against the document embeddings using a similarity metric, such as cosine similarity. Documents with embeddings that are most similar to the query embedding are retrieved as the most relevant documents.
Advantages of Using BERT Embeddings

Contextual Understanding:

BERT generates contextual embeddings, meaning that the embedding for a word or sentence takes into account the context in which it appears. This is in contrast to traditional methods like TF-IDF (Term Frequency-Inverse Document Frequency) or word embeddings (e.g., Word2Vec, GloVe), which do not capture context to the same extent.
Contextual embeddings allow BERT to understand the nuances and complexities of language, leading to more accurate and relevant retrieval results.
Semantic Similarity:

BERT embeddings capture semantic similarity, meaning that documents and queries with similar meanings will have similar embeddings, even if they use different words. This is particularly useful in question answering, where the same concept can be expressed in various ways.
Traditional methods like TF-IDF rely on exact word matches, which can lead to less accurate retrieval results when dealing with synonyms or paraphrases.
Handling Long-Range Dependencies:

BERT's attention mechanism allows it to capture long-range dependencies in text, meaning that it can understand the relationships between words and sentences that are far apart. This is important in question answering, where relevant information may be spread across different parts of a document.
Traditional methods like TF-IDF or word embeddings struggle with long-range dependencies, as they do not have a mechanism to capture the relationships between distant words.
Improved Performance:

The use of BERT embeddings in the retrieval component has been shown to improve the performance of RAG systems in various applications. For example, in question answering, BERT-based retrieval can lead to more accurate and relevant answers, as it is better able to capture the semantic meaning of the query and the documents.
Analysis of BERT's Contribution

The effectiveness of BERT in RAG systems can be attributed to its ability to generate high-quality contextual embeddings. These embeddings capture the semantic meaning of text, allowing the retrieval component to identify relevant documents based on their content rather than just keyword matches. This leads to more accurate and relevant retrieval results, which in turn improves the performance of the generation component.

In question answering, the use of BERT embeddings ensures that the retrieved documents are semantically similar to the query, providing the generation component with the most relevant information to generate an accurate answer. This is particularly important in complex question answering tasks, where the query may involve multiple concepts or require a deep understanding of the context.

Conclusion

BERT's ability to generate contextual embeddings makes it a powerful tool for the retrieval component of RAG systems. Its advantages over traditional methods like TF-IDF or word embeddings, such as contextual understanding, semantic similarity, and handling long-range dependencies, contribute to the effectiveness of RAG systems in various applications, including question answering. By leveraging BERT embeddings, RAG systems can achieve state-of-the-art performance and provide more accurate and relevant results.

To implement a basic Retrieval-Augmented Generation (RAG) system, we'll need to set up a few components: a knowledge source, a method to generate embeddings using BERT, a vector database to store and retrieve these embeddings, and a generative model to produce answers based on retrieved information.

Below is a simplified example of how you might implement such a system using Python. This example will use the Hugging Face Transformers library for BERT and a simple in-memory storage for embeddings.

In [3]:
from transformers import BertTokenizer, BertModel
import torch
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Sample knowledge source
documents = [
    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
    "It is named after the engineer Gustave Eiffel, whose company designed and built the tower.",
    "Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair.",
    "The tower is 330 meters tall, about the same height as an 81-storey building."
]

# Function to generate embeddings
def generate_embeddings(texts):
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).numpy()

# Generate embeddings for documents
document_embeddings = generate_embeddings(documents)

# Function to retrieve relevant documents
def retrieve_relevant_documents(query, document_embeddings, documents, top_k=1):
    query_embedding = generate_embeddings([query])
    similarities = cosine_similarity(query_embedding, document_embeddings)
    relevant_indices = np.argsort(similarities[0])[-top_k:][::-1]
    return [documents[i] for i in relevant_indices]

# Example query
query = "Who designed the Eiffel Tower?"
relevant_docs = retrieve_relevant_documents(query, document_embeddings, documents)

# Simple generator function
def generate_answer(query, relevant_docs):
    context = " ".join(relevant_docs)
    # In a real application, you would use a generative model here
    answer = f"Based on the context: {context}, the answer to '{query}' is: Gustave Eiffel's company designed and built the tower."
    return answer

# Generate an answer
answer = generate_answer(query, relevant_docs)
print(answer)

Based on the context: It is named after the engineer Gustave Eiffel, whose company designed and built the tower., the answer to 'Who designed the Eiffel Tower?' is: Gustave Eiffel's company designed and built the tower.


In [None]:
pip install transformers torch scikit-learn

In [5]:
from transformers import BertTokenizer, BertModel
import torch
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Sample knowledge source
documents = [
    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
    "It is named after the engineer Gustave Eiffel, whose company designed and built the tower.",
    "Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair.",
    "The tower is 330 meters tall, about the same height as an 81-storey building."
]

# Function to generate embeddings
def generate_embeddings(texts):
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).numpy()

# Generate embeddings for documents
document_embeddings = generate_embeddings(documents)

# Function to retrieve relevant documents
def retrieve_relevant_documents(query, document_embeddings, documents, top_k=1):
    query_embedding = generate_embeddings([query])
    similarities = cosine_similarity(query_embedding, document_embeddings)
    relevant_indices = np.argsort(similarities[0])[-top_k:][::-1]
    return [documents[i] for i in relevant_indices]

# Example query
query = "Who designed the Eiffel Tower?"
relevant_docs = retrieve_relevant_documents(query, document_embeddings, documents)

# Simple generator function
def generate_answer(query, relevant_docs):
    context = " ".join(relevant_docs)
    # In a real application, you would use a generative model here
    answer = f"Based on the context: {context}, the answer to '{query}' is: Gustave Eiffel's company designed and built the tower."
    return answer

# Generate an answer
answer = generate_answer(query, relevant_docs)
print(answer)


Based on the context: It is named after the engineer Gustave Eiffel, whose company designed and built the tower., the answer to 'Who designed the Eiffel Tower?' is: Gustave Eiffel's company designed and built the tower.
