<a href="https://colab.research.google.com/github/Venkat13465/assignment5/blob/main/Assignment_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install required libraries:
# - `transformers`: for using pretrained models from Hugging Face
# - `datasets`: for loading benchmark NLP datasets (not used here but often needed)
# - `faiss-cpu`: Facebook AI Similarity Search library for vector search (CPU version)
!pip install -q transformers datasets faiss-cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m57.8 MB/s[0m eta [36m0:00:00[0m
[?25h

Ingestion

In [2]:
# Define a multi-line string `sample_text` containing information about Albert Einstein.
sample_text="""
Albert Einstein was a theoretical physicist who developed the theory of relativity,
 one of the two pillars of modern physics (alongside quantum mechanics). His work is also
 known for its influence on the philosophy of science. He is best known to the general public
 for his mass-energy equivalence formula E = mc².
"""

Embedding

In [6]:
# Import the tokenizer and model loading tools from Hugging Face Transformers
from transformers import AutoTokenizer, AutoModel

# Import PyTorch, a deep learning library used to run the model and manage tensors
import torch

# Import NumPy, a library used for numerical operations and array manipulations
import numpy as np

# Define the name of the pretrained model to be used for creating sentence embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"

# Load the tokenizer for the model (used to convert text into tokens/numbers)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the pretrained transformer model itself (used to generate embeddings)
model = AutoModel.from_pretrained(model_name)


# Define a function to convert text into vector embeddings using the loaded model
def get_embeddings(text):
  # Tokenize the input text and convert to PyTorch tensor with padding and truncation
  tokens = tokenizer(text, return_tensors='pt', truncation=True, padding=True) # Changed return_tensor to return_tensors

  # Disable gradient computation (no training, only inference)
  with torch.no_grad():
    # Pass the tokenized input into the model to get the output
    outputs = model(**tokens)

  # Take the mean of all token embeddings across the sequence (dimension 1) to get a single vector
  return outputs.last_hidden_state.mean(dim=1).squeeze().numpy()

Retrival

In [8]:
# Import FAISS, a library used for fast similarity search over vectors
import faiss

# Create a list of chunks — here only one: the sample text about Einstein
chunks = [sample_text]

# Generate vector embeddings for each text chunk
embeddings = [get_embeddings(chunk) for chunk in chunks]

# Determine the dimensionality of the embeddings (i.e., length of one embedding vector)
dim = len(embeddings[0])

# Initialize a FAISS index for flat (brute-force) L2 distance search
index = faiss.IndexFlatL2(dim)

# Add all generated embeddings into the FAISS index for search
index.add(np.array(embeddings))

In [10]:
# Import a pipeline from Hugging Face to use a text2text model for answering questions
from transformers import pipeline

# Create a QA pipeline using Google's FLAN-T5 model (text2text format)
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-small")

# Define a function that takes a query, retrieves relevant text, and generates an answer
def retrive_and_answer(query, top_k=1):  # top_k: number of most relevant chunks to retrieve

  # Convert the query into an embedding vector
  query_embedding = get_embeddings(query).reshape(1, -1)

  # Perform vector similarity search in FAISS to get the closest chunks
  _, indices = index.search(query_embedding, top_k)

  # Retrieve the most relevant text chunks based on the index
  retrived_texts = [chunks[i] for i in indices[0]]

  # Concatenate all retrieved texts into a single context string
  context = " ".join(retrived_texts)

  # Create a prompt that combines the context with the question
  prompt = f'Context: {context} \nQuestion :{query} \nAnswer: '

  # Pass the prompt to the QA model to generate an answer
  result = qa_pipeline(prompt, max_length=100, do_sample=False)

  # Return only the generated answer text
  return result[0]['generated_text']

Device set to use cpu


In [11]:
Question="What is Albert Einstein known for?"
Answer=retrive_and_answer(Question)
print("Q:",Question)
print("A:",Answer)

Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: What is Albert Einstein known for?
A: his mass-energy equivalence formula E = mc2
