### Install Dependencies

In [None]:
!pip install langchain_community
!pip install docarray rapidocr-onnxruntime pypdf transformers sentence-transformers chromadb pypdf2 beautifulsoup4
!pip install langchain-chroma


Collecting langchain_community
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain<0.4.0,>=0.3.0 (from langchain_community)
  Downloading langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain_community)
  Downloading langchain_core-0.3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting langsmith<0.2.0,>=0.1.112 (from langchain_community)
  Downloading langsmith-0.1.128-py3-none-any.whl.metadata (13 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.5.2-py3-none-any.whl.metadata (3.5 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain_community)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_co

##### Document Loader

In [None]:
## Data Ingestion (Text File Loader)
# Load Text file
from langchain_community.document_loaders import TextLoader

loader = TextLoader("speech.txt")
text_documents = loader.load()


In [None]:
text_documents

[Document(metadata={'source': 'speech.txt'}, page_content='The world must be made safe for democracy. Its peace must be planted upon the tested foundations of political liberty. We have no selfish ends to serve. We desire no conquest, no dominion. We seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. We are but one of the champions of the rights of mankind. We shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them.\n\nJust because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.\n\n…\n\nIt will be all the easier for us to conduct ourselves as belligerents in a high spirit of right and fairness be

### 3. Data Ingestion (Web-Based Loader)

In [None]:
# Load Web-Based content
from langchain_community.document_loaders import WebBaseLoader
import bs4

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(parse_only=bs4.SoupStrainer(class_=("post-title", "post-content", "post-header")))
)

web_documents = loader.load()



In [None]:
print(web_documents)

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [None]:
# Load PDF file
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://arxiv.org/pdf/1706.03762.pdf", extract_images=True)
pdf_pages = loader.load()

print(f"Text documents: {text_documents}")
print(f"PDF pages: {len(pdf_pages)}")

1 Introduction
Recurrent neural networks, long short-term memory [ 13] and gated recurrent [ 7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [ 35,2,5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden
states ht, as a function of the previous hidden state ht−1and the input for position t. This inherently
sequential nature precludes parallelization within training examples, which becomes critical at longer
sequence lengths, as memory constraints limit batching across examples. Recent work has achieved
significant improvements in computational efficiency through factor

In [None]:
pdf_pages

[Document(metadata={'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architect

In [None]:
len(pages)

15

##### 5. Split Documents into Chunks

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=50,
    separators=["\n\n", "\n", " "]
)

# Split PDF pages into chunks
doc_chunks = text_splitter.split_documents(pdf_pages)

# Display total chunks and one example chunk
print(f"Total Document Chunks: {len(doc_chunks)}")
print(doc_chunks[10].page_content)


Total Document Chunks: 266
page_content='best models from the literature. We show that the Transformer generalizes well to
other tasks by applying it successfully to English constituency parsing both with
large and limited training data.' metadata={'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}


In [None]:
len(doc_chunks)

266

In [None]:
doc_chunks[10]

Document(metadata={'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}, page_content='best models from the literature. We show that the Transformer generalizes well to\nother tasks by applying it successfully to English constituency parsing both with\nlarge and limited training data.')

##### 6. Initialize Hugging Face Embeddings and Chroma Vector Store

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

# Define embedding model
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Create a Chroma vector store
vector_store = Chroma(
    collection_name="example_collection",
    embedding_function=embedding_model,
    persist_directory="./chroma_langchain_db"  # Set directory for persistence
)





  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  vector_store = Chroma(


###  Adding Documents to the Vector Store

In [None]:
# Prepare document texts and metadata
texts = [chunk.page_content for chunk in doc_chunks]
metadatas = [{"source": f"Document Chunk {i}"} for i in range(len(doc_chunks))]

# Add document chunks and metadata to vector store
vector_store.add_texts(texts=texts, metadatas=metadatas)
print("Documents added to vector store.")

['bb91b240-1154-429d-8082-e1832512f1fa',
 'f1898818-323f-416a-865d-d1804d682fda',
 '1034557d-22db-49d8-b9b4-4d5768977960',
 'c268209e-71cf-4c61-8c96-0b697b93c699',
 'db0c54a9-1b33-4b3a-87d4-acd0a00c6217',
 'dea443ed-fe19-4f5d-99e1-2a4a0dfddf8d',
 'b65a533a-856c-4be3-9b63-95150cbcd587',
 '942305cc-fc37-4ba2-ada0-ee4586f94de0',
 'b96ad8a7-cfd2-48fb-8758-9054282cfd0d',
 'c824f0a9-3cb5-402b-a4e7-6af23479bc44',
 '095af5d9-a991-4a59-8086-604edc97986e',
 '33f7529f-4040-4bf7-bbcc-70f3734b157a',
 '9d858e5d-2cd3-42a3-8ff2-3fec0d3885aa',
 '68957e89-a9a7-40bc-9329-bfc4a911394c',
 '32a480f5-0f9c-4fa5-830f-dd85684d69f1',
 '46192446-ad0f-44b4-9221-4f4c6cc76302',
 '1604a067-9f71-4565-ab86-5584de2cf3a3',
 'd3530188-43d6-4ada-b937-0649ad8e3e53',
 '9df7ea63-1192-4a7a-b315-d8f2a574f13a',
 '0a4ca294-d82e-4c4e-8ce0-603473d00d7c',
 '67b5c291-aa39-404a-90a9-c491cd1e79a7',
 '333d9426-9068-42cb-a7b1-9e141b6714b4',
 'cc2e61e0-708b-4840-84c0-fe94f1f982e7',
 '12945e37-5c4b-4a82-9ba1-3304cd4e9652',
 '9f674861-73dc-

### Initialize FLAN-T5 Model for Generation

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load FLAN-T5 tokenizer and model
tokenizer = T5Tokenizer.from_pretrained('google/flan-t5-base')
generation_model = T5ForConditionalGeneration.from_pretrained('google/flan-t5-base')


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


### Querying Chroma Vector Store

In [None]:
def retrieve_documents(query, n_results=5):
    # Embed the query
    query_embedding = embedding_model.embed_query(query)

    # Perform similarity search in Chroma using query embedding
    results = vector_store.similarity_search_by_vector(query_embedding, k=n_results)

    # Retrieve the most relevant document chunks
    return [result.page_content for result in results]


## RAG Response Generation

In [None]:
def generate_rag_response(query):
    # Retrieve relevant document chunks
    relevant_docs = retrieve_documents(query)

    # Concatenate the relevant document chunks
    context = ' '.join(relevant_docs)

    # Create the prompt with the context and query
    prompt = f"Using the following context, answer the question:\n{context}\n\n{query}"

    # Tokenize the prompt for FLAN-T5
    inputs = tokenizer(prompt, return_tensors='pt')

    # Generate response from FLAN-T5
    outputs = generation_model.generate(inputs.input_ids, max_length=200, num_return_sequences=1)

    # Decode and return the response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response


### Testing the RAG System

In [None]:
# Example query
query = "Explain the attention mechanism in AI."
response = generate_rag_response(query)
print(f"Generated Response: {response}")


Retrieved Documents:
Doc 1: attention and the parameter-free position representation and became the other person involved in nearly every
Doc 2: the approach we take in our model.
As side benefit, self-attention could yield more interpretable models. We inspect attention distributions
Doc 3: depicted in Figure 2.
Multi-head attention allows the model to jointly attend to information from different representation
Doc 4: during training.
4 Why Self-Attention
In this section we compare various aspects of self-attention layers to the recurrent and convolu-
Doc 5: described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions
Generated Response: Self-attention is a mechanism relating different positions to different representations during training.
