# RAG with LangChain

## Step 1. Environment Setup

This section outlines the steps taken to set up the Python environment for our project using Jupyter Notebook. We are utilizing the `pip` package manager to install a series of libraries that are essential for our project's functionality.

### 1.1 Python Package Installation

We are using the following command to install the necessary packages:

In [1]:
!pip install langchain langchain_core langchain_community langchain_chroma pypdf faiss-cpu

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.9 -m pip install --upgrade pip[0m


## Step 2. Indexing your file

### 2.1 Document Loading and Text Splitting for Indexing

In this section, we outline the process of loading a PDF document and splitting its text content into manageable chunks for indexing in a local knowledge base.

#### 2.1.1 Load the PDF Document

We begin by creating an instance of `PyPDFLoader` with the path to the PDF document "vitis_hls.pdf". This loader will be used to read the document's content. You can replace the document name or path as needed to index different documents in your local knowledge base.

```python
loader = PyPDFLoader("IHI0050G_amba_chi_architecture_spec.pdf")  # You can change documentation here to index your local knowledge base
```

#### 2.1.2 Load Document Pages
Next, we call the load method on the loader instance to retrieve the pages of the document. This step is crucial for accessing the text content that will be processed.

```python
pages = loader.load()
```
#### 2.1.3 Initialize Text Splitter
We then initialize a RecursiveCharacterTextSplitter, which is designed to split the text based on certain separators and chunk sizes. The splitter is configured with line breaks and periods as separators, a chunk size of 1024 characters, and an overlap of 100 characters to ensure continuity between chunks.

```python
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n", "\n\n", "."], chunk_size=1024, chunk_overlap=100
)
```
#### 2.1.4 Split the Document into Chunks
Finally, we use the text splitter to divide the document's pages into smaller text chunks. These chunks are suitable for embedding and indexing, facilitating efficient retrieval operations.

```python
docs = text_splitter.split_documents(pages)
```

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.embeddings.ollama import OllamaEmbeddings
from langchain.llms.ollama import Ollama

loader = PyPDFLoader("IHI0050G_amba_chi_architecture_spec.pdf") # You can change documentation here to index your local knowledge base
pages = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n", "\n\n", "."], chunk_size=1024, chunk_overlap=100
)
docs = text_splitter.split_documents(pages)


### 2.2 Document Embedding and Similarity Search with Ollama Model

In this section, we demonstrate how to perform document embedding and similarity search using the `Ollama` model. `Ollama` is a library that provides text embedding and large language model (LLM) services.

#### 2.2.1 Initialize Embedding and Language Models

First, we initialize an `OllamaEmbeddings` model for document embedding, setting the model to `nomic-embed-text`. Then, we create an `Ollama` instance connected to the remote language model service `llama3:8b`, with a context length of 8192.\
\
**NOTICE**: http://241.4.117.29:11434 is the LLM service url do not change it unless you have another LLM api.
```python
embedding_model = OllamaEmbeddings(base_url="http://241.4.117.29:11434", model="nomic-embed-text")
llm = Ollama(base_url="http://241.4.117.29:11434", model="llama3:8b", num_ctx=8192)
```
#### 2.2.2 Create FAISS Index
Next, we construct a FAISS index using the document collection and the embedding model created above. FAISS is a library for efficient similarity search and dense vector clustering.

```python
faiss_index = FAISS.from_documents(docs, embedding_model)
```

#### 2.2.3 Perform Similarity Search
We then use the FAISS index to search for the most relevant document fragments to the query "How many burst read-writes can Vitis handle at most?" Here, we retrieve the top two most relevant results.

```python
relevant_chunk = faiss_index.similarity_search("What is amba chi?", k=2)
```

#### 2.2.4 Print Search Results
Finally, we print the retrieved relevant information. For each retrieved document fragment, we print its page number and content.

```python
print("Here are the retrieved relevant pieces of information:")
for doc in relevant_chunk:
    print("#" * 20)
    print(f"Page {doc.metadata['page']}: {doc.page_content}")
```


In [4]:
embedding_model = OllamaEmbeddings(base_url="http://241.4.117.29:11434", model="nomic-embed-text")
llm = Ollama(base_url="http://241.4.117.29:11434", model="llama3:8b", num_ctx=8192)

faiss_index = FAISS.from_documents(docs, embedding_model)
relevant_chunk = faiss_index.similarity_search("What is amba chi?", k=2)

print("there are relevant documentation chunks")
for doc in relevant_chunk:
    print("#"*20)
    print(str(doc.metadata["page"]) + ":", doc.page_content)

there are relevant documentation chunks
####################
577: Chapter D1. Glossary
Advanced Microcontroller Bus Architecture, AMBA
The AMBA family of protocol speciﬁcations is the Arm open standard for on-chip buses. AMBA provides
solutions for the interconnection and management of functional blocks that make up a System-on-Chip (SoC).
Applications include the development of embedded systems with one or more processors or signal processors and
multiple peripherals.
Aligned
A data item stored at an address that is divisible by the highest power of 2 that divides into its size in bytes.
Aligned halfwords, words and doublewords therefore have addresses that are divisible by 2, 4 and 8 respectively.
An aligned access is one where the address of the access is aligned to the size of each element of the access.
AMBA
See Advanced Microcontroller Bus Architecture.
At approximately the same time
Two events occur at approximately the same time if a remote observer might not be able to determi

## Step 3. Build RAG Chain

### 3.1 Define System Prompte
here is a guidance for how to write a good prompt: https://www.coursera.org/articles/how-to-write-chatgpt-prompts
you can write prompt in following steps:
  1. what roles will LLMs plays like
  2. relevant information
  3. what you expect LLMs do for you
  4. more instruction, i.e: generate answer in JSON format;  answer in CHINESE; 

In [11]:
from langchain_core.prompts import PromptTemplate

template = (
    "You are an amba chi architecture expert. Please answer and provide guidance based on the user's question according to the product manual.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Please answer based on the content of the architecture spec manual.\n"
    "When answering, provide the page number(s) in the product manual where the relevant information can be found.\n"
    "If the question goes beyond the scope of the manual, clearly inform the user that the question is out of the manual's scope.\n"
    "The answer should be accurate and concise.\n"
    "---------------------\n"
    "User's question: {query_str}\n"
    "---------------------\n"
    "Answer: "
)

prompt_template = PromptTemplate.from_template(
    template,
)

print(prompt_template.format(
    context_str="The CHI architecture is primarily topology-independent. However, certain topology-dependent optimizations are included in this specification to make implementation more efficient. Figure B1.1 shows three examples of topologies selected to show the range of interconnect bandwidth and scalability options that are available.",
    query_str="What is Topology of CHI architecture?"
))

You are an amba chi architecture expert. Please answer and provide guidance based on the user's question according to the product manual.
---------------------
The CHI architecture is primarily topology-independent. However, certain topology-dependent optimizations are included in this specification to make implementation more efficient. Figure B1.1 shows three examples of topologies selected to show the range of interconnect bandwidth and scalability options that are available.
---------------------
Please answer based on the content of the architecture spec manual.
When answering, provide the page number(s) in the product manual where the relevant information can be found.
If the question goes beyond the scope of the manual, clearly inform the user that the question is out of the manual's scope.
The answer should be accurate and concise.
---------------------
User's question: What is Topology of CHI architecture?
---------------------
Answer: 


In [12]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

retrieval = faiss_index.as_retriever()

rag_chain = (
    {"context_str": retrieval | format_docs, "query_str": RunnablePassthrough()}
    | prompt_template | llm | StrOutputParser()
)

## Step 4. Test your RAG system

In [13]:
user_question = "What is Transaction classification?"
output = rag_chain.invoke(user_question)
print(output)

According to Chapter B1. Introduction, Section B1.4. Transaction classiﬁcation, transaction classification refers to the grouping of different transaction types together.

Please refer to Table B1.3 for a comprehensive list of transaction classifications, which represents collectively:

* CleanSharedPersist*: CleanSharedPersist and CleanSharedPersistSep
* SnpStash*: SnpStashUnique and SnpStashShared
* DBIDResp*: DBIDResp and DBIDRespOrd

You can find this information on page [36] in the product manual.


In [14]:
user_question = "how can i get more amba chi information?"
output = rag_chain.invoke(user_question)
print(output)

To obtain more AMBA CHI information, I recommend reviewing the following sections in the product manual:

* The "Intended audience" section (page xix) provides an overview of who this specification is written for and what kind of information it covers.
* The "Release information" section (pages ii-iii) lists the various releases of the AMBA CHI architecture specification, including the date and version number. This can help you identify the most recent release or find information specific to a particular version.

Additionally, you may want to explore Arm's official website or documentation portal for more comprehensive resources on AMBA CHI, such as whitepapers, technical briefs, and application notes.

Note: Since this manual is primarily focused on the specification itself, it does not contain additional information beyond what is presented here.
