#### RAG Pipeline With the LLama Index

In [34]:
pwd

'C:\\Users\\Dipjyoti\\PycharmProjects\\GenerativeAI-projects\\RAG-practice-examples'

In [None]:
# RAG pipeline to process PDF documents

In [1]:
!pip install llama-index

Collecting llama-index
  Obtaining dependency information for llama-index from https://files.pythonhosted.org/packages/1f/8f/6dab0520d0d4e355928a705ccab2ca30162343811b5cfe1acf5e0fa5b98f/llama_index-0.9.34-py3-none-any.whl.metadata
  Using cached llama_index-0.9.34-py3-none-any.whl.metadata (8.4 kB)
Collecting openai>=1.1.0 (from llama-index)
  Obtaining dependency information for openai>=1.1.0 from https://files.pythonhosted.org/packages/f1/d8/590a68d390501faf48f4e57b098076df02afd003ac880f50d3b0704f7773/openai-1.8.0-py3-none-any.whl.metadata
  Using cached openai-1.8.0-py3-none-any.whl.metadata (18 kB)
Using cached llama_index-0.9.34-py3-none-any.whl (15.8 MB)
Using cached openai-1.8.0-py3-none-any.whl (222 kB)
Installing collected packages: openai, llama-index
  Attempting uninstall: openai
    Found existing installation: openai 0.28.1
    Uninstalling openai-0.28.1:
      Successfully uninstalled openai-0.28.1
Successfully installed llama-index-0.9.34 openai-1.8.0


In [5]:
import os
from llama_index import ServiceContext, LLMPredictor, OpenAIEmbedding, PromptHelper
from llama_index.llms import OpenAI
from llama_index.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index import set_global_service_context
import openai
import tiktoken

In [6]:
# get the Open API key from a config file:
from configparser import ConfigParser
config_object = ConfigParser()
config_object.read("../../config_genai.ini")
openai.api_key = config_object["OPENAI"]["openai_api_key"]

In [10]:
!pip install pypdf

Collecting pypdf
  Obtaining dependency information for pypdf from https://files.pythonhosted.org/packages/12/cd/12fa4ca4262f78a984ba9cce95be824e9ce7f9dbcc1529ff4966591caef6/pypdf-4.0.0-py3-none-any.whl.metadata
  Downloading pypdf-4.0.0-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-4.0.0-py3-none-any.whl (283 kB)
   ---------------------------------------- 0.0/283.9 kB ? eta -:--:--
   - -------------------------------------- 10.2/283.9 kB ? eta -:--:--
   - -------------------------------------- 10.2/283.9 kB ? eta -:--:--
   ----- --------------------------------- 41.0/283.9 kB 326.8 kB/s eta 0:00:01
   ------------------- ------------------ 143.4/283.9 kB 944.1 kB/s eta 0:00:01
   ------------------------------------ --- 256.0/283.9 kB 1.3 MB/s eta 0:00:01
   ---------------------------------------- 283.9/283.9 kB 1.2 MB/s eta 0:00:00
Installing collected packages: pypdf
Successfully installed pypdf-4.0.0


In [11]:
import pypdf

In [None]:
# Save the pdf document : Tesla 2023 Q3 Earnigs Report in the Data Folder
# https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q3-2023-Update-3.pdf

In [12]:
# Load documents
documents = SimpleDirectoryReader(input_dir='data/tesla-docs').load_data()

In [13]:
documents

[Document(id_='60134482-5c91-47cd-8032-9c6107eaba09', embedding=None, metadata={'page_label': '1', 'file_name': 'TSLA-Q3-2023-Update-3.pdf', 'file_path': 'data\\TSLA-Q3-2023-Update-3.pdf', 'file_type': 'application/pdf', 'file_size': 3455728, 'creation_date': '2024-01-21', 'last_modified_date': '2024-01-21', 'last_accessed_date': '2024-01-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, hash='1f4298130083056ddcafa776290c3e88473688a5222256c0aca9c621adebff45', text='Q3 2023 Update\n1', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='370b7a5a-f615-438a-b5e0-704df0213b2e', embedding=None, metadata={'page_label': '2', 'file_name': 'TSLA-Q3-2023-Upd

### Creating Text Chunks
* Smaller text chunks result in better embedding accuracy, subsequently improving retrieval accuracy.
* Precise context: Narrowing down information will help in getting better information.

In [28]:
#  Llama index has built-in tools for chunking texts
text_splitter = TokenTextSplitter(
  separator=" ",
  chunk_size=1024,
  chunk_overlap=20,
  backup_separators=["\n"],
  tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)


In [29]:
node_parser = SimpleNodeParser.from_defaults()

SimpleNodeParser creates nodes out of text chunks, and the text chunks are created using Llama Index’s TokenTextSplitter.

In [None]:
# Alternate Sentence Splitter
# text_splitter = SentenceSplitter(
#   separator=" ",
#   chunk_size=1024,
#   chunk_overlap=20,
#   paragraph_separator="\n\n\n",
#   secondary_chunking_regex="[^,.;。]+[,.;。]?",
#   tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode

# )

#### Building Knowledge Bases

In [30]:
# Llama Index has a custom implementation of popular embedding models, such as OpeanAI’s Ada, Cohere, Sentence transformers, etc.
# To customize the embedding model, we need to use ServiceContext and PromptHelper.
llm = OpenAI(model='gpt-3.5-turbo', temperature=0, max_tokens=256)
embed_model = OpenAIEmbedding()
prompt_helper = PromptHelper(
    context_window=4096, 
    num_output=256, 
    chunk_overlap_ratio=0.1, 
    chunk_size_limit=None
)

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    node_parser=node_parser,
    prompt_helper=prompt_helper
)

#### Vector Database

* Why can’t we use the traditional databases? Technically, we can use a database like SQLite or MySQL to store vectors and linearly compare the embeddings of the query text with all others. But the problem is, you might have guessed, the linear search with O(n) time complexity. While a GPU-augmented machine can handle a few thousand data points perfectly fine, it will fail miserably when processing hundreds of millions of embeddings in any real-world application.

* So, how do we solve this? The answer is indexing embeddings using different ANN algorithms such as HNSW. The HNSW is a graph-based algorithm that can efficiently handle billions of embeddings. The average query complexity of HNSW is O(log n).

* Apart from HNSW, there are a few other indexing techniques, such as Product quantization, Scalar quantization, and Inverted file indexing. However, HNSW is used as the default indexing algorithm for most of the vector databases.

In [31]:
# We will use the Llama Index’s default vector store. 
index = VectorStoreIndex.from_documents(
    documents, 
    service_context = service_context
    )

#### Query Index

* Llama Index provides a query engine for querying and a chat engine for a chat-like conversation. The difference between the two is the chat engine preserves the history of the conversation, and the query engine does not.

In [32]:
query_engine = index.as_query_engine(service_context=service_context)
response = query_engine.query("What is HNSW?")
print(response)

I'm sorry, but I cannot answer the query as there is no information provided about HNSW in the given context.


In [33]:
response = query_engine.query("What is total revenue of Tesla in Q3-2023 ?")
print(response)

The total revenue of Tesla in Q3-2023 is $23.4 billion.


#### End