In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Open AI API Key:")

Open AI API Key:··········


# Retrieval Augmented Generation (RAG)

[Meta AI introduced the RAG method](https://ai.meta.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/), emphasizing its potential for knowledge-intensive tasks.

## 🔍 **1. What is RAG?**

- RAG, or Retrieval Augmented Generation, boosts large language models (LLMs) by tapping into external knowledge sources.

- Meta AI pioneered RAG to tackle knowledge-heavy tasks efficiently.

- Combines info retrieval with text generation, enabling LLMs to access fresh, reliable info.

- Ideal for tasks needing accurate, current data.

## 🤔 **2. Why RAG was developed?**

- LLMs excel in mimicking human text but face limitations.

- High training/fine-tuning costs.

- Knowledge is static, outdated post-training.

- "Hallucinating" issue: confidently giving wrong info.

- RAG overcomes these by merging LLM prowess with real-time data access.

## 3. 🛠️ **3. How RAG Works?**

- On receiving a query (like a question), RAG fetches relvant documents/passages from external sources (like Wikipedia).

- Blends these retrieved docs with the query to create an enriched context. This is then processed by a text generator (e.g., GPT-3) to generate the final answer.

<img src="https://docs.aws.amazon.com/images/sagemaker/latest/dg/images/jumpstart/jumpstart-fm-rag.jpg">

## 🌟 **4. Key Features of RAG:**

- RAG stays current, accessing the latest info, unlike static-knowledge LLMs.

- Integrates fresh info without the high cost of retraining the whole LLM.

- Sources reliable info, reducing wrong answers or "hallucinations."

## 🛠 **5. Practical Implementations:**

- Answering evolving topic questions.

- Useful in domains needing real-time accuracy (e.g., medical, legal).

- Boosts chatbots/virtual assistants with factual, updated replies.

## 📌 **In Summary:**

- RAG: Marrying vast LLM knowledge with the latest real-world info.

- Ensures models knowledgeable, up-to-date, and accurate.

# 🛠️ **RAG Implementation in LangChain**

1. 🧠 **LLM**: The brain of the system, generating human-like text.

2. 🌐 **Vector Store**: The heart of retrieval - stores text embeddings for quick, efficient access.

3. 🔍 **Vector Store Retriever**: The system's "search engine," finding relevant documents via vector similarities.

4. 🔄 **Embedder**: Transforms text into vectors, making it readable for the system.

5. 💬 **Prompt**: Captures the initial user query or statement, kicking off the process.

6. 📚 **Document Loader**: Manages the import and preparation of documents for processing.

7. 🧩 **Document Chunker**: Breaks down large documents into smaller segments for better efficiency.

8. 👤 **User Input**: The starting point, where the user's query activates the RAG workflow.


# 🌐 **The RAG System and Its Subsystems**

### 1. 🗂️ **Index Subsystem**
<!-- <img src="https://python.langchain.com/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png"> -->

   - **Components**: Embedder, Vector Store, Document Loader, Document Chunker.

   - **Function**: Processes and organizes data into an accessible format.

   - **Role**: Creates a searchable database of vectorized information.

### 2. 🔎 **Retrieval Subsystem**:

   - **Components**: User Input, Prompt, Vector Store Retriever.

   - **Function**: Matches user queries with relevant data.

   - **Role**: Fetches the most pertinent information from the index based on user input.

### 3. 🤖 **Augment Subsystem**:

   - **Components**: LLM, User Input, Retrieved Data.

   - **Function**: Integrates user queries with retrieved data.

   - **Role**: Generates accurate and context-rich responses, blending human-like text generation with factually correct information.

<!-- <img src="https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png"> -->

Together, these subsystems form a seamless flow, transforming user queries into comprehensive and reliable responses.

# Load documents
There are SO MANY document loaders in LangChain

I won't go every single one in this notebook. But, you can check out [the documentation](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/document_loaders) to see jusy how many are available to you.

## 📂 **Understanding Document Loaders in LangChain**

- 📚 LangChain document loaders load data from various sources into Document objects.

- 📄 A Document is text with metadata.

- 🌐 Loaders fetch data from text files, web pages, video transcripts, etc.

- 🔄 Main role: Retrieve data for further processing.

- 🛠️ Method: Use `load` to fetch data and return it as a Document.

- 🧠 Some loaders support lazy loading (data loads into memory only when needed).

## 🔧 **How to Use Document Loaders**

1. 📥 Import the loader class from `langchain.document_loaders`.

2. 🏗️ Create an instance of your chosen class with the directory path.

3. 🚀 Use `load()` to load files in the directory into Document format.


In [None]:
pip install langchain-community

In [4]:
from langchain_community.document_loaders import WebBaseLoader

yolo_nas_loader = WebBaseLoader("https://deci.ai/blog/pose-estimation-yolo-nas-pose/").load()

decicoder_loader = WebBaseLoader("https://deci.ai/blog/decicoder-6b-the-best-multi-language-code-generation-llm-in-its-class/").load()

yolo_newsletter_loader = WebBaseLoader("https://deeplearningdaily.substack.com/p/unleashing-the-power-of-yolo-nas").load()



In [5]:
yolo_newsletter_loader[0]

Document(page_content="\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nUnleashing the Power of YOLO-NAS: A New Era in Object Detection and Computer Vision\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSubscribeSign inShare this postUnleashing the Power of YOLO-NAS: A New Era in Object Detection and Computer Visiondeeplearningdaily.substack.comCopy linkFacebookEmailNoteOtherUnleashing the Power of YOLO-NAS: A New Era in Object Detection and Computer VisionThe Future of Computer Vision is HereDeep Learning Daily CommunityMay 05, 20236Share this postUnleashing the Power of YOLO-NAS: A New Era in Object Detection and Computer Visiondeeplearningdaily.substack.comCopy linkFacebookEmailNoteOtherShareWhat does it take to make a mark in the fiercely competitive world of object detection?¬†In this newsletter edition, I want to take you on a behind-the-scenes journey of how YOLO-NAS, a novel, groundbreaking object dete

# Chunk documents

🔢 **Exploring Text Splitters in LangChain**

- 📖 Text splitters divide long texts into
smaller, meaningful parts.

- 🧩 Aim: Make large texts easier to handle for analysis or processing.

### How Text Splitters Work:

1. ✂️ Split text into small, meaningful chunks (like sentences).

2. 📏 Combine these chunks into a larger one until a certain size is reached.

3. 📌 Once the size is reached, start a new chunk with some overlap for context.

### Customization Axes:

1. 🛠️ How the text is split.

2. 📐 How chunk size is measured.

## Getting Started with Text Splitters

- 🚀 Default choice: `RecursiveCharacterTextSplitter`.

- 📋 Works by: Splitting text based on a list of characters.

- 🔄 If chunks are too large, it moves to the next character.

- 📌 Default split characters: `["\n\n", "\n", " ", ""]`.

### Additional Controls:

- 📏 `length_function`: Defines how chunk length is calculated (default: character count, token counter is common).

- 🔍 `chunk_size`: Sets the maximum chunk size.

- 🔀 `chunk_overlap`: Determines overlap between chunks for continuity.

- 📊 `add_start_index`: Option to include each chunk's start position in the original document in metadata.

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50,
    length_function = len
)

yolo_nas_chunks = text_splitter.transform_documents(yolo_nas_loader)

decicoder_chunks = text_splitter.transform_documents(decicoder_loader)

yolo_newsletter_chunks = text_splitter.transform_documents(yolo_newsletter_loader)

# Index System

- 🎯 **Purpose:** Efficiently organize data for easy retrieval.

### Steps in the Index System:
1. 📚 **Load Documents (Document Loader):**
   - Import and read large amounts of data.
2. 🧩 **Chunk Documents (Document Chunker):**
   - Break down documents into smaller parts for better handling.
3. 🌐 **Embed Documents (Embedder):**
   - Convert text chunks into vector formats for searchability.
4. 💾 **Store Embeddings (Vector Store):**
   - Keep embeddings and their textual counterparts for retrieval.

In [None]:
pip install langchain-openai faiss-gpu

In [11]:
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.vectorstores import FAISS
from langchain.storage import LocalFileStore

store = LocalFileStore("./cachce/")

# create an embedder
core_embeddings_model = OpenAIEmbeddings()

embedder = CacheBackedEmbeddings.from_bytes_store(
    core_embeddings_model,
    store,
    namespace = core_embeddings_model.model
)

# store embeddings in vector store
vectorstore = FAISS.from_documents(yolo_nas_chunks, embedder)

vectorstore.add_documents(decicoder_chunks)

vectorstore.add_documents(yolo_newsletter_chunks)

['36bb42dd-e7cb-4d99-8ba6-f83890a13967',
 '24b05637-82f0-45f0-bc12-7e574e6fcfd9',
 'bddea5f4-3546-48e7-85ea-dba46ad73388',
 'aa3b4318-365d-440a-b253-6214dbe396ab',
 'b66914c4-a9e8-4475-8ee6-bcdef9d2a714',
 '11a02043-f6c8-4c04-bb90-54a6ee3779a3',
 'f89db972-3c2f-4593-8c1e-f1aced5bad4f',
 '3a02524e-29f4-4d0d-bc39-ecd764008d15',
 '92e9e8b6-ef77-4829-89a5-c9c0d2ed870f',
 '73ce1e9f-c4b0-44d0-9e72-8270b1288ad7',
 '751d2b60-2fe0-4051-890d-063cb042cfcc',
 '4725f0bd-052f-4ade-b273-9f60828f17bb',
 '1802154a-a3cb-4100-8e4f-9afba2153606',
 'feb8ab33-c003-418f-b65c-01ae7574f666',
 'c2634498-3478-4d35-b97a-51a36032b9dd',
 'cf87c96b-5cdf-4941-af49-82575d444f6a',
 '07d543a0-d9f6-43be-b7d2-a2d82c6c8bc9',
 '29633fbc-0c4f-420c-a2cc-93e88905d78a',
 'e167cf25-8db8-4493-a6ba-dbe4e18951b0']

# 🔍 **Retrieval System**

- 🎯 **Purpose:** Fetch relevant information based on user queries.

### Steps in the Retrieval System:

1. 💬 **Obtain User Query (User Input):**
   - Capture the user's question or statement.

2. 🔄 **Embed User Query (Embedder):**
   - Convert the user's query into a vector format, aligning with indexed documents.

3. 🔍 **Vector Search (Vector Store Retriever):**
   - Search for document embeddings in the Vector Store that closely match the user query.

4. 📄 **Return Relevant Documents:**
   - Provide the top matching documents, ensuring pertinence to the query.



In [12]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain import hub

In [13]:
# instantiate a retriever
retriever = vectorstore.as_retriever()

In [14]:
llm = ChatOpenAI(model="gpt-4-0125-preview")

# 🔍 **Augment System**

- 🚀 **Purpose:** Improve LLM's input with additional context.

### Steps in the Augment System:

1. 🌟 **Create Initial Prompt (Prompt):**
   - Begin with the user's initial question or statement.

2. 🧩 **Augment Prompt with Retrieved Context (Context Integration):**
   - Blend the initial prompt with context from the Vector Store for a richer input.

3. ⚡ **Send Augmented Prompt to LLM (Input Enhancement):**
   - Pass the enhanced prompt to the LLM.

4. 📬 **Receive LLM's Response (Output Reception):**
   - Obtain the LLM's comprehensive response after processing the augmented prompt.

In [None]:
pip install langchainhub

In [17]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = hub.pull("rlm/rag-prompt")

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [18]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [19]:
prompt.input_variables

['context', 'question']

In [20]:
# This is the entire augment system!
rag_chain.invoke("What does Neural Architecture Search have to do with how Deci creates its models?")

'Deci creates its models using its advanced Neural Architecture Search (NAS) engine, AutoNAC, which automates the process of searching through vast architecture spaces efficiently. This technology allows Deci to intelligently and efficiently find optimal architectures for their models, such as DeciCoder-6B and YOLO-NAS, by navigating through potentially trillions of possible architectures. AutoNAC significantly reduces the necessary computational resources compared to traditional NAS methods, making the development of high-performing neural networks more practical and accessible.'

In [21]:
rag_chain.invoke("What is DeciCoder")

'DeciCoder is a cost-effective, highly efficient, and accurate code generation application, notable for its 6-billion parameter model, DeciCoder-6B. It is designed for performance at scale, outperforming competitors like CodeGen and StarCoder in terms of both speed and accuracy, particularly in Python. DeciCoder-6B achieves significant computational efficiency and low latency, making it a superior choice for scaling code generation tasks.'

## Return sources

In [22]:
from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel({"context": retriever, "question": RunnablePassthrough()}).assign(answer=rag_chain_from_docs)

In [23]:
rag_chain_with_source.invoke("What does Neural Architecture Search have to do with how Deci creates its models?")

{'context': [Document(page_content='which is a product of Deci’s cutting-edge Neural Architecture Search-based AutoNAC engine.', metadata={'source': 'https://deci.ai/blog/decicoder-6b-the-best-multi-language-code-generation-llm-in-its-class/', 'title': 'Introducing DeciCoder-6B: Code LLM Engineered for Accuracy & Cost Efficiency At Scale', 'description': 'DeciCoder-6B, a multi-language code LLM in the 7B parameter class that supports a sequence length of up to 4096 tokens and excels in 8 code languages.', 'language': 'en-US'}),
  Document(page_content='Neural Architecture Search is define the architecture search space. For YOLO-NAS, our researchers took inspiration from the basic blocks of YOLOv6 and YOLOv8. With the architecture and training regime in place, our researchers harnessed the power of AutoNAC. It intelligently searched a vast space of ~10^14 possible architectures, ultimately zeroing in on three final networks that promised outstanding results. The result is a family of ar