# **what is RAG?**

RAG is a technique for augmenting LLM knowledge with additional data.


LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG)

# **Installing all dependencies**

In [5]:
!pip install langchain # to build LLM APPs easily
!pip install langchain-groq # GROQ-Langchain integration for using LPU of GROQ
!pip install langchain-community # 3rd party integration with langchain
!pip install langchain-core # to parse output of LLM i.e give proper structured output to us or human readable
!pip install langchain-text-splitter # Splitting & chunkinf
!pip install langchain-astradb # VectorDB/store
!pip install langchainhub

!pip install sentence_transformers # huggingface's models(text embedding models)
!pip install Unstructured # for loading data via markdown

Collecting langchain
  Downloading langchain-0.2.1-py3-none-any.whl (973 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/973.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m256.0/973.5 kB[0m [31m7.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━[0m [32m778.2/973.5 kB[0m [31m11.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m973.5/973.5 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.0-py3-none-any.whl (23 kB)
Installing collected packages: langchain-text-splitters, langchain
Successfully installed langchain-0.2.1 langchain-text-splitters-0.2.0
Collecting langchain-groq
  Downloading langchain_groq-0.1.4-py3-none-any.whl (11 kB)
Collecting groq<1,>=0.4.1 (from langchain-gr

In [None]:
!pip install langchain-core

# **load environment variables**

In [6]:
import getpass
import os

os.environ['HUGGINGFACEHUB_API_TOKEN'] = getpass.getpass('HUGGINGFACEHUB_API_TOKEN')
os.environ['GROQ_API_KEY'] = getpass.getpass('GROQ_API_KEY')
os.environ['LANGCHAIN_API_KEY'] = getpass.getpass('LANGCHAIN_API_KEY')
# etc api keys

HUGGINGFACEHUB_API_TOKEN··········
GROQ_API_KEY··········
LANGCHAIN_API_KEY··········


# **Loading Data : WebBaseLoader**

In [None]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(["link1", "link2"])
docs = loader.load()
docs # will contain text

# or

loader = WebBaseLoader(
    web_paths=(["link1","link2"]),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

docs = loader.load()
docs # will contain text

# can also get other IMP info too like fetch all urls


 WebBaseLoader API Reference :

 [WebBaseLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html)

# **Loading Data: Markdown**

API Reference:

[UnstructuredMarkdownLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html)

In [7]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

markdown_path = "/content/sklearn_user_guide.md"
loader = UnstructuredMarkdownLoader(markdown_path)

docs = loader.load()
docs # will contain text

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.




# **Splitting & chunking**

In [8]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# **Vector Embedding: HuggingFaceEmbeddings**

In [9]:
from langchain_community.embeddings import HuggingFaceEmbeddings

# create an instance of embedding to use later while creating vectorstore
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# or
# embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-V2")

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Also can use langchain & huggingface integration library :

***!pip install langchain-huggingface***

# **Indexing: AstraDB**
Storing ***Splits*** in form of vectors into a ***VectorDB*** of ***AstraDB***, we ***Embedd*** & store at same time (one after other)

In [None]:
# AstraDB Configuration
ASTRA_DB_API_ENDPOINT = input("ASTRA_DB_API_ENDPOINT = ")


In [10]:
ASTRA_DB_APPLICATION_TOKEN = getpass.getpass("ASTRA_DB_APPLICATION_TOKEN = ")

ASTRA_DB_APPLICATION_TOKEN = ··········


In [11]:
ASTRA_DB_ID = getpass.getpass("ASTRA_DB_ID = ")

ASTRA_DB_ID = ··········


In [12]:
# connection of the AstraDB

from langchain.vectorstores.cassandra import Cassandra
import cassio
from langchain_astradb import AstraDBVectorStore

# Replace with your actual AstraDB credentials
# ASTRA_DB_ID = "8610ebf1-45ab-4bb5-8b91-65788fcbd547"
# ASTRA_DB_REGION = "us-east-2"

In [13]:
cassio.init(token=ASTRA_DB_APPLICATION_TOKEN,database_id=ASTRA_DB_ID)



In [14]:
## Convert Data Into Vectors and store in AstraDB

astra_vector_store=Cassandra(
    embedding=embeddings,
    table_name="sklearn_user_guide_markdown_v2",
    session=None,
    keyspace=None

)

In [15]:
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

astra_vector_store.add_documents(splits)
print("Inserted %i headlines." % len(splits))

astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

# astra_vector_index, we can store this as a variable named 'retriever'

Inserted 1234 headlines.


In [18]:
# Define the retriever
retriever = astra_vector_index.query

# **LLM to use: ChatGroq**

* we will be using ***llama3-8b-8192*** model provided by ***GROQ Cloud API***.

* by using ***GROQ*** we get ***LPU***(Language Processing Unit), fastest inference(quick response) for LLM

In [20]:
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama3-8b-8192",
    temperature=0.0,
    # etc params
)

In [21]:
# query or question

query = "what is task decomposition ?"
response = retriever(query,llm=llm)
print(response)



Based on the provided context, it appears that the term "task decomposition" is not explicitly mentioned. However, the context discusses various algorithms and techniques for dimensionality reduction, regression, and matrix factorization, which are all related to machine learning and data analysis.

Task decomposition, in the context of machine learning and data analysis, refers to the process of breaking down a complex task or problem into smaller, more manageable sub-tasks or components. This can involve decomposing a large dataset into smaller subsets, identifying key features or variables, or applying different algorithms or techniques to specific parts of the data.

In the context of the provided text, task decomposition might involve decomposing a complex data analysis problem into smaller sub-tasks, such as:

1. Dimensionality reduction using techniques like PCA or PLS
2. Feature selection or extraction
3. Matrix factorization or decomposition
4. Regression or classification mod

In [23]:
query = "what is essemble? answer in full detail"
response = retriever(query,llm=llm)
print(response)



Ensemble learning is a machine learning technique that combines the predictions or decisions of multiple models or algorithms to improve the overall performance and accuracy of the system. In other words, ensemble methods combine the strengths of multiple models to create a more robust and accurate prediction or decision-making system.

Ensemble methods are particularly useful when:

1. **Individual models are not accurate enough**: When a single model is not accurate enough to make a reliable prediction or decision, ensemble methods can combine the predictions of multiple models to improve the overall accuracy.
2. **Models have different strengths and weaknesses**: Ensemble methods can combine models that have different strengths and weaknesses to create a more robust system.
3. **Data is noisy or imbalanced**: Ensemble methods can help to reduce the impact of noisy or imbalanced data by combining the predictions of multiple models.

Types of Ensemble Methods:

1. **Bagging (Bootstrap

In [24]:
query = "what is football ?"
response = retriever(query,llm=llm)
print(response)



I don't know the answer to that question. The context provided is about machine learning and scoring functions, and it doesn't mention football.


***VectorStore (or DB)*** created on ***AstraDB*** & vector's for this embedding store in a ***DB*** (DB_ID = "...") with a ***name***(collection_name) to this group

# **Retrive Embedded Vectors for use : Retriever**

No special import needed , part of ***VectorStore*** i.e part of ***langchain-astradb*** (here)

# **Prompts**

instruction provided to LLM on what task it will perform on question as well as on custom dataset

In [32]:
from langchain_core.prompts import ChatPromptTemplate


prompt_template = """
Answer the following question based only on the provided context.
Think step by step before providing a detailed answer.
I will tip you $1000 if the user finds the answer helpful.
<context>
{context}
</context>

Question: {input}
"""

prompt = ChatPromptTemplate.from_template(prompt_template)

In [33]:
from langchain import hub

# prompt = hub.pull("rlm/rag-prompt")
# prompt = hub.pull("rlm/rag-prompt") # eg

# this code is giving error , so don't use this for now


# **IMPORTANT NOTE**
- Define the format_docs function to format retrieved documents
- this function will be different for different method of data loading
- as i scraped data with online tool, then converted it into ***.md(markdown file)*** , so my logic will be different from the data scraped from website

In [34]:
# Define the format_docs function to format retrieved documents
def format_docs(docs):
    if isinstance(docs, list):
        return "\n\n".join(docs)  # Join list of text chunks with double newlines
    return str(docs)  # Convert to string if not a list

# **Chain**:

- create a chain(from retrieval to llm response)
- so, less code to write and every component at one place , so easy to understand & later to modify

In [35]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

from langchain_core.messages.human import HumanMessage

In [36]:
# Define the chain
def rag_chain(query):
    # Retrieve documents(query) & pass LLM to use
    retrieved_docs = retriever(query, llm=llm)

    formatted_docs = format_docs(retrieved_docs)  # Format the documents

    # Construct the input for the prompt
    input_data = {
        "context": formatted_docs,
        "input": query  # Ensure to use the correct key 'input'
    }

    # Format the prompt correctly using the input data
    prompt_result = prompt.format(**input_data)

    # Ensure the input to the LLM is in the correct format
    human_message = HumanMessage(content=prompt_result)

    # Pass the message in a list to the LLM
    llm_response = llm([human_message])  # Get the LLM response

    # Create an instance of StrOutputParser
    output_parser = StrOutputParser()

    # Parse the LLM response using the instance
    final_output = output_parser.parse(llm_response)

    return final_output


# **Sample Ex. to test our RAG-LLM Chatbot**

In [37]:
query = "what is Task Decompostion?"
response = rag_chain(query)
print(response)



content='Based on the provided context, Task Decomposition is a technique used to break down a complex task into smaller, more manageable sub-tasks that can be executed in parallel.' response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 298, 'total_tokens': 332, 'completion_time': 0.02708508, 'prompt_time': 0.13665724, 'queue_time': None, 'total_time': 0.16374232000000002}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_c5f20b5bb1', 'finish_reason': 'stop', 'logprobs': None} id='run-0ce6758b-bcf7-48c3-94b8-119c03dc4202-0'


In [39]:
query = "what is cross-validation? give a proper example to explain this topic"
response = rag_chain(query)
response



AIMessage(content="Based on the provided context, I will provide a detailed answer.\n\nCross-validation is a technique used to evaluate the performance of a machine learning model by splitting the available data into training and testing sets. The goal is to get a more accurate estimate of the model's performance on unseen data.\n\nHere's an example to illustrate the concept:\n\nSuppose we have a dataset of 100 images of cats and dogs, and we want to train a model to classify them. We split the dataset into 5 folds:\n\nFold 1: 20 images (10 cats, 10 dogs)\nFold 2: 20 images (10 cats, 10 dogs)\nFold 3: 20 images (10 cats, 10 dogs)\nFold 4: 20 images (10 cats, 10 dogs)\nFold 5: 20 images (10 cats, 10 dogs)\n\nWe train our model on 4 folds (80 images) and test it on the remaining fold (20 images). We calculate the accuracy of the model on the test set. Let's say the accuracy is 90%.\n\nWe repeat this process for each fold, training the model on 4 folds and testing it on the remaining fold

# **Making output prettier**

In [41]:
## fuck this , i am done coding for now

### ***This Tutorial to develop RAG-Based LLM chatbot Application is created by Abhishek Vidhate***
[GITHUB profile](https://github.com/Abhishekvidhate)

# **Next Task**: create UI for this RAG-LLM app using STREAMLIT