<a href="https://colab.research.google.com/github/T-DevH/LLM_RAG_NIM/blob/main/NIM_mistral_7b_instruct_v0_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM-RAG using NVIDIA NIM. `mistral-7b-instruct-v0.2`

Installing the `langchain_nvidia_ai_endpoints` package.

This package provides access to NVIDIA AI endpoints, which are essential for leveraging NVIDIA's AI services within the LangChain framework.
The LangChain framework allows for building applications that integrate large language models with external tools and services.

The installation command below uses pip, which is the package installer for Python. It is used to install and manage software packages from the Python Package Index (PyPI) and other repositories.



In [4]:
!pip install langchain_nvidia_ai_endpoints

Collecting langchain_nvidia_ai_endpoints
  Downloading langchain_nvidia_ai_endpoints-0.1.2-py3-none-any.whl (31 kB)
Collecting langchain-core<0.3,>=0.1.27 (from langchain_nvidia_ai_endpoints)
  Downloading langchain_core-0.2.9-py3-none-any.whl (321 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pillow<11.0.0,>=10.0.0 (from langchain_nvidia_ai_endpoints)
  Downloading pillow-10.3.0-cp310-cp310-manylinux_2_28_x86_64.whl (4.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3,>=0.1.27->langchain_nvidia_ai_endpoints)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.2.0,>=0.1.75 (from langchain-core<0.3,>=0.1.27->langchain_nvidia_ai_endpoints)
  Downloading langsmith-0.1.80-py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━

**Importing NVIDIAEmbeddings and ChatNVIDIA classes from langchain_nvidia_ai_endpoints package**

The NVIDIAEmbeddings class provides methods for generating embeddings using NVIDIA's AI models.
Embeddings are numerical representations of text data that can be used for various NLP tasks such as similarity search, clustering, and classification.

The ChatNVIDIA class facilitates interaction with NVIDIA's AI-powered chat models, allowing the development of conversational AI applications.

By importing these classes, we gain access to the functionality required to integrate NVIDIA's advanced AI capabilities within our LangChain-based application.




In [1]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA

**Installing additional necessary packages**

The langchain-community package includes various community-contributed tools and extensions for the LangChain framework.
These tools can enhance the functionality of LangChain, making it easier to build and integrate large language models with other systems.

The langchain-text-splitters package provides utilities for splitting large texts into smaller chunks.
This is particularly useful for processing long documents in natural language processing (NLP) tasks.

The faiss-cpu package is a library for efficient similarity search and clustering of dense vectors.
It is useful for tasks such as nearest neighbor search, which is commonly used in recommendation systems and information retrieval.

The installation command below uses pip to install these packages from the Python Package Index (PyPI).


In [2]:
!pip install langchain-community langchain-text-splitters faiss-cpu

Collecting langchain-community
  Downloading langchain_community-0.2.5-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters
  Downloading langchain_text_splitters-0.2.1-py3-none-any.whl (23 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting langchain<0.3.0,>=0.2.5 (from langchain-community)
  Downloading langchain-0.2.5-py3-none-any.whl (974 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m48.2 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0

# Importing the WebBaseLoader class from the langchain_community.document_loaders package

The WebBaseLoader class is designed to load documents from web pages.
It retrieves the content from a specified URL and prepares it for further processing.

The URL provided in this example points to a Wikipedia page about "Pepsi-Cola Made with Real Sugar".
The WebBaseLoader will fetch the content of this page for use in the application.

You can use a PDF file and here is what you need to do:

Importing the PyPDFLoader class from the langchain_community.document_loaders package

The PyPDFLoader class is designed to load documents from PDF files using the PyPDF2 library.
It extracts text content from the specified PDF file and prepares it for further processing.

`from langchain_community.document_loaders import PyPDFLoader`

Specify the path to your PDF file:
`pdf_file_path = "path/to/your/file.pdf"`

Initializing the PyPDFLoader with the specified file path
`loader = PyPDFLoader(pdf_file_path)`

Using the load method to retrieve and load the document content from the PDF file
`docs = loader.load()`

Note: Make sure the PDF file is accessible from your current working directory or provide the full path.
Make sure to have the necessary dependencies installed, such as PyPDF2, which PyPDFLoader relies on. If not already installed, you can add a cell to install it:
`!pip install PyPDF2`




In [3]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://en.wikipedia.org/wiki/Pepsi-Cola_Made_with_Real_Sugar")

docs = loader.load()



# Importing the userdata module from google.colab to access user-specific data

The google.colab.userdata module allows accessing stored user data in Google Colab, such as API keys and other sensitive information.
This is useful for securely managing credentials required for accessing external services.
from google.colab import userdata

# Importing the os module to interact with the operating system

The os module in Python provides a way to use operating system dependent functionality like reading or writing to the file system.
Here, it is used to set environment variables.


# Setting the `NVIDIA_API_KEY` environment variable using userdata

The userdata.get method retrieves the value of the 'NVIDIA_API_KEY' stored in Google Colab.
The retrieved API key is then set as an environment variable, `NVIDIA_API_KEY`, which can be used by other parts of the application to authenticate with NVIDIA's services.


*Note: Ensure that the NVIDIA API key is properly stored in Google Colab's userdata for this to work.*


In [4]:
from google.colab import userdata
import os
os.environ['NVIDIA_API_KEY'] = userdata.get('NVIDIA_API_KEY')

# Creating an instance of the `NVIDIAEmbeddings` class

The `NVIDIAEmbeddings` class, imported from the `langchain_nvidia_ai_endpoints` package, is used to generate embeddings from text data.
Embeddings are dense vector representations of text, which can be used for various natural language processing (NLP) tasks such as similarity search, clustering, and classification.
By creating an instance of the NVIDIAEmbeddings class, we can utilize NVIDIA's AI models to generate these embeddings.

*Note: Ensure that the NVIDIA API key is correctly set as an environment variable before creating this instance.
The NVIDIAEmbeddings class will use the API key for authentication with NVIDIA's services.*


In [5]:
embeddings = NVIDIAEmbeddings()

# Importing necessary classes from langchain_community and langchain_text_splitters packages

FAISS is a library for efficient similarity search and clustering of dense vectors.
It is particularly useful for tasks involving large-scale information retrieval.

`RecursiveCharacterTextSplitter` is a utility for splitting large texts into smaller chunks.
This is important for processing long documents in natural language processing (NLP) tasks.


# Initializing a text splitter with specified chunk size and overlap

`chunk_size`: The maximum size of each chunk.
`chunk_overlap`: The number of characters that overlap between chunks.
This helps in maintaining context across chunks.
`text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)`

Splitting the loaded documents into smaller chunks using the text splitter

The split_documents method takes the loaded documents (from the previous cell) and splits them into smaller chunks.
This makes it easier to process and analyze large documents.
`documents = text_splitter.split_documents(docs)`

# Creating a FAISS vector store from the documents and embeddings

The `from_documents` method creates a FAISS index from the provided documents and their corresponding embeddings.
This allows for efficient similarity search and retrieval of relevant documents based on query vectors.
`vector = FAISS.from_documents(documents, embeddings)`

# Creating a retriever from the FAISS vector store

The `as_retriever` method converts the FAISS vector store into a retriever object.
This retriever can be used to perform similarity search and retrieve relevant documents for a given query.
`retriever = vector.as_retriever()`

*Note: Ensure that the embeddings instance is properly created and functional before using it to generate vectors.*


In [6]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)
retriever = vector.as_retriever()

# Importing necessary classes from langchain_core package

`ChatPromptTemplate` is used to define the template for prompts that will be used in chat interactions.
This helps in standardizing the format of prompts that are sent to the chat model.

`StrOutputParser` is used to parse the output from the chat model.
It ensures that the responses are processed correctly and can be used in further steps of the application.


# Creating an instance of the ChatNVIDIA class with a specified model

The `ChatNVIDIA` class, imported from the `langchain_nvidia_ai_endpoints` package, facilitates interaction with NVIDIA's AI-powered chat models.
The '`model`' parameter specifies the particular model to be used. In this case, "`mistral_7b`" is used, which refers to a specific NVIDIA chat model.
By creating an instance of `ChatNVIDIA`, we can send prompts and receive responses from this model, enabling conversational AI capabilities.


*Note: Ensure that the NVIDIA API key is correctly set as an environment variable before creating this instance.
The ChatNVIDIA class will use the API key for authentication with NVIDIA's services.*


In [7]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


model = ChatNVIDIA(model="mistral_7b")



Checking the model. We are using `mistral-7b-instruct-v0.2` here:



In [8]:
model

ChatNVIDIA(model='mistralai/mistral-7b-instruct-v0.2')

# Defining a hypothetical answer template

This template instructs the model to generate a one-paragraph hypothetical answer to a given question.
Even if the model does not know the full answer, it is encouraged to provide a plausible response based on the information available.


# Creating a ChatPromptTemplate from the defined template
#
The `from_template` method of `ChatPromptTemplate` is used to create a prompt template object from the provided string template.
This object standardizes the format of prompts that will be sent to the chat model, ensuring consistency in interactions.

# Creating a query transformer pipeline

The query transformer pipeline is a sequence of operations that transforms an input question into a hypothetical answer.
This is done by passing the question through the `hyde_prompt`, then the `model`, and finally parsing the output using `StrOutputParser`.

`hyde_prompt`: Applies the template to the input question.
`model`: Uses the `ChatNVIDIA` instance to generate a response based on the templated input.
`StrOutputParser`: Parses the model's response into a usable format (string in this case).

*Note: Ensure that all components (`ChatPromptTemplate`, `ChatNVIDIA,` and `StrOutputParser`) are properly imported and functional.*


In [9]:
hyde_template = """Even if you do not know the full answer, generate a one-paragraph hypothetical answer to the below question:

{question}"""
hyde_prompt = ChatPromptTemplate.from_template(hyde_template)
hyde_query_transformer = hyde_prompt | model | StrOutputParser()

# Importing the chain decorator from langchain_core.runnables

The chain decorator is used to create a sequence of operations that are executed in a specific order.
It allows combining multiple steps into a single callable function, facilitating complex workflows.

# Defining a chained function for hypothetical document retrieval

The hyde_retriever function takes a question as input and follows these steps:
1. Generates a hypothetical document using the hyde_query_transformer.
2. Uses the retriever to find and return relevant documents based on the hypothetical document.

The `@chain `decorator ensures that these steps are executed in sequence when the function is called.

```
@chain
def hyde_retriever(question):
```
Generating a hypothetical document using the hyde_query_transformer

The invoke method of hyde_query_transformer applies the transformation pipeline to the input question.
This involves formatting the question with the hyde_prompt, passing it through the model, and parsing the output.
`hypothetical_document = hyde_query_transformer.invoke({"question": question})`
    
# Using the retriever to find relevant documents
The invoke method of retriever is then called with the hypothetical document to retrieve relevant documents.
The retriever uses the FAISS vector store to perform similarity search and return the most relevant documents.
    `return retriever.invoke(hypothetical_document)`

*Note: Ensure that all components (`hyde_query_transformer `and `retriever`) are properly initialized and functional before defining the chained function.*


In [10]:
from langchain_core.runnables import chain

@chain
def hyde_retriever(question):
    hypothetical_document = hyde_query_transformer.invoke({"question": question})
    return retriever.invoke(hypothetical_document)


# Defining a template for answering questions based on provided context

This template instructs the model to answer a question using only the provided context.
The template includes placeholders for the context and the question, which will be filled in at runtime.

# Creating a `ChatPromptTemplate` from the defined template

The from_template method of `ChatPromptTemplate` is used to create a prompt template object from the provided string template.
This object standardizes the format of prompts that will be sent to the chat model, ensuring consistency in interactions.
prompt = ChatPromptTemplate.from_template(template)

# Creating an answer chain pipeline

The answer chain pipeline is a sequence of operations that transforms an input question and context into an answer.
This is done by passing the formatted input through the prompt, then the model, and finally parsing the output using StrOutputParser.

1. `prompt`: Applies the template to the input context and question.
2. `model`: Uses the ChatNVIDIA instance to generate a response based on the formatted input.
3.  `StrOutputParser`: Parses the model's response into a usable format (string in this case).


*Note: Ensure that all components (ChatPromptTemplate, ChatNVIDIA, and StrOutputParser) are properly imported and functional.*


In [11]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
answer_chain = prompt | model | StrOutputParser()

# Importing the chain decorator from langchain_core.runnables

The chain decorator is used to create a sequence of operations that are executed in a specific order.
It allows combining multiple steps into a single callable function, facilitating complex workflows.


# Defining a chained function for retrieving and answering questions

The final_chain function takes a question as input and follows these steps:
1. Uses the `hyde_retriever` to generate a hypothetical document and retrieve relevant documents.
2. Uses the `answer_chain` to generate answers based on the retrieved documents and the original question.

The @chain decorator ensures that these steps are executed in sequence when the function is called.

# Retrieve relevant documents using the hyde_retriever

The invoke method of hyde_retriever is called with the input question. This generates a hypothetical document and retrieves relevant documents based on it.
    
    
# Stream answers based on the retrieved documents and the original question
    
The stream method of answer_chain is used to generate answers in a streaming fashion.
It takes a dictionary with the question and the context (retrieved documents) as input.
The function yields each answer generated by the answer_chain.
   

*Note: Ensure that all components (hyde_retriever and answer_chain) are properly initialized and functional before defining the chained function.*


In [12]:
@chain
def final_chain(question):
    documents = hyde_retriever.invoke(question)
    for s in answer_chain.stream({"question": question, "context": documents}):
        yield s

# Streaming the output from the final_chain function for a specific question

This code block calls the final_chain function with a specific question about the sodium content in Pepsi.
It streams the generated answers and prints them as they are produced.

The `final_chain` function retrieves relevant documents and generates answers based on the input question, using the `hyde_retriever` and `answer_chain `components.

The stream method of `final_chain` processes the input question in a streaming fashion, yielding answers as they are generated.
Each answer is printed without a newline character at the end, ensuring the output is continuous.

In [13]:
for s in final_chain.stream("what is the value of sodium in Pepsi drink"):
    print(s, end="")

 According to the context provided, the sodium content in a 12 oz (355 mL) serving of Pepsi is 30 mg. However, it's important to note that there might be slight variations in sodium content depending on specific batches or individual bottles. The context does not indicate any difference in sodium content between Pepsi Throwback and Pepsi-Cola Made with Real Sugar.

In [14]:
!pip install gradio



In [15]:
import gradio as gr

# Define a function that uses the final_chain to answer questions
def answer_question(question):
    response = ""
    for s in final_chain.stream(question):
        response += s
    return response

# Create a Gradio interface
iface = gr.Interface(
    fn=answer_question,              # Function to call
    inputs="text",                   # Input type
    outputs="text",                  # Output type
    title="LLM-RAG Question Answering",  # Title of the interface
    description="Ask a question based on the documents loaded in the system and get contextually relevant answers."  # Description
)

# Launch the Gradio interface
iface.launch()



Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://f88361e08b1c53fbc3.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


