In [1]:
!pip install nbformat




In [2]:
import nbformat

nb = nbformat.read("Groq_RAG_App_with_Gradio.ipynb", as_version=4)

# Ensure widget metadata has the 'state' key
if "widgets" in nb["metadata"]:
    nb["metadata"]["widgets"].setdefault("state", {})

nbformat.write(nb, "Groq_RAG_App_with_Gradio_fixed.ipynb")


### RETRIEVAL AUGMENTED GENERATION [RAG]



### What is RAG?
One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as retrieval-augmented generation (RAG). RAG is a technique for augmenting LLM knowledge with additional data, which can be your own data.

LLMs can reason about wide-ranging topics, but their knowledge is limited to public data up to the specific point in time that they were trained. If you want to build AI applications that can reason about private data or data introduced after a model’s cut-off date, you must augment the knowledge of the model with the specific information that it needs. The process of bringing and inserting the appropriate information into the model prompt is known as RAG.

LangChain has several components that are designed to help build Q&A applications and RAG applications, more generally.


### RAG architecture
A typical RAG application has two main components:

* **Indexing**: A pipeline for ingesting and indexing data from a source. This usually happens offline.

* **Retrieval and generation**: The actual RAG chain takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

### Importing Required Libraries

In [1]:
!pip install -U langchain-community
!pip install langchain-groq
!pip install wget

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

In [2]:
# You can use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain_groq import ChatGroq

import wget

### Preprocessing

Loading the document

In [3]:
file_name = 'The_Adventures_of_Sherlock_Holmes.txt'
url = 'https://www.gutenberg.org/ebooks/1661.txt.utf-8'

wget.download(url, out=file_name)
print('Book Downloaded')


Book Downloaded


In [4]:
with open(file_name, 'r') as file:
  content = file.read()
  print(content[:500])

The Project Gutenberg eBook of The Adventures of Sherlock Holmes
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before usin


### Splitting Document into Chunks

In [5]:
loader = TextLoader(file_name)
document = loader.load()
text_splitter = CharacterTextSplitter(chunk_size = 1000, chunk_overlap = 30)
texts = text_splitter.split_documents(document)
print(len(texts))



687


### Embedding and storing
This step is the `embed` and `store` processes in `Indexing`.

In [6]:
!pip install chromadb
embeddings = HuggingFaceEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)
print('document ingested')

Collecting chromadb
  Downloading chromadb-1.0.15-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.0 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.6 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.36.0-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb)
  Downloading opentelemetry_sdk-1.36.0-py3-none-any.whl.metadata (1.5 k

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

document ingested


In [7]:
model_id = 'llama-3.1-8b-instant'

In [None]:
llama_llm = ChatGroq(
    api_key = "YOUR_GROQ_API_KEY",
    model_name=model_id,
    temperature=0.5,
    max_tokens=256
)


### RetrievalQA

In [9]:
qa = RetrievalQA.from_chain_type(llm = llama_llm,
                                 chain_type = 'stuff',
                                 retriever = docsearch.as_retriever(),
                                 return_source_documents = False
                                 )
query = 'Can you summarize the document?'
qa.invoke(query)

{'query': 'Can you summarize the document?',
 'result': 'The document analyzed appears to be a personality profile or assessment of a person. It lists various aspects of their life and skills, including:\n\n- Philosophy: 0 (implying a lack of interest or knowledge)\n- Astronomy: 0\n- Politics: 0\n- Botany: variable\n- Geology: profound (with a focus on mud-stains within 50 miles of town)\n- Chemistry: eccentric\n- Anatomy: unsystematic\n- Sensational literature and crime records: unique\n- Violin-player: (implying they play the violin)\n- Boxer: (implying they are a skilled boxer)\n- Swordsman: (implying they are skilled with a sword)\n- Lawyer: (implying they have a background or interest in law)\n- Self-poisoner (by cocaine and tobacco): (implying they have a history of substance abuse)\n\nThe profile also mentions that the person is to copy out the first volume of the Encyclopædia Britannica.'}

In [10]:
qa = RetrievalQA.from_chain_type(llm = llama_llm,
                                 chain_type = 'stuff',
                                 retriever = docsearch.as_retriever(),
                                 return_source_documents = False
                                 )
query = 'Give me the important Charater names?'
qa.invoke(query)

{'query': 'Give me the important Charater names?',
 'result': "The important character names mentioned in the context are:\n\n1. Sherlock Holmes\n2. Lestrade (friend of Holmes and a detective)\n3. Mr. John Turner (a landed proprietor in Herefordshire)\n4. Mr. Charles McCarthy (Mr. Turner's tenant and an ex-Australian)\n5. Mr. Godfrey Norton (a Mr. Turner's male visitor and a lawyer from the Inner Temple)"}

In [11]:
qa = RetrievalQA.from_chain_type(llm = llama_llm,
                                 chain_type = 'stuff',
                                 retriever = docsearch.as_retriever(),
                                 return_source_documents = False
                                 )
query = 'Is there a movie based on the book?'
qa.invoke(query)

{'query': 'Is there a movie based on the book?',
 'result': 'Yes, there is a movie based on the book. The story is from a short story called "The Stone of Evil" but more commonly referred to as \'The Case of the Missing Canary\'  by none other than Sir Arthur Conan Doyle, the creator of Sherlock Holmes, but it was later included in the collection "The Case-Book of Sherlock Holmes"  and was adapted into a film called "The Case of the Missing Canary"'}

### As you can see, the query is asking something that does not exist in the document. The LLM responds with information that actually is not true. You don't want this to happen, so you must add a prompt to the LLM.


### Using Prompt Template

In [12]:
# 1. Define custom prompt template
prompt_template = '''Use the information from the document to answer the question at the end.
If you don't know the answer,
just say that you don't know, definately do not try to make up an answer.

{context}

Question: {question}
'''

# 2. Create a PromptTemplate object from the string
prompt = PromptTemplate(template=prompt_template,
                        input_variables=['context', 'question']) # Note: 'question' is standard

# 3. Create the dictionary to pass the custom prompt
chain_type_kwargs = {'prompt': prompt}



In [13]:
# 4. Use the dictionary when initializing your RetrievalQA chain
qa = RetrievalQA.from_chain_type(llm = llama_llm,
                                 chain_type = 'stuff',
                                 retriever = docsearch.as_retriever(),
                                 chain_type_kwargs = chain_type_kwargs,
                                 return_source_documents = False
                                 )
query = 'Is there a movie based on the book?'
qa.invoke(query)

{'query': 'Is there a movie based on the book?',
 'result': 'I don\'t know. The provided text does not mention any information about a movie adaptation of the book. However, I can suggest that the book "The Hound of the Baskervilles" by Sir Arthur Conan Doyle is a famous mystery novel featuring Sherlock Holmes, and it has been adapted into numerous films and other media. But the text provided does not give any specific information about a movie adaptation.'}

From the answer, you can see that the model responds with "don't know".


In [14]:
# --- Gradio Application Functions ---
# These functions will be called by the Gradio interface.
def answer_question(query):
    """
    This function processes the user's query and returns an answer.
    """
    if not query:
        return "Please enter a question."

    try:
        response = qa.invoke({'query': query})
        return response['result']
    except Exception as e:
        return f"An error occurred: {e}"

In [16]:
# --- Gradio Interface ---
# Define the Gradio web interface
import gradio as gr
rag_application = gr.Interface(
    fn=answer_question,
    allow_flagging='never',
    inputs=gr.Textbox(label='Input Query', lines=5, placeholder="Enter your question here"),
    outputs=gr.Textbox(label="Answer"),
    title="Groq-powered RAG Chatbot for Sherlock Holmes",
    description="Ask a question and get an answer based on 'The Adventures of Sherlock Holmes'.",
)

if __name__ == "__main__":
    rag_application.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://ace2ba77e635e6aa32.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
