# Project Title: [ End-to-End Document Q & A Chatbot Using Google Gemma Open-Source Models And Groq API ]

- This project demonstrates how to build a scalable, high-performance document Q&A system using cutting-edge open-source models and technology from " Google and Groq ".

# Objective:

- This project aims to develop a document Q&A chatbot leveraging "Google’s Gemma" open-source language models and "Groq’s" high-speed inferencing engine. 

- The goal is to create an efficient and responsive system that can handle large language model (LLM) tasks in real-time.

# 1.Gemma Models:

"Gemma" is a family of lightweight, open-source models derived from the same research used to build "Google's Gemini models".

- Multiple Gemma variants exist, each tailored to different use cases

- ( a )  Gemma 2 – ( which was recently launched )
- ( b ) Gemma 1 
- ( C ) Recurrent Gemma - ( For to improve memory efficiency ). 
- ( d ) Pali Gemma -  ( This specifically open vision language model)
-(e )  Code Gemma -   ( Lets say if we really want to work with model which will be able to provide with us a amazing code assistance 
                    then we specifically used the Code Gemma ).


- In this project, the [ "Gemma-1" ] model is used for practical implementation.

# 2.Groq Engine:

In [None]:
- "Groq Engine" is a cutting-edge inferencing platform designed to provide faster inference times than traditional GPUs.
- It employs a Language Processing Unit (LPU), which overcomes the limitations of GPUs by improving compute density and memory bandwidth, 
-  significantly speeding up LLM tasks.

# About "GROQ"

- Now with the help of this Groq what particular inferencing engine what so special about this “Groq”
- Go lets read about the [ “Groq” ] 

- Definition :  Groq is on a mission to set the standard for “GenAI” inference Speed, helping real-time AI applications come to life today.

- One problem is with working with this LLMs is Specifically with respect to inferencing how quickly we are able to get the response.
    
- If I will consider this “Groq” Platform, it uses something called as – LPU Inferencing Engine.
    
- LPU Inferencing Engine is nothing but LPU Stand for – Language Processing unit. It is a new type of end to end processing unit system.

- That provide the faster inference for computationally intensive application with a sequential component to them such as AI language application.

- What is the main this of LPU over here is that, it is pretty much faster for the Inferencing purpose, it is much more faster then GPUs.
                                                                                                      
- Question :-  Why it is so much faster then the GPU for LLM and GenAI?
                                                    
- Answer :
                                                    
- LPU is designed to overcome the two LLM bottlenecks compute density and memory bandwidth.
                                                    
- LPU has a great compute capacity than a GPU and a CPU in regards to LLM this reduces the amounts of time per word calculated. Allowing sequences of text to be generated much faster. 
                                            Additionally, eliminating external memory bottlenecks enables the LPU Inference Engine to deliver orders of magnitude better performance on LLMs compared to GPUs.

- That is the region why LPU is very much important, there is  also a research paper  , we can refer if we want to go deep dive into it.

- If we specifically called with respect to “Groq” it provides us API and if we see top right corner, this platform has almost every open sources model.


# 3.Technology Stack:

- [ LangChain ] : Used for setting up the Q&A framework.

- [ FAISS ] : A vector store from Meta used for embedding and retrieving information from documents.

- [ Streamlit ] : Used for building the interactive chatbot UI.
    
- [ PyPDF ] : For reading and processing PDF documents.
    
- [ Google Generative AI Embeddings ] : Used for embedding text into vectors.
    
- [ Groq Cloud ] : For deploying and running the model on Groq's LPU engine.

# 4.Project Workflow:

# ( A ) Environment Setup:

- Create a Python virtual environment.Named it as a - [ GEMMA ]
    
- Install required libraries listed below inside a [ requirements.txt ] file


- Faiss-cpu
- Groq
- Langchain-groq
- PyPDF2
- Langchain_google_genai
- Langchain
- Streamlit
- Langchain_community
- Python-dotenv
- Pypdf

# ( B ) Embedding and Data Ingestion:

- Load PDF documents from a local directory.

- Convert the text from these documents into chunks and embed them using "Google Generative AI Embeddings".

- Store the embedded text in "FAISS", a vector store optimized for "semantic search".

# (C ) Q&A System:

- Utilize LangChain to build the document Q&A system.

- Define custom prompt templates to guide the chatbot's responses based on the document context.

- Implement the chatbot using the Groq inferencing engine with the [ Gemma-7b-it model ], which handles the question-answering tasks.

# ( D ) Deployment:

- The chatbot is deployed using Streamlit, providing an interactive interface for users to input questions and receive answers 
    based on the provided documents.

# 5.Key Features:

- Fast Inference : The project leverages Groq's LPU engine for faster inference times, making the Q&A system highly responsive.

- Scalable Models : The integration of Gemma models allows for flexibility in scaling up to larger models ( For Example : Gemma-7b) for more complex tasks.

- Document-Based Q&A : The system efficiently handles documents, extracts relevant information, and provides accurate answers to user queries.

# Below is the code with detailed explanations, line by line

In [None]:
import os
import streamlit as st
from langchain_groq import ChatGroq
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_google_genai import GoogleGenerativeAIEmbeddings

from dotenv import load_dotenv

load_dotenv()

##Load the GROQ and Google API KEY from the .env file

groq_api_key = os.getenv("GROQ_API_KEY")
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

st.title("Google_Gemma_Model : Document_Q&A")

# Initialize the ChatGroq model
llm = ChatGroq(groq_api_key=groq_api_key, model_name="Gemma-7b-it")

# Define the prompt template
prompt = ChatPromptTemplate.from_template(
    """
    Answer the questions based on the provided context only.
    Please provide the most accurate response based on the question.
    <context>
    {context}
    <context>
    Questions: {input}
    """
)

# Function to initialize vector embeddings
def vector_embedding():
    if "vectors" not in st.session_state:
        st.session_state.embeddings=GoogleGenerativeAIEmbeddings(model = "models/embedding-001")
        st.session_state.loader=PyPDFDirectoryLoader("./us_census") ## Data Ingestion
        st.session_state.docs=st.session_state.loader.load() ## Document Loading
        st.session_state.text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200) ## Chunk Creation
        st.session_state.final_documents=st.session_state.text_splitter.split_documents(st.session_state.docs[:20]) #splitting
        st.session_state.vectors=FAISS.from_documents(st.session_state.final_documents,st.session_state.embeddings) #vector OpenAI embeddings

# Input for the user's question
prompt1 = st.text_input("Enter Your Question From Documents")

# Button to initialize document embeddings

if st.button("Initialize Document Embeddings"):
    vector_embedding()
    st.success("Vector Store DB is Ready")

import time

if prompt1:
    document_chain = create_stuff_documents_chain(llm, prompt)
    retriever = st.session_state.vectors.as_retriever()
    retrieval_chain = create_retrieval_chain(retriever, document_chain)
    start = time.process_time()
    response = retrieval_chain.invoke({'input': prompt1})
    print("Response time :", time.process_time() - start)
    st.write(response['answer'])

#With a streamlit expander

    with st.expander("Document Similarity Search"):
        # Find the relevant chunks
        for i, doc in enumerate(response["context"]):
            st.write(doc.page_content)
            st.write("--------------------------------")

In [None]:
import os
import streamlit as st

In [89]:
# Since I am going to use “Groq” so, in Groq we have to chat groq. So, that we will be able to create a chatbot.

In [None]:
from langchain_groq import ChatGroq 

In [91]:
# Once I probably read any documents, I should be able to convert that into chunks. So, for that I will be using from ( langchain.text_splitter ) and
# [ import RecursiveCharacterTextSplitter ].

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter 

In [None]:
# Along with this I am using one more libraries called [ langchain.chains.combine_documents ] and
# I am going to create [ import create_stuff_documents_chain  ]. So, in Langchain libraries, we use this create stuff documents chain for the
# relevant documents and Q & A. This is will basically help us to set up the context.

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain

In [None]:
# Here, we will also going to use [ ChatPromptTemplate ]. So, that we will be able to create a our own custom prompt template. 
# So, here we are writing from [ langchain_core.prompts import ChatPromptTemplate ]. This is my vector store DB.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain

In [None]:
# Now, here we will be embedding our vector store DB that we are specifically going to use and for that we are going to import from the
# [ langchain_community.vectorstores import FAISS ] .

# “FAISS” is kind of a vector store that has been created, by Meta and internally it is basically used to store vectors and
# it internally performs semantic search and similarities search to give us the result based on the information that I have asked.

In [None]:
from langchain_community.vectorstores import FAISS

In [None]:
# Here, I will also importing my libraries which is [ langchain_community.document_loaders ], we are using [ PyPDF ] directory
# because our main aim is that we will be reading some PDF files that from our folder and then we will be reading all the documents and then, 
# we will be dividing chunk using this recursive character Text plater from langchain_community.document_loaders import PyPDFDirectoryLoader


In [4]:
# Now, finally we also going to use one more library, that is our embedding technique for that we are going to import from 
# the [ langchain_google_genai ] because Google Already provide this, Google GenAI embedding which we can completely use it. 
# Its completely freely available, just by using the “GOOGLE_API_KEY”.
# These  [ GoogleGenerativeAIEmbeddings ] will be responsible to converting my text, a chunks of text into the vectors.
# This is my vector Embedding techniques.

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

In [6]:
# Next : Step code2 :

In [None]:
# Now from dotenv we are going to import load_ underscore dotenv.
# I am using this libraries region is that,so that we can actually go ahead and load all our environment variables.

In [None]:
from dotenv import load_dotenv
load_dotenv()

In [None]:
# Next : Step code 3 :  

In [None]:
# Now, Load the GROQ and Google API KEY from the [ .env ] or environment variable file.
# How do I load it, for that [ groq_api_key = os.getenv("GROQ_API_KEY") ]
# Here , I am using my Amazon Q as my code assistance over here.So, automatically I am able to get the suggestion. 
# So, the both below environment variable is created in my [ .env ] file or folder.

In [None]:
groq_api_key = os.getenv("GROQ_API_KEY")
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

In [None]:
# Next : Step code 4 :  

In [None]:
# lets st.title here I ("Google_Gemma_Model : Document_Q&A"). Here specifically we are using “Groq”.

In [None]:
st.title("Google_Gemma_Model : Document_Q&A")

In [None]:
# Here, I am just calling my LLM model, here I am going to use my “ChaGroq”, with respect to that, I am going to use my GROQ_API_KEY and 
# specifically say model_name, and here I am going to give my Model name, which is – [ Gemma-7b-it ].
# This particular model, I am using it and with the help of GROQ_API_KEY, I will be able to call it.

In [None]:
llm = ChatGroq(groq_api_key=groq_api_key, model_name="Gemma-7b-it")

In [None]:
# Next : Step code 5 :  

In [None]:
# Now here what we will do is, we will going to set up our Prompt template.To set up Prompt template, 
# I will be using this same [ ChatPromptTemplate.from_template ] & I am going to write these prompt as – 

# """
# Answer the questions based on the provided context only.
# Please provide the most accurate response based on the question.
# <context>
# {context}    # The context will be this, from this particular context,
# <context>
# Questions: {input}   # It will going to take this particular input over here.
# """
# By taking this all information, it is going to give us the entire information.


In [None]:
prompt = ChatPromptTemplate.from_template(
"""
Answer the questions based on the provided context only.
Please provide the most accurate response based on the question.
<context>
{context}
<context>
Questions: {input}
"""

In [None]:
# Next : Step code 6 

In [None]:
# So, now we are going to create a function which is called as Vector Embedding & this vector Embedding function, It is going to do is that.
# We will be reading all the documents from our PDF Files.
# From those PDF files, we also going to convert these into chunks then apply embeddings.
# We are going to apply “GOOGLE_GENERTIVE_AI Embedding and then finally, we will be storing it in a vector store DB, that is called as “FAISS”. 
# We will going to keep this vector store DB, even in our session state so that we will be able to use it anywhere when it is required.
# So 1st of all that, I am going to do over here is that, I am going to make sure that one type of PDFs, I should be able to upload over here. 
# I will create a new folder name it as a – [ us_census ]
# I will save this folder with 4 PDF files inside it and save it inside my repository directory, where is 
# my all files stores + codes and every available. 
# This is my Dirctory inside my system – [ C:\Users\Public\Music\GEMMA> ] 

# This 4 PDF specifically we are going to use it.

In [None]:
def vector_embedding(): 

In [None]:
# Now here, 1st vector store DB, that I am going to create in a variable that variable I will just go ahead and write it somewhere like this.

# So, lets go and write it. 
# Next Line :
# If vectors, since I also need to use session states, not in St.Session_States, So I will not 

In [None]:
if "vectors" not in st.session_state: 

In [None]:
# Now here, 1st vector store DB, that I am going to create in a variable that variable I will just go ahead and write it somewhere like this.

# So, lets go and write it. 

- Next : Line 1 -

In [None]:
# If vectors, since I also need to use session states, not in St.Session_States, So I will not 

In [None]:
if "vectors" not in st.session_state: 

In [None]:
# So, I will make code very simple, because I need to save everything in session states with respect to different different variables.

# If the vector is not in the session states, 1st things I am going to write over here is that, 
# go ahead & write the st.session_state.embeddings, and 1st of all I am going to define my # embeddings for that, 
# I am going to use my [ GoogleGenerativeAIEmbeddings ] inside this I am going to use use my model, which is basically called as 
# [ models/embeddings-001 ]. This is the one of the model, which is available in Google. For the embedding purpose.

# Now I am actually going to stay share in the form of the session states with this variable that is called as “Embeddings”. 

In [None]:
st.session_state.embeddings=GoogleGenerativeAIEmbeddings(model = "models/embedding-001") 

- Next : Line 2 -

In [None]:
# we will write [ st.session_state.loader= here I am specifically use the PyPDF Directory and my folder name is where I have kept my all 4 PDF files.
# Name is – [ us_census ].
# This is basically my Data injection phase, once I read it I am storing entire loader in a session states with this particular variable name.
# This is my basically “Data Ingetsion”.

In [None]:
st.session_state.loader=PyPDFDirectoryLoader("./us_census") 

- Next : Line 3 -

In [None]:
# St.session_states.docs= & I am going to create another variable called as docs. Let write it st.session_underscore state.loader.load.
# When we use this loader.load in short, it is going to load all the documents.This is basically loading all the documents.

In [None]:
st.session_state.docs=st.session_state.loader.load()

- Next: Line 4 –

In [None]:
# I am going to use this st.session_states.text_Splitter, because when we write the loader.load, we will specifically get all the docments.
# Inside my particular variable Docs.
# So, just by using this [ st.session_state.text_splitter ] This is my variable name. Here, we will be going to use  [ RecursiveCharacterTextSplitter ]
# where in all these documents will be Splitted into chunks. Here we will going to use a chunk size of [ Chunk_size = 1000 and chunk_overlap = 200 ].
# In short, these [ RecursiveCharacterTextSplitter ] function, which is going to take out the documents and Splits based on this Chunk_Size and
# Chunk_overlap, Chunk_overlap is basically means there is a overlap of characters.

In [None]:
st.session_state.text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)

In [None]:
# Finally, I will go ahead and write the st.session_State.  And  basically I am going to create my final_documents.
# And lets use this text Splitter [ st.session_state.text_splitter.split_documents ] and here we are going to use entire docs
# – [ st.session_state.docs[:20]) ]
# That is the region, we are going have given the [  st.session_state.docs[:20]) ] 
# Region why we are saving this all in this session_States, because we should be able to use it anywhere that we required and finally after doing this

In [None]:
st.session_state.final_documents=st.session_state.text_splitter.split_documents(st.session_state.docs[:20])

- Next : Line 5 – 

In [None]:
# Finally, after doing this, we will finally create our vector_Store a Vector OpenAI Embeddings 
# st.session_state.vectors and I am going to use this “FAISS”. From documents and here I am going to use this – [ st.session_state.final_documents ]
# and the Embeddings techniques it will be the same Embedding techniques that we have initial over here that is nothing,
# but Google Generative_AI Embedding.
# And this embedding – [ st.session_state.embeddings ] Will be responsible for the converting this all final documents into vectors, and
# this “FAISS” will be responsible in storing all those embeddings into this particular vectors.
# This is the Function that is probably doing all these things.
# vector OpenAI embeddings 

In [None]:
st.session_state.vectors=FAISS.from_documents(st.session_state.final_documents,st.session_state.embeddings) 

- Next : Step code 6 -

In [None]:
# After close the function, Here I have crated the field, so lets write.
# prompt1 and lets create the  st.text_input("Enter Your Question From Documents"). 
# Here basically, what I want to asked from the documents.

In [None]:
prompt1 = st.text_input("Enter Your Question From Documents") 

In [None]:
# Here, I am creating the Button, This Button will be responsible for on - Initialize Document Embeddings.
#  This basically means, If I click this button then my entire process of this vector embedding should happened.
# So, here what I am calling my vector embedding,
#  when this entire embedding is created, I will just go-ahead and write, my vector store my vector DB is ready. 
# Because from these vector store, I am going to do is that, quey anything.
# Before that we required this vector_states Variable, this vector DB should be there.
# So, If I will click this below button, it automatically that vector embedding will created.

In [None]:
if st.button("Initialize Document Embeddings"):
    vector_embedding()
    st.success("Vector Store DB is Ready")   

- Next : Step code 8 -

In [None]:
# Here, lets try to create something with respect to the “time”

In [None]:
if prompt1:
    document_chain = create_stuff_documents_chain(llm, prompt)
    retriever = st.session_state.vectors.as_retriever()
    retrieval_chain = create_retrieval_chain(retriever, document_chain)
    start = time.process_time()
    response = retrieval_chain.invoke({'input': prompt1})
    print("Response time :", time.process_time() - start)
    st.write(response['answer'])   

In [None]:
# Because, I am saying that, this is really really very fast, This “Goq_LPU” is really very fast.
# So, lets record the “time” also, with that we will be able to understand the importance of it.
# So, if prompt1 :
# when basically, I am writing any text and press enter, I should take this particular input and create my document chain.For that,
# I am using this [ create_stuff_documents_chain ]
# inside this, there is 2 parameters will be given, one is (llm, prompt). The LLM model, I am using here is nothing but  - [ Gemma-7b-it ].
# And the 2nd parameters, that I am using is basically nothing but – [ Prompt ].Both these things are available.

In [None]:
import time

if prompt1:
    document_chain = create_stuff_documents_chain(llm, prompt)  

- Next : Step 9 -

In [None]:
# Now, I will go-ahead and create a [ retriever ] and this is basically be [ st.session_state.vectors.as_retriever() ].
# what is the main functionality of this particular retriever,
# This vector is a vector database, now to retriever information from this vectors, vector database, with the help of this particular function
# as retriever, it create a interface. So, whatever question basically we will ask through this interface, it will be able to take this 
# particular response and it will be able to give to the end user. That is the region we specifically used the “Retriver”.

In [None]:
retriever = st.session_state.vectors.as_retriever()

- Next : Step 10 -

In [None]:
# After creating this retriever, I really need to run this in the form of chain, where I have my retriever, where I have my documents chain.
# Boths needs to be combined. So basically we are creating a retriever_chain.

In [None]:
retrieval_chain = create_retrieval_chain(retriever, document_chain) 

- Next Step 11 -

In [None]:
# Here, I am going to start my time, I am going to say -  [   time.process_time() ]

In [None]:
start = time.process_time() 

In [None]:
# Then, I will go ahead and write my response, which is nothing but it will be “Retriver_chain”
# And, we will going to call the invoke function. Inside this Invoke function, we can keep our variable as ({'input': prompt1}) input  and
# this will be equal to my prompt1.
# Because I am going to sent, whatever input, I am giving in this particular prompt with respect to my question, it should be able to retrieve, 
# from this entire chain. Then finally I will get my response.

In [None]:
response = retrieval_chain.invoke({'input': prompt1})

- Next : Step 12 -

In [None]:
# After get my entire response, basically I am going to do is display that entire response.In my streamlit APPs.
# Note : 
# When this Gemma Model provide a response, it will also provide some kind of context in return.
# I will try to display that content over here 

In [None]:
print("Response time :", time.process_time() - start)
    st.write(response['answer'])   

# Next : Step 13 :                       [  StreamLit  ]

In [None]:
# This below is my final code for Streamlit, step by step explanation:
# With a streamlit expander

In [None]:
with st.expander("Document Similarity Search"):
        for i, doc in enumerate(response["context"]):
            st.write(doc.page_content)
            st.write("--------------------------------")     

In [None]:
# Here,this line creates an interactive expander in a Streamlit app. An expander is a collapsible container that can be expanded or 
# collapsed by the user. Initially, it shows as a heading labeled "Document Similarity Search," and users can click to expand or collapse it.
# Streamlit Context: The with statement here is used to create a context where all the code inside it will be executed only if the expander is 
# opened by the user. This is similar to how we would work with st.sidebar or st.form.

In [None]:
with st.expander("Document Similarity Search"):   

- Next : Line 14 - 

In [None]:
# This line is a for loop that iterates over the elements of response["context"].
# Enumerate function: The enumerate function adds a counter to the iteration. The variable I will represent the index (starting from 0) of each doc in
# the response["context"] list.
# Response["context"]: This is likely a list of documents or chunks of text that have been retrieved as part of a document similarity search.
# Each doc represents a chunk or document that has been identified as similar, response of ["context] through which I will be able to get 
# my entire context information. [i ], doc basically have the page content, which will get display.

# The region I am using enumerate function over here is that, because there will be 2 values. 

In [None]:
for i, doc in enumerate(response["context"]): 

- Next Line 15 :

In [None]:
# This line writes the content of each document (doc.page_content) to the Streamlit app. The st.write function can display text, markdown or
# even more complex objects.
# doc.page_content: This likely refers to the actual content (text) of the document or chunk that is being displayed.

In [None]:
st.write(doc.page_content)

- Next : Line 16 -

In [None]:
# This line writes a visual separator (a row of dashes) between the different documents being displayed. This helps to visually distinguish between different chunks or
# documents in the app.
# Why a separator? After each document's content is displayed, a line of dashes is added to separate it from the next document,
# making the output easier to read and understand.

In [None]:
st.write("--------------------------------") 

# “ Terminal”

- Next Line 17 :   [   Go to the “ Terminal” section to see the final outcome ]

In [None]:
# Finally, Lets go-ahead and run this entire code to see my final outcome, in my "Streamlit".

In [None]:
streamlit run app.py 

# Project_Deploye_on_The_Streamlit

In [None]:
# If we run this line of code, pop up open and in the browser we can see my final Apps has been Deployed successfully.

- After successfully deploying my project, I searched for 2 titles from a single PDF file. Overall, I have 4 PDF files in my directory."

- I have search these below 2 titles :

1.	[  WHAT IS HEALTH INSURANCE COVERAGE ]  
2.	[ WHAT IS DIFFRENCES IN THE UNINSURED RATE BY STATE IN 2022 ] 