<a href="https://colab.research.google.com/github/Tiru28/ModernAIPro/blob/main/Q%26A_over_documents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modern AI Prac: Enterprise chat over documents

## 1. Basics

In [None]:
! pip install -q -U langchain-groq langchain langchain-community langchain-text-splitters pypdf gradio chromadb langdetect indic-transliteration

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/981.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m481.3/981.5 kB[0m [31m14.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.2/313.2 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━

In [None]:
# We will use a simple utility to make the text wrap properly when printing.
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
from google.colab import userdata
import os
os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")
from langchain_groq import ChatGroq
llm_groq = ChatGroq(model_name="llama3-70b-8192")

## 2. Load and Manage Data

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter


def load_and_chunk_pdf(pdf_url):
    loader = PyPDFLoader(pdf_url)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
    chunks = text_splitter.split_documents(documents)
    print(len(chunks))
    return chunks

In [None]:
import chromadb
#chroma_client = chromadb.Client()
chroma_client = chromadb.PersistentClient(path="./philosophy")
collection = chroma_client.create_collection(name="philosophy7")

def add_chunks_to_vector_db(chunks):
  id = 0
  for chunk in chunks:
      id += 1
      if id % 10 != 0:
        continue # For now, choose only 1 in 10 documents for sampling.
      if id % 500 == 0:
        print(f"Added {id} embeddings")
      collection.add(
          documents=[chunk.page_content],
          metadatas=[{"source": chunk.metadata["source"],"page_no": chunk.metadata["page"]} ],
          ids=[chunk.metadata["source"]+":"+str(id)],
      )


In [None]:
chunks = load_and_chunk_pdf('https://web.archive.org/web/20201224194654id_/http://www.bhagavatgita.ru/files/Bhagavad-gita_As_It_Is.pdf')
add_chunks_to_vector_db(chunks)

1300


/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:01<00:00, 67.3MiB/s]


Added 500 embeddings
Added 1000 embeddings


In [None]:
chunks = load_and_chunk_pdf('https://files.alislam.cloud/pdf/Holy-Quran-Arabic.pdf')
add_chunks_to_vector_db(chunks)

0


In [None]:
chunks = load_and_chunk_pdf('https://www.churchofjesuschrist.org/bc/content/shared/content/english/pdf/language-materials/83291_eng.pdf')
add_chunks_to_vector_db(chunks)

812
Added 500 embeddings


In [None]:
chunks = load_and_chunk_pdf('https://cdn.centerforinquiry.org/wp-content/uploads/sites/29/1996/03/22165045/p28.pdf')
add_chunks_to_vector_db(chunks)

28


In [None]:
query_text = "Duty in life"
results = collection.query(query_texts=[query_text],n_results=5)
for result in results["documents"][0]:
    print(result)

Copyright © 1998 The Bhaktivedanta Book Trust Int'l. All Rights Reserved.
thinking that the resultant ac tions will make them happy. They do not know
that no kind of material body anywhere within the universe can give life
without miseries. The miseries of life, namely birth, death, old age and diseases,
are present everywhere within the material world. But o ne who understands
his real constitutional position as the eternal servitor of the Lord, and thus
knows the position of the Personality of Godhead, engages himself in the
transcendental loving service of the Lord. Consequently he becomes qualified
to enter into the Vaikuëöha planets, where there is neither material, miserable
life nor the influence of time and death. To know one’s constitutional position
means to know also the sublime positi on of the Lord. One who wrongly thinks
that the living entity’s position and the Lord’s position are on the same level is
to be understood to be in darkness and therefore unable to engage hims

In [None]:
query_text = "Meaning of life"
results = collection.query(query_texts=[query_text],n_results=5,where={"source":"https://www.churchofjesuschrist.org/bc/content/shared/content/english/pdf/language-materials/83291_eng.pdf"})
for result in results["documents"][0]:
    print(result)

1489 PHILIPPIANS 1:21–2:10
expec tation and my a hope, that in 
nothing I shall be ashamed, but that 
with all boldness, as always, so now 
also Christ shall be b magnified in 
my body, whether it be by life, or 
by death.
21
 For to me to live is Christ, and 
to die is gain.
22 But if I live in the flesh, this is 
the fruit of my labour: yet what I 
shall choose I 
a wot not.
23 For I am a in a strait betwixt two, 
having a desire to depart, and to 
be with Christ; which is far better:
24
 Nevertheless to a abide in the 
flesh is more needful for you.
25 And having this confidence, I 
know that I shall abide and continue 
with you all for your furtherance 
and joy of faith;
26
 That your rejoicing may be more 
abundant in Jesus Christ for me by 
my coming to you again.
27
 Only let your conversation be 
as it becometh the gospel of Christ: 
that whether I come and see you, or 
else be absent, I may hear of your 
affairs, that ye 
a stand fast in one 
spirit, with b one c mind d strivi

In [None]:
combined_text = str(results['documents'] + results['metadatas'])
combined_text



In [None]:
prompt = """You are a helpful philosophy assistant. You will be given documents and will then use that to analyse. If there is Sanskrit
or other languages transliterated, clean it up to provide the original language text (such as Sanskrit if they are present )
as well as cleaned up English form. If foreign languages are not present in transliteration, it is fine to continue in English.
Provide a nice analysis in a combination of languages as in the original text with proper markdown. At the end provide the citations
from the source documents in a simplified, concise format"""

def rag(query_text):
    results = collection.query(query_texts=[query_text],n_results=5)
    combined_sentence = " ".join(result for result in results["documents"][0])
    query = prompt + ". The context is: " + \
            combined_sentence + "The question is :" + \
            query_text

    return llm_groq.invoke(query).content

In [None]:
print(rag("Pillars of life"))

**Analysis**

This document appears to be a collection of passages from various Christian and Hindu scriptures, including the Bible and the Bhagavad Gita. The passages are accompanied by Sanskrit transliterations and English translations.

The Christian passages are from the New Testament, specifically from the First Epistle of Peter and other books. They discuss topics such as salvation, faith, and the nature of God.

The Hindu passages are from the Bhagavad Gita, a Hindu scripture that is part of the Indian epic, the Mahabharata. They discuss topics such as the nature of the self, the importance of sacrifice, and the path to spiritual liberation.

**Sanskrit Passages with English Translations**

1. ** TEXT 31 **
नाया लोकोऽस्त्यजस्य कुतोऽन्यः कुरुसत्तम (nāyā loko'sty ajasya kuto 'nyaḥ kuru-sattama)
Translation: O best of the Kuru dynasty, without sacrifice one can never live happily on this planet.

2. ** TEXT 20 **
परस्तasmात्तु भावोऽन्योऽव्यक्तोऽव्यक्तात्सनातनः (paras tasmāt tu bhāv

## 4. Chat

In [None]:
import gradio as gr
def chat(message, history):
    return rag(message)

demo = gr.ChatInterface(
    fn=chat,
    title="Document chatbot",
    description="This is a chatbot built as part of Modern AI Pro Essentials program",
)
demo.launch(debug=True)

  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://0596dda52417f22fa3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
