#🏛️Indian Penal Code Chatbot: Your Legal Assistant 🗣️

📜 In this assignment, we will build a chatbot for the Indian Penal Code. We'll start by downloading the official Indian Penal Code document, and then we'll create a chatbot that can interact with it. Users will be able to ask questions about the Indian Penal Code and have a conversation with it. 🤖🗣️ This chatbot will help users understand the legal provisions easily. 🌐 It's a great tool for students, lawyers, and anyone interested in Indian law. 📘 Get ready to explore the IPC like never before! 🚀

#Step-1:Document Loading
https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/IPC/THE_INDIAN_PENAL_CODE.pdf

## Instructions

1. **Download a PDF file using wget:**
   - Import the `wget` module and use it to download the PDF file from the specified URL.

   
```python
# Download the THE_INDIAN_PENAL_CODE.pdf file
!wget https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/IPC/THE_INDIAN_PENAL_CODE.pdf
```

2. **Load the PDF file:**
   - Use the `PyPDFLoader` from `PyPDFLoader` library to load the downloaded PDF file.

3. **Extract text from the PDF:**
   - Loop through the pages of the PDF document and concatenate the text content of each page into a single string.

4. **Print the first 1000 characters of the extracted text:**
   - Use string slicing to print the first 1000 characters of the concatenated text.

5. **Print the number of lines, words, and characters in the extracted text:**
   - Use string methods to split the text by newline characters to count the number of lines.
   - Use string methods to split the text by spaces to count the number of words.
   - Use the `len` function to count the total number of characters in the text.



In [2]:
!pip install wget


Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9655 sha256=68241e11f5f98e439e09315b4213b0fc3bf17e66e52cab66d70e400187047103
  Stored in directory: /root/.cache/pip/wheels/40/b3/0f/a40dbd1c6861731779f62cc4babcb234387e11d697df70ee97
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [6]:
import wget

# Download the Indian Penal Code PDF file
url = "https://raw.githubusercontent.com/venkatareddykonasani/Datasets/master/IPC/THE_INDIAN_PENAL_CODE.pdf"
filename = wget.download(url)
print(f"\nDownloaded file: {filename}")



Downloaded file: THE_INDIAN_PENAL_CODE (1).pdf


In [5]:
!pip install -U langchain-community


Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

In [6]:
!pip install -q langchain pypdf


In [7]:
from langchain_community.document_loaders import PyPDFLoader

# Load the IPC PDF
loader = PyPDFLoader("THE_INDIAN_PENAL_CODE.pdf")
pages = loader.load()

# Extract full text
full_text = "\n".join([page.page_content for page in pages])

# Print first 1000 characters
print("First 1000 characters of the IPC text:\n")
print(full_text[:1000])

# Count lines, words, characters
lines = full_text.split('\n')
words = full_text.split()
characters = len(full_text)

print(f"\n Number of lines: {len(lines)}")
print(f" Number of words: {len(words)}")
print(f" Number of characters: {characters}")


First 1000 characters of the IPC text:

1 
 
THE INDIAN PENAL CODE 
___________ 
ARRANGEMENT OF SECTIONS  
__________ 
CHAPTER I  
INTRODUCTION  
PREAMBLE 
SECTIONS 
1. Title and extent of operation of the Code.  
2. Punishment of offences committed within India.  
3. Punishment of offences committed beyond, but which by law may be tried within, India. 
4. Extension of Code to extra-territorial offences. 
5. Certain laws not to be affected by this Act. 
CHAPTER II 
GENERAL EXPLANATIONS 
6. Definitions in the Code to be understood subject to exceptions.  
7. Sense of expression once explained.  
8. Gender. 
9. Number.  
10. “Man”.  “Woman”.  
11. “Person”. 
12.  “Public”.  
13. [Omitted .]. 
14. “Servant of Government”.  
15. [Repealed. ]. 
16. [Repealed .] . 
17. “Government”.  
18. “India”.  
19. “Judge”.  
20. “Court of Justice”.  
21. “Public  servant”.  
22. “Moveable property”.  
23. “Wrongful gain”. 
“Wrongful loss”. 
Gaining wrongfully/ Losing wrongfully. 
24.  “Dishonestly”.  


# Step-2: Split the data into Chunks

## Instructions


1. **Install and Import Necessary Libraries:**
   - Ensure you have the `langchain` library installed. If not, install it using `pip`.
   - Import `RecursiveCharacterTextSplitter` from the `langchain.text_splitter` module.

2. **Initialize the Text Splitter:**
   - Create an instance of `RecursiveCharacterTextSplitter` with a specified `chunk_size` and `chunk_overlap`.

3. **Split the Text into Chunks:**
   - Use the `split_text` method of the text splitter instance to divide the full text into chunks.

4. **Print the Number of Chunks and the First Chunk:**
   - Print the total number of chunks created.
   - Print the content of the first chunk.


In [None]:
#Write your code here

In [8]:
!pip install -U langchain




In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter


In [27]:
# Configure the splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,      # Max characters per chunk
    chunk_overlap=20     # Overlap between chunks
)


In [28]:
# Split the full IPC text into chunks
chunks = text_splitter.split_text(full_text)


In [29]:
# Print the total number of chunks and the first chunk
print(f"Total chunks created: {len(chunks)}\n")
print("First chunk:\n")
print(chunks[0])


Total chunks created: 3336

First chunk:

1 
 
THE INDIAN PENAL CODE 
___________ 
ARRANGEMENT OF SECTIONS  
__________ 
CHAPTER I  
INTRODUCTION  
PREAMBLE 
SECTIONS 
1. Title and extent of operation of the Code.


#Step-3: Creating embeddings and Storing in Vector Stores

## Instructions
### 1. Install the Required Library
First, you need to install the `ChromaDB` library, which is essential for creating and managing vector stores. Use the following command to install it:



### 2. Import Necessary Modules
Next, import the necessary modules from `langchain`. You will need `OpenAIEmbeddings` for creating embeddings and `Chroma` for managing vector stores.



### 3. Create Embeddings
Instantiate the `OpenAIEmbeddings` class to create embeddings. This class will be used to generate embeddings for your text data.



### 4. Create and Populate the Vector Store
Use the `Chroma` class to create a vector store from your text data. The `from_texts` method is used to convert your text data (`chunks`) into embeddings and store them in a persistent directory (`IPC_db`).


### 5. Persist the Vector Store
Finally, ensure that the vector store is saved by calling the `persist` method. This will save the data to the specified directory, making it available for future use.




In [None]:
#Write your code here

In [13]:
!pip install -U chromadb langchain cohere


Collecting chromadb
  Downloading chromadb-1.0.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting cohere
  Downloading cohere-5.15.0-py3-none-any.whl.metadata (3.4 kB)
Collecting fastapi==0.115.9 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-4.2.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.33.1-py3-none-any.whl.metadata (1.6 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.33.1-py3-none-any.whl.metadata 

In [18]:
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import CohereEmbeddings
import os


In [25]:
os.environ["COHERE_API_KEY"] = ""


In [31]:
# Step 1: Create the Cohere Embedding instance
from langchain.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Step 2: Create the vector store and populate it
vectorstore = Chroma.from_texts(
    texts=chunks,
    embedding=embedding,
    persist_directory="IPC_db"
)


  embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [32]:
vectorstore.persist()
print("Vector store saved successfully to IPC_db")

Vector store saved successfully to IPC_db


  vectorstore.persist()


#Step-4: Conversation and Retrieval Chain

## Instructions
### 1. Set Up the Retriever
First, create a retriever from the `IPC_db` vector store. This retriever will be used to fetch relevant documents based on user queries.

### 2. Define the Query
Create a query string that you want to search in the vector store. This query should be specific to the information you are looking for.

```python
query = """
What is the section related to take part in an unlawful assembly or riot.
"""
```

### 3. Retrieve Relevant Documents
Use the retriever to get documents that are relevant to your query. The `get_relevant_documents` method will return a list of documents matching the query.

```python
result = retriever.get_relevant_documents(query)
```

### 4. Display the Results
Loop through the retrieved documents and print their content. This will display the relevant information from the documents.

### 5. Set Up the Conversational AI
Instantiate the `ChatOpenAI` class to create a language model for conversation. Set the temperature to 0 for deterministic responses.

### 6. Create a Memory Buffer
Create a memory buffer using `ConversationBufferMemory`. This buffer will store the conversation history.

### 7. Set Up the Conversational Retrieval Chain
Use the `ConversationalRetrievalChain` class to create a retrieval-augmented generation (RAG) system. This system will handle the conversation and retrieval tasks.




In [None]:
#Write your code here

In [33]:
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

retriever = Chroma(persist_directory="IPC_db", embedding_function=embedding).as_retriever()


query = "What is the section related to take part in an unlawful assembly or riot."

results = retriever.get_relevant_documents(query)
print("Retrieved Documents:")
for i, doc in enumerate(results):
    print(f"Document {i+1}:\n{doc.page_content}\n{'-'*40}")

  retriever = Chroma(persist_directory="IPC_db", embedding_function=embedding).as_retriever()
  results = retriever.get_relevant_documents(query)


Retrieved Documents:
Document 1:
158. Being hired to take part in an unlawful assembly or riot .—Whoever is engaged, or hired, or
----------------------------------------
Document 2:
unlawful assembly by which such riot was committed was likely to be held, shall not respectively use all
----------------------------------------
Document 3:
146. Rioting .—Whenever force or violence is used by an unlawful assembly, or by any member 
thereof, in prosecution of the common object of such assembly, every member of such assembly is gu ilty
----------------------------------------
Document 4:
such public servant, in endeavouring to disperse an unlawful assembly, or to suppress a riot or affray, or
----------------------------------------


In [34]:
text_gen = pipeline(
    "text-generation",
    model="google/flan-t5-small",  # Lightweight and free model, you can choose others too
    max_length=256,
    do_sample=False
)


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The model 'T5ForConditionalGeneration' is not supported for text-generation. Supported models are ['PeftModelForCausalLM', 'AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BitNetForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DeepseekV3ForCausalLM', 'DiffLlamaForCausalLM', 'ElectraForCausalLM', 'Emu3ForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'Gemma3ForConditionalGeneration', '

In [35]:
llm = HuggingFacePipeline(pipeline=text_gen)


  llm = HuggingFacePipeline(pipeline=text_gen)


In [36]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


In [37]:
from langchain.chains import ConversationalRetrievalChain

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory
)


In [38]:
user_question = "Tell me about the punishment under this section."
response = qa_chain.run(user_question)

print("\nAI Response:\n", response)

  response = qa_chain.run(user_question)
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.



AI Response:
 Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Explanation.—The punishment in this section is in addition to the punishment for which the person to

Whoever, commits an offence punishable under sub -section (1) or sub -section (2) of section 376 and in

shall be punished with imprisonment which may extend to 6[three years], or with fine, or with both. 
7[(2) Statements creating or promoting enmity, hatred  or ill -will between classes .—Whoever

sub-section (1) shall be punished with rigorous imprisonment for a term which may extend to three years, 
or with fine, or with both.

Question: Tell me about the punishment under this section.
Helpful Answer: or promoting enmity, hatred or ill -will between classes.—Whoever sub-section (1) shall be punished with rigorous imprisonment for a term which may extend to three years, or with fine, or with both. 7[(2) Sta

#Step-5 : Conversation

### Instructions

1. **Import the Necessary Libraries**
   - Import any libraries required for conversational AI. In this case, we are assuming a function `conversational_RAG` is available for answering legal questions. If the function is part of an external library, ensure that the library is installed and imported.

2. **Create the User Input Loop**
   - Initialize a variable `user_input` to store the message input by the user.
   - Use a `while` loop to continuously prompt the user for input until they type "quit".

3. **Define the Conversational Function**
   - Define or import the function `conversational_RAG` which will take a dictionary with the key "question" and return a dictionary with the key "answer".

4. **Process User Input**
   - Inside the loop, capture user input using `input()`.
   - Pass the user input to the `conversational_RAG` function.
   - Print the response received from the function.

5. **Exit Condition**
   - The loop should terminate when the user types "quit".

6. **Sample Interactions**

```Python
# What is the section related to take part in an unlawful assembly or riot.
# What is the punishment for that
# What is the Punishment for using a false property mark.
# Which section deals with that?
# Which section deals with Counterfeiting currency-notes or bank-notes?
# tell me more about the section and the punishment for Counterfeiting currency-notes
```

In [41]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFacePipeline  # Or your preferred LLM wrapper

# 1. Define a prompt template that encourages concise answers
prompt_template = """
Use the following pieces of context to answer the question. If you don't know the answer, say "I don't know."

Context:
{context}

Question:
{question}

Answer:
"""

PROMPT = PromptTemplate(input_variables=["context", "question"], template=prompt_template)

# 2. Initialize your LLM pipeline here (example using HuggingFace)
# Replace with your actual LLM or pipeline
llm = HuggingFacePipeline(pipeline=text_gen)

# 3. Setup the ConversationalRetrievalChain with the prompt and memory
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    combine_docs_chain_kwargs={"prompt": PROMPT},
    memory=ConversationBufferMemory(memory_key="chat_history", return_messages=True)
)

def conversational_RAG(inputs: dict) -> dict:
    """
    Wrapper function to process input question and return an answer.
    """
    question = inputs.get("question", "")
    answer = qa_chain.run(question)
    return {"answer": answer}

def main():
    print("Welcome to the Legal Q&A Chatbot. Type 'quit' to exit.")
    while True:
        user_input = input("\nYour question: ").strip()
        if user_input.lower() == "quit":
            print("Exiting chatbot. Goodbye!")
            break

        response = conversational_RAG({"question": user_input})
        print("\nAnswer:\n", response.get("answer", "Sorry, no answer found."))


if __name__ == "__main__":
    main()


Welcome to the Legal Q&A Chatbot. Type 'quit' to exit.

Your question: What is the section related to taking part in an unlawful assembly or riot?


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.



Answer:
 
Use the following pieces of context to answer the question. If you don't know the answer, say "I don't know."

Context:
158. Being hired to take part in an unlawful assembly or riot .—Whoever is engaged, or hired, or

unlawful assembly by which such riot was committed was likely to be held, shall not respectively use all

146. Rioting .—Whenever force or violence is used by an unlawful assembly, or by any member 
thereof, in prosecution of the common object of such assembly, every member of such assembly is gu ilty

such public servant, in endeavouring to disperse an unlawful assembly, or to suppress a riot or affray, or

Question:
What is the section related to taking part in an unlawful assembly or riot?

Answer:


Your question: What is the punishment for participating in an unlawful assembly?


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.



Answer:
 
Use the following pieces of context to answer the question. If you don't know the answer, say "I don't know."

Context:
146. Rioting .—Whenever force or violence is used by an unlawful assembly, or by any member 
thereof, in prosecution of the common object of such assembly, every member of such assembly is gu ilty

158. Being hired to take part in an unlawful assembly or riot .—Whoever is engaged, or hired, or

unlawful assembly by which such riot was committed was likely to be held, shall not respectively use all

such public servant, in endeavouring to disperse an unlawful assembly, or to suppress a riot or affray, or

Question:
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: What is the section related to taking part in an unlawful assembly or riot?
Assistant: 
Use the following pieces of context to answer the question. If you don't know the answer, sa