### Buidling a Normal RAG System for LAW 

In [126]:
from langchain_community.document_loaders import PyPDFLoader # For loading pdf
from langchain.text_splitter import RecursiveCharacterTextSplitter # For chunking
from langchain.chains.question_answering import load_qa_chain
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings import OllamaEmbeddings
from langchain_groq import ChatGroq # Inferencing
from langchain_core.prompts import ChatPromptTemplate # Prompt Templates
from langchain.chains.combine_documents import create_stuff_documents_chain # Using this Specific Chain
from langchain.chains import create_retrieval_chain # For chain and retrieval
import os # Detecting environment

In [45]:
os.environ['PINECONE_API_KEY'] = os.getenv('PINECONE_API_KEY') # Getting the pinecone api key

Defining Prompts

In [111]:
prompt = ChatPromptTemplate.from_template("""
                                          
                                          Answer the questions based only on the provided context.
                                          Give a precise answer.
                                          <context>
                                          {context} 
                                          </context>
                                          Question:{input}
                                          
                                          """)

Loading the LAW Sections Data (I need to retrieve data from this pdf)

In [96]:
lawdata = PyPDFLoader("R:\Projects\Langchain\lawsections\LawSections.pdf") 
loader = lawdata.load()

In [97]:
loader

[Document(page_content='1 \n THE INDIAN PENAL CODE  \n___________  \nARRANGEMENT OF SECTIONS  \n__________  \nCHAPTER I  \nINTRODUCTION  \nPREAMBLE  \nSECTIONS  \n1. Title and extent of operation of the Code.  \n2. Punishment of offences committed within India.  \n3. Punishment of offences committed beyond, but which by law may be tried within, India.  \n4. Extension of Code to extra -territorial offences.  \n5. Certain laws not to be affected by this Act.  \nCHAPTER II  \nGENERAL  EXPLANATIONS  \n6. Definitions in the Code to be understood subject to exceptions.  \n7. Sense of expression once explained.  \n8. Gender.  \n9. Number.  \n10. “Man”.  “Woman”.  \n11. “Person”.  \n12.  “Public”.  \n13. [Omitted .]. \n14. “Servant of Government”.  \n15. [Repealed. ]. \n16. [Repealed .]. \n17. “Government”.  \n18. “India”.  \n19. “Judge”.  \n20. “Court of Justice”.  \n21. “Public  servant”.  \n22. “Moveable property”.  \n23. “Wrongful gain”.  \n“Wrongful loss”.  \nGainin g wrongfully/ Losing w

In [98]:
type(loader)

list

In [99]:
len(loader)

112

Performing chunking

In [100]:
chunkdata = RecursiveCharacterTextSplitter(chunk_size=4000,chunk_overlap=0)

In [101]:
splitdata = chunkdata.split_documents(loader)

In [52]:
splitdata[:5]

[Document(page_content='1 \n THE INDIAN PENAL CODE  \n___________  \nARRANGEMENT OF SECTIONS  \n__________  \nCHAPTER I  \nINTRODUCTION  \nPREAMBLE  \nSECTIONS  \n1. Title and extent of operation of the Code.  \n2. Punishment of offences committed within India.  \n3. Punishment of offences committed beyond, but which by law may be tried within, India.  \n4. Extension of Code to extra -territorial offences.  \n5. Certain laws not to be affected by this Act.  \nCHAPTER II  \nGENERAL  EXPLANATIONS  \n6. Definitions in the Code to be understood subject to exceptions.  \n7. Sense of expression once explained.  \n8. Gender.  \n9. Number.  \n10. “Man”.  “Woman”.  \n11. “Person”.  \n12.  “Public”.  \n13. [Omitted .]. \n14. “Servant of Government”.  \n15. [Repealed. ]. \n16. [Repealed .]. \n17. “Government”.  \n18. “India”.  \n19. “Judge”.  \n20. “Court of Justice”.  \n21. “Public  servant”.  \n22. “Moveable property”.  \n23. “Wrongful gain”.  \n“Wrongful loss”.  \nGainin g wrongfully/ Losing w

In [53]:
len(splitdata)

191

Indexing for pinecone

In [54]:
index_name = "langchain5"

Vector Store

In [87]:
vector = PineconeVectorStore.from_documents(splitdata[:15],OllamaEmbeddings(model="mxbai-embed-large"), index_name= index_name)

Instancing llm

In [31]:
groqllm = ChatGroq(model="Llama3-8B-8192",temperature=1) # Llama model, temp defined as 1

In [108]:
query = "I want to live peacefully, what I can do"

In [109]:
db = vector.similarity_search(query)

In [110]:
db

[Document(page_content="13 \n CHAPTER XX  \nOF OFFENCES  RELATINGTO  MARRIAGE  \nSECTIONS  \n493. Cohabitation caused by a man deceitfully inducing a belief of lawful marriage.  \n494. Marrying again during life -time of husband or wife.  \n495. Same offence with concealment of former marriage from person with whom subsequent marriage is contracted.  \n496. Marriage ceremony fraudulently gone through without lawful marriage.  \n497. Adultery.  \n498. Enticing or taking away or detaining with criminal intent a married woman.  \n CHAPTER XXA  \nOF CRUELTY BY  HUSBAND OR  RELATIVES OF  HUSBAND  \n498A. Husband or relative of husband of a woman subjecting her to cruelty.  \nCHAPTER XXI  \nOF DEFAMATION  \n499. Defamation.  \nImputation of truth which public good requires to be made or published.  \nPublic conduct of public servants.  \nConduct of any person touching any public question.  \nPublication of reports of proceedings of Courts.  \nMerits of case decided in Court or conduct of wit

This is just a similarity search, it is not that much effective. For making it to more effective we can go towards advanced rag using chain and retrieval concept.

Chain and Retrieval

In [113]:
docs_chain = create_stuff_documents_chain(groqllm,prompt)

In [115]:
retrieval = vector.as_retriever()

In [116]:
retrieval

VectorStoreRetriever(tags=['PineconeVectorStore', 'OllamaEmbeddings'], vectorstore=<langchain_pinecone.vectorstores.PineconeVectorStore object at 0x0000027036E35550>)

In [121]:
retrieval_chain = create_retrieval_chain(retrieval,docs_chain)

In [125]:
response = retrieval_chain.invoke({"input": "I'm a women, while travelling in the bus I got abused by a person"})
response['answer']

'In that case, the relevant sections of the Indian Penal Code that may apply are:\n\n* Section 509: Word, gesture or act intended to insult the modesty of a woman.\n* Section 354: Assault or criminal force to woman with intent to outrage her modesty.\n* Section 354A: Sexual harassment and punishment for sexual harassment.\n* Section 376: Rape.\n\nPlease note that these are just preliminary findings and you should consult a lawyer or the police to determine the exact charges that can be brought against the abuser. Additionally, you may also be eligible for compensation and support through various government schemes and organizations.'

Now it gave the relevant answer. Here I used chain and retriever concept. It works based on the prompt, so whatever you define in prompt, it will act like that. 

Experimenting different Chains 

In [42]:
chain = load_qa_chain(groqllm,chain_type="map_reduce",verbose=True)

In [50]:
chain.run(question=query, input_documents = splitdata)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
1 
 THE INDIAN PENAL CODE  
___________  
ARRANGEMENT OF SECTIONS  
__________  
CHAPTER I  
INTRODUCTION  
PREAMBLE  
SECTIONS  
1. Title and extent of operation of the Code.  
2. Punishment of offences committed within India.  
3. Punishment of offences committed beyond, but which by law may be tried within, India.  
4. Extension of Code to extra -territorial offences.  
5. Certain laws not to be affected by this Act.  
CHAPTER II  
GENERAL  EXPLANATIONS  
6. Definitions in the Code to be understood subject to exceptions.  
7. Sense of expression once explained.  
8. Gender.  
9. Number.  
10. “Man”.  “Woman”.  
11. “Person”.  
12.  “Public”.  
13. [Omitted .]. 
14. “Servant o

'Based on the provided text, there are several sections in the Indian Penal Code (IPC) that deal with crimes related to the abuse or harassment of women. Some of these sections include:\n\n* Section 376C: This section deals with sexual intercourse by a person in authority, which is intended to ensure the safety and security of women, particularly those in vulnerable positions.\n* Section 498A: This section criminalizes the subjecting of a married woman to cruelty by her husband or his relatives.\n* Section 354: This section addresses assault or criminal force to a woman with intent to outrage her modesty.\n* Section 354A: This section specifically criminalizes sexual harassment.\n* Section 509: This section punishes whoever intending to insult the modesty of any woman, utters any words, makes any sound or gesture, or exhibits any object.\n\nThese sections are specific to the IPC and deal with various forms of harassment and abuse towards women, including physical, sexual, and emotional

The usecase of this chain is it will perform Q&A based on the given data. Here I'm not passing the vector embeddings values for retrieval, the main usecase of these chains is it will go through the whole document and it has the ability to give a proper response based on the user query. Just experimented different chains here.

For efficient retrieval go with vector embeddings.