# Indian Legal Text Summarization for Long Document

### Install Required Libraries
##### 1.) OpenAI: Installed to access OpenAI's models for legal document summarization
##### 2.) LangChain: Essential for implementing document mapping, reduction, and combining workflows efficiently.
##### 3.) Tiktoken: Helps manage token counts within text data, ensuring efficient usage of language models and avoiding token limit issues.
##### 4.) pypdf, PyMuPdf: Helps in accessing legal documents and summaries
##### 5.) rouge-score: Helps in analyzing the summary 

### Initializing OpenAI LLM
##### Import the OpenAI module from LangChain and Initialize it with the provided API key to utilize language models for document summarization.

In [1]:
import openai
API_KEY="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"

In [2]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0,
                 max_tokens=1000,
                 openai_api_key=API_KEY,
                 model='ft:gpt-3.5-turbo-0613:personal::8f3ZRSqM' # loading our fine tuned model
                )

  warn_deprecated(


### Splitting text by Character
##### The Text Splitter overcomes the token limit by breaking down the text into smaller chunks that are each within the token limit.

In [3]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=50, separator='\n\n'   #divides the text into chunks of size 1000 and overlaps between chunks to ensure that no info is lost.
)

### Loading legal document


In [4]:
from langchain.document_loaders import PyPDFLoader
def chunks(pdf_file_path):
    loader = PyPDFLoader(pdf_file_path)     # Loading and processing the contents of the PDF
    docs = loader.load()
    return docs

### Map Reduce Prompt Template
##### Importing all the required for the implementation of LangChain MapReduce.

In [5]:
from langchain.chains.mapreduce import MapReduceChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ReduceDocumentsChain, MapReduceDocumentsChain
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain

### Template Definition
##### Two templates map_template and reduce_template are the structured prompts for instructing a language model on how to process and summarise sets of documents.

In [45]:
map_template = """The following is a set of documents
{docs}
Based on this list of docs, summarised into meaningful
Helpful Answer"""

map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)

In [89]:
reduce_template = """The following is set of summaries:
{doc_summaries}
Take these and distil it into final consolidated summary with title(mandatory) in bold with important key points into 1 paragraph.
The length can vary to encapsulate the essence of the provided information.
Helpful Answer:"""

reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

### Map and Reduce LLM Chains

In [90]:
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="doc_summaries"    #  Combines the docs into one string(chain of docs)
)


In [91]:
reduce_documents_chain = ReduceDocumentsChain(              #   performs in-depth reduction and summarization for the chain of docs
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=combine_documents_chain,     
    token_max=3090,
)

In [92]:
map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,      #  Consolidate the summaries into a final comprehensive document summary
    document_variable_name="docs",
    return_intermediate_steps=False,
)

### Summarization Function

In [93]:
def summarize_pdf(file_path):
    split_docs = text_splitter.split_documents(chunks(file_path))
    return map_reduce_chain.run(split_docs)

In [94]:

#file_path="C:/Users/prash/Downloads/Untitled document-3.pdf" 
with open("C:/Users/prash/Downloads/temp_file_path.txt", "r") as temp_file:
    file_path = temp_file.read()


In [95]:
result_summary=summarize_pdf(file_path)

In [96]:
print(result_summary)

No Relinquishment of Tenancy in the Present Case
The appellant, W.H. King, was the tenant of a flat and allegedly demanded a sum of money from the complainant for vacating the flat. The complainant reported the matter to the police, and a trap was laid for the appellant. The police recovered a sum of money and various documents from the appellant and the complainant. The list of documents includes a typed draft of a partnership agreement, an application form for permission to occupy the building as caretaker, a letter handing vacant possession, a receipt for payment for articles of furniture, and a letter to the Bombay Gas Company for transfer of the gas connection. The appellant was charged under section 18(1) of the Bombay Rents, Hotel and Lodging House Rates Control Act, LVII of 1947, for receiving a pugree of Rs. 29,500 and he was further charged under section 19(2) of the said Act for receiving the said sum as a condition for the relinquishment of his tenancy. The defence of the a