## Document Chains Demo

Document Chains allow you to process and analyze large amounts of text data efficiently. They provide a structured approach to working with documents, enabling you to retrieve, filter, refine, and rank them based on specific criteria.<br><br>
By using different types of Document Chains like Stuff, Refine, Map Reduce, or Map Re-rank, you can perform specific operations on the retrieved documents and obtain more accurate and relevant results.

In [None]:
import os
import getpass
import textwrap

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
# we will cover docstores and splitters in more details when we get to retrieval
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

from dotenv import load_dotenv

In [14]:
!pip install pypdf
load_dotenv()


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


True

In [8]:
model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.5)

### Stuff Chain
This involves putting all relevant data into the Prompt for LangChain’s StuffDocumentsChain to process.
The advantage of this method is that it only requires one call to the LLM, and the model has access to all the information at once.

In [10]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("loaders-samples/Software-Engineer-CV.pdf")
docs = loader.load()

In [11]:
cnt = 0
for doc in docs:
    cnt = cnt+1
    print("---- Document #", cnt)
    print(doc.page_content.strip())

---- Document # 1
Name: Sunil Sharma                             Mobile: +91 9898989898 
 
Designation: Senior Technical Lead                     Mail Id: sunil.sharma@gmail.com 
 
Objective:  
Experienced Senior Software Developer with 12 years of hands-on expertise in 
designing, developing, and delivering high-quality software solutions.  
Proven track record of successfully leading and collaborating with cross-functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects. 
Education: 
Bachelor in Engineering in Electronics and Communication 
K.L.N. College of Information Technology, Madurai - 2007 
Professional Summary: 
• 12 years of experience in Software Development in C on Linux Environment. 
• Over 5 years of programming experience as an Oracle PL/SQL developer in 
Analysis, Design and Implementation of business application using Oracle DBMS. 
• Expertise in all 

In [12]:
prompt_template ="""
You are given a Resume as the below text. 
-----
{text}
-----
Question: Please respond with the Key Skills and Experience summary of the person. 
Key Skills:
Esxperience Summary: 
"""

In [13]:
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

stuff_chain = load_summarize_chain(model,
                             chain_type="stuff",
                             prompt=prompt)

output_summary = stuff_chain.run(docs)

  output_summary = stuff_chain.run(docs)


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [8]:
print(output_summary)


Key Skills: 
1. Software Development 
2. Oracle PL/SQL 
3. Linux Environment 
4. Database Management 
5. Programming Languages: C, Pro C, Shell scripting 
6. Version Control: GIT, TFS, CVS 
7. Tools: PL/SQL developer, JIRA, Confluence, Visual studio, GDB, Mercurial, Spirent Test Centre (STC), Wireshark 
8. Leadership and Team Collaboration 

Experience Summary: 
1. 12 years of experience in Software Development 
2. 5 years of experience as an Oracle PL/SQL developer 
3. Expertise in all stages of Software Development Life Cycle 
4. Experience with Table functions, indexes, Table partitioning, Collections, Analytical functions, and materialized views 
5. Proficient in creating tables, views, constraints, and indexes 
6. Strong knowledge of Oracle performance-related features 
7. Experience with Oracle-supplied packages, Dynamic SQL, records, and tables 
8. Familiarity with SQL Loader for loading data into database tables 
9. Experience in leading and collaborating with cross-functional

## Refine Chain
The Refine Documents Chain uses an iterative process to generate a response by analyzing each input document and updating its answer accordingly.<br>

It passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to obtain a new answer for each document.<br>

This chain is ideal for tasks that involve analyzing more documents than can fit in the model’s context, as it only passes a single document to the LLM at a time.

In [9]:
refine_chain = load_summarize_chain(model, chain_type="refine")
print(refine_chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary.
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary (only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary.
If the context isn't useful, return the original summary.


In [10]:
output_summary = refine_chain.run(docs)
output_summary

"\n\nSunil Sharma is a highly experienced Senior Technical Lead with a Bachelor's degree in Engineering and a proven track record in software development. With 12 years of experience, he has worked in various roles, including Technical Lead at Nokia Networks and Senior Engineer at Plintron Global Technology Solutions Pvt. Ltd. Sunil is skilled in Oracle PL/SQL development and has a strong understanding of programming languages, database management, and operating systems. He is also proficient in various tools and has a knack for leadership and team collaboration. Currently, he is working as a Senior Technical Lead at HCL Technologies, leading the offshore development activities for a healthcare project based in the USA. Sunil has expertise in solving complex SQL problems related to reporting, creating indexes and partitioning tables for SQL tuning, and writing procedures, functions, views, and materialized views. His previous projects at Nokia Networks include enhancing their ONT GPON 

## Map-Reduce Chain
To process large amounts of data efficiently, the MapReduceDocumentsChain method is used.<br>
This involves applying an LLM chain to each document individually (in the Map step), producing a new document. Then, all the new documents are passed to a separate combine documents chain to get a single output (in the Reduce step). If necessary, the mapped documents can be compressed before passing them to the combine documents chain.<br>
This compression step is performed recursively.

In [11]:
map_reduce_chain = load_summarize_chain(model,
                                        chain_type="map_reduce",
                                        verbose=True)

In [12]:
print(map_reduce_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [13]:
# just using the first 20 chunks as I don't want to run too long
output_summary = map_reduce_chain.run(docs)

print(output_summary)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"        
                                                 
Name: Sunil Sharma                              Mobile: +91 9898989898  
 
Designation: Senior Technical Lead                      Mail Id: sunil.sharma @gmail.com  
 
Objective:   
Experienced S enior Software Developer with 1 2 years of hands -on expertise in 
designing, developing, and delivering high -quality software solutions.  
Proven track record of successfully leading and collaborating with cross -functional 
teams to deliver projects on time and within budget. Seeking to leverage my technical 
skills and leadership experience to contribute to innovative software projects.  
Education:  
Bachelor in Engineering in Electronics and Communication  
K.L.N.  College of Information Technology, Madurai - 2007  
Professional Summary:  
• 12 year