Text Summarization

This notebook walks through how to use LangChain for summarization over a list of documents. It covers three different chain types: stuff, map_reduce, and refine.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
groq_api_key=os.getenv("GROQ_API_KEY")
from langchain_groq import ChatGroq
model = ChatGroq(model="llama-3.1-8b-instant", api_key=groq_api_key)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from langchain_classic.schema import AIMessage, HumanMessage, SystemMessage


In [5]:
speech="""Salesforce CEO Mark Benioff has declared that he will not use ChatGPT, the AI chatbot developed by Microsoft-backed AI company OpenAI. His comments came after he used Google‚Äôs latest AI model, Gemini 3.0, stating that a two-hour experience with Google‚Äôs new model has convinced him to switch permanently. Google CEO Sundar Pichai termed the Gemini 3 as the company‚Äôs ‚Äúmost intelligent model, that combines all of Gemini‚Äôs capabilities together so you can bring any idea to life.

In a post on X, Benioff claimed that Gemini 3.0 has brought a monumental shift in the AI landscape. The CEO specifically praised Gemini 3‚Äôs performance across multiple modalities, noting key improvements in reasoning, speed, image quality, and video processing as the compelling factors behind his decision.

Holy shit. I‚Äôve used ChatGPT every day for 3 years. Just spent 2 hours on Gemini 3. I‚Äôm not going back. The leap is insane ‚Äî reasoning, speed, images, video‚Ä¶ everything is sharper and faster. It feels like the world just changed, again. ‚ù§Ô∏è ü§ñ

Tech leaders acknowledge Gemini 3.0 success
Benioff‚Äôs endorsement highlights the competitive pressure now facing OpenAI as models from rival tech giants like Google make significant advances. Tesla CEO Elon Musk and even OpenAI chief executive Sam Altman praised Pichai following the launch of Gemini 3.
The xAI founder said ‚ÄúCongrats‚Äù on Pichai‚Äôs "Geminiii" announcement on X, and ‚ÄúNice Work‚Äù to Demis Hassabis, the chief of DeepMind. Altman posted Congrats to Google on Gemini 3! Looks like a great model.

Altman even noted potential ‚Äútemporary economic headwinds‚Äù in an internal memo but expressed confidence in OpenAl‚Äôs rapid progress and leadership in the Al race.

We have built enough strength as a company to weather great models shipping elsewhere competition... (so), having most of our research team focused on really getting to superintelligence is critically important, Altman wrote."""

In [6]:
chat_message=[
    SystemMessage(content="you are an expert with expertise in summarizing speeches"),
    HumanMessage(content="Please provide a short and concise summary of following speech:{text}")
]

In [7]:
model.get_num_tokens(speech)

455

In [13]:
model.invoke(speech).content

'It seems that there\'s a lot of competition and advancements in the field of AI, particularly from Google\'s Gemini 3.0 model. Here\'s a summary of the key points:\n\n1. Salesforce CEO Mark Benioff has declared that he will not use ChatGPT, the AI chatbot developed by OpenAI, after using Google\'s Gemini 3.0 model. He praised its performance across multiple modalities, including reasoning, speed, image quality, and video processing.\n\n2. Benioff\'s endorsement highlights the competitive pressure on OpenAI, as rival tech giants like Google are making significant advances in AI. This is evident from the praise received by Google CEO Sundar Pichai from other tech leaders, including Tesla CEO Elon Musk and OpenAI CEO Sam Altman.\n\n3. OpenAI CEO Sam Altman expressed confidence in the company\'s rapid progress and leadership in the AI race. Despite potential "temporary economic headwinds," Altman believes that OpenAI has built enough strength to weather the competition.\n\n4. Altman empha

Prompt Template Text Summarization

In [18]:
from langchain_core.prompts import PromptTemplate

generic_template="""
Write the summary of given context
Speech:{speech}
Aslo translate the summary speech to {language}"""

prompt = PromptTemplate(
    template=generic_template,
    input_variables=['speech', 'language']
)

In [22]:
full_prompt=prompt.format(speech=speech, language='hindi')

In [23]:
model.get_num_tokens(full_prompt)

476

In [29]:
chain = prompt | model

summary = chain.invoke({"speech":speech, "language":"telugu"})
summary


AIMessage(content="**Summary:**\n\n Salesforce CEO Mark Benioff has expressed his preference for Google's latest AI model, Gemini 3.0, over Microsoft-backed OpenAI's ChatGPT. After using Gemini 3.0 for two hours, Benioff claimed that it has brought a significant shift in the AI landscape, with improvements in reasoning, speed, image quality, and video processing. Tech leaders like Elon Musk and Sam Altman have acknowledged the success of Gemini 3.0, but Altman has expressed confidence in OpenAI's progress and leadership in the AI race.\n\n**‡∞∏‡∞Ç‡∞ï‡±ç‡∞∑‡∞ø‡∞™‡±ç‡∞§‡∞Ç:**\n\n‡∞∏‡∞æ‡∞≤‡±ç‡∞∏‡±ç\u200c‡∞´‡±ã‡∞∞‡±ç‡∞∏‡±ç ‡∞∏‡±Ä‡∞à‡∞ì ‡∞Æ‡∞æ‡∞∞‡±ç‡∞ï‡±ç ‡∞¨‡±Ü‡∞®‡∞ø‡∞Ø‡±ã‡∞´‡±ç ‡∞Æ‡±à‡∞ï‡±ç‡∞∞‡±ã‡∞∏‡∞æ‡∞´‡±ç‡∞ü‡±ç ‡∞∏‡∞π‡∞ï‡∞æ‡∞∞‡∞Ç‡∞§‡±ã ‡∞â‡∞®‡±ç‡∞® ‡∞ì‡∞™‡±Ü‡∞®‡±ç\u200c‡∞é‡∞Ø‡∞ø ‡∞§‡∞Ø‡∞æ‡∞∞‡±Å‡∞ö‡±á‡∞∏‡∞ø‡∞® ‡∞ö‡∞æ‡∞ü‡±ç\u200c‡∞ú‡±Ä‡∞™‡±Ä‡∞ü‡∞ø ‡∞®‡±Å‡∞Ç‡∞°‡∞ø ‡∞ó‡±Ç‡∞ó‡±Å‡∞≤‡±ç ‡∞Ø‡±ä‡∞ï‡±ç‡∞ï ‡∞Ö‡∞§‡±ç‡∞Ø‡∞æ‡∞ß‡±Å‡∞®‡∞ø‡∞ï ‡∞é‡∞Ø‡∞ø ‡∞Æ‡∞æ‡∞°‡∞≤‡±ç ‡∞ú‡±Ä‡∞Æ‡∞ø‡∞®‡±Ä 3.0‡∞®‡∞ø ‡∞™‡±

Text Summarization Stuff Chain

In [31]:
from langchain_community.document_loaders import PyPDFLoader
loader= PyPDFLoader('D:\\Langchain\\LangChain_Book.pdf')
pages=loader.load_and_split()
pages

[Document(metadata={'producer': 'Antenna House PDF Output Library 7.1.1639', 'creator': 'AH CSS Formatter V7.1 MR2 for Linux64 : 7.1.3.50324 (2021-04-26T09:47+09)', 'creationdate': '2025-04-18T11:54:13+00:00', 'author': 'Mayo Oshin and Nuno Campos', 'moddate': '2025-05-20T12:58:10-04:00', 'title': 'Learning LangChain', 'source': 'D:\\Langchain\\LangChain_Book.pdf', 'total_pages': 297, 'page': 0, 'page_label': 'Cover'}, page_content='Mayo Oshin &  \nNuno Campos\n Learning \nLangChain\nBuilding AI and LLM Applications  \nwith LangChain and LangGraph'),
 Document(metadata={'producer': 'Antenna House PDF Output Library 7.1.1639', 'creator': 'AH CSS Formatter V7.1 MR2 for Linux64 : 7.1.3.50324 (2021-04-26T09:47+09)', 'creationdate': '2025-04-18T11:54:13+00:00', 'author': 'Mayo Oshin and Nuno Campos', 'moddate': '2025-05-20T12:58:10-04:00', 'title': 'Learning LangChain', 'source': 'D:\\Langchain\\LangChain_Book.pdf', 'total_pages': 297, 'page': 1, 'page_label': 'BackCover'}, page_content='9 

In [32]:
len(pages)

293

In [33]:
template='''
Write a concise and short summary of the following speech:\n
{text}'''

prompt = PromptTemplate(
    template=template,
    input_variables=['text']
)

In [35]:
from langchain_classic.chains.summarize import load_summarize_chain
summarize_chain = load_summarize_chain(
    model, chain_type="stuff", prompt=prompt, verbose=True)


In [36]:
summarize_chain

StuffDocumentsChain(verbose=True, llm_chain=LLMChain(verbose=True, prompt=PromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, template='\nWrite a concise and short summary of the following speech:\n\n{text}'), llm=ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000143BC944D60>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000143BC944C70>, model_name='llama-3.1-8b-instant', model_kwargs={}, groq_api_key=SecretStr('**********')), output_parser=StrOutputParser(), llm_kwargs={}), document_prompt=PromptTemplate(input_variables=['page_content'], input_types={}, partial_variables={}, template='{page_content}'), document_variable_name='text')

In [38]:
summarized_output=summarize_chain.run(pages[0:10])
summarized_output



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise and short summary of the following speech:

Mayo Oshin &  
Nuno Campos
 Learning 
LangChain
Building AI and LLM Applications  
with LangChain and LangGraph

9 781098 167288
57999
ISBN:   978-1-098-16728-8
US $79.99    CAN $99.99
DATA
If you‚Äôre looking to build production-ready AI applications that can reason and retrieve external data for 
context-awareness, you‚Äôll need to master LangChain‚Äîa popular development framework and platform 
for building, running, and managing agentic applications. LangChain is used by several leading companies, 
including Zapier, Replit, Databricks, and many more. This guide is an indispensable resource for developers 
who understand Python or JavaScript but are beginners eager to harness the power of AI.
Authors Mayo Oshin and Nuno Campos demystify the use of LangChain through practical insights and in

'Here is a concise and short summary of the speech:\n\n"Learning LangChain" is a book by Mayo Oshin and Nuno Campos that provides a comprehensive guide to building production-ready AI applications using LangChain, a popular development framework for building, running, and managing agentic applications. The book covers topics such as:\n\n* Retrieval-Augmented Generation (RAG) to enhance the accuracy of Large Language Models (LLMs)\n* Developing and deploying AI applications that interact intelligently and contextually with users\n* Using the powerful agent architecture with LangGraph\n* Integrate and manage third-party APIs and tools to extend the functionality of AI applications\n* Monitoring, testing, and evaluating AI applications to improve performance\n\nThe book is designed for developers who understand Python or JavaScript but are beginners in AI and LLM development. It provides clear explanations, actionable techniques, and hands-on code to help readers build production-ready AI

Map-Reduce Chain Type

In [41]:
# this chain type is used for large documents
#considering only 10 documents here due to model toekn limit constraint

pages[0:10]

[Document(metadata={'producer': 'Antenna House PDF Output Library 7.1.1639', 'creator': 'AH CSS Formatter V7.1 MR2 for Linux64 : 7.1.3.50324 (2021-04-26T09:47+09)', 'creationdate': '2025-04-18T11:54:13+00:00', 'author': 'Mayo Oshin and Nuno Campos', 'moddate': '2025-05-20T12:58:10-04:00', 'title': 'Learning LangChain', 'source': 'D:\\Langchain\\LangChain_Book.pdf', 'total_pages': 297, 'page': 0, 'page_label': 'Cover'}, page_content='Mayo Oshin &  \nNuno Campos\n Learning \nLangChain\nBuilding AI and LLM Applications  \nwith LangChain and LangGraph'),
 Document(metadata={'producer': 'Antenna House PDF Output Library 7.1.1639', 'creator': 'AH CSS Formatter V7.1 MR2 for Linux64 : 7.1.3.50324 (2021-04-26T09:47+09)', 'creationdate': '2025-04-18T11:54:13+00:00', 'author': 'Mayo Oshin and Nuno Campos', 'moddate': '2025-05-20T12:58:10-04:00', 'title': 'Learning LangChain', 'source': 'D:\\Langchain\\LangChain_Book.pdf', 'total_pages': 297, 'page': 1, 'page_label': 'BackCover'}, page_content='9 

In [45]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter= RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)

chunks = splitter.split_documents(pages[0:10])
len(chunks)

15

In [None]:
chunks_prompt='''
write a concise summary of the given text:\n
{text}'''

map_prompt= PromptTemplate(
    template=chunks_prompt,
    input_variables=['text',]
)

In [52]:
final_prompt='''
provide the final summary of whole text after considering all main points.
include a suitable title for the summary and mention highlights from summary in bullet points.

{text}'''

map_reduce_prompt=PromptTemplate(
    template=final_prompt,
    input_variables=['text',]
)

In [55]:
map_summary_chain=load_summarize_chain(
    llm=model,
    chain_type="map_reduce",
    map_prompt=map_prompt,
    combine_prompt=map_reduce_prompt,
    verbose=True
)

In [56]:
map_summary_chain.run(chunks)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
write a concise summary of the given text:

Mayo Oshin &  
Nuno Campos
 Learning 
LangChain
Building AI and LLM Applications  
with LangChain and LangGraph[0m
Prompt after formatting:
[32;1m[1;3m
write a concise summary of the given text:

9 781098 167288
57999
ISBN:   978-1-098-16728-8
US $79.99    CAN $99.99
DATA
If you‚Äôre looking to build production-ready AI applications that can reason and retrieve external data for 
context-awareness, you‚Äôll need to master LangChain‚Äîa popular development framework and platform 
for building, running, and managing agentic applications. LangChain is used by several leading companies, 
including Zapier, Replit, Databricks, and many more. This guide is an indispensable resource for developers 
who understand Python or JavaScript but are beginners eager to harness the power of AI.
Authors Mayo Oshin and Nu

"It seems you are trying to summarize various text snippets, but there is no original text provided. However, I can provide you with a text snippet for you to summarize.\n\nHere is a text snippet about the benefits of using LangChain for building AI applications:\n\nLangChain is an open-source library that simplifies the development of AI applications by providing a set of pre-built components and tools. With LangChain, developers can focus on building the core logic of their applications without worrying about implementing low-level details such as data retrieval and processing. This allows for faster development and deployment of AI applications, making it an attractive choice for companies looking to integrate AI into their products and services.\n\nLangChain's components and tools can be used to build a wide range of AI applications, including chatbots, virtual assistants, and content generation systems. The library provides a flexible and modular architecture, making it easy to in