# Project: Summarization App Using LangChain and OpenAI

This notebook uses **the latest versions** of the OpenAI and LangChain libraries.

In [None]:
!pip install -q -r ./requirements.txt

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.9/312.9 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m867.6/867.6 kB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.4/193.4 kB[0m [31m18.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.5/214.5 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m25.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m33.5

Download [requirements.txt](https://drive.google.com/file/d/1UpURYL9kqjXfe9J8o-_Dq5KJTbQpzMef/view?usp=sharing)

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

### A) Basic Prompt

In [None]:
from langchain_openai import ChatOpenAI
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)


In [None]:
text= r"""
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability \
of AI hardware and extensibility of AI models.
Mojo is a new programming language that bridges the gap between research and production \
by combining the best of Python syntax with systems programming and metaprogramming.
With Mojo, you can write portable code that’s faster than C and seamlessly inter-op with the Python ecosystem.
When we started Modular, we had no intention of building a new programming language. \
But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, \
we realized that programming across the entire stack was too complicated. Plus, we were writing a \
lot of MLIR by hand and not having a good time.
And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" \
is the host CPU. Nowadays, CPUs have lots of tensor-core-like accelerator blocks and other AI acceleration \
units, but they also serve as the “fallback” for operations that specialized accelerators don’t handle, \
such as data loading, pre- and post-processing, and integrations with foreign systems. \
"""

messages = [
    SystemMessage(content='You are an expert copywriter with expertize in summarizing documents'),
    HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT: {text}')
]

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')



In [None]:
llm.get_num_tokens(text)

237

In [None]:
summary_output = llm.invoke(messages)

In [None]:
print(summary_output.content)

Mojo is a new programming language that combines Python's usability with C's performance, enhancing AI hardware programmability and model extensibility. It aims to bridge the gap between research and production by offering faster, portable code that seamlessly integrates with the Python ecosystem. The language was developed by Modular to simplify programming across the ML/AI stack and optimize performance on host CPUs, which play a crucial role in AI operations.


### Summarizing Using Prompt Templates

In [None]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [None]:
template = '''
Write a concise and short summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
'''
prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [None]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

258

In [None]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.invoke({'text': text, 'language':'telugu'})

  warn_deprecated(


In [None]:
print(summary)

{'text': 'మోజో ఒక కొత్త ప్రోగ్రామింగ్ భాష మరియు పైథాన్ యొక్క ఉపయోగకరమైనతను సి యొక్క ప్రదర్శనతో కలిసి, ఏఆయ్ హార్డ్వేర్ యొక్క ప్రోగ్రమ్మబిలిటీ మరియు ఏఆయ్ మోడల్లను అన్లాక్ చేస్తుంది. మోజో ప్రోగ్రామింగ్ భాష ఒక కొత్త భాష మరియు సిస్టమ్స్ ప్రోగ్రామింగ్ మరియు మెటాప్రోగ్రామింగ్ యొక్క మధ్య లో లేని దూరంను మూలకం ఉంచుకుంది. మోజోతో, మీరు సీ కంటే త్వరగా ఉండే పోర్టేబుల్ కోడ్ రాయవచ్చు మరియు పైథాన్ ఎకోసిస్టమ్తో స్వాయంగా ఇంటర్-ఆప్ చేయవచ్చు. మోడ్యులర్ తో మేము కొత్త ప్రోగ్రామింగ్ భాషను నిర్మించడానికి యొక్క యొక్క ఇచ్ఛ లేదు. కానీ, మేము ప్లాట్ఫారంను ఏఆయ్ ఇంఫ్రాస్ట్రక్చర్ను ఏకీకరించడానికి ఉద్దేశించుటకు మేము అన్ని స్టాక్ వరకు ప్రోగ్రామింగ్ చేయడం చాలా కఠినంగా ఉండింది. మరియు అక్కడ మేము అనేక MLIR హస్తంతా రాయడం మరియు ఒక మంచి సమయం లేకపోవడం కాకుండ. మరియు అక్కడ త్వరితంగా ఉండే "ఎక్సెలరేటర్లు" ముఖ్యమైనవి, కానీ, అతనికి అన్ని టెన్సర్-కోర్ లైక్ ఎక్సెలరేటర్ బ్లాక్లు మరియు ఇతర ఏఆయ్ ఎక్సెలరేషన్ యూనిట్లు ఉన్నాయి, కానీ, విదేశీ సిస్టమ్లతో ఇంటిగ్రేషన్లు, డేటా లోడింగ్, ప్రీ- మరియు పోస్ట్-ప్రాసెసింగ్, మరియు ఇతర విశేషిత ఎక్సెలరేటర్ల

### Summarizing using SuffDocumentChain

In [None]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document


In [None]:
with open('./sj.txt', encoding='utf-8') as f:
    text = f.read()

docs = [Document(page_content=text)]
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [None]:
template = '''Write a concise and short summary of the following text.
TEXT: `{text}`
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)
prompt

PromptTemplate(input_variables=['text'], template='Write a concise and short summary of the following text.\nTEXT: `{text}`\n')

In [None]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
output_summary = chain.invoke(docs)

In [None]:
# output_summary is a dict with 2 keys: 'input_documents' and 'output_text'
# displaying the summary
print(output_summary['output_text'])

The speaker shares three stories from his life during a commencement speech. The first story is about connecting the dots and dropping out of college, leading to unexpected opportunities. The second story is about love and loss, including being fired from the company he co-founded. The third story is about facing death and the importance of following one's heart. The speaker encourages the audience to stay hungry and stay foolish as they begin anew.


### Summarizing Large Documents Using map_reduce

In [None]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
with open('./sj.txt', encoding='utf-8') as f:
    text = f.read()

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [None]:
llm.get_num_tokens(text)

2653

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [None]:
len(chunks)

2

In [None]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
output_summary = chain.invoke(chunks)

In [None]:
# output_summary is a dict with 2 keys: 'input_documents' and 'output_text'
# displaying the summary
print(output_summary['output_text'])

The speaker, who did not graduate from college, shares personal stories at a university commencement about dropping out, being fired from his company, and facing a life-threatening illness. He emphasizes the importance of following one's passion, embracing change, and living each day fully. The speaker encourages the audience to trust their intuition, find what they love, and not settle for less. He reflects on death, urging authenticity, intuition, and a hunger for pursuing dreams. The speaker references The Whole Earth Catalog's message of curiosity and open-mindedness.


In [None]:
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [None]:
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

### map_reduce wich Custom Prompts

In [None]:
map_prompt = '''
Write a short and concise summary of the following:
Text: `{text}`
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [None]:
combine_prompt = '''
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
'''
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=['text'])

In [None]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)
output = summary_chain.invoke(chunks)

In [None]:
# output is a dict with 2 keys: 'input_documents' and 'output_text'
# displaying the summary
print(output['output_text'])

Title: Embracing Passion and Authenticity in Life

Introduction:
The text discusses the importance of following one's passion, embracing authenticity, and facing the inevitability of death to make meaningful choices in life.

Key Points:
- The speaker shares three stories from his life during a commencement speech
- Story 1: Dropping out of college, finding passion for calligraphy, influencing Macintosh design
- Story 2: Being fired from Apple, finding success with NeXT and Pixar
- Story 3: Facing cancer diagnosis, emphasizing living each day as if it were the last
- Emphasis on following passion, not settling, and embracing death to make meaningful choices

Conclusion:
The text encourages readers to stay hungry, stay foolish, and live authentically by following their intuition and passions.


### Summarizing Using the refine Chain

In [None]:
from langchain_openai import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
# from langchain.document_loaders import UnstructuredPDFLoader
from langchain.document_loaders import PyPDFLoader

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [None]:
# pip install -q pdf2image

In [None]:
# pip install -q pdfminer

In [None]:
# loader = UnstructuredPDFLoader('files/attention_is_all_you_need.pdf')
loader = PyPDFLoader('./attention_is_all_you_need.pdf')
data = loader.load()

In [None]:
# print(data[0].page_content)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [None]:
len(chunks)

15

In [None]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [None]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
output_summary = chain.invoke(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Exp

In [None]:
print(output_summary)

{'input_documents': [Document(page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe superior in quality while being more parallel

### refine With Custom Prompts

In [None]:
prompt_template = """Write a concise summary of the following extracting the key information:
Text: `{text}`
CONCISE SUMMARY:"""
initial_prompt = PromptTemplate(template=prompt_template, input_variables=['text'])

refine_template = '''
    Your job is to produce a final summary.
    I have provided an existing summary up to a certain point: {existing_answer}.
    Please refine the existing summary with some more context below.
    ------------
    {text}
    ------------
    Start the final summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
    by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.

'''
refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=['existing_answer', 'text']
)


In [None]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False

)
output_summary = chain.invoke(chunks)

In [None]:
# print(output_summary)

In [None]:
# displaying the refined summary
print(output_summary['output_text'])

Introduction:
The Transformer model has revolutionized the field of deep learning, particularly in natural language processing tasks, with its attention mechanisms and sequence-to-sequence modeling capabilities setting a new standard for capturing global dependencies and achieving optimal performance. Recent research has focused on optimizing various aspects of the Transformer architecture to further enhance its performance and scalability. Attention visualizations have provided insights into how the model handles long-distance dependencies, showcasing its adaptability and effectiveness in processing complex language structures.

Key Points:
- Research has shown that variations in the Transformer architecture, such as adjusting attention key size and dropout rates, can significantly impact model quality and overfitting avoidance.
- Larger Transformer models consistently outperform smaller ones, with the use of dropout being crucial for achieving high model performance.
- The Transforme

### Summarizing Using LangChain Agents

In [None]:
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [None]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
wikipedia = WikipediaAPIWrapper()

In [None]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11680 sha256=0c1b54202a0e172974bc82d70d57f360035ee8de244ffe25aefa7e8672dcb5e8
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [None]:
tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a single topic"
    )
]

In [None]:
agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

  warn_deprecated(


In [None]:
output = agent_executor.invoke('Can you please provide a short summary of George Washington?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should look up George Washington on Wikipedia to get a brief summary of his life and accomplishments.
Action: Wikipedia
Action Input: George Washington[0m
Observation: [36;1m[1;3mPage: George Washington
Summary: George Washington (February 22, 1732 – December 14, 1799) was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the U.S. federal government. Washington has thus become commonly known as the "Father of his Country".
Washington's first public office, from 1749 to 1750, was as surveyor of Culpeper County in the Colony o

In [None]:
print(output['input'])

Can you please provide a short summary of George Washington?


In [None]:
print(output['output'])

George Washington was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797. George Washington Carver was an American agricultural scientist and inventor who promoted alternative crops to cotton and methods to prevent soil depletion.
