# Project: Summarization App Using LangChain and OpenAI


In [2]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

### A) Basic Prompt

In [3]:
from langchain_openai import ChatOpenAI
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [4]:
text= r"""
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability \
of AI hardware and extensibility of AI models.
Mojo is a new programming language that bridges the gap between research and production \ 
by combining the best of Python syntax with systems programming and metaprogramming.
With Mojo, you can write portable code that’s faster than C and seamlessly inter-op with the Python ecosystem.
When we started Modular, we had no intention of building a new programming language. \
But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, \
we realized that programming across the entire stack was too complicated. Plus, we were writing a \
lot of MLIR by hand and not having a good time.
And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" \
is the host CPU. Nowadays, CPUs have lots of tensor-core-like accelerator blocks and other AI acceleration \
units, but they also serve as the “fallback” for operations that specialized accelerators don’t handle, \
such as data loading, pre- and post-processing, and integrations with foreign systems. \
"""

messages = [
    SystemMessage(content='You are an expert copywriter with expertize in summarizing documents'),
    HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT: {text}')
]

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [5]:
llm.get_num_tokens(text)

238

In [6]:
summary_output = llm.invoke(messages)

In [7]:
print(summary_output.content)

Mojo is a new programming language that combines Python's usability with C's performance, enabling enhanced programmability of AI hardware and extensibility of AI models. It bridges the gap between research and production by offering faster and portable code that seamlessly integrates with the Python ecosystem. Developed by Modular to simplify ML/AI infrastructure, Mojo addresses the complexity of programming across the stack and leverages the host CPU as a crucial accelerator for various AI operations.


### Summarizing Using Prompt Templates

In [8]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [9]:
template = '''
Write a concise summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
'''
prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [10]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

257

In [13]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.invoke({'text': text, 'language':'yoruba'})

In [14]:
print(summary)

{'text': 'Mojo ni aamiiran ti o ni owo ni Python pẹlu iṣẹgun ti C, ti o fi iranlọwọ lati ṣe iṣẹgun AI hardware ati iṣẹgun AI models. Mojo ni aamiiran titẹlẹ ti o ṣe iṣẹgun ti Python syntax pẹlu iṣẹgun iṣẹgun iṣẹgun ati iṣẹgun iṣẹgun. Pẹlu Mojo, o le ṣe iṣẹgun ti o ni iranlọwọ ti o dara ju C ati o le ṣe iṣẹgun pẹlu iṣẹgun Python ecosystem.', 'language': 'yoruba'}


### Summarizing using SuffDocumentChain

In [27]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [33]:
with open('files/sj.txt', encoding='utf-8') as f:
    text = f.read()

docs = [Document(page_content=text)]
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [34]:
template = '''Write a concise and short summary of the following text.
TEXT: `{text}`
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [35]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
output_summary = chain.run(docs)

  warn_deprecated(


In [36]:
print(output_summary)

The speaker shares three stories from his life during a commencement speech. The first story is about connecting the dots and dropping out of college, leading to unexpected opportunities. The second story is about love and loss, including being fired from the company he co-founded. The third story is about facing death and the importance of following one's heart. The speaker encourages the audience to stay hungry and stay foolish as they begin anew.


### Summarizing Large Documents Using map_reduce

In [37]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [38]:
with open('files/sj.txt', encoding='utf-8') as f:
    text = f.read()

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [39]:
llm.get_num_tokens(text)

2653

In [40]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [41]:
len(chunks)

2

In [44]:
chain = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)
output_summary = chain.run(chunks)

In [45]:
print(output_summary)

Steve Jobs shares three stories from his life in his commencement speech, highlighting the importance of following your passion, not settling, and living each day as if it were your last. He encourages the audience to stay hungry and foolish in pursuing their dreams, reflect on the inevitability of death, and stay curious and open-minded.


### map_reduce wich Custom Prompts

In [46]:
map_prompt = '''
Write a concise summary of the following:
Text: `{text}`
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [47]:
combine_prompt = '''
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
'''
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=['text'])

In [51]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)
output = summary_chain.invoke(chunks)

In [52]:
print(output['output_text'])

Title: Embracing Life's Challenges and Following Your Passion

Introduction:
The text discusses the importance of following one's passion, facing challenges, and embracing the inevitability of death through three stories shared by a speaker during a commencement speech.

Key Points:
- Importance of connecting the dots in life and finding love and purpose in work
- Emphasis on living each day as if it were the last
- Impact of battling cancer on the speaker's perspective on life
- Encouragement to stay hungry and foolish as one embarks on new beginnings

Conclusion:
The text highlights the significance of living authentically, following intuition, and embracing life's challenges with passion and purpose.


### Summarizing Using the refine Chain

In [54]:
from langchain_openai import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

In [58]:
loader = PyPDFLoader('files/attention_is_all_you_need.pdf')
data = loader.load()

In [59]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [60]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [61]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)
output_summary = chain.invoke(chunks)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Exp

In [63]:
print(output_summary['output_text'])

The existing summary of the paper "Attention Is All You Need" provides a comprehensive overview of the Transformer model, its architecture, advantages over existing models, training regime, performance in translation tasks, and generalization to other tasks like English constituency parsing. The paper introduces key concepts such as Scaled Dot-Product Attention, Multi-Head Attention, and Position-wise Feed-Forward Networks to enhance the efficiency and performance of the attention mechanism in the Transformer model. It also discusses the computational complexity, parallelizability, and path lengths of self-attention layers compared to recurrent and convolutional layers, highlighting the advantages of self-attention in learning long-range dependencies in sequence transduction tasks. The Transformer model achieves state-of-the-art results in translation tasks, outperforming previous models with faster training and better BLEU scores. The paper also explores model variations, the impact o

### refine With Custom Prompts

In [64]:
prompt_template = """Write a concise summary of the following extracting the key information:
Text: `{text}`
CONCISE SUMMARY:"""
initial_prompt = PromptTemplate(template=prompt_template, input_variables=['text'])

refine_template = '''
    Your job is to produce a final summary.
    I have provided an existing summary up to a certain point: {existing_answer}.
    Please refine the existing summary with some more context below.
    ------------
    {text}
    ------------
    Start the final summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
    by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
    
'''
refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=['existing_answer', 'text']
)


In [65]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
    
)
output_summary = chain.invoke(chunks)

In [66]:
print(output_summary['output_text'])

Introduction:
The Transformer model has revolutionized sequence modeling and transduction tasks, showcasing exceptional performance in capturing complex relationships within sequences. With its attention mechanism allowing for the capture of long-distance dependencies, the Transformer has proven its potential in natural language processing and machine translation tasks.

Key Points:
- The Transformer model, particularly in a 4-layer configuration with dmodel = 1024, has shown remarkable results in training on the Wall Street Journal portion of the Penn Treebank and in semi-supervised settings with larger corpora.
- Its success in translation tasks surpasses previous models, outperforming traditional RNN sequence-to-sequence models and the Berkeley Parser with minimal task-specific tuning.
- State-of-the-art results have been achieved by the Transformer in English-to-German and English-to-French translation tasks, highlighting its efficiency and effectiveness in various applications.
- 

### Summarizing Using LangChain Agents

In [67]:
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [68]:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
wikipedia = WikipediaAPIWrapper()

In [69]:
tools = [
    Tool(
        name="Wikipedia", 
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a single topic"
    )
]

In [70]:
agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

  warn_deprecated(


In [71]:
output = agent_executor.invoke('Can you please provide a short summary of George Washington?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should look up George Washington on Wikipedia to get a brief summary of his life and accomplishments.
Action: Wikipedia
Action Input: George Washington[0m
Observation: [36;1m[1;3mPage: George Washington
Summary: George Washington (February 22, 1732 – December 14, 1799) was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the U.S. federal government. Washington has thus become commonly known as the "Father of his Country".
Washington's first public office, from 1749 to 1750, was as surveyor of Culpeper County in the Colony o

In [72]:
print(output)

{'input': 'Can you please provide a short summary of George Washington?', 'output': 'George Washington was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797.'}
