# Project: Summarization

In [1]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

## A) Basic Prompt

In [2]:
from langchain_openai import ChatOpenAI
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [3]:
text= r"""
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability \
of AI hardware and extensibility of AI models.
Mojo is a new programming language that bridges the gap between research and production \ 
by combining the best of Python syntax with systems programming and metaprogramming.
With Mojo, you can write portable code that’s faster than C and seamlessly inter-op with the Python ecosystem.
When we started Modular, we had no intention of building a new programming language. \
But as we were building our platform with the intent to unify the world’s ML/AI infrastructure, \
we realized that programming across the entire stack was too complicated. Plus, we were writing a \
lot of MLIR by hand and not having a good time.
And although accelerators are important, one of the most prevalent and sometimes overlooked "accelerators" \
is the host CPU. Nowadays, CPUs have lots of tensor-core-like accelerator blocks and other AI acceleration \
units, but they also serve as the “fallback” for operations that specialized accelerators don’t handle, \
such as data loading, pre- and post-processing, and integrations with foreign systems. \
"""

messages = [
    SystemMessage(content='You are an expert copywriter with expertize in summarizing documents'),
    HumanMessage(content=f'Please provide a short and concise summary of the following text:\n TEXT: {text} ')
]

# temperature 0 is to prevent hallucinations in the summary
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [4]:
llm.get_num_tokens(text=text)

238

In [7]:
summary_output = llm(messages=messages)

In [9]:
print(summary_output.content)

Mojo is a new programming language that combines Python's usability with C's performance, enhancing AI hardware programmability and model extensibility. It bridges the gap between research and production by offering fast, portable code that integrates seamlessly with the Python ecosystem. Developed by Modular to simplify ML/AI infrastructure, Mojo recognizes the importance of host CPUs in AI acceleration and aims to streamline programming across the entire stack.


## Summarizing Using Prompt Templates

In [10]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [12]:
template = '''
Write a concise and short summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.
'''
prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [13]:
llm.get_num_tokens(prompt.format(text=text, language='English'))

259

In [14]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.invoke({'text': text, 'language': 'spanish'})

In [16]:
print(summary)

{'text': 'Mojo es un nuevo lenguaje de programación que combina la usabilidad de Python con el rendimiento de C, desbloqueando una programabilidad sin igual de hardware de IA y la extensibilidad de modelos de IA. Con Mojo, se puede escribir código portátil más rápido que C y interoperar sin problemas con el ecosistema de Python. Este lenguaje surge de la necesidad de simplificar la programación en el campo de la inteligencia artificial y acelerar el desarrollo de infraestructuras de IA.', 'language': 'spanish'}


Basic prompt or prompt templates are good for the text to be summarized. Summary should not exceed the token limit of the model.

## Summarizing using StuffDocumentChain
**Stuff all the text to be summarized into the prompt as context. This only works with small pieces of data because the query result will be larger than the context length of the document.**

In [11]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [19]:
with open('files/sj.txt', encoding='utf-8') as f:
    text = f.read()
    
docs = [Document(page_content=text)]
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [20]:
template = '''Write a concise and short summary of the following text.
TEXT: `{text}`
'''
prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [22]:
chain = load_summarize_chain(
    llm=llm, 
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)
output_summary = chain.invoke(docs)

In [23]:
print(output_summary)

{'input_documents': [Document(page_content='I am honored to be with you today at your commencement from one of the finest universities in the world. I never graduated from college. Truth be told, this is the closest I’ve ever gotten to a college graduation. Today I want to tell you three stories from my life. That’s it. No big deal. Just three stories.\n\nThe first story is about connecting the dots.\n\nI dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?\n\nIt started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting lis

## Summarizing Large Document Using map_reduce
This method splits the whole document into chunks and each chunk is summarized and at the end summarized chunks merged.
The MapReduce uses two prompts; first one summarizes each chunk of data and the other combines the summary of each chunk into the final summary.

In [24]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [25]:
with open('files/sj.txt', encoding='utf-8') as f:
    text = f.read()
    
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [27]:
# Let's see how many tokens are in the text
llm.get_num_tokens(text)

2653

In [30]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])

In [31]:
len(chunks)

2

In [32]:
chain = load_summarize_chain(
    llm=llm, 
    chain_type='map_reduce',
    verbose=False
)
output_summary = chain.invoke(chunks)

In [33]:
print(output_summary)

{'input_documents': [Document(page_content='I am honored to be with you today at your commencement from one of the finest universities in the world. I never graduated from college. Truth be told, this is the closest I’ve ever gotten to a college graduation. Today I want to tell you three stories from my life. That’s it. No big deal. Just three stories.\n\nThe first story is about connecting the dots.\n\nI dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?\n\nIt started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minute that they really wanted a girl. So my parents, who were on a waiting lis

In [34]:
 # Let's see the first prompt used for summarizing each chunk of data
chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

In [35]:
# ...and the prompt for combining summary
chain.combine_document_chain.llm_chain.prompt.template

'Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'

## MapReduce with custom prompts

In [36]:
# The first prompt that summarize each chunk of data is called map prompt
map_prompt = '''
Write a concise and short summary of the following text.
Text: `{text}`
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [37]:
# Let's define the second prompt that combines summarizes of each chunk of data
combine_prompt = '''
Write a concise summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
Text: `{text}`
'''
combine_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=combine_prompt
)

In [39]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)
output = summary_chain.invoke(chunks)

In [41]:
print(output['output_text'])

Title: Steve Jobs' Commencement Speech: Following Your Passion and Trusting Your Intuition

Introduction:
Steve Jobs delivered a commencement speech where he shared three impactful stories from his life, emphasizing the importance of following one's passion and trusting in one's intuition.

Key Points:
- Story of dropping out of college leading to taking a calligraphy class that influenced the design of the Macintosh computer
- Story of being fired from Apple, starting over with NeXT and Pixar, and finding success
- Story of facing death after cancer diagnosis and the importance of living each day to the fullest
- Emphasis on following passion, not settling, and trusting intuition
- Discussion on the inevitability of death and living authentically
- Reference to The Whole Earth Catalog and the encouragement to "Stay Hungry. Stay Foolish."

Conclusion:
Steve Jobs' speech highlights the significance of living authentically, following one's passion, and trusting in one's intuition, urging

### Summarizing Using the `refine` Chain
Step1: summarize(chunk #1) => summary #1
Step2: summarize(summary #1 + chunk #2) => summary #2
...
Step n: summarize(summary #n-1 + chunk #n) = final summary

In [4]:
from langchain_openai import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredPDFLoader

In [5]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [None]:
# ! pip install unstructured -q
# ! pip uninstall pdf2image -q
# ! pip install pdfminer -q

In [10]:
loader = UnstructuredPDFLoader('files/attention_is_all_you_need.pdf')
data = loader.load()

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/bertan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [11]:
print(data[0].page_content)

7 1 0 2 c e D 6

] L C . s c [

5 v 2 6 7 3 0 . 6 0 7 1 : v i X r a

Attention Is All You Need

Ashish Vaswani∗ Google Brain avaswani@google.com

Noam Shazeer∗ Google Brain noam@google.com

Niki Parmar∗ Google Research nikip@google.com

Jakob Uszkoreit∗ Google Research usz@google.com

Llion Jones∗ Google Research llion@google.com

Aidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu

Łukasz Kaiser∗ Google Brain lukaszkaiser@google.com

Illia Polosukhin∗ ‡ illia.polosukhin@gmail.com

Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while bei

In [12]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [13]:
len(chunks)

4

In [14]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

In [23]:
def calculate_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('gpt-3.5-turbo') # text-embedding-ada-002
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    # print(f'Total Tokens: {total_tokens}')
    # print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.0004:.6f}')
    return total_tokens, total_tokens / 1000 * 0.002

tokens, embedding_cost = calculate_embedding_cost(chunks)
print(f'Total tokens: {tokens}')
print(f'Embedding cost is in USD: {embedding_cost:.6f}')

Total tokens: 9650
Embedding cost is in USD: 0.019300


In [24]:
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    verbose=False
)
output_summary = chain.invoke(chunks)

In [25]:
print(output_summary)

{'input_documents': [Document(page_content='7 1 0 2 c e D 6\n\n] L C . s c [\n\n5 v 2 6 7 3 0 . 6 0 7 1 : v i X r a\n\nAttention Is All You Need\n\nAshish Vaswani∗ Google Brain avaswani@google.com\n\nNoam Shazeer∗ Google Brain noam@google.com\n\nNiki Parmar∗ Google Research nikip@google.com\n\nJakob Uszkoreit∗ Google Research usz@google.com\n\nLlion Jones∗ Google Research llion@google.com\n\nAidan N. Gomez∗ † University of Toronto aidan@cs.toronto.edu\n\nŁukasz Kaiser∗ Google Brain lukaszkaiser@google.com\n\nIllia Polosukhin∗ ‡ illia.polosukhin@gmail.com\n\nAbstract\n\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine t

## `refine` With Custom Prompts

In [27]:
prompt_template = """Write a concise summary of the following extracting the key information:"
Text: `{text}`
CONCISE SUMMARY:"""
initial_prompt = PromptTemplate(
    template=prompt_template,
    input_variables=['text']
)

refine_template = '''
    Your job is to produce a final summary.
    I have provided an existing summary up to a certain point: {existing_answer}.
    Please refine the existing summary with some more context below.
    ------------
    {text}
    ------------
    Start the final summary with an INTRODUCTION PARAGRAPH that gives an overview of the topic FOLLOWED
    by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
'''
refine_prompt = PromptTemplate(
    template=refine_template,
    input_variables=['existing_answer', 'text']
)

In [28]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=False
)
output_summary = chain.invoke(chunks)

In [30]:
print(output_summary['output_text'])

INTRODUCTION:
The Transformer model architecture has revolutionized machine translation by utilizing attention mechanisms instead of recurrent or convolutional neural networks. This innovative approach has led to superior performance, increased efficiency, and new state-of-the-art scores in translation tasks. The Transformer model can be trained significantly faster than traditional architectures, achieving new state-of-the-art results in translation tasks like English-to-German and English-to-French.

KEY POINTS:
- The Transformer model is based on stacked self-attention and fully connected layers for both the encoder and decoder.
- Multi-Head Attention allows the model to jointly attend to information from different representation subspaces at different positions.
- Position-wise Feed-Forward Networks are applied in each layer of the encoder and decoder.
- The model uses embeddings, Softmax functions, and Positional Encoding for input and output token conversion.
- The training regim

In [1]:
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool, create_
from langchain.utilities import WikipediaAPIWrapper

In [2]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [3]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
wikipedia = WikipediaAPIWrapper()

In [6]:
# Declaration of tools to be used by the agent.
tools = [
    Tool(
        name='Wikipedia',
        func=wikipedia.run,
        description='It will be used when getting information needed from Wikipedia '
    )
]

In [9]:
agent_executor = initialize_agent(
    tools, 
    llm,
    agent='zero-shot-react-description',
    verbose=True
)

In [11]:
output = agent_executor.run('Can you please provide a short summary of George Washington?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should use Wikipedia to find a short summary of George Washington.
Action: Wikipedia
Action Input: George Washington[0m
Observation: [36;1m[1;3mPage: George Washington
Summary: George Washington (February 22, 1732 – December 14, 1799) was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the U.S. federal government. Washington has thus been known as the "Father of his Country".
Washington's first public office, from 1749 to 1750, was as surveyor of Culpeper County in the Colony of Virginia. He subsequently received military 

In [12]:
print(output)

George Washington was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797.
