#### Summarizing with Basic Prompt  
This is only used for small text as it's not scalable to used it for documents with large token

In [36]:
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv(), override=True)

True

In [37]:
from langchain_openai import ChatOpenAI
from langchain.schema import(
    HumanMessage,
    SystemMessage
)

In [38]:
text = '''
Oversharing can be obnoxious—but worse than that, oversharing with data is expensive. It takes time to respond to a request, and even more time to filter data down to just the pieces the request asked for.

Unintuitive or not, this is how a lot of data exchange over the internet works. Information about a thing is stored in a location that we can access using a specific link. We can send it a request asking for a single detail, but often it responds with way more information than we want.

GraphQL lets us approach requests for data in a more natural way. When we have an understanding of what kinds of data are available to us, we can build a document called a schema.

The schema acts as a kind of blueprint for our requests. If you imagine a whole buffet table of data, the schema is like the menu distilling down exactly what each item is, and what you can expect when you ask for it. We consult the schema when building requests because it gives us all the pieces we need to build reusable, recombinable, and (most importantly!) precise requests. And the types and fields we put into our schema make it all possibl
'''

messages = [
    SystemMessage(content='You are an expert copy writer with expertize in summarizing document'),
    HumanMessage(content=f'Please provide a short ans concise summary of the following text:\n TEXT: {text}')
]

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)




In [39]:
llm.get_num_tokens(text)

242

In [40]:
summary_ouput = llm(messages)

In [41]:
print(summary_ouput.content)

Summary:
Oversharing data can be costly and time-consuming. Traditional data exchange methods often provide more information than needed. GraphQL offers a more efficient approach by using schemas to specify exactly what data is requested, leading to precise and reusable requests.


#### Summarizing Using Prompt Templates   

This is to be used in a scenario where the text and it summary total length is lower than the model's maximum allowed tokens

In [42]:
from langchain import PromptTemplate
from langchain.chains  import LLMChain

In [43]:
template = """
Write a concise and short summary of the following text:
TEXT: \n {text}
Translate the summary to {language}
"""

prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [44]:
llm.get_num_tokens(prompt.format(text=text, language='Yoruba'))

264

In [45]:
chain = LLMChain(
    llm=llm, prompt=prompt
)

summary = chain.invoke({'text':text, 'language': 'English'})

In [46]:
print(summary)

{'text': 'Oversharing data can be costly and time-consuming. Traditional data exchange methods often provide more information than necessary. GraphQL allows for more efficient and precise data requests by using a schema as a blueprint. This schema helps in creating reusable and accurate data requests.', 'language': 'English'}


#### Summarizing using StuffDocumentChain  
- Pros
  - Make a single call to the llm
  - When generating text the llm has access to all of the data at once
- Cons
  -  LLMs has threshold for content length and for large document it will result in prompt that is larger than the content length, therefore down side it only works for smaller document

In [47]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from  langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [48]:
with open('../files/sj.txt') as f:
    text = f.read()
    
# text

In [49]:
docs = [Document(page_content=text)]
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

In [50]:
template = """
Write a concise and short summary of the following text:
TEXT: \n {text}
"""

prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [51]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)

summary = chain.invoke(docs)

KeyboardInterrupt: 

In [None]:
print(summary['output_text'])

The speaker, who never graduated from college, shares three stories from his life at a university commencement. He talks about dropping out of college, being fired from the company he co-founded, and facing a life-threatening illness. Through these experiences, he emphasizes the importance of following one's passion, trusting in one's intuition, and living each day as if it were the last. He encourages the graduates to stay hungry and stay foolish as they embark on their own journeys.


#### Summarizing Large Documents Using map_reduce
The MapReduce method implements a multi-stage summarization.   It is a technique for summarizing large pieces of text by first summarizing smaller chunks of text and then combining those summaries into a single summary.   

In LangChain, you can use MapReduceDocumentsChain as part of the    load_summarize_chain method. What you need to do is setting map_reduce as chain_type of your chain.

https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/language/use-cases/document-summarization/summarization_large_documents_langchain.ipynb#scrollTo=RM3V1JARZ9-k

In [None]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
with open('../files/sj.txt') as f:
    text = f.read()
    
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
llm.get_num_tokens(text)

2653

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=20)
chunks = text_splitter.create_documents([text])
len(chunks)

2

In [None]:
chain  = load_summarize_chain(
    llm,
    chain_type='map_reduce',
    verbose=False
)

summary = chain.invoke(chunks)


Steve Jobs delivers a commencement speech where he shares three stories from his life: dropping out of college and the influence it had on the design of the Macintosh computer, being fired from Apple and finding success with NeXT and Pixar, and facing death after being diagnosed with cancer. He encourages the audience to live authentically, follow their intuition, and stay hungry and foolish in pursuing their dreams, referencing The Whole Earth Catalog's message to stay curious and open-minded.


In [None]:
print(summary['output_text'])

#### Map_reduce with Custom Prompts

In [None]:
map_prompt = '''
Write a short and concise summary of the following:
Text: '{text}'
CONCISE SUMMARY:
'''
map_prompt_template = PromptTemplate(
    input_variables=['text'],
    template=map_prompt
)

In [None]:
combine_prompt = '''
Write a shot and consice summary of the following text that covers the key points.
Add a title to the summary.
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the 
topic FOLLOWED by BULLET POINTS if possible AND end the summary with a CONCLUSION PHASE.
TExt: '{text}'
'''

combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=['text']
)

In [None]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt= map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False
)

output_summary = summary_chain.invoke(chunks)


In [None]:
print(output_summary['output_text'])

Title: Embracing Change and Following Your Passion

Introduction:
The text discusses the importance of following one's passion and embracing change, as shared by a speaker at a university commencement. The speaker, who never graduated from college, shares three impactful stories from his life that highlight the significance of living authentically and trusting one's instincts.

Key Points:
- Speaker shares personal stories of dropping out of college, finding love and success after being fired from Apple, and facing a cancer diagnosis
- Emphasizes the importance of following one's passion, embracing change, and living each day to the fullest
- Encourages the audience to trust their instincts, pursue what they love, and not settle for anything less
- References The Whole Earth Catalog and encourages readers to "Stay Hungry. Stay Foolish."

Conclusion:
The text serves as a reminder to embrace change, follow one's passion, and live authentically, as the speaker's stories highlight the tran

#### Summarization using Refine Chain   

- Pros
    - Uses more relevant context(better summarization)
    - less lossy than map_reduce
-  Cons
    - It requires many calls to the LLM
    - The calls are not independent and can not be parallelized


In [66]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader, UnstructuredPDFLoader

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)

True

In [56]:
pip install unstructured==0.5.6 -q

Note: you may need to restart the kernel to use updated packages.


In [65]:
pip install 'unstructured[local-inference]'

Defaulting to user installation because normal site-packages is not writeable
Collecting unstructured-inference~=0.2.4 (from unstructured[local-inference])
  Using cached unstructured_inference-0.2.11.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting layoutparser[layoutmodels,tesseract] (from unstructured-inference~=0.2.4->unstructured[local-inference])
  Using cached layoutparser-0.3.4-py3-none-any.whl.metadata (7.7 kB)
Collecting python-multipart (from unstructured-inference~=0.2.4->unstructured[local-inference])
  Using cached python_multipart-0.0.9-py3-none-any.whl.metadata (2.5 kB)
Collecting opencv-python==4.6.0.66 (from unstructured-inference~=0.2.4->unstructured[local-inference])
  Using cached opencv_python-4.6.0.66-cp37-abi3-macosx_11_0_arm64.whl.metadata (18 kB)
Collecting transformers (from unstructured-inference~=0.2.4->unstructured[local-inference])
  Using cached transformers-4.39.0-py3-none-any.whl.metadata (134 kB)
Collecting scipy (from layoutpa

In [69]:
pip install detectron2  -q

[31mERROR: Could not find a version that satisfies the requirement detectron2 (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for detectron2[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [73]:
loader = UnstructuredPDFLoader('../files/attention_is_all_you_need.pdf')
data = loader.load()

detectron2 is not installed. Cannot use the hi_res partitioning strategy. Falling back to partitioning with the fast strategy.


In [71]:
print(data[0].page_content)

7

1

0

2

c

e

D

6

]

L

C

.

s

c

[

5

v

2

6

7

3

0

.

6

0

7

1

:

v

i

X

r

a

Attention Is All You Need

Ashish Vaswani∗

Google Brain

avaswani@google.com

Noam Shazeer∗

Google Brain

noam@google.com

Niki Parmar∗

Google Research

nikip@google.com

Jakob Uszkoreit∗

Google Research

usz@google.com

Llion Jones∗

Google Research

llion@google.com

Aidan N. Gomez∗ †

University of Toronto

aidan@cs.toronto.edu

Łukasz Kaiser∗

Google Brain

lukaszkaiser@google.com

Illia Polosukhin∗ ‡

illia.polosukhin@gmail.com

Abstract

The dominant sequence transduction models are based on complex recurrent or

convolutional neural networks that include an encoder and a decoder. The best

performing models also connect the encoder and decoder through an attention

mechanism. We propose a new simple network architecture, the Transformer,

based solely on attention mechanisms, dispensing with recurrence and convolutions

entirely. Experiments on two machine translation tasks sho

In [74]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [76]:
# len(chunks)
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')

In [77]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.0004:.6f}')

In [78]:
print_embedding_cost(chunks)

Total Tokens: 10122
Embedding Cost in USD: 0.004049


In [79]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    verbose=False
)

output_summary = chain.invoke(chunks)

In [80]:
print(output_summary['output_text'])

The paper "Attention Is All You Need" introduces the Transformer model, which relies solely on attention mechanisms for sequence transduction tasks, achieving superior performance in machine translation tasks. The model uses Scaled Dot-Product Attention and Multi-Head Attention for input and output sequences, outperforming existing models in terms of parallelizability and training time. The study discusses the effectiveness of attention functions, multi-head attention, and self-attention layers in the Transformer model, showcasing its success in various tasks. The researchers plan to extend the model to handle different input and output modalities and provide code for training and evaluation.


#### refine with Custome Prompts

In [81]:
prompt_template = '''
Write a short and concise summary of the following:
Text: '{text}'
CONCISE SUMMARY:
'''
initial_prompt = PromptTemplate(
    input_variables=['text'],
    template=prompt_template
)

refine_template = '''
Your job is to produce a final summary.
I have provided an existing summary up to a certain point: {existing_answer}.
Please refine the existing summary with some more context below.
----------------------
{text}
-----------------------
Start your summary with an INTRODUCTION PARAGRAPH that gives an overview of the 
topic FOLLOWED by BULLET POINTS if possible AND end the summary with a CONCLUSION PHRASE.
'''

refine_prompt = PromptTemplate(template=refine_template, input_variables=['existing_answer', 'text'])


In [83]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True
)

output_summary = chain.invoke(chunks)


In [84]:
print(output_summary['output_text'])

The Transformer model architecture has revolutionized sequence transduction tasks by utilizing attention mechanisms to achieve superior results in machine translation and other applications. By eliminating the need for recurrent or convolutional neural networks, the Transformer offers advantages in computational complexity, parallelization, and long-range dependency learning. Key components such as multi-head attention, position-wise feed-forward networks, embeddings, and softmax functions contribute to its exceptional performance in translation quality and generalization to tasks like English constituency parsing.

- Additive and dot-product attention functions are commonly used, with dot-product attention being faster and more space-efficient.
- Multi-Head Attention enables joint attention to information from different representation subspaces, enhancing overall performance.
- Position-wise Feed-Forward Networks are applied in each layer for comprehensive information processing.
- Th

#### Summarization with Langchain Agents

In [91]:
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [92]:
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv(), override=True)

True

In [97]:
from langchain.tools import WikipediaQueryRun

In [98]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
wikipedia = WikipediaAPIWrapper()
api_wrapper = WikipediaAPIWrapper()
wikipedia = WikipediaQueryRun(api_wrapper=api_wrapper)
wikipedia_tool = Tool(
    name='Wikipedia',
    func=wikipedia.run,
    description='Useful for when you need to look up a topic, country, or prson on Wikipedia.'
)


In [94]:
tools = [wikipedia_tool]

In [102]:
agent_executor = initialize_agent(tools, llm, agent='chat-zero-shot-react-description', verbose=True)

In [103]:
output = agent_executor.run('Can you please provide a short of Geoarge Washington ')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I should use Wikipedia to find a short summary of George Washington.

Action:
```
{
  "action": "wikipedia",
  "action_input": "George Washington"
}
```[0m
Observation: [36;1m[1;3mPage: George Washington
Summary: George Washington (February 22, 1732 – December 14, 1799) was an American Founding Father, military officer, and politician who served as the first president of the United States from 1789 to 1797. Appointed by the Second Continental Congress as commander of the Continental Army in 1775, Washington led Patriot forces to victory in the American Revolutionary War and then served as president of the Constitutional Convention in 1787, which drafted and ratified the Constitution of the United States and established the U.S. federal government. Washington has thus become commonly known as the "Father of his Country".
Washington's first public office, from 1749 to 1750, was as surveyor of Culpeper County in the 