### This code is based on an online course by Andrei Dumitrescu (Crystal Mind Academy)

#### This code summarizes a given text based on Langchain and OpenAI

In [1]:
import os
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv(), override=True)

True

#### 1. Basic Prompt

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [3]:
text = '''Qiskit is open-source software for working with quantum computers at the level of circuits, \
pulses, and algorithms.
The central goal of Qiskit is to build a software stack that makes it easy for anyone to use quantum \
computers, regardless of their skill level or area of interest; Qiskit allows one to easily design \
experiments and applications and run them on real quantum computers or classical simulators. \
Qiskit is already in use around the world by beginners, hobbyists, educators, researchers, and commercial companies.'''
messages = [
    SystemMessage(content='Your are an expert copywriter with expertize in summarizing documents'),
    HumanMessage(content=f'Please provide a short and crisp summary of the the following text: {text}')
]

llm = ChatOpenAI(model = 'gpt-3.5-turbo', temperature=0)

In [4]:
llm.get_num_tokens(text)

104

In [5]:
summary_output = llm(messages)

print(summary_output.content)

Qiskit is an open-source software that enables users to work with quantum computers at various levels. Its main objective is to create a user-friendly software stack that allows anyone, regardless of their expertise or field, to utilize quantum computers. Qiskit facilitates the design and execution of experiments and applications on real quantum computers or classical simulators. It is widely used by beginners, hobbyists, educators, researchers, and commercial companies globally.


#### 2. Summarizing using Prompt Templates

In [6]:
from langchain import PromptTemplate
from langchain.chains import LLMChain

In [7]:
template = ''' 
Write a brief summary of the following text:
TEXT: `{text}`
Translate the summary to {language}.'''

prompt = PromptTemplate(
    input_variables=['text','language'],
    template=template
)

In [8]:
llm.get_num_tokens(prompt.format(text=text, language="English"))

124

In [9]:
chain = LLMChain(llm=llm, prompt=prompt)
summary = chain.run({'text':text, 'language':'dutch'})

print(summary)

Qiskit is open-source software waarmee je kunt werken met quantumcomputers op het niveau van circuits, pulsen en algoritmes. Het belangrijkste doel van Qiskit is het bouwen van een softwarestack die het voor iedereen gemakkelijk maakt om quantumcomputers te gebruiken, ongeacht hun vaardigheidsniveau of interessegebied. Met Qiskit kun je eenvoudig experimenten en toepassingen ontwerpen en ze uitvoeren op echte quantumcomputers of klassieke simulators. Qiskit wordt wereldwijd al gebruikt door beginners, hobbyisten, docenten, onderzoekers en commerciële bedrijven.


#### 3. Summarizing using StuffDocumentChain

In [11]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [12]:
with open('gc.txt') as f:
    text = f.read()

docs = [Document(page_content=text)]
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

In [14]:
template = '''Write a concise and short summary of the following text.
TEXT: `{text}`'''

prompt = PromptTemplate(input_variables=['text'], template=template)

In [15]:
chain = load_summarize_chain(llm, chain_type='stuff', prompt=prompt, verbose=False)

output_summary = chain.run(docs)

In [16]:
print(output_summary)

Ice cores drilled from a glacier in the Himalayan Mountains have provided a detailed record of the Earth's climate over the past 1,000 years. The analysis of the ice shows that the last decade and the last 50 years were the warmest in 1,000 years. The ice cores also reveal at least eight major droughts caused by a failure of the South Asian Monsoon, including a catastrophic seven-year-long dry spell that resulted in the deaths of over 600,000 people. The research suggests that human activity is at least partially responsible for the warming and climate changes observed.


#### 4. Summarizing Large Documents using map_reduce

In [17]:
from langchain import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [18]:
with open('gc.txt') as f:
    text = f.read()

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

In [19]:
llm.get_num_tokens(text)

1400

In [20]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])
len(chunks)

2

In [21]:
chain = load_summarize_chain(llm, chain_type='map_reduce', verbose=False)
output_summary = chain.run(chunks)
print(output_summary)

Ice cores from a glacier in the Himalayan Mountains show that the last decade and the last 50 years were the warmest in the past 1,000 years. The cores also reveal evidence of major droughts caused by a failure of the South Asian Monsoon, including a seven-year dry spell that resulted in over 600,000 deaths. Human activity is believed to have contributed to the warming, and the research highlights the potential for future disruptions due to climate change.


#### 5. Map_reduce with Custom Prompts

In [22]:
map_prompt = '''
Write a brief summary of the following:
TEXT: `{text}`
CONCISE SUMMARY:'''

map_prompt_template = PromptTemplate(input_variables=['text'], template=map_prompt)

In [23]:
combine_prompt = '''
Write a concise summary of the following text that covers the key points.
Add a title to the summary. Start you summary with an INTRODUCTION that gives an overview of the topic followed
by BULLET POINTS and end the summary with a CONCLUSION remark.
TEXT: `{text}`'''

combine_prompt_template = PromptTemplate(input_variables=['text'], template=combine_prompt)

In [24]:
summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', map_prompt=map_prompt_template, combine_prompt=combine_prompt_template, verbose=False)

output_summary = summary_chain.run(chunks)

print(output_summary)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-5ZMsMU0OykrDIa8DDEpvJrpf on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-5ZMsMU0OykrDIa8DDEpvJrpf on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht

Title: Ice cores reveal warming trends and droughts in the Himalayas

Introduction:
Ice cores drilled from a glacier in the Himalayan Mountains provide a detailed climate record of the past 1,000 years. The analysis of these ice cores reveals warming trends and major droughts, with evidence suggesting human activity's contribution to recent warming.

Key Points:
- The last decade and the last 50 years were the warmest in the past 1,000 years, according to the analysis of the ice cores.
- At least eight major droughts caused by a failure of the South Asian Monsoon were identified in the ice cores, including a catastrophic seven-year-long dry spell resulting in over 600,000 deaths.
- The research, supported by the National Science Foundation, suggests that human activity has contributed to the observed warming in the late 20th century.
- The ice core record shows instances of monsoon failures and droughts, with the most devastating one occurring in 1790. The cause of these failures remai

#### 6. Summarizing using Refine Chain

In [39]:
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter


In [46]:
with open('Relevance.txt', encoding='utf-8') as f:
    text = f.read()

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

In [47]:
llm.get_num_tokens(text)

7232

In [58]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=15000, chunk_overlap=50)
chunks = text_splitter.create_documents([text])
len(chunks)

3

In [59]:
chain = load_summarize_chain(llm=llm, chain_type='refine', verbose=False)

output_summary = chain.run(chunks)

In [60]:
print(output_summary)

This paper proposes a new performance metric called Relevance Score (RS) for evaluating supervised learning algorithms in machine learning. The authors argue that the commonly used Classification Accuracy (CA) metric is not suitable for certain applications where there is randomness in the output and multiple acceptable outcomes. The Relevance Score is based on probabilistic distances between predicted and actual outcomes and aims to measure how relevant the predicted outcome is for a given observed context. The authors evaluate the proposed metric on a dataset from an intelligent lighting pilot installation and compare it to the CA metric, finding that the Relevance Score is more appropriate for this type of application. The authors also investigate the influence of different parameters on the Relevance Score and test it on a dataset with randomly generated output. Overall, the results suggest that the Relevance Score is a better metric for evaluating machine learning algorithms in ap

#### 7. Refine with Custom Prompts

In [66]:
template =  ''' Write a concise summary of the following extracting the important information:
TEXT: `{text}`
CONCISE SUMMARY:'''
initial_prompt = PromptTemplate(input_variables=['text'], template=template)

refine_template = ''' Your job is to produce a final summary. I have provided an existing summary up to a certain point: {existing_answer}.
Please refine the existing summary with more context available below.
----------------------------
{text}
----------------------------
Start the final summary with an INTRODUCTION, which gives an overview of the topic followed by BULLET points and end the summary with CONCLUSION.'''

refine_prompt = PromptTemplate(input_variables=['existing_answer','text'], template=refine_template)


In [67]:
chain = load_summarize_chain(llm=llm, chain_type='refine', question_prompt=initial_prompt, refine_prompt=refine_prompt, verbose=False)

output_summary = chain.run(chunks)

In [68]:
print(output_summary)

Introduction:
The paper discusses the importance of choosing the right performance metric for evaluating machine learning algorithms in different application domains. The authors propose a new metric called Relevance Score (RS) that is based on probabilistic distances among different outcomes. They evaluate the proposed metric on a dataset from an intelligent lighting pilot installation and compare it to the commonly used Classification Accuracy metric. The results show that the Relevance Score is more appropriate for certain applications.

Bullet points:
- The authors introduce the concept of Relevance Score (RS) as a new performance metric for evaluating machine learning algorithms.
- RS is based on probabilistic distances among different outcomes and takes into account the relevance of the predicted outcomes.
- The authors evaluate the RS metric on a dataset from an intelligent lighting pilot installation.
- They compare the performance of RS with the commonly used Classification Ac

#### 8. Summarizing with Langchain Agents

In [69]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

In [70]:
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)
wiki = WikipediaAPIWrapper()

In [73]:
tools = [ Tool(name="Wikipedia", func=wiki.run, description="Useful when you need to get information from Wikipedia about a topic")]

agent_executor = initialize_agent(tools=tools, llm=llm, agent='zero-shot-react-description', verbose=True)


In [74]:
output_summary = agent_executor.run("Please provide a short summary of Global Warming")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should look up the definition and causes of global warming
Action: Wikipedia
Action Input: Global warming[0m
Observation: [36;1m[1;3mPage: Climate change
Summary: In common usage, climate change describes global warming—the ongoing increase in global average temperature—and its effects on Earth's climate system. Climate change in a broader sense also includes previous long-term changes to Earth's climate. The current rise in global average temperature is more rapid than previous changes, and is primarily caused by humans burning fossil fuels. Fossil fuel use, deforestation, and some agricultural and industrial practices increase greenhouse gases, notably carbon dioxide and methane. Greenhouse gases absorb some of the heat that the Earth radiates after it warms from sunlight. Larger amounts of these gases trap more heat in Earth's lower atmosphere, causing global warming.
Climate change is causing a range of increasing im

In [75]:
print(output_summary)

Global warming refers to the ongoing increase in global average temperature caused primarily by humans burning fossil fuels. It is leading to climate change, which includes impacts such as expanding deserts, heat waves, wildfires, melting permafrost, glacial retreat, sea ice loss, intense storms, droughts, and other weather extremes. Climate change threatens people with increased flooding, extreme heat, increased food and water scarcity, more disease, and economic loss. It is considered the greatest threat to global health in the 21st century. Efforts to reduce emissions and limit warming are necessary to mitigate these impacts.
