#### Summarizing with Basic Prompt  
This is only used for small text as it's not scalable to used it for documents with large token

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv(), override=True)

True

In [2]:
from langchain_openai import ChatOpenAI
from langchain.schema import(
    HumanMessage,
    SystemMessage
)



In [3]:
text = '''
Oversharing can be obnoxious—but worse than that, oversharing with data is expensive. It takes time to respond to a request, and even more time to filter data down to just the pieces the request asked for.

Unintuitive or not, this is how a lot of data exchange over the internet works. Information about a thing is stored in a location that we can access using a specific link. We can send it a request asking for a single detail, but often it responds with way more information than we want.

GraphQL lets us approach requests for data in a more natural way. When we have an understanding of what kinds of data are available to us, we can build a document called a schema.

The schema acts as a kind of blueprint for our requests. If you imagine a whole buffet table of data, the schema is like the menu distilling down exactly what each item is, and what you can expect when you ask for it. We consult the schema when building requests because it gives us all the pieces we need to build reusable, recombinable, and (most importantly!) precise requests. And the types and fields we put into our schema make it all possibl
'''

messages = [
    SystemMessage(content='You are an expert copy writer with expertize in summarizing document'),
    HumanMessage(content=f'Please provide a short ans concise summary of the following text:\n TEXT: {text}')
]

llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)




In [4]:
llm.get_num_tokens(text)

242

In [5]:
summary_ouput = llm(messages)

In [6]:
print(summary_ouput.content)

Summary:
Oversharing data can be costly and time-consuming. Traditional data exchange methods often provide more information than needed. GraphQL offers a more efficient approach by allowing users to create a schema that acts as a blueprint for precise data requests, leading to reusable and accurate queries.


#### Summarizing Using Prompt Templates   

This is to be used in a scenario where the text and it summary total length is lower than the model's maximum allowed tokens

In [7]:
from langchain import PromptTemplate
from langchain.chains  import LLMChain

In [8]:
template = """
Write a concise and short summary of the following text:
TEXT: \n {text}
Translate the summary to {language}
"""

prompt = PromptTemplate(
    input_variables=['text', 'language'],
    template=template
)

In [10]:
llm.get_num_tokens(prompt.format(text=text, language='Yoruba'))

264

In [13]:
chain = LLMChain(
    llm=llm, prompt=prompt
)

summary = chain.invoke({'text':text, 'language': 'English'})

In [14]:
print(summary)

{'text': 'The text discusses how oversharing data can be costly and time-consuming, and introduces GraphQL as a more efficient way to request and retrieve specific data over the internet. By creating a schema that acts as a blueprint for data requests, users can make precise and reusable requests, avoiding unnecessary information overload.', 'language': 'English'}


#### Summarizing using StuffDocumentChain  
- Pros
  - Make a single call to the llm
  - When generating text the llm has access to all of the data at once
- Cons
  -  LLMs has threshold for content length and for large document it will result in prompt that is larger than the content length, therefore down side it only works for smaller document

In [3]:
from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from  langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document

In [4]:
with open('../files/sj.txt') as f:
    text = f.read()
    
# text

In [5]:
docs = [Document(page_content=text)]
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

In [6]:
template = """
Write a concise and short summary of the following text:
TEXT: \n {text}
"""

prompt = PromptTemplate(
    input_variables=['text'],
    template=template
)

In [15]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='stuff',
    prompt=prompt,
    verbose=False
)

summary = chain.invoke(docs)

In [16]:
print(summary['output_text'])

The speaker, who never graduated from college, shares three stories from his life at a university commencement. He talks about dropping out of college, being fired from the company he co-founded, and facing a life-threatening illness. Through these experiences, he emphasizes the importance of following one's passion, trusting in one's intuition, and living each day as if it were the last. He encourages the graduates to stay hungry and stay foolish as they embark on their own journeys.
