In [2]:
import os 
from dotenv import load_dotenv
from langchain.schema import AIMessage, HumanMessage, SystemMessage



In [22]:
speech = """
Deep Neural Networks (DNNs) are powerful models that have achieved excel- lent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT’14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM’s BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the pas- sive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM’s performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
1 Introduction
Deep Neural Networks (DNNs) are extremely powerful machine learning models that achieve ex- cellent performance on difficult problems such as speech recognition [13, 7] and visual object recog- nition [19, 6, 21, 20]. DNNs are powerful because they can perform arbitrary parallel computation for a modest number of steps. A surprising example of the power of DNNs is their ability to sort N N -bit numbers using only 2 hidden layers of quadratic size [27]. So, while neural networks are related to conventional statistical models, they learn an intricate computation. Furthermore, large DNNs can be trained with supervised backpropagation whenever the labeled training set has enough information to specify the network’s parameters. Thus, if there exists a parameter setting of a large DNN that achieves good results (for example, because humans can solve the task very rapidly), supervised backpropagation will find these parameters and solve the problem.
Despite their flexibility and power, DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. It is a significant limitation, since many important problems are best expressed with sequences whose lengths are not known a-priori. For example, speech recognition and machine translation are sequential problems. Likewise, ques- tion answering can also be seen as mapping a sequence of words representing the question to a
"""

chat_message = [
    SystemMessage(
        content="You are a helpful assistant that is expert in text summarization."
        ),
    HumanMessage(
        content=f"Please provide the summary of the following text: {speech}"
    )
]

In [23]:
from langchain_groq import ChatGroq

In [24]:
load_dotenv()
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
llm = ChatGroq(
    groq_api_key=os.environ["GROQ_API_KEY"],
    model="Llama3-8b-8192",
    streaming=True,
    )

In [25]:
llm.get_num_tokens(text)

653

In [26]:
llm(chat_message)


AIMessage(content="Here is a summary of the text:\n\nThe paper presents a new approach to sequence learning using deep neural networks (DNNs) that can map sequences to sequences. The method uses two layers of Long Short-Term Memory (LSTM) networks to transform the input sequence into a fixed-dimensional vector and then decode the target sequence from the vector. The approach is tested on an English-to-French translation task and achieves a BLEU score of 34.8 on the entire test set. This is comparable to a phrase-based statistical machine translation (SMT) system, and even better when the LSTM is used to rerank the hypotheses produced by the SMT system. The LSTM also learns sensible phrase and sentence representations that are sensitive to word order and relatively invariant to the active and passive voice. The authors found that reversing the order of the words in the source sentences improved the LSTM's performance, suggesting that the model benefits from short-term dependencies betwe

In [27]:
## Prompt template

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate 

In [28]:
generic_prompt_template = """"
Write a summary of the following text:
Text: {text},
Translate the summary to {language}:
"""

In [29]:
prompt = PromptTemplate(
    input_variables=["text", "language"],
    template=generic_prompt_template
)

In [30]:
prompt

PromptTemplate(input_variables=['language', 'text'], input_types={}, partial_variables={}, template='"\nWrite a summary of the following text:\nText: {text},\nTranslate the summary to {language}:\n')

In [34]:
complete_prompt = prompt.format(
    text = speech,
    language="hindi",
)

In [35]:
llm.get_num_tokens(complete_prompt)

678

In [36]:
llm_chain = LLMChain(llm=llm, prompt=prompt)
summary = llm_chain.run(
    {
        "text":speech,
        "language":"hindi"
    }
)

  llm_chain = LLMChain(llm=llm, prompt=prompt)
  summary = llm_chain.run(


In [37]:
summary

'Here is the summary of the text in Hindi:\n\nदीप न्यूरल नेटवर्क (DNNs)強력 मॉडल हैं जिन्होंने कठिन सीखने के कार्यों पर शानदार प्रदर्शन किया है। लेकिन DNNs को सीक्वेंस को सीक्वेंस में मैपिंग करने के लिए नहीं किया जा सकता है, जब तक बड़े लेबल्ड ट्रेनिंग सेट उपलब्ध नहीं हैं। इस पेपर में हम एक सामान्य अंत-एंड-एंड सीक्वेंस लर्निंग कि दिशा प्रस्तुत करते हैं, जो सीक्वेंस संरचना के बारे में कुछ भी नहीं मानता है। हमारा मETHOD एक मल्टीलेयर्ड लॉन्ग शार्ट-टर्म मेमोरी (LSTM) है जो प्रवेश सीक्वेंस को एक निर्धारित आयाम के वेक्टर में मैप करता है, और फिर दूसरा गहरा LSTM है जो वेक्टर से लक्ष्य सीक्वेंस को डीकोड करता है। हमारा मुख्य नतजा है कि एक इंग्लिश से फ्रेंच अनुवाद कार्य में LSTM की नतजा एक BLEU स्कोर 34.8 है, जहां LSTM का BLEU स्कोर शब्दावली के बाहर के शब्दों पर पेनल्टी किया गया था। इसके अलावा, LSTM ने लंबे वाक्यों पर कठिनाई नहीं की। इसके लिए एक फेज-आधारित SMT सिस्टम की नतजा एक BLEU स्कोर 33.3 है, जो समान डेटासेट पर है। जब हमने LSTM को aforementioned SMT सिस्टम के 1000 हाइपोटहीज को पुनर्रैंक किया, त

In [40]:
## Stuff Document Chain Text Summarization

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/Users/sarthakagarwal/Dropbox/study material/python/genAI/Udemy GenAI/basic_ann_to_gen_ai/langchain-projects/temp.pdf")
pages = loader.load_and_split()

invalid pdf header: b'Conte'
incorrect startxref pointer(1)
parsing for Object Streams


In [41]:
prompt_template = """
Write a concise and short summary with in 200 words of the following text:
Text: {text},
"""

In [42]:
template = PromptTemplate(
    template=prompt_template,
    input_variables=["text"]
)

In [43]:
from langchain.chains.summarize import load_summarize_chain



In [46]:
chain = load_summarize_chain(
    llm=llm,
    chain_type="stuff",
    verbose=True,
    prompt=template
)

In [48]:
summary = chain.run(
    pages
)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise and short summary with in 200 words of the following text:
Text: Star Health And Allied Insurance Company Limited
To, IMPORTANT
ADHAR GUPTA    ,
S/O Devendra kumar agarwal Hno.9 Achar jaan
Bijnor
Bijnor Tehsil,Uttar Pradesh-246701
Mobile : 9719213675
Date : 05-May-2024
Dear Customer,
Re:  Health Insurance Policy - 11240279223503
We are extremely thankful to you for your renewal instructions and payment of premium. We enclose the
renewed policy based on our records. We would request you to kindly study the renewed policy carefully and
revert to us if there is any discrepancy to enable us to attend to the same.
Kindly note that the above request is very important and if we do not hear anything from you within
15 days, we would presume that the policy issued by us is in order and the contract is concluded.
We would like to mention that we 

In [50]:
print(summary)

Here is a concise and short summary of the text within 200 words:

Star Health and Allied Insurance Company Limited has issued a renewal policy to Adhar Gupta, with policy number 11240279223503. The policy is a Star Super Surplus (Floater) Insurance Policy with a sum insured of Rs. 25,00,000 and a premium of Rs. 11,894. The policy covers Adhar Gupta, his spouse Shalini Gupta, and their two sons, Divik Gupta and Madhav Gupta. The policy period is from May 8, 2024, to May 7, 2025. The policy includes a hospitalization benefit and covers pre-existing diseases. The policyholder is required to inform the insurance company immediately in case of hospitalization and to provide a copy of the hospital bill. The policy also includes a tax invoice with details of the premium paid. The insurance company has requested the policyholder to review the policy carefully and revert with any discrepancies.


In [52]:
# Map Reduce Summarization Chain

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
).split_documents(pages)



In [53]:
text_splitter

[Document(metadata={'producer': 'iText 2.1.6 by 1T3XT; modified using iText® 5.3.2 ©2000-2012 1T3XT BVBA (AGPL-version)', 'creator': 'JasperReports Library version 5.6.0', 'creationdate': '2024-05-05T12:26:01+05:30', 'moddate': '2024-05-05T12:25:58+05:30', 'source': '/Users/sarthakagarwal/Dropbox/study material/python/genAI/Udemy GenAI/basic_ann_to_gen_ai/langchain-projects/temp.pdf', 'total_pages': 6, 'page': 0, 'page_label': '1'}, page_content='Star Health And Allied Insurance Company Limited\nTo, IMPORTANT\nADHAR GUPTA    ,\nS/O Devendra kumar agarwal Hno.9 Achar jaan\nBijnor\nBijnor Tehsil,Uttar Pradesh-246701\nMobile : 9719213675\nDate : 05-May-2024\nDear Customer,\nRe:  Health Insurance Policy - 11240279223503\nWe are extremely thankful to you for your renewal instructions and payment of premium. We enclose the\nrenewed policy based on our records. We would request you to kindly study the renewed policy carefully and\nrevert to us if there is any discrepancy to enable us to atten

In [56]:
prompt_template = """
Write a concise and short summary with in 200 words of the following text:
Text: {text},
"""
template = PromptTemplate(
    template=prompt_template,
    input_variables=["text"]
)
final_prompt = """
Provide the final summary for the text provided in the following points below.
Add a title to the summary.
Points: {text}
"""
final_template = PromptTemplate(
    template=final_prompt,
    input_variables=["text"]
)
chain = load_summarize_chain(
    llm=llm,
    chain_type="map_reduce",
    verbose=True,
    map_prompt=template,
    combine_prompt=final_template,
)

In [57]:
summary = chain.run(text_splitter)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Write a concise and short summary with in 200 words of the following text:
Text: Star Health And Allied Insurance Company Limited
To, IMPORTANT
ADHAR GUPTA    ,
S/O Devendra kumar agarwal Hno.9 Achar jaan
Bijnor
Bijnor Tehsil,Uttar Pradesh-246701
Mobile : 9719213675
Date : 05-May-2024
Dear Customer,
Re:  Health Insurance Policy - 11240279223503
We are extremely thankful to you for your renewal instructions and payment of premium. We enclose the
renewed policy based on our records. We would request you to kindly study the renewed policy carefully and
revert to us if there is any discrepancy to enable us to attend to the same.
Kindly note that the above request is very important and if we do not hear anything from you within
15 days, we would presume that the policy issued by us is in order and the contract is concluded.
We would like to mention that

Token indices sequence length is longer than the specified maximum sequence length for this model (2712 > 1024). Running this sequence through the model will result in indexing errors



[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Provide the final summary for the text provided in the following points below.
Add a title to the summary.
Points: Here is a concise and short summary of the text within 200 words:

Star Health and Allied Insurance Company Limited has sent a renewal notice to Adhar Gupta for his health insurance policy (11240279223503). The policy has been renewed based on the company's records, and a copy is enclosed. The company requests Adhar to carefully review the renewed policy and report any discrepancies to enable them to address the issues. The company emphasizes the importance of this request and advises that if they do not receive a response within 15 days, they will presume that the policy is in order and the contract is concluded. Additionally, the company has incorporated the name of the intermediary as indicated by Adhar. The company wishes Adhar good health and looks forward to s

In [59]:
print(summary)

**Summary: Star Health and Allied Insurance Company Policy Renewal and Updates**

Star Health and Allied Insurance Company Limited has sent a renewal notice to Adhar Gupta for his health insurance policy (11240279223503). The company requests Adhar to review the renewed policy and report any discrepancies. The company has incorporated the name of the intermediary as indicated by Adhar and wishes him good health. The policy's sum insured is meant to be used until its expiration date, and the policyholder is free to choose their hospital, room rent, and treatment charges. The company has provided various contact details, including phone numbers, email addresses, and a website, for any assistance or queries.


In [61]:
## Refine Chain Summarization

chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    verbose=True
)

summary = chain.run(text_splitter)



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Star Health And Allied Insurance Company Limited
To, IMPORTANT
ADHAR GUPTA    ,
S/O Devendra kumar agarwal Hno.9 Achar jaan
Bijnor
Bijnor Tehsil,Uttar Pradesh-246701
Mobile : 9719213675
Date : 05-May-2024
Dear Customer,
Re:  Health Insurance Policy - 11240279223503
We are extremely thankful to you for your renewal instructions and payment of premium. We enclose the
renewed policy based on our records. We would request you to kindly study the renewed policy carefully and
revert to us if there is any discrepancy to enable us to attend to the same.
Kindly note that the above request is very important and if we do not hear anything from you within
15 days, we would presume that the policy issued by us is in order and the contract is concluded.
We would like to mention that we have incorporated the name of the in

In [62]:
print(summary)

Here is the refined summary:

Star Health Insurance Company has renewed the Star Super Surplus (Floater) Insurance Policy (Unique Identification No. SHAHLIP22034V062122) for Adhar Gupta, effective [date of renewal]. The policy was renewed for a further period of 1 year upon payment of a renewal premium of Rs. 11,894/- for policy number 11240279223502.

The policy details remain unchanged from the original policy, including:

* Policy number: 11240279223502
* Policy term: 1 year
* Sum Insured: Rs. 25,00,000
* Defined Limit: Rs. 5,00,000
* Period of insurance: From 08-May-2024 to 07-May-2025
* Premium payment frequency: Annual
* Total premium: Rs. 11,894/-

The policy covers Adhar Gupta and his family members, including his spouse Shalini Gupta (44 years old) and children Divik Gupta and Madhav Gupta. The policy also includes nominee details, with Shalini Gupta as the appointee and 100% of the claim.

The policy terms and conditions, as well as the scope and extent of coverage, remain un