<a href="https://colab.research.google.com/github/devsjee/Text-Summarization/blob/main/text_summarization_using_langchain_and_gemini_llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Hi there!

Text summarization is a beautiful cognitive task in which humans go through a piece of context and somehow magically come up with a gist of it such as in the example below:

Text: Psychologist Dr. Jessica Zucker, author of "I Had a Miscarriage: A Memoir, A Movement," tells TODAY.com that people generally refer to a baby born after a pregnancy loss, infant death, stillbirth or miscarriage as a rainbow baby.

My Summary: A rainbow baby is a baby born after some kind of loss such as a miscarriage.

It is evident that different people will come up with different summaries according to their own perspectives. I have always wondered about how to automatically construct a summary of a given text. There were some very good logical attempts of extractive and absractive summarizations such as using frequency counts, lex rank, text rank, LSA, ESA, Lexical Chains, etc. However, in the current LLM era, deep learning models are achieving close to impeccable results in the text summarization task!

Why not create a text summarization module using LLMs and experience its summarization skills? I am going to closely follow the below tutorial to build my summarization model.


Reference: https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/langchain/Gemini_LangChain_Summarization_WebLoad.ipynb

In [1]:
!pip install --quiet langchain


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.6/803.6 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m230.3/230.3 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
!pip install --quiet langchain-google-genai

In [3]:
import os
import getpass

os.environ['GOOGLE_API_KEY'] = getpass.getpass("Enter the GOOGLE API KEY: ")

Enter the GOOGLE API KEY: ··········


In [4]:
from langchain import PromptTemplate
from langchain.document_loaders import WebBaseLoader


In [5]:
#loader = WebBaseLoader("https://blog.google/technology/ai/google-gemini-ai/#sundar-note")
#docs = loader.load()
#print(docs)

In [16]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm=ChatGoogleGenerativeAI(model="gemini-pro",temperature =0.2, top_p=0.1)

In [7]:
doc_prompt = PromptTemplate.from_template("{page_content}")
llm_prompt = PromptTemplate.from_template("Write a concise summary of the following : {text} \n\n Summary:")

print(doc_prompt)
print(llm_prompt)

input_variables=['page_content'] template='{page_content}'
input_variables=['text'] template='Write a concise summary of the following : {text} \n\n Summary:'


In [11]:
from langchain.schema.prompt_template import format_document
from langchain.schema import StrOutputParser

""" stuff_chain = (
        {"text": lambda docs: "\n\n".join(
    format_document(doc, doc_prompt) for doc in docs)}
        | llm_prompt | llm | StrOutputParser()
        )
"""



' stuff_chain = (\n        {"text": lambda docs: "\n\n".join(\n    format_document(doc, doc_prompt) for doc in docs)}\n        | llm_prompt | llm | StrOutputParser()\n        )\n'

In [9]:
#stuff_chain.invoke(docs)

Now that the model works, let me check what output it will give for my text in the example shared earlier.

Text: Psychologist Dr. Jessica Zucker, author of "I Had a Miscarriage: A Memoir, A Movement," tells TODAY.com that people generally refer to a baby born after a pregnancy loss, infant death, stillbirth or miscarriage as a rainbow baby.

My Summary: A rainbow baby is a baby born after some kind of loss such as a miscarriage.

Gemini's Summary: ?

In [17]:
from langchain.docstore.document import Document

docs=[Document(page_content="Psychologist Dr. Jessica Zucker, author of 'I Had a Miscarriage: \
A Memoir, A Movement,' tells TODAY.com that people generally refer to a baby born after a pregnancy loss,\
 infant death, stillbirth or miscarriage as a rainbow baby.")]

stuff_chain = (
        {"text": lambda docs: "\n\n".join(format_document(doc, doc_prompt) for doc in docs)}
        | llm_prompt | llm | StrOutputParser()
        )

stuff_chain.invoke(docs)

'A rainbow baby is a term used to describe a baby born after a pregnancy loss, infant death, stillbirth, or miscarriage. The term is meant to symbolize hope and joy after a difficult experience.'

Text: Psychologist Dr. Jessica Zucker, author of "I Had a Miscarriage: A Memoir, A Movement," tells TODAY.com that people generally refer to a baby born after a pregnancy loss, infant death, stillbirth or miscarriage as a rainbow baby.

My Summary: A rainbow baby is a baby born after some kind of loss such as a miscarriage.

Gemini's Summary: A rainbow baby is a term used to describe a baby born after a pregnancy loss, infant death, stillbirth, or miscarriage. The term is meant to symbolize hope and joy after a difficult experience.


---

This is interesting, isn't it? has not only selected a key sentence from the given input but also managed to add an additional sentence from its general knowledge bringing out the broad meaning of the term 'rainbow baby'. Gemini also has nicely ignored the details of the psychologist just as I would do :)

Also, I tried running the model on the same input multiple times, changed the model parameters (temperature and top_p) but it did not have any effect on the summary generated for the above example.