# Summarization example with open source LLM and OpenAI functions calling

In this notebook we will cover these aspects:  

Summarization task in NLP

Installation of all needed packages

Usage of open source LLMs for summarization tasks

Usage of Langchain

Evaluation of different models and metrics for summarization

OpenAI functions calling example

# Summarization task in NLP

Text Summarization is a natural language processing (NLP) task that involves condensing a lengthy text document into a shorter, more compact version while still retaining the most important information and meaning. The goal is to produce a summary that accurately represents the content of the original text in a concise form.

Please find more information here: https://paperswithcode.com/task/text-summarization

# Installation of all needed dependencies

In [None]:
! pip install getpass
! pip install pandas
! pip install torch
! pip install transformers
! pip install datasets
! pip install evaluate
! pip install absl
! pip install rouge_score
! pip install nltk
! pip install langchain
! pip install openai

## All needed imports

In [1]:
import re
import json
import pandas as pd
import numpy as np
from tqdm import tqdm
from getpass import getpass
import torch
from transformers import pipeline
from datasets import load_dataset
from evaluate import load
from langchain import HuggingFacePipeline
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
import openai

## Input your private API key for OpenAI

In [2]:
openai.api_key = getpass('<API kay for OpenAI>')

# We use the dataset for Text summarization task 
https://huggingface.co/datasets/pszemraj/scientific_lay_summarisation-plos-norm

You can find alternative datasets for summarization task here:

https://huggingface.co/datasets?task_categories=task_categories:summarization&sort=trending

Lets observe the dataset format:

In [3]:
test_set = load_dataset('pszemraj/scientific_lay_summarisation-plos-norm')["test"].to_pandas()
test_set.head(2)

Found cached dataset parquet (/Users/vladimir_kosolapov/.cache/huggingface/datasets/pszemraj___parquet/pszemraj--scientific_lay_summarisation-plos-norm-3d46fb74e7dd8e77/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)


  0%|          | 0/3 [00:00<?, ?it/s]

Unnamed: 0,article,summary,section_headings,keywords,year,title,article_length,summary_length
0,Seasonal epidemics of influenza virus result i...,Influenza virus continues to pose a significan...,Abstract\nIntroduction\nResults\nDiscussion\nM...,medicine\ninfectious diseases\nimmunology\nbio...,2013,"Cooperativity Between CD8+ T Cells, Non-Neutra...",13428,165
1,Leprosy remains a public health problem in Bra...,"In Brazil, leprosy remains a significant publi...",Abstract\nIntroduction\nMethods\nResults\nDisc...,social epidemiology\nmedicine\ninfectious dise...,2013,Patterns of Migration and Risks Associated wit...,4301,269


# Function for model initialization

Firstly we need to prepare the function which will get the name of the HuggingFace LLM and return the pipeline. The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API. Please find more details here: https://huggingface.co/docs/transformers/main/main_classes/pipelines

Function parameters:

model_name - you need to choose one of LLMs from HuggingFace leaderboard, comparing them by quality metrics and by count of parameters. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. Count of parameters - is very important criteria for choosing the model. If you want to use CPU, you must choose between models with low count of parameters, depends on desired speed of inference and traid off with model quality. If you want to use GPU, you need to check that specific parameters count can be handled by you GPU type with specific memory size.

float_16 - flag for quantization to float-16 dtype (it works not for all models, but can significantly save the memory of your GPU and provide opportunity to use more heavy models).

model_kwargs - parameters for HuggingFacePipeline class: https://github.com/hwchase17/langchain/blob/master/langchain/llms/huggingface_pipeline.py

In [13]:
def init_model(model_name, float_16=False, model_kwargs: dict = None):
    has_gpu = torch.cuda.is_available()
    if not model_kwargs:
        model_kwargs = dict(no_repeat_ngram_size=3, early_stopping=True)

    if float_16 and has_gpu:
        generate_text = pipeline(
            model=model_name,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True, 
            device_map="auto", 
            return_full_text=True,
        )
    elif has_gpu:
        generate_text = pipeline(
            model=model_name,
            trust_remote_code=True, 
            device_map="auto", 
            return_full_text=True,
        )
    else:
        generate_text = pipeline(
            model=model_name,
            trust_remote_code=True, 
            return_full_text=True
        )
    hf_pipeline = HuggingFacePipeline(pipeline=generate_text, model_kwargs=model_kwargs)
    return hf_pipeline

# Functions for using Langchain

We will use Langchain as a wrapper on top of the model.

LangChain is a framework for developing applications powered by language models. It enables applications that are:

Data-aware: connect a language model to other sources of data

Agentic: allow a language model to interact with its environment

The main value props of LangChain are:

Components: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not

Off-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasks

Off-the-shelf chains make it easy to get started. For more complex applications and nuanced use-cases, components make it easy to customize existing chains or build new ones.

https://python.langchain.com/docs/get_started/introduction.html - Langchain get started

https://python.langchain.com/docs/modules/chains/popular/summarize - Langchain docs for summarization task

Firstly we create the function create_summarization_chain - here we prepare prompt and chain.

Language models take text as input - that text is commonly referred to as a prompt. Typically this is not simply a hardcoded string but rather a combination of a template, some examples, and user input. LangChain provides several classes and functions to make constructing and working with prompts easy.

A prompt template refers to a reproducible way to generate a prompt. It contains a text string ("the template"), that can take in a set of parameters from the end user and generates a prompt.

https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/ - please find details about prompt templates from Langchain here.

In [6]:
def create_summarization_chain(llm, chain_type="refine"):
    refine_template = (
        "Produce a summary.\n"
        "Existing summary: {existing_answer}\n"
        "Refine the existing answer using only context below:\n"
        "{text}\n"
        "If the context isn't useful, return the original summary."
    )
    refine_prompt_summary = PromptTemplate(
        input_variables=["existing_answer", "text"],
        template=refine_template,
    )
    prompt_template = """Write a concise summary of the text:
    {text}
    Summary:"""
    question_prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
    return load_summarize_chain(
        llm=llm,
        refine_prompt=refine_prompt_summary,
        question_prompt=question_prompt,
        chain_type=chain_type, 
        verbose=False,
    )

After this we prepare ancillary function _format_result - this function can be used for postprocessing of the result if needed

Here we provide some examples of formatting:

In [5]:
def _format_result(summary: str) -> str:
    res = re.sub(r"\[\d+\]", "", summary)
    res = re.sub(r" +", " ", res)
    res = res.replace("Refine the existing answer using only context below:", "")
    return res.strip()

And finally we prepare th function generate_summary - here we calculate summary for specific text with prepared chain, do downstream post-processing and return results.

In [7]:
def generate_summary(chain, text_splitter, context: str) -> str:
    docs = [Document(page_content=text) for text in text_splitter.split_text(context)]
    summary = chain.run(docs)
    return _format_result(summary)

# Now we can choose the open source LLM from the Huggingface leaderboard
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

# We use the ROUGE metric for model evaluation.
ROUGE explanation: https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499

ROUGE wiki: https://en.wikipedia.org/wiki/ROUGE_(metric)

# Lets calculate metric for summaries on the test set

Models facebook/opt-1.3b and databricks/dolly-v2-3b was choosen to be able to reproduce this example on CPU with relatively short time of computations for one example from the dataset. Feel free to test more heavy models that can achieve higher quality.

In [18]:
models = ["facebook/opt-1.3b", "databricks/dolly-v2-3b"]
metrics = {}
for model_name in models:
    generation_kwargs = dict(
        max_new_tokens=4000, 
        min_length=80, 
        num_beams=5, 
        no_repeat_ngram_size=3,
        early_stopping=True
    )
    model = init_model(model_name, float_16=True, model_kwargs=generation_kwargs)
    rogue_res_total = []
    for i, row in tqdm(test_set.head(1).iterrows()):
        input_str = row["article"]
        reference = row["summary"]
        chain = create_summarization_chain(model)
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000, 
            chunk_overlap=125,
            separators=["\n\n", "\n", ".", " ", ""],
        )
        summary = generate_summary(chain=chain, text_splitter=text_splitter, context=input_str)
        rouge = load('rouge')
        rogue_res = rouge.compute(predictions=[summary], references=[reference])
        rogue_res_total.append(rogue_res["rougeLsum"])
    rogue_res_total = np.mean(rogue_res_total)
    metrics[model_name] = rogue_res_total

1it [00:00,  1.00it/s]
1it [00:01,  1.21s/it]


Lets compare metrics. Looks like Dolly works a bit better, but still not enough. So you can play with models with higher amount of parameters.

In [19]:
print(metrics)

{'facebook/opt-1.3b': 0.08603634193332954, 'databricks/dolly-v2-3b': 0.13718789205438342}


# We can try OpenAI models as alternative

Here you can find basic example of usage OpenAI models for summarization, which are not fre, but don't require enough computational resources on your side, you just call API.

In [None]:
input_str = "Test text for summarization"
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": f"Summarize this: {input_str}"},
    ],
)
page_summary = response["choices"][0]["message"]["content"]

# Functions calling example

In case of using OpenAI models, you can now describe functions to gpt-4-0613 and gpt-3.5-turbo-0613, and have the model intelligently choose to output a JSON object containing arguments to call those functions. 

This is a new way to more reliably connect GPT's capabilities with external tools and APIs.

These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature. 

Function calling allows you to more reliably get structured data back from the model.

To get more information please find the links below:

https://openai.com/blog/function-calling-and-other-api-updates

https://platform.openai.com/docs/guides/gpt/function-calling

In the example below we will ask ChatGPT to answer the question about the weather in the specific location:

1. send the conversation and available functions to GPT

2. check if GPT wanted to call a function

3. call the function

4. send the info on the function call and function response to GPT

In [None]:
# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    weather_info = {
        "location": location,
        "temperature": "72",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)


def run_conversation():
    # Step 1: send the conversation and available functions to GPT
    messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
    functions = [
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        }
    ]
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call="auto",  # auto is default, but we'll be explicit
    )
    response_message = response["choices"][0]["message"]

    # Step 2: check if GPT wanted to call a function
    if response_message.get("function_call"):
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        function_name = response_message["function_call"]["name"]
        fuction_to_call = available_functions[function_name]
        function_args = json.loads(response_message["function_call"]["arguments"])
        function_response = fuction_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit"),
        )

        # Step 4: send the info on the function call and function response to GPT
        messages.append(response_message)  # extend conversation with assistant's reply
        messages.append(
            {
                "role": "function",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response
        second_response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0613",
            messages=messages,
        )  # get a new response from GPT where it can see the function response
        return second_response

In [None]:
print(run_conversation())

This functionality can be useful if you want to work with summarization or other NLP task, but need to add structured responses.

For example, if you want to summarize some texts about weather and add specific temperature-facts for specific locations - you can use OpenAI LLM for summarization and change the prompt to additionaly ask LLM about temperature in specific locations.

# Summary

So, now you familiar with:

Summarization task in NLP

Datasets and Metrics for this task

Examples how to use open source LLMs to solve summarization task with Huggingface and Langchain

How to use OpenAI models as alternative

How to ask OpenAI LLMs to call functions with structured input/output