## Open Source Modelling with BBC News Dataset

For comparison of Summary in different aspects/topics.

Understand LLM Strengths and Weaknesses:

- **Identify domain-specific strengths**: Different LLMs are trained on different datasets and may excel in different domains. Comparing summaries across topics can help you identify which LLM performs best in a specific area relevant to your needs.

- **Uncover biases and limitations**: LLMs can inherit biases from their training data. Comparing summaries can help you identify potential biases and limitations in different models, allowing you to choose the one with the least bias for your task.

- **Evaluate factual accuracy**: Some LLMs prioritize fluency over factual accuracy, while others excel at fact-checking. Comparing summaries can help you assess the factual accuracy of each LLM and choose the one that best suits your need for reliable information.

Steps:

1. **Install & Import Necessary Libraries**

2. **Helper Function**

3. **Load Dataset (BBC News)** : Pick 5 rows of data per aspects

4. **Top 5 Open Source Model Summarizer**

#### Install & Import Necessary Libraries

In [1]:
import time
import sys
sys.path.append('../')
from helper.SummarizationMetrics import SummarizationMetrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import ast
import seaborn as sns


import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from  langchain import LLMChain, HuggingFacePipeline, PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from sentence_transformers import SentenceTransformer, util
from scipy.signal import argrelextrema
from sklearn.cluster import KMeans


import nltk
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize, sent_tokenize
nltk.download('punkt')
nltk.download('stopwords')

  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Error loading punkt: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
[nltk_data] Error loading stopwords: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>


False

#### Helper Function

In [2]:
def summ_pipeline(model, tokenizer, chain_type, max_length, prompt=False):
  pipeline = transformers.pipeline(
      "summarization",
      model=model,
      tokenizer=tokenizer,
      torch_dtype=torch.bfloat16,
      trust_remote_code=True,
      device_map="auto",
      max_length=max_length,
      do_sample=True,
      top_k=10,
      num_return_sequences=1,
      eos_token_id=tokenizer.eos_token_id,
  )
  llm = HuggingFacePipeline(pipeline = pipeline)

  if chain_type == "map_reduce":
    if prompt:
      prompt_template = """Summarize this: ```{text}```"""
      prompt_message = PromptTemplate(template=prompt_template, input_variables=["text"])
      
      summary_chain = load_summarize_chain(llm=llm, chain_type=chain_type, token_max=max_length, prompt=prompt_message)
    else:
      summary_chain = load_summarize_chain(llm=llm, chain_type=chain_type, token_max=max_length)
  else:
    # can't get it to work with refine and stuff, think they updated the library but no documentation
    # on how to set token_max
    summary_chain = load_summarize_chain(llm=llm, chain_type=chain_type)
  return summary_chain

### BBC DATASET
#### Load Dataset

In [3]:
bbc_train_df = pd.read_excel("../Data/newsbbc_train.xlsx")

bbc_train_df.head()

Unnamed: 0,File_path,Articles,Summaries,transcript,summary
0,business,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...
1,politics,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...
2,entertainment,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...
3,politics,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...
4,politics,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ..."


Get 5 records of data for each topics.

In [4]:
# Randomly get 5 records from each topics.
bbc_train_df = bbc_train_df.groupby('File_path').head(5)
bbc_train_df['File_path'].value_counts()

File_path
business         5
politics         5
entertainment    5
sport            5
tech             5
Name: count, dtype: int64

In [5]:
bbc_train_df

Unnamed: 0,File_path,Articles,Summaries,transcript,summary
0,business,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...
1,politics,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...
2,entertainment,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...
3,politics,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...
4,politics,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ..."
5,politics,Howard denies split over ID cards..Michael How...,Michael Howard has denied his shadow cabinet w...,Howard denies split over ID cards..Michael How...,Michael Howard has denied his shadow cabinet w...
6,sport,Chelsea denied by James heroics..A brave defen...,Chelsea were now looking more like Premiership...,Chelsea denied by James heroics..A brave defen...,Chelsea were now looking more like Premiership...
7,politics,Guantanamo man 'suing government'..A British t...,He said he was sent there after being interrog...,Guantanamo man 'suing government'..A British t...,He said he was sent there after being interrog...
13,business,Could Yukos be a blessing in disguise?..Other ...,But it argues that more rigorous tax policing ...,Could Yukos be a blessing in disguise?..Other ...,But it argues that more rigorous tax policing ...
14,business,Asian quake hits European shares..Shares in Eu...,The unfolding scale of the disaster in south A...,Asian quake hits European shares..Shares in Eu...,The unfolding scale of the disaster in south A...


When running too much models at once it could take very long, also Kernel will stop.

Recommend: Run model separately.

In [6]:
# df_scores = pd.read_excel("../Process/result/open_source_model_topics_comparison.xlsx")

# Defined the top 5 Models
data = {
    'Models': [
        # 'pszemraj/led-base-book-summary',
        # 'pszemraj/led-large-book-summary',
        # 'HHousen/distil-led-large-cnn-16384',
        'philschmid/bart-large-cnn-samsum',
        'pszemraj/long-t5-tglobal-base-16384-book-summary'
    ]
}

top_5_models = pd.DataFrame(data)
top_5_models

Unnamed: 0,Models
0,philschmid/bart-large-cnn-samsum
1,pszemraj/long-t5-tglobal-base-16384-book-summary


In [7]:
# Create DataFrame to store summary result and performance.
df_scores = pd.DataFrame(columns=['model', 'method', 'max_tokens', "topic" ,'transcript','original summary', 'summary', 'rouge','bert_score', 'bleu', 'time_taken', 'grammar', 'readability'])
df_scores

Unnamed: 0,model,method,max_tokens,topic,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability


In [8]:
# Read file, if want to continue add on summary result.
df_scores = pd.read_excel("../Process/result/open_source_model_topics_comparison.xlsx")
print(df_scores.shape)
df_scores.head()

(75, 13)


Unnamed: 0,model,method,max_tokens,topic,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability
0,pszemraj/led-base-book-summary,MapReduce,16384,business,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,The following is a summary of the announcement...,"[{'rouge-1': {'r': 0.38095238095238093, 'p': 0...","(tensor([0.8857]), tensor([0.8723]), tensor([0...",0.218163,19.899436,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 13.012869198312234, grade_level: '13'"
1,pszemraj/led-base-book-summary,MapReduce,16384,politics,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,"In this brief summary, we summarize the speech...","[{'rouge-1': {'r': 0.2874251497005988, 'p': 0....","(tensor([0.8664]), tensor([0.8317]), tensor([0...",0.095036,21.163034,"[Match({'ruleId': 'EN_COMPOUNDS', 'message': '...","score: 9.665454545454548, grade_level: '10'"
2,pszemraj/led-base-book-summary,MapReduce,16384,entertainment,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,The narrator of this piece originally appeared...,"[{'rouge-1': {'r': 0.4148148148148148, 'p': 0....","(tensor([0.8524]), tensor([0.8609]), tensor([0...",0.17599,28.52308,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 11.396671501087742, grade_level: '11'"
3,pszemraj/led-base-book-summary,MapReduce,16384,politics,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,"The Home Secretary, Charles Clarke, has outlin...","[{'rouge-1': {'r': 0.2710843373493976, 'p': 0....","(tensor([0.8730]), tensor([0.8468]), tensor([0...",0.056949,24.95648,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 11.617224157955867, grade_level: '12'"
4,pszemraj/led-base-book-summary,MapReduce,16384,politics,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",The English Parliament passes the Prevention o...,"[{'rouge-1': {'r': 0.3959731543624161, 'p': 0....","(tensor([0.8890]), tensor([0.8720]), tensor([0...",0.167719,21.425882,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 9.484285714285715, grade_level: '9'"


#### Top 5 Open Source Model Summarizer

In [9]:
for model_name in top_5_models['Models']:
    print(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    print(tokenizer.model_max_length)
    for index, row in bbc_train_df.iterrows():
        method = "MapReduce"
        
        # get the summary
        start_time = time.time()


        max_tokens = tokenizer.model_max_length
        summary_chain = summ_pipeline(model, tokenizer, "map_reduce", max_tokens)

        # Used for efficient tokenization and processing of long texts when working with language models
        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row["transcript"]])
        summary = summary_chain.run(docs)


        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'topic': row["File_path"],
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index()
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)


philschmid/bart-large-cnn-samsum
1024


Your max_length is set to 1024, but your input_length is only 785. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=392)
Your max_length is set to 1024, but your input_length is only 110. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=55)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.06s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 62.51it/s]


done in 2.08 seconds, 0.48 sentences/sec


Your max_length is set to 1024, but your input_length is only 716. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=358)
Your max_length is set to 1024, but your input_length is only 82. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=41)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.08s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 63.98it/s]


done in 2.10 seconds, 0.48 sentences/sec


Your max_length is set to 1024, but your input_length is only 722. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=361)
Your max_length is set to 1024, but your input_length is only 94. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=47)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.64s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 71.33it/s]


done in 1.66 seconds, 0.60 sentences/sec


Your max_length is set to 1024, but your input_length is only 688. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=344)
Your max_length is set to 1024, but your input_length is only 82. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=41)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.33s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 83.17it/s]


done in 2.35 seconds, 0.43 sentences/sec


Your max_length is set to 1024, but your input_length is only 665. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=332)
Your max_length is set to 1024, but your input_length is only 93. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=46)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.05s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.70it/s]


done in 2.05 seconds, 0.49 sentences/sec


Your max_length is set to 1024, but your input_length is only 657. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=328)
Your max_length is set to 1024, but your input_length is only 88. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=44)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.81s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.15it/s]


done in 1.82 seconds, 0.55 sentences/sec


Your max_length is set to 1024, but your input_length is only 737. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=368)
Your max_length is set to 1024, but your input_length is only 102. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=51)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.74s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 34.48it/s]


done in 3.77 seconds, 0.26 sentences/sec


Your max_length is set to 1024, but your input_length is only 757. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=378)
Your max_length is set to 1024, but your input_length is only 90. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=45)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.07s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 58.82it/s]


done in 3.10 seconds, 0.32 sentences/sec


Your max_length is set to 1024, but your input_length is only 885. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=442)
Your max_length is set to 1024, but your input_length is only 94. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=47)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.83s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 199.80it/s]


done in 2.85 seconds, 0.35 sentences/sec


Your max_length is set to 1024, but your input_length is only 763. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=381)
Your max_length is set to 1024, but your input_length is only 115. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=57)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.26s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.04it/s]


done in 3.27 seconds, 0.31 sentences/sec


Your max_length is set to 1024, but your input_length is only 725. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=362)
Your max_length is set to 1024, but your input_length is only 94. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=47)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.72s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 499.86it/s]


done in 2.73 seconds, 0.37 sentences/sec


Your max_length is set to 1024, but your input_length is only 847. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=423)
Your max_length is set to 1024, but your input_length is only 110. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=55)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.50s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 500.75it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.51 seconds, 0.40 sentences/sec


Your max_length is set to 1024, but your input_length is only 866. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=433)
Your max_length is set to 1024, but your input_length is only 101. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.38s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 21.74it/s]


done in 3.44 seconds, 0.29 sentences/sec


Your max_length is set to 1024, but your input_length is only 668. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=334)
Your max_length is set to 1024, but your input_length is only 93. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=46)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.59s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 63.14it/s]


done in 2.62 seconds, 0.38 sentences/sec


Your max_length is set to 1024, but your input_length is only 812. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=406)
Your max_length is set to 1024, but your input_length is only 96. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=48)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.39s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 495.72it/s]


done in 2.40 seconds, 0.42 sentences/sec


Your max_length is set to 1024, but your input_length is only 820. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=410)
Your max_length is set to 1024, but your input_length is only 81. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=40)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.77s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.65it/s]


done in 2.78 seconds, 0.36 sentences/sec


Your max_length is set to 1024, but your input_length is only 859. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=429)
Your max_length is set to 1024, but your input_length is only 175. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=87)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.48s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 249.90it/s]


done in 3.49 seconds, 0.29 sentences/sec


Your max_length is set to 1024, but your input_length is only 713. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=356)
Your max_length is set to 1024, but your input_length is only 88. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=44)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.35s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.09it/s]


done in 2.35 seconds, 0.42 sentences/sec


Your max_length is set to 1024, but your input_length is only 731. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=365)
Your max_length is set to 1024, but your input_length is only 78. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=39)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.01s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 166.37it/s]


done in 3.04 seconds, 0.33 sentences/sec


Your max_length is set to 1024, but your input_length is only 803. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=401)
Your max_length is set to 1024, but your input_length is only 78. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=39)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.75s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 249.96it/s]


done in 2.76 seconds, 0.36 sentences/sec


Your max_length is set to 1024, but your input_length is only 828. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=414)
Your max_length is set to 1024, but your input_length is only 90. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=45)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.54s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 403.92it/s]


done in 2.54 seconds, 0.39 sentences/sec


Your max_length is set to 1024, but your input_length is only 741. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=370)
Your max_length is set to 1024, but your input_length is only 103. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=51)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.46s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 500.27it/s]


done in 3.46 seconds, 0.29 sentences/sec


Your max_length is set to 1024, but your input_length is only 661. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=330)
Your max_length is set to 1024, but your input_length is only 79. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=39)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.41s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 99.94it/s]


done in 2.43 seconds, 0.41 sentences/sec


Your max_length is set to 1024, but your input_length is only 710. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=355)
Your max_length is set to 1024, but your input_length is only 129. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=64)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.76s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 100.00it/s]


done in 2.79 seconds, 0.36 sentences/sec


Your max_length is set to 1024, but your input_length is only 612. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=306)
Your max_length is set to 1024, but your input_length is only 91. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=45)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.36s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 495.02it/s]


done in 2.37 seconds, 0.42 sentences/sec
pszemraj/long-t5-tglobal-base-16384-book-summary


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 872. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=436)


1000000000000000019884624838656


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 183. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=91)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.38s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 500.22it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.39 seconds, 0.42 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 793. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=396)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 143. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=71)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.23s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 200.00it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.24 seconds, 0.45 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 828. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=414)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 107. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=53)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.65s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.04it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.66 seconds, 0.60 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 802. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=401)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 160. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=80)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.17s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.84it/s]


done in 3.19 seconds, 0.31 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 760. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=380)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 122. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=61)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.59s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 102.75it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.62 seconds, 0.38 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 724. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=362)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 209. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=104)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.19s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 224.16it/s]
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.20 seconds, 0.46 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 895. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=447)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 258. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=129)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.69s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 308.63it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.70 seconds, 0.37 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 899. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=449)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 132. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=66)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.68s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 498.37it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.69 seconds, 0.37 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 981. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=490)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 145. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=72)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.93s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 165.92it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.95 seconds, 0.34 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 824. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=412)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 103. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=51)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.74s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 249.59it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.76 seconds, 0.36 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 792. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=396)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 139. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=69)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.09s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 314.75it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.10 seconds, 0.32 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 903. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=451)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 68. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=34)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.62s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 500.04it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.63 seconds, 0.38 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 955. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=477)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 121. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=60)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.86s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.12it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.87 seconds, 0.35 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 747. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=373)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 123. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=61)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.61s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 249.93it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.62 seconds, 0.38 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 860. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=430)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 138. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=69)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.71s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 287.38it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.72 seconds, 0.27 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 891. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=445)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 123. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=61)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:04<00:00,  4.32s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.87it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 4.34 seconds, 0.23 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 930. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=465)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 64. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=32)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:04<00:00,  4.50s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 250.00it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 4.51 seconds, 0.22 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 770. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=385)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 121. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=60)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.35s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 98.43it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.37 seconds, 0.30 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 880. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=440)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 185. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=92)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.27s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 83.34it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.30 seconds, 0.30 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 907. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=453)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 75. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.34s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.28it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.35 seconds, 0.30 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 910. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=455)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 92. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=46)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.29s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.20it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.30 seconds, 0.43 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 837. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=418)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 164. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=82)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.90s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 249.84it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.91 seconds, 0.26 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 739. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=369)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 107. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=53)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.76s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.90it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.78 seconds, 0.36 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 805. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=402)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 163. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=81)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.48s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.54it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.49 seconds, 0.29 sentences/sec


Your max_length is set to 1000000000000000019884624838656, but your input_length is only 679. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=339)
Your max_length is set to 1000000000000000019884624838656, but your input_length is only 73. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=36)
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.59s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 166.78it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.61 seconds, 0.38 sentences/sec


In [10]:
df_scores

Unnamed: 0,model,method,max_tokens,topic,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability
0,pszemraj/led-base-book-summary,MapReduce,16384,business,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,The following is a summary of the announcement...,"[{'rouge-1': {'r': 0.38095238095238093, 'p': 0...","(tensor([0.8857]), tensor([0.8723]), tensor([0...",2.181628e-01,19.899436,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 13.012869198312234, grade_level: '13'"
1,pszemraj/led-base-book-summary,MapReduce,16384,politics,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,"In this brief summary, we summarize the speech...","[{'rouge-1': {'r': 0.2874251497005988, 'p': 0....","(tensor([0.8664]), tensor([0.8317]), tensor([0...",9.503589e-02,21.163034,"[Match({'ruleId': 'EN_COMPOUNDS', 'message': '...","score: 9.665454545454548, grade_level: '10'"
2,pszemraj/led-base-book-summary,MapReduce,16384,entertainment,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,The narrator of this piece originally appeared...,"[{'rouge-1': {'r': 0.4148148148148148, 'p': 0....","(tensor([0.8524]), tensor([0.8609]), tensor([0...",1.759903e-01,28.523080,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 11.396671501087742, grade_level: '11'"
3,pszemraj/led-base-book-summary,MapReduce,16384,politics,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,"The Home Secretary, Charles Clarke, has outlin...","[{'rouge-1': {'r': 0.2710843373493976, 'p': 0....","(tensor([0.8730]), tensor([0.8468]), tensor([0...",5.694877e-02,24.956480,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 11.617224157955867, grade_level: '12'"
4,pszemraj/led-base-book-summary,MapReduce,16384,politics,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",The English Parliament passes the Prevention o...,"[{'rouge-1': {'r': 0.3959731543624161, 'p': 0....","(tensor([0.8890]), tensor([0.8720]), tensor([0...",1.677187e-01,21.425882,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...","score: 9.484285714285715, grade_level: '9'"
...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,pszemraj/long-t5-tglobal-base-16384-book-summary,MapReduce,1000000000000000019884624838656,sport,Paris promise raises Welsh hopes..Has there be...,But since they threw off the shackles against ...,This brief paper gives an update on the progre...,"[{'rouge-1': {'r': 0.08695652173913043, 'p': 0...","([tensor(0.8620)], [tensor(0.8025)], [tensor(0...",4.201394e-157,9.210987,[],100 words required.
121,pszemraj/long-t5-tglobal-base-16384-book-summary,MapReduce,1000000000000000019884624838656,entertainment,Rapper Kanye West's shrewd soul..US hip-hop st...,Leaving his Chicago art school after only one ...,The Narrator informs the audience that Kany We...,"[{'rouge-1': {'r': 0.09392265193370165, 'p': 0...","([tensor(0.8522)], [tensor(0.7918)], [tensor(0...",1.260210e-156,15.023796,"[Offset 39, length 4, Rule ID: MORFOLOGIK_RULE...",100 words required.
122,pszemraj/long-t5-tglobal-base-16384-book-summary,MapReduce,1000000000000000019884624838656,entertainment,Redford's vision of Sundance..Despite sporting...,Redford wanted Sundance to be a platform for i...,Robert Redford is the founder and president of...,"[{'rouge-1': {'r': 0.07096774193548387, 'p': 0...","([tensor(0.8724)], [tensor(0.8100)], [tensor(0...",8.507861e-159,8.595882,"[Offset 51, length 8, Rule ID: MORFOLOGIK_RULE...",100 words required.
123,pszemraj/long-t5-tglobal-base-16384-book-summary,MapReduce,1000000000000000019884624838656,business,Water firm Suez in Argentina row..A conflict b...,The government has rejected the 60% rise and w...,This paper describes a dispute between the Fre...,"[{'rouge-1': {'r': 0.07643312101910828, 'p': 0...","([tensor(0.8490)], [tensor(0.8013)], [tensor(0...",2.727474e-81,16.457350,"[Offset 106, length 9, Rule ID: MORFOLOGIK_RUL...",100 words required.


In [11]:
# df_scores.to_excel("../Process/result/open_source_model_topics_comparison.xlsx", index=False)

In [13]:
df_scores.shape

(125, 13)