## Close Source Modelling with BBC News Dataset

For comparison of Summary in different aspects/topics.

Understand LLM Strengths and Weaknesses:

- **Identify domain-specific strengths**: Different LLMs are trained on different datasets and may excel in different domains. Comparing summaries across topics can help you identify which LLM performs best in a specific area relevant to your needs.

- **Uncover biases and limitations**: LLMs can inherit biases from their training data. Comparing summaries can help you identify potential biases and limitations in different models, allowing you to choose the one with the least bias for your task.

- **Evaluate factual accuracy**: Some LLMs prioritize fluency over factual accuracy, while others excel at fact-checking. Comparing summaries can help you assess the factual accuracy of each LLM and choose the one that best suits your need for reliable information.

Steps:

1. **Install & Import Necessary Libraries**


2. **Load Dataset (BBC News)** : Pick 5 rows of data per aspects


3. **OpenAI Topic Summary Generation**


4. **Google Topic Summary Generation**

#### Install & Import Necessary Libraries

In [2]:
import os
import platform
import subprocess
import time
import sys

import nltk
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize, sent_tokenize
nltk.download('punkt')
nltk.download('stopwords')

sys.path.append('../')
from helper.SummarizationMetrics import SummarizationMetrics
from helper.chatgpt_automation import ChatGPTAutomation, split_text_into_chunks
from helper.bard_automation import BardAutomation, split_text_into_chunks


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from langchain import LLMChain, HuggingFacePipeline

from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.chat_models import ChatVertexAI
from langchain.llms import VertexAI
import google.generativeai as genai

from sentence_transformers import SentenceTransformer, util
from scipy.signal import argrelextrema
from sklearn.cluster import KMeans

from os import environ

from dotenv import load_dotenv
import pandas as pd
import sys
sys.path.append('../')
from helper.SummarizationMetrics import SummarizationMetrics

[nltk_data] Downloading package punkt to C:\Users\Zhang
[nltk_data]     Xiang\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Zhang
[nltk_data]     Xiang\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
  from .autonotebook import tqdm as notebook_tqdm


#### Load Dataset (BBC)

In [2]:
bbc_train_df = pd.read_excel("../Data/newsbbc_train.xlsx")

bbc_train_df.head()

Unnamed: 0,File_path,Articles,Summaries,transcript,summary
0,business,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...
1,politics,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...
2,entertainment,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...
3,politics,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...
4,politics,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ..."


In [3]:
bbc_train_df = bbc_train_df.groupby('File_path').head(5)
bbc_train_df['File_path'].value_counts()

File_path
business         5
politics         5
entertainment    5
sport            5
tech             5
Name: count, dtype: int64

In [4]:
bbc_train_df

Unnamed: 0,File_path,Articles,Summaries,transcript,summary
0,business,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...
1,politics,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...
2,entertainment,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...
3,politics,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...
4,politics,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ..."
5,politics,Howard denies split over ID cards..Michael How...,Michael Howard has denied his shadow cabinet w...,Howard denies split over ID cards..Michael How...,Michael Howard has denied his shadow cabinet w...
6,sport,Chelsea denied by James heroics..A brave defen...,Chelsea were now looking more like Premiership...,Chelsea denied by James heroics..A brave defen...,Chelsea were now looking more like Premiership...
7,politics,Guantanamo man 'suing government'..A British t...,He said he was sent there after being interrog...,Guantanamo man 'suing government'..A British t...,He said he was sent there after being interrog...
13,business,Could Yukos be a blessing in disguise?..Other ...,But it argues that more rigorous tax policing ...,Could Yukos be a blessing in disguise?..Other ...,But it argues that more rigorous tax policing ...
14,business,Asian quake hits European shares..Shares in Eu...,The unfolding scale of the disaster in south A...,Asian quake hits European shares..Shares in Eu...,The unfolding scale of the disaster in south A...


In [5]:
# Load File that contain API Key
load_dotenv(".env", override=True)

True

In [6]:
# Retrieve API Key
openai_api_key = environ.get("OPENAI_API_KEY")
gcp_api_key = environ.get("GCP_API_KEY")


In [7]:
genai.configure(api_key=gcp_api_key)

In [8]:
# List Model Available on genai for Google.
for model in genai.list_models():
    print(model)

Model(name='models/chat-bison-001',
      base_model_id='',
      version='001',
      display_name='PaLM 2 Chat (Legacy)',
      description='A legacy text-only model optimized for chat conversations',
      input_token_limit=4096,
      output_token_limit=1024,
      supported_generation_methods=['generateMessage', 'countMessageTokens'],
      temperature=0.25,
      top_p=0.95,
      top_k=40)
Model(name='models/text-bison-001',
      base_model_id='',
      version='001',
      display_name='PaLM 2 (Legacy)',
      description='A legacy model that understands text and generates text as an output',
      input_token_limit=8196,
      output_token_limit=1024,
      supported_generation_methods=['generateText', 'countTextTokens', 'createTunedTextModel'],
      temperature=0.7,
      top_p=0.95,
      top_k=40)
Model(name='models/embedding-gecko-001',
      base_model_id='',
      version='001',
      display_name='Embedding Gecko',
      description='Obtain a distributed representatio

### OpenAI Topic Summary Generation

In [14]:
# Initialize Models
# Test 1-2 models at a time to avoid Kernel stopping.
openai_models = {
    "models": [
        # "gpt-3.5-turbo-1106",
        # "gpt-3.5-turbo-0125",
        # "gpt-4",
        # "gpt-4-1106-preview",
        "gpt-4-0125-preview",

    ],
    "max_tokens": [
        # 16385,
        # 16385,
        # 8192,
        # 128000,
        128000,
    ]
}

df_openai_models = pd.DataFrame(openai_models)
df_openai_models

Unnamed: 0,models,max_tokens
0,gpt-4-0125-preview,128000


In [5]:
df_scores = pd.DataFrame(columns=['model', 'method', 'max_tokens', 'topic' ,'num_tokens' ,'transcript', 'original summary' ,'summary', 'grammar', 'readability', 'rouge', 'bert_score', 'time_taken', 'prompt', 'temperature'])
df_scores

# Run Read Excel when adding more data into the DataFrame
df_scores = pd.read_excel("./result/closed_source_model_topics_comparison.xlsx")
df_scores.head(0)

Unnamed: 0,model,method,max_tokens,topic,num_tokens,transcript,original summary,summary,grammar,readability,rouge,bert_score,time_taken,prompt,temperature,bleu


In [16]:
for model_index, model_row in df_openai_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0

    llm = ChatOpenAI(temperature=temperature, model_name=model_name, openai_api_key=openai_api_key)
    
    prompt_template = """Write a concise summary of the following: {text}"""
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

    for index, row in bbc_train_df.iterrows():
        method = "MapReduce"

        # get the summary
        start_time = time.time()
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)

        max_tokens = model_row["max_tokens"]

        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))

        # break
        summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', token_max=max_tokens , map_prompt=PROMPT)
        summary = summary_chain.run(docs)

        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'topic': row["File_path"],
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': prompt_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)

gpt-4-0125-preview
Number of tokens: 762
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.20s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 52.79it/s]


done in 2.24 seconds, 0.45 sentences/sec
Number of tokens: 702
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:05<00:00,  5.52s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 16.19it/s]


done in 5.62 seconds, 0.18 sentences/sec
Number of tokens: 718
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.73s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 18.69it/s]


done in 3.80 seconds, 0.26 sentences/sec
Number of tokens: 671
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:04<00:00,  4.66s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 52.58it/s]


done in 4.71 seconds, 0.21 sentences/sec
Number of tokens: 642
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:04<00:00,  4.04s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 71.36it/s]


done in 4.07 seconds, 0.25 sentences/sec
Number of tokens: 634
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.85s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 500.10it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.86 seconds, 0.54 sentences/sec
Number of tokens: 726
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.06s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 52.47it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 3.10 seconds, 0.32 sentences/sec
Number of tokens: 753
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:05<00:00,  5.11s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 217.07it/s]


done in 5.12 seconds, 0.20 sentences/sec
Number of tokens: 875
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.40s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.25it/s]


done in 2.41 seconds, 0.41 sentences/sec
Number of tokens: 756
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.52s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 237.31it/s]


done in 2.52 seconds, 0.40 sentences/sec
Number of tokens: 707
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.20s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.30it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.21 seconds, 0.45 sentences/sec
Number of tokens: 842
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.34s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.33it/s]


done in 2.35 seconds, 0.43 sentences/sec
Number of tokens: 829
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.57s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 466.81it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.58 seconds, 0.39 sentences/sec
Number of tokens: 639
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.96s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.23it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.97 seconds, 0.51 sentences/sec
Number of tokens: 787
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.98s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 499.26it/s]


done in 1.99 seconds, 0.50 sentences/sec
Number of tokens: 800
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.08s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 111.13it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.10 seconds, 0.48 sentences/sec
Number of tokens: 838
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.65s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 72.83it/s]


done in 2.70 seconds, 0.37 sentences/sec
Number of tokens: 694
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:03<00:00,  3.61s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 17.76it/s]


done in 3.69 seconds, 0.27 sentences/sec
Number of tokens: 712
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.15s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.12it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.16 seconds, 0.46 sentences/sec
Number of tokens: 774
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.46s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 29.20it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.51 seconds, 0.40 sentences/sec
Number of tokens: 805
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.78s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 227.72it/s]


done in 1.79 seconds, 0.56 sentences/sec
Number of tokens: 707
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.09s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 332.80it/s]


done in 2.10 seconds, 0.48 sentences/sec
Number of tokens: 636
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.60s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 213.80it/s]


done in 1.61 seconds, 0.62 sentences/sec
Number of tokens: 697
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.48s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 68.51it/s]


done in 2.50 seconds, 0.40 sentences/sec
Number of tokens: 594
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.66s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 398.70it/s]


done in 1.68 seconds, 0.60 sentences/sec


In [17]:
df_scores

Unnamed: 0,model,method,max_tokens,topic,num_tokens,transcript,original summary,summary,grammar,readability,rouge,bert_score,time_taken,prompt,temperature,bleu
0,gpt-3.5-turbo-0125,MapReduce,16385,business,762,Cuba winds back economic clock..Fidel Castro's...,Fidel Castro's decision to ban all cash transa...,Cuba has imposed a 10% tax on conversions betw...,[],100 words required.,"[{'rouge-1': {'r': 0.17006802721088435, 'p': 0...","(tensor([0.9130]), tensor([0.8402]), tensor([0...",3.803998,Write a concise summary of the following: {text},0,0.016126
1,gpt-3.5-turbo-0125,MapReduce,16385,politics,702,Blair looks to election campaign..Tony Blair's...,There was little in terms of concrete proposal...,Tony Blair's recent speech marked the start of...,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,"[{'rouge-1': {'r': 0.17365269461077845, 'p': 0...","(tensor([0.8809]), tensor([0.8259]), tensor([0...",4.186849,Write a concise summary of the following: {text},0,0.016755
2,gpt-3.5-turbo-0125,MapReduce,16385,entertainment,718,New York rockers top talent poll..New York ele...,New York electro-rock group The Bravery have c...,"The Bravery, a New York electro-rock group, to...","[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,"[{'rouge-1': {'r': 0.26666666666666666, 'p': 0...","(tensor([0.9043]), tensor([0.8542]), tensor([0...",2.662916,Write a concise summary of the following: {text},0,0.057747
3,gpt-3.5-turbo-0125,MapReduce,16385,politics,671,Terror suspects face house arrest..UK citizens...,British citizens are being included in the cha...,The UK government is considering new measures ...,[],100 words required.,"[{'rouge-1': {'r': 0.16265060240963855, 'p': 0...","(tensor([0.8905]), tensor([0.8307]), tensor([0...",3.348620,Write a concise summary of the following: {text},0,0.005401
4,gpt-3.5-turbo-0125,MapReduce,16385,politics,642,'No more concessions' on terror..Charles Clark...,"On Monday, MPs voted 272-219 in favour of the ...",Charles Clarke is standing firm on his anti-te...,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,"[{'rouge-1': {'r': 0.2751677852348993, 'p': 0....","(tensor([0.8944]), tensor([0.8478]), tensor([0...",2.810315,Write a concise summary of the following: {text},0,0.061109
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70,gpt-4-0125-preview,MapReduce,128000,sport,805,Paris promise raises Welsh hopes..Has there be...,But since they threw off the shackles against ...,The article highlights the rising expectations...,[],"score: 14.651290322580646, grade_level: '15'","[{'rouge-1': {'r': 0.17391304347826086, 'p': 0...","([tensor(0.8637)], [tensor(0.8236)], [tensor(0...",33.090430,Write a concise summary of the following: {text},0,0.035691
71,gpt-4-0125-preview,MapReduce,128000,entertainment,707,Rapper Kanye West's shrewd soul..US hip-hop st...,Leaving his Chicago art school after only one ...,"Kanye West, a key figure in hip-hop, leads the...","[Offset 557, length 4, Rule ID: MORFOLOGIK_RUL...","score: 13.59773553719008, grade_level: '14'","[{'rouge-1': {'r': 0.2430939226519337, 'p': 0....","([tensor(0.8815)], [tensor(0.8397)], [tensor(0...",22.018759,Write a concise summary of the following: {text},0,0.034853
72,gpt-4-0125-preview,MapReduce,128000,entertainment,636,Redford's vision of Sundance..Despite sporting...,Redford wanted Sundance to be a platform for i...,Robert Redford founded the Sundance Film Festi...,"[Offset 27, length 8, Rule ID: MORFOLOGIK_RULE...","score: 15.02, grade_level: '15'","[{'rouge-1': {'r': 0.16129032258064516, 'p': 0...","([tensor(0.8860)], [tensor(0.8411)], [tensor(0...",25.070732,Write a concise summary of the following: {text},0,0.025239
73,gpt-4-0125-preview,MapReduce,128000,business,697,Water firm Suez in Argentina row..A conflict b...,The government has rejected the 60% rise and w...,The Argentine government is in a dispute with ...,"[Offset 46, length 5, Rule ID: MORFOLOGIK_RULE...","score: 17.724833333333333, grade_level: '18'","[{'rouge-1': {'r': 0.2929936305732484, 'p': 0....","([tensor(0.8820)], [tensor(0.8585)], [tensor(0...",43.071345,Write a concise summary of the following: {text},0,0.064537


In [18]:
# df_scores.to_excel("./result/closed_source_model_topics_comparison.xlsx", index=False)

### Google Topic Summary Generation

In [11]:
df_scores = pd.read_excel("./result/closed_source_model_topics_comparison.xlsx")
print(df_scores.shape)
df_scores.head(0)

(150, 16)


Unnamed: 0,model,method,max_tokens,topic,num_tokens,transcript,original summary,summary,grammar,readability,rouge,bert_score,time_taken,prompt,temperature,bleu


In [21]:
# Initialize Models
gcp_models = {

    # Models selected
    "models": [
        # "gemini-pro",
        # "chat-bison-001",
        "text-bison-001"
    ],

    # Input Token
    "max_tokens": [
        # 30720,
        # 4096,
        8196,

    ]
}


df_gcp_models = pd.DataFrame(gcp_models)
df_gcp_models

Unnamed: 0,models,max_tokens
0,text-bison-001,8196


In [22]:
for model_index, model_row in df_gcp_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0

    if "chat" in model_name.lower():
      llm = ChatVertexAI(temperature=temperature, model=model_name, verbose=True)
    else:
      llm = VertexAI(temperature=temperature, model=model_name, verbose=True)
    
    prompt_template = """Write a concise summary of the following: {text}"""
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

    for index, row in bbc_train_df.iterrows():
        method = "MapReduce"

        # get the summary
        start_time = time.time()
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)

        max_tokens = model_row["max_tokens"]
        print(max_tokens)

        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))

        summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', token_max=max_tokens , map_prompt=PROMPT)
        summary = summary_chain.run(docs)

        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'topic': row["File_path"],
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': prompt_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)


text-bison-001
Number of tokens: 778
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.04s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 95.23it/s]


done in 2.08 seconds, 0.48 sentences/sec
Number of tokens: 701
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.43s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 90.89it/s]


done in 2.45 seconds, 0.41 sentences/sec
Number of tokens: 728
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.59s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 200.00it/s]


done in 1.60 seconds, 0.62 sentences/sec
Number of tokens: 672
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.97s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 181.34it/s]


done in 1.99 seconds, 0.50 sentences/sec
Number of tokens: 654
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.12s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 125.02it/s]


done in 2.15 seconds, 0.47 sentences/sec
Number of tokens: 629
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 166.67it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.52 seconds, 0.66 sentences/sec
Number of tokens: 704
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.06s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.83it/s]


done in 2.08 seconds, 0.48 sentences/sec
Number of tokens: 767
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.10s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 499.92it/s]


done in 2.11 seconds, 0.47 sentences/sec
Number of tokens: 879
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.28s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 122.94it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.30 seconds, 0.43 sentences/sec
Number of tokens: 773
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.27s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 995.56it/s]


done in 2.28 seconds, 0.44 sentences/sec
Number of tokens: 699
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.04s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 111.00it/s]


done in 2.06 seconds, 0.49 sentences/sec
Number of tokens: 859
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.27s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 121.60it/s]


done in 2.29 seconds, 0.44 sentences/sec
Number of tokens: 828
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.48s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 153.46it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.50 seconds, 0.40 sentences/sec
Number of tokens: 634
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.69s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 124.99it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.71 seconds, 0.58 sentences/sec
Number of tokens: 801
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.39s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 499.38it/s]


done in 2.40 seconds, 0.42 sentences/sec
Number of tokens: 809
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.08s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 124.95it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.09 seconds, 0.48 sentences/sec
Number of tokens: 854
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.54s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.92it/s]


done in 2.56 seconds, 0.39 sentences/sec
Number of tokens: 696
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.08s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 100.00it/s]


done in 2.11 seconds, 0.47 sentences/sec
Number of tokens: 701
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.08s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 83.34it/s]


done in 2.10 seconds, 0.48 sentences/sec
Number of tokens: 751
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.24s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 117.55it/s]


done in 2.26 seconds, 0.44 sentences/sec
Number of tokens: 815
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.03s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 32.11it/s]


done in 2.08 seconds, 0.48 sentences/sec
Number of tokens: 744
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.27s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 79.98it/s]


done in 2.29 seconds, 0.44 sentences/sec
Number of tokens: 639
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.98s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 111.09it/s]


done in 2.01 seconds, 0.50 sentences/sec
Number of tokens: 690
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  2.00s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 95.21it/s]


done in 2.02 seconds, 0.50 sentences/sec
Number of tokens: 600
8196
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.09s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 97.50it/s]


done in 2.12 seconds, 0.47 sentences/sec


In [14]:
df_scores.to_excel("./result/closed_source_model_topics_comparison.xlsx", index=False)