# Closed-Source Prompt Engineering

1. **Install & Import Necessary Libraries**

2. **Helper Function**

3. **Multi-Speaker Conversation Summarizer Prompt Engineering (OpenAI)**

    3.1. Prompt Template Testing

    3.2 Few-Shot Prompting (Did 5 Shots)

    3.3. Map & Combine load_summarize_chain(): Best Method over testing

### Install & Import Necessary Libraries

In [1]:
import os
import platform
import subprocess
import time
import sys
import re

import nltk
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize, sent_tokenize
nltk.download('punkt')
nltk.download('stopwords')

sys.path.append('../')
from helper.SummarizationMetrics import SummarizationMetrics
from helper.chatgpt_automation import ChatGPTAutomation, split_text_into_chunks
from helper.bard_automation import BardAutomation, split_text_into_chunks


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import seaborn as sns

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from langchain import LLMChain, HuggingFacePipeline

from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from sentence_transformers import SentenceTransformer, util
from scipy.signal import argrelextrema
from sklearn.cluster import KMeans


[nltk_data] Downloading package punkt to C:\Users\Zhang
[nltk_data]     Xiang\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Zhang
[nltk_data]     Xiang\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
  from .autonotebook import tqdm as notebook_tqdm


#### Helper Function

In [2]:
def summ_pipeline(model, tokenizer, chain_type, max_length, prompt=False):
  pipeline = transformers.pipeline(
      "summarization",
      model=model,
      tokenizer=tokenizer,
      torch_dtype=torch.bfloat16,
      trust_remote_code=True,
      device_map="auto",
      max_length=max_length,
      do_sample=True,
      top_k=10,
      num_return_sequences=1,
      eos_token_id=tokenizer.eos_token_id,
  )
  
  llm = HuggingFacePipeline(pipeline = pipeline)

  if chain_type == "map_reduce":
    if prompt:
      prompt_template = """Summarize this: ```{text}```"""
      prompt_message = PromptTemplate(template=prompt_template, input_variables=["text"])
      
      summary_chain = load_summarize_chain(llm=llm, chain_type=chain_type, token_max=max_length, prompt=prompt_message)
    else:
      summary_chain = load_summarize_chain(llm=llm, chain_type=chain_type, token_max=max_length)
  else:
    # can't get it to work with refine and stuff, think they updated the library but no documentation
    # on how to set token_max
    summary_chain = load_summarize_chain(llm=llm, chain_type=chain_type)
  return summary_chain

In [3]:
from os import environ
from langchain.chat_models import ChatOpenAI

from dotenv import load_dotenv

import pandas as pd
import sys
sys.path.append('../')
from helper.SummarizationMetrics import SummarizationMetrics

In [4]:
# df = pd.read_excel("../Data/tib_test.xlsx")

# df_test = df.head(5)

# df_test

Unnamed: 0,summary,transcript
0,A firsthand look at efforts to improve diversi...,All right. So our next talk is called Hacking...
1,It is certainly a time of discovery- though th...,"Welcome, DEF CON 28, the Do No Harm panel. Th..."
2,Roman Architecture (HSAR 252) Professor Kleine...,Good morning. As you can see from the title o...
3,Stochastic rewriting systems evolving over gra...,"Thank you very much, first important question..."
4,"In typical military operations, the advantage ...",I was great to be with all of you today. I sa...


In [5]:
load_dotenv(".env", override=True)

True

In [6]:
# retrieve api key
openai_api_key = environ.get("OPENAI_API_KEY")

In [7]:
# Initialize OpenAI Model
openai_models = {
    "models": [
        "gpt-3.5-turbo-1106",
        "gpt-4",
        "gpt-4-1106-preview"
    ],
    "max_tokens": [
        16385,
        8192,
        128000,

    ]
}

df_openai_models = pd.DataFrame(openai_models)
df_openai_models

Unnamed: 0,models,max_tokens
0,gpt-3.5-turbo-1106,16385
1,gpt-4,8192
2,gpt-4-1106-preview,128000


In [13]:
# Execute when want to continue
df_scores = pd.read_excel("./result/closed_source_model_openai_api.xlsx")
df_scores.head(0)

Unnamed: 0,model,method,max_tokens,num_tokens,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability,prompt,temperature


In [15]:
for model_index, model_row in df_openai_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0
    llm = ChatOpenAI(temperature=temperature, model_name=model_name, openai_api_key=openai_api_key)
    
    prompt_template = """Write a concise yet comprehensive summary
        that highlights key topics and discussions from the webinar transcripts. Purpose of the summary is for users seek overviews before committing to the full video, and the summary should capture
          essential threads, providing a gist of the discussion. The intended purpose is to assist users in quickly grasping main points 
          and reinforcing learning post-viewing. 
          
          Keep the generated summary around 200 words for following context: 
          {text}"""
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

    for index, row in df_test.iterrows():
        method = "MapReduce"

        # get the summary
        start_time = time.time()
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)

        max_tokens = model_row["max_tokens"]

        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))

        # break
        summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', token_max=max_tokens , map_prompt=PROMPT)
        summary = summary_chain.run(docs)

        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': prompt_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)
        time.sleep(2)


gpt-3.5-turbo-1106
Number of tokens: 6827
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.17s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 52.63it/s]


done in 2.25 seconds, 0.44 sentences/sec


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


Number of tokens: 6969
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.05s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 73.16it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.08 seconds, 0.48 sentences/sec
Number of tokens: 7628
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.12s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.88it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.14 seconds, 0.47 sentences/sec
Number of tokens: 6855
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.42s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 125.00it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.45 seconds, 0.41 sentences/sec
Number of tokens: 7058
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.41s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 499.26it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.41 seconds, 0.71 sentences/sec
gpt-4
Number of tokens: 6827
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.49s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 25.97it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.55 seconds, 0.39 sentences/sec
Number of tokens: 6969
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.31s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 664.60it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.31 seconds, 0.76 sentences/sec
Number of tokens: 7628
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.24s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 26.67it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.30 seconds, 0.44 sentences/sec
Number of tokens: 6855
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.30s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 499.44it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.31 seconds, 0.43 sentences/sec
Number of tokens: 7058
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 29.96it/s]


done in 1.55 seconds, 0.65 sentences/sec
gpt-4-1106-preview
Number of tokens: 6827
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.31s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 83.91it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.34 seconds, 0.43 sentences/sec
Number of tokens: 6969
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.96s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 71.43it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.99 seconds, 0.50 sentences/sec
Number of tokens: 7628
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.59s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 26.66it/s]


done in 2.65 seconds, 0.38 sentences/sec
Number of tokens: 6855
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.24s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 29.41it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.29 seconds, 0.44 sentences/sec
Number of tokens: 7058
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:02<00:00,  2.21s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.30it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 2.22 seconds, 0.45 sentences/sec


In [16]:
df_scores.shape

Unnamed: 0,model,method,max_tokens,num_tokens,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability,prompt,temperature
0,gpt-3.5-turbo-1106,MapReduce,16385,6827,All right. So our next talk is called Hacking...,A firsthand look at efforts to improve diversi...,Professor Christina Tamba-Hester's talk on Hac...,"[{'rouge-1': {'r': 0.13690476190476192, 'p': 0...","(tensor([0.8656]), tensor([0.8255]), tensor([0...",8.954086e-156,21.383893,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
1,gpt-3.5-turbo-1106,MapReduce,16385,7628,Good morning. As you can see from the title o...,Roman Architecture (HSAR 252) Professor Kleine...,The speaker discusses important developments i...,"[{'rouge-1': {'r': 0.11428571428571428, 'p': 0...","(tensor([0.8856]), tensor([0.8039]), tensor([0...",6.397942e-157,11.230562,[],100 words required.,Write a concise summary of the following: {text},0
2,gpt-3.5-turbo-1106,MapReduce,16385,6855,"Thank you very much, first important question...",Stochastic rewriting systems evolving over gra...,The main goal of the research is to analyze th...,"[{'rouge-1': {'r': 0.08888888888888889, 'p': 0...","(tensor([0.8576]), tensor([0.7937]), tensor([0...",1.7494850000000002e-156,115.299044,[],100 words required.,Write a concise summary of the following: {text},0
3,gpt-3.5-turbo-1106,MapReduce,16385,7058,I was great to be with all of you today. I sa...,"In typical military operations, the advantage ...",The presentation challenged traditional approa...,"[{'rouge-1': {'r': 0.1037037037037037, 'p': 0....","(tensor([0.8622]), tensor([0.8058]), tensor([0...",1.5875009999999998e-79,14.139802,[],100 words required.,Write a concise summary of the following: {text},0
4,gpt-3.5-turbo-1106,MapReduce,16385,6969,"Welcome, DEF CON 28, the Do No Harm panel. Th...",It is certainly a time of discovery- though th...,The speaker emphasizes the need to take respon...,"[{'rouge-1': {'r': 0.06299212598425197, 'p': 0...","(tensor([0.8376]), tensor([0.7977]), tensor([0...",1.4651720000000001e-156,93.767647,[],100 words required.,Write a concise summary of the following: {text},0
5,gpt-4,MapReduce,8192,6969,"Welcome, DEF CON 28, the Do No Harm panel. Th...",It is certainly a time of discovery- though th...,The DEF CON 28 Do No Harm panel discussed the ...,"[{'rouge-1': {'r': 0.14173228346456693, 'p': 0...","(tensor([0.8701]), tensor([0.8204]), tensor([0...",2.587246e-79,11.267398,[],100 words required.,Write a concise summary of the following: {text},0
6,gpt-4-1106-preview,MapReduce,128000,6969,"Welcome, DEF CON 28, the Do No Harm panel. Th...",It is certainly a time of discovery- though th...,"At DEF CON 28, the ""Do No Harm"" panel brought ...","[{'rouge-1': {'r': 0.1968503937007874, 'p': 0....","(tensor([0.8469]), tensor([0.8214]), tensor([0...",5.467525e-79,16.473476,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
7,gpt-4,MapReduce,8192,6827,All right. So our next talk is called Hacking...,A firsthand look at efforts to improve diversi...,"In her talk ""Hacking Diversity"", Professor Chr...","[{'rouge-1': {'r': 0.11904761904761904, 'p': 0...","(tensor([0.8734]), tensor([0.8299]), tensor([0...",2.314046e-79,13.086216,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
8,gpt-4,MapReduce,8192,7628,Good morning. As you can see from the title o...,Roman Architecture (HSAR 252) Professor Kleine...,The lecture explores the architecture of Hercu...,"[{'rouge-1': {'r': 0.15, 'p': 0.34426229508196...","(tensor([0.8820]), tensor([0.8144]), tensor([0...",0.02455021,14.684876,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
9,gpt-4,MapReduce,8192,6855,"Thank you very much, first important question...",Stochastic rewriting systems evolving over gra...,The speaker explores the application of combin...,"[{'rouge-1': {'r': 0.09444444444444444, 'p': 0...","(tensor([0.8588]), tensor([0.7954]), tensor([0...",5.637758e-80,8.984848,[],100 words required.,Write a concise summary of the following: {text},0


#### Multi-Speaker Conversation Summarizer Prompt Engineering (OpenAI)

In [8]:
df_qmsum_test = pd.read_excel("../Data/qmsum_test.xlsx")
df_qmsum_test = df_qmsum_test.iloc[[4]]
# df_qmsum_test = df_qmsum_test.head(5)

df_qmsum_test

Unnamed: 0,transcript,summary
4,"Professor C : Uh , is it the twenty - fourth ?...",The meeting covered the issues with different ...


In [9]:
# Initialize Models
openai_models = {
    "models": [
        "gpt-3.5-turbo-1106",
        # "gpt-4",
        # "gpt-4-1106-preview"
    ],
    "max_tokens": [
        16385,
        # 8192,
        # 128000,

    ]
}

df_openai_models = pd.DataFrame(openai_models)
df_openai_models

Unnamed: 0,models,max_tokens
0,gpt-3.5-turbo-1106,16385


In [10]:
# Read and continue update dataframe
# df_scores = pd.read_excel("./result/closed_source_model_openai_api.xlsx")
# df_scores.head(0)

Unnamed: 0,model,method,max_tokens,num_tokens,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability,prompt,temperature


#### Prompt Template Testing

In [2]:
for model_index, model_row in df_openai_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0
    llm = ChatOpenAI(temperature=temperature, model_name=model_name, openai_api_key=openai_api_key)
    
    prompt_template = """
                    Transcript:
                    {text}

                    Given a transcripts between multiple speakers above, provide a concise summary within 200 words for each speaker highlighting their key points or contributions with the given format below:

                    Speaker 1 Name: (Summary for speaker 1)
                    

                    Speaker 2 Name: (Summary for speaker 2)
                    
                    """
    

    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

    for index, row in df_qmsum_test.iterrows():
        method = "MapReduce"

        # get the summary
        start_time = time.time()
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)

        max_tokens = model_row["max_tokens"]

        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))

        # break
        summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', token_max=max_tokens , map_prompt=PROMPT)
        summary = summary_chain.run(docs)

        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': prompt_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)
        time.sleep(2)


NameError: name 'df_openai_models' is not defined

In [55]:
for model_index, model_row in df_openai_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0
    llm = ChatOpenAI(temperature=temperature, model_name=model_name, openai_api_key=openai_api_key)
    
    prompt_template = """Compose concise yet comprehensive individual summaries for each speaker based on the provided transcript:


                    Follow the format below in "markdown format", ensuring each speaker summary is approximately 100 to 200 words. Include details about their perspectives, key points, and any noteworthy insights.

                    Speaker 1 Name:
                    - Provide a brief introduction to Speaker 1.
                    - Summarize their key contributions to the discussion.
                    - Highlight notable opinions or stances expressed by Speaker 1.

                    Speaker 2 Name:
                    - Introduce Speaker 2 briefly.
                    - Outline the main points and ideas put forth by Speaker 2.
                    - Emphasize any distinct perspectives or viewpoints presented by Speaker 2.

                    Transcript:
                    {text}
                    
                    """
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

    for index, row in df_qmsum_test.iterrows():
        method = "MapReduce"

        # get the summary
        start_time = time.time()
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)

        max_tokens = model_row["max_tokens"]

        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))

        # break
        summary_chain = load_summarize_chain(llm=llm, chain_type='map_reduce', token_max=max_tokens , map_prompt=PROMPT)
        summary = summary_chain.run(docs)

        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': prompt_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)
        time.sleep(2)


gpt-3.5-turbo-1106
Number of tokens: 8983
Number of chunks: 1


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:00<00:00,  1.28it/s]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 177.74it/s]
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 0.79 seconds, 1.26 sentences/sec


In [61]:
df_scores['max_tokens'] = df_scores['max_tokens'].astype(int)

# df_scores.to_excel("./result/closed_source_model_openai_api.xlsx", index=False)

In [62]:
df_scores.eahd(1)

Unnamed: 0,model,method,max_tokens,num_tokens,transcript,original summary,summary,rouge,bert_score,bleu,time_taken,grammar,readability,prompt,temperature
0,gpt-3.5-turbo-1106,MapReduce,16385,6827,All right. So our next talk is called Hacking...,A firsthand look at efforts to improve diversi...,Professor Christina Tamba-Hester's talk on Hac...,"[{'rouge-1': {'r': 0.13690476190476192, 'p': 0...","(tensor([0.8656]), tensor([0.8255]), tensor([0...",8.954086e-156,21.383893,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
1,gpt-3.5-turbo-1106,MapReduce,16385,7628,Good morning. As you can see from the title o...,Roman Architecture (HSAR 252) Professor Kleine...,The speaker discusses important developments i...,"[{'rouge-1': {'r': 0.11428571428571428, 'p': 0...","(tensor([0.8856]), tensor([0.8039]), tensor([0...",6.397942e-157,11.230562,[],100 words required.,Write a concise summary of the following: {text},0
2,gpt-3.5-turbo-1106,MapReduce,16385,6855,"Thank you very much, first important question...",Stochastic rewriting systems evolving over gra...,The main goal of the research is to analyze th...,"[{'rouge-1': {'r': 0.08888888888888889, 'p': 0...","(tensor([0.8576]), tensor([0.7937]), tensor([0...",1.7494850000000002e-156,115.299044,[],100 words required.,Write a concise summary of the following: {text},0
3,gpt-3.5-turbo-1106,MapReduce,16385,7058,I was great to be with all of you today. I sa...,"In typical military operations, the advantage ...",The presentation challenged traditional approa...,"[{'rouge-1': {'r': 0.1037037037037037, 'p': 0....","(tensor([0.8622]), tensor([0.8058]), tensor([0...",1.5875009999999998e-79,14.139802,[],100 words required.,Write a concise summary of the following: {text},0
4,gpt-3.5-turbo-1106,MapReduce,16385,6969,"Welcome, DEF CON 28, the Do No Harm panel. Th...",It is certainly a time of discovery- though th...,The speaker emphasizes the need to take respon...,"[{'rouge-1': {'r': 0.06299212598425197, 'p': 0...","(tensor([0.8376]), tensor([0.7977]), tensor([0...",1.4651720000000001e-156,93.767647,[],100 words required.,Write a concise summary of the following: {text},0
5,gpt-4,MapReduce,8192,6969,"Welcome, DEF CON 28, the Do No Harm panel. Th...",It is certainly a time of discovery- though th...,The DEF CON 28 Do No Harm panel discussed the ...,"[{'rouge-1': {'r': 0.14173228346456693, 'p': 0...","(tensor([0.8701]), tensor([0.8204]), tensor([0...",2.587246e-79,11.267398,[],100 words required.,Write a concise summary of the following: {text},0
6,gpt-4-1106-preview,MapReduce,128000,6969,"Welcome, DEF CON 28, the Do No Harm panel. Th...",It is certainly a time of discovery- though th...,"At DEF CON 28, the ""Do No Harm"" panel brought ...","[{'rouge-1': {'r': 0.1968503937007874, 'p': 0....","(tensor([0.8469]), tensor([0.8214]), tensor([0...",5.467525e-79,16.473476,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
7,gpt-4,MapReduce,8192,6827,All right. So our next talk is called Hacking...,A firsthand look at efforts to improve diversi...,"In her talk ""Hacking Diversity"", Professor Chr...","[{'rouge-1': {'r': 0.11904761904761904, 'p': 0...","(tensor([0.8734]), tensor([0.8299]), tensor([0...",2.314046e-79,13.086216,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
8,gpt-4,MapReduce,8192,7628,Good morning. As you can see from the title o...,Roman Architecture (HSAR 252) Professor Kleine...,The lecture explores the architecture of Hercu...,"[{'rouge-1': {'r': 0.15, 'p': 0.34426229508196...","(tensor([0.8820]), tensor([0.8144]), tensor([0...",0.02455021,14.684876,"[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'me...",100 words required.,Write a concise summary of the following: {text},0
9,gpt-4,MapReduce,8192,6855,"Thank you very much, first important question...",Stochastic rewriting systems evolving over gra...,The speaker explores the application of combin...,"[{'rouge-1': {'r': 0.09444444444444444, 'p': 0...","(tensor([0.8588]), tensor([0.7954]), tensor([0...",5.637758e-80,8.984848,[],100 words required.,Write a concise summary of the following: {text},0


#### Few-Shot Prompt Engineering

In [58]:
examples = [
    {
        "question": """
Transcript:

Grad A : Ah , so comfortable . Grad F : Smooth . Grad A : Mm - hmm . Good . I know that he 's going to like , Taiwan and other places to eat . So . Grad D : On ? Am I on ? Grad A : Yep . Yep . Grad D : I think I 'm on ? Grad B : Yeah . Grad D : Good . Good . Grad A : Bye . Grad B : Actually {disfmarker} Grad F : I just had one of the most frustrating meetings of my career . Grad A : It 's definitely not the most frustrating meeting I 've ever had . Grad D : You a You 're {disfmarker} you remember you 're being recorded at this point . Grad A : Oh , yeah , so , w we didn't yet specify with whom . Professor E : Yeah . Grad F : Yeah . Professor E : Right . Grad A : But um . Professor E : Uh , right . Grad A : So that 's why Keith and I are going to be a little dazed for the first half m the meeting . Professor E : Uh . Grad F : Huh . Yeah , I 'm just gonna sit here and Professor E : Right . Yeah , I {disfmarker} I {disfmarker} I avoided that as long as I could for you guys , Grad F : growl . Professor E : but , uh {disfmarker} Grad F : Yeah . Grad A : Mm - hmm . Grad F : For which we thank you , by the way . Grad A : Are very appreciative , yeah . Professor E : Right . Grad F : I know you were {disfmarker} you were doing that , but , anyway . Grad D : Oh yeah , how di how d exactly did , uh , that paper lead to anti - lock brakes ? Grad F : Oh , I could tell you had a rough day , man ! Grad D : Nah . Grad A : What ? Grad D : I love that story . Grad F : Yeah , it 's a great story . Grad C : OK . Grad F : Oh my goodness . Grad C : Oh yeah , um , Liz suggested we could start off by uh , doing the digits all at the same time . Grad A : What ? Grad D : All at the same time . I don't know if {disfmarker} I would get distracted and confused , probably . Professor E : e Grad A : Really ? Do we have to like , synchronize ? Professor E : Well , I think you 're supposed to {disfmarker} OK . We can do this . Grad F : Are you being silly ? Grad D : Oh wait do we have t Professor E : Everybody 's got different digits , Grad C : Yep . Professor E : right ? Grad D : Yeah , do we have to time them at the same time or just overlapping {disfmarker} Grad F : Uh . Grad A : You 're kidding . Grad C : No , no , just {disfmarker} just start whenever you want . Professor E : No . Grad A : And any rate ? Professor E : e yeah , the Grad F : Alright . Professor E : Well , they {disfmarker} they have s they have the close talking microphones for each of us , Grad A : Yeah , that 's true . Professor E : so {disfmarker} Grad C : Yeah . Professor E : yeah , there 's separate channels . Grad F : Alright . Grad A : OK . Grad C : Yeah . Professor E : So when I say Grad F : Just plug one ear . Grad A : You lose . Professor E : OK . Grad F : OK , bye ! That was a great meeting ! Professor E : Right . Grad D : Alright . Grad F : So - {vocalsound} Now , uh , why ? Grad C : Just to save time . Grad F : OK . Grad C : Does matter for them . Grad A : Are we gonna start all our meetings out that way from now on ? Professor E : No . Grad A : Oh . Too bad . I kinda like it . Grad F : Well , could we ? Grad D : It 's strangely satisfying . Grad A : Yeah . It 's a ritual . Grad D : Are we to r Just to make sure I know what 's going on , we 're talking about Robert 's thesis proposal today ? Is that Grad C : We could . 

Given a transcripts between multiple speakers above, provide a concise summary within 200 words for each speaker highlighting their key learning points or contributions with the given format below:

Also add a "Overall Summary" section to summarise the entire discussion.

""",
        
        "answer":"""

Grad A: Grad A expresses comfort and satisfaction, mentioning someone's upcoming travels to Taiwan for food. They confirm Grad D is on the call and later discuss being recorded. Despite initial confusion, they thank Professor E for delaying a topic. Grad A seems interested in starting meetings with a synchronized ritual.

Grad D: Grad D questions if they are on the call and later asks about the connection between a paper and anti-lock brakes. Despite being told the story before, Grad D expresses enthusiasm. They also inquire about the meeting's agenda, focusing on Robert's thesis proposal.

Grad F: Grad F shares frustration over a recent career meeting but acknowledges it's not the worst. They commend Professor E for delaying a topic to spare them from confusion. Grad F shows interest in a story about the paper and anti-lock brakes and proposes starting meetings with a synchronized ritual.

Professor E: Professor E explains the delayed topic, addressing the confusion caused by recording. They discuss the possibility of starting meetings with a synchronized ritual and clarify the meeting agenda, focusing on Robert's thesis proposal.

Grad C: Grad C suggests starting the meeting with everyone saying their digits simultaneously. They clarify the lack of synchronization requirement and propose it to save time. Grad C also mentions the discussion of Robert's thesis proposal.

Overall Summary: The group discussed the first version of the Bayes-net used to work out a user's intentions when asking for directions from a navigation device. Three intentions were identified: Vista (to view), Enter (to visit) and Tango (to approach). The structure of the belief-net comprises, firstly, a feature layer, which includes linguistic, discourse and world knowledge information that can be gleaned from the data. It is possible for these variables to form thematic clusters( eg "entrance", "type of object", "verb"), each one with a separate middle layer.  At this stage, all the actual probabilities are ad-hoc and hand-coded. However, there has been progress in the design and organisation of experiments, that will eventually provide data more useful and appropriate for this task.

"""
    },

    {
        "question": """
Transcript:

Project Manager : {vocalsound} I just forgot their name , so uh you're i sorry , I just forgot them all . So {disfmarker} {vocalsound} I have to write it down . Marketing : {vocalsound} Okay . Project Manager : So {disfmarker} Marketing : Fine . Project Manager : Do you know them or {disfmarker} Marketing : The names ? Project Manager : Yeah . Marketing : For for for my sur um Project Manager : Yeah . Marketing : Jens . Project Manager : Yeah , no , but your b your surname . Marketing : Uh Damman . D_ A_ W_ . Project Manager : W_O_ da . Okay . Marketing : Uh uh M_ M_ . I mean M_ . Double M_ . Project Manager : Okay . And what's your name ? User Interface : Paul Wiezer . Paul Wiezer . Project Manager : W_I_E_S_ z Z_ or S_ ? User Interface : A_ E_ Z_ zee zee Project Manager : Uh uh zee {gap} . Okay . User Interface : E_ R_ . Project Manager : What's your name ? Industrial Designer : Uh Martijn . Project Manager : Yeah , but your surname . {vocalsound} Your surname . Industrial Designer : What ? Uh Abbing . A_ B_ B_ I_ N_ G_ . Project Manager : Okay , thanks . User Interface : Uh . Industrial Designer : I was a little short on time , Project Manager : Yeah , me too , so that's not {disfmarker} Industrial Designer : but {disfmarker} User Interface : Yeah , same here . Project Manager : No no no , I just fi first my {disfmarker} {vocalsound} Marketing : Oh . {vocalsound} Sorry . User Interface : Uh let's see . Which one was mine ? Project Manager : So let's have a look , we have forty minutes , so it's it's more than enough . {gap} Okay , perfect . So we have {disfmarker} Oh no , what's that ? So so we have uh forty minutes for this uh for this second meeting , and we have to make uh sure that we going t that we are sure , that we are , User Interface : Good . Project Manager : that we know what we're going to make uh th what the product is going to like {disfmarker} look like . Uh first I have the notes of the last meeting , so I showed uh show them to you . Oh , sorry about that , I just escape this one . How do I escape this ? How do I I escape this s uh presentation ? Industrial Designer : What ? User Interface : Uh left . Industrial Designer : Uh {disfmarker} Project Manager : Ah okay . User Interface : So {disfmarker} Industrial Designer : Just {disfmarker} Yeah . Project Manager : And show , sorry . Okay , so let's have a look s at this one . Okay , so the f the f the points we had last meeting was the um {disfmarker} Should be a univ uh universal remote control {disfmarker} No , that's {disfmarker} I uh s I just got a email from the from the personal coach and it should be a T_V_ remote control only . So have you changed that part ? Marketing : {vocalsound} User Interface : Okay . Marketing : Okay . {vocalsound} Project Manager : Um so {disfmarker} yeah , it still has to be uh f a r a remote control for kids and elderly . It's it's still the same . Um {disfmarker} All these points uh we have to look at . You all know them . But uh there's another point . The um uh the main uh people of interest of this company are forty plus people . So they're old and not younger people . So we have to look at that as well . 'Specially old people , maybe bi bigger buttons or something , I dunno . User Interface : Yeah , okay . Project Manager : Uh so {disfmarker} So {disfmarker} yeah , that's it , so just you can do your presentation for uh {disfmarker} User Interface : Which one first ? Marketing : Okay . Project Manager : Oh it doesn't matter , just start with the {disfmarker} User Interface : Okay . Marketing : Mm . Uh {disfmarker} User Interface : Functional requirements , yeah Marketing : Okay . Well my name is Jens Damman , but we're in a group , and I I will start it . Wait . Um I've used a marketing report on uh the site . Uh I think you've uh read it too . Uh and uh f and furthermore I uh surfed the o the other site . Project Manager : I I didn't read i read it , so it's not for me , Marketing : You didn't read it ? Project Manager : I didn't get it uh anyway . User Interface : No , I didn don't thing we got it . Project Manager : It's only for you . Marketing : Oh okay , I I was the only one who get it . Project Manager : Yeah . User Interface : Yes . Marketing : Okay it was uh uh uh um um {vocalsound} a report about uh an experiment with uh a lot of users . And uh they had a lot of findings in their report uh with statistical uh uh uh thing uh with statistical uh proof . So I um I had three pages with findings and sev a lot of uh a lot of findings . So we can use this uh to uh create our own remote control . Uh seventy five percent of the users find uh most remote controls ugly . Yeah , I think uh uh that's a lot , so we have to make a beautiful remote control . Uh eighty percent of users would spend when uh a remote control will l uh look fancy . I think this fits uh at the {vocalsound} uh what what uh Michael said about uh older people . Older people will uh spend more money uh for uh something uh uh what's good . Because younger people are more critical uh about uh uh where they spend their money money at . Uh seventy five percent uh seventy five percent of the users say they zap a lot . Well okay , that's uh normal . I think uh we we have to make uh good zap buttons . But that's one of our requirements . Project Manager : The last point is quite an interesting {disfmarker} {vocalsound} Marketing : Yes , fifty percent of users say they only use ten percent of the buttons . Project Manager : So Marketing : Um Martijn alr already said it . Project Manager : if we {disfmarker} Yeah . Marketing : And uh maybe our uh fold open system is is a good one , but {disfmarker} I don't think it's uh Project Manager : Yeah , we should have the ten percent on the on the top , Marketing : reachable . Project Manager : then you're you're {disfmarker} Marketing : Yeah , the ten percent on the top , yeah . Industrial Designer : Yeah . Marketing : That that's a good one . Um uh page two . Remote controls are often lost somewhere in the room . That's exactly what we said about um maybe a home station for uh for it uh to uh recharge the batteries or something . Uh I thought mo maybe we could make a clap system , so when you {vocalsound} clap your hands it will beep or something . Uh you must find it uh quickly . User Interface : Uh . Maybe just a button on the home station . So remote control beeps when you click that button on the home station . Marketing : Okay , yeah . Yeah , we can uh combine that . Uh it takes too much time to learn how to use a r new remote control . Uh I think we must t uh take a look at this . It's only uh th thirty four of the {vocalsound} thirty four percent . But it's uh a tough one . Because if we make a ha whole new product , our own style , we we c uh this is so difficult , uh a difficulty I think . Uh next , remote controls are bad for R_S_I_ . Yeah , but only if they zap a lot , and they watch over five hours T_V_ or something . I don't {disfmarker} We we haven't {disfmarker} Uh we mustn't look too much at uh the last point . Okay , last page .


Given a transcripts between multiple speakers above, provide a concise summary within 200 words for each speaker highlighting their key learning points or contributions with the given format below:

Also add a "Overall Summary" section to summarise the entire discussion.

""",
        
        "answer":"""

Project Manager: The Project Manager starts the meeting with some difficulty recalling names and apologizes. They quickly gather the names and surnames of team members, emphasizing the need for efficient time management. The Project Manager discusses the focus on a TV remote control for kids and elderly users, reiterating key points from the last meeting and highlighting the target audience of forty-plus individuals. The manager directs the team to present their progress.

User Interface: The User Interface team member, Paul Wiezer, provides insights into functional requirements for the remote control. The team discusses findings from a marketing report, emphasizing the importance of aesthetics and functionality. They propose a fold-open system and a home station with a clap system or a button for finding lost remote controls quickly. The team addresses the challenge of user learning curves and potential risks of repetitive strain injuries (RSI).

Marketing: Jens Damman from the Marketing team presents the marketing report findings, emphasizing user preferences for aesthetically pleasing remote controls. They propose incorporating these findings into the design, such as making buttons for the top ten percent of frequently used functions. Marketing suggests a home station with a button to locate a lost remote control and discusses challenges related to product adoption and the impact on RSI for heavy users.

Industrial Designer: Martijn Abbing, the Industrial Designer, briefly mentions time constraints but aligns with the proposed solutions, including the top-ten buttons on the remote and the home station. The team collectively discusses potential design considerations, such as larger buttons for older users.

Overall Summary: This meeting was about the functional design of the remote control. Firstly, Marketing gave a presentation on functional requirements. Group decided to focus on the fancy and fashionable look, usability, and different colors. Next, User Interface gave a presentation on the technical function design. Also, the group discussed this topic, and they decided to design the menu buttons of the remote similar to the mobile phone. Then, Industrial Designer gave a presentation on the working design. Group mates discussed deciding on the use of LED light on the buttons to indicate the transmitting of the Morse code when pressing the button. They also decided to use a more intelligent chip than the standard one when the circuit was closed, it would produce the pattern. For the age group, they would target the age group below forty since it was a young market.

"""
    },

    {
        "question": """
Transcript: 

The Chair (Hon. Anthony Rota (NipissingTimiskaming, Lib.)) : I call this meeting to order. Welcome to the sixth meeting of the House of Commons Special Committee on the COVID-19 pandemic. Today's meeting is taking place by videoconference. Before speaking, please wait until I recognize you by name. When you are ready to speak, please activate your mic. When you are not speaking, leave your mic on mute. Of course, change the language when you change the language on the screen.  I would remind hon. members that if you want to speak English, you should be on the English channel; if you want to speak French, you should be on the French channel; and should you wish to alternate between the two languages, as I just did, you should change the channel to the language that you are speaking, each time you switch languages. In addition, please direct your remarks through the chair and speak slowly and clearly at all times to help our interpreters. Finally, for members who will be speaking, we strongly recommend that you use a headset. I recommend the headset for your fellow members, but also for the interpreters as it gets loud, up and down, and it squeaks. It really does make it difficult for them if you do not have the prescribed headsets. We'll go on to ministerial announcements. I understand that there are no ministerial announcements today, so we will proceed to presenting petitions, for a period not exceeding 15 minutes. I would like to remind members that any petition presented during a meeting of the special committee must have already been certified by the clerk of petitions. We will now proceed to presenting petitions. Ms. Heather McPherson (Edmonton Strathcona, NDP) : Thank you, Mr. Chair. World Maternal Mental Health Day took place last week, and today I'd like to take a moment to present a very important petition on behalf of the Canadian Perinatal Mental Health Collaborative. Whereas perinatal mood and anxiety disorders are the most common obstetrical complication, whereas in Canada and worldwide 20% of women and 10% of men suffer from a perinatal mental illness, resulting in an annual economic cost to Canada of approximately $11 billion, and whereas the U.K., Australia and parts of the U.S. have perinatal mental health strategies and screening guidelines in place and Canada does not, the Canadian Perinatal Mental Health Collaborative is calling upon the House of Commons in Parliament assembled to create a national perinatal mental health strategy that will provide direction, policy and funding to develop specialized, comprehensive perinatal mental health care services, which include universal screening and timely access to treatment for all women and men during pregnancy and the postpartum period. Mr. Scott Reid (LanarkFrontenacKingston, CPC) : Thank you, Mr. Chair. My petition relates to cystic fibrosis. If we were in the House now, as May is Cystic Fibrosis Awareness Month, one of the days this month we would all be wearing yellow roses in sympathy and solidarity with those who suffer from what is the number one disease killer in Canada of young people. The petitioners have asked us to look at the situation with the Patented Medicine Prices Review Board, which is scheduled to go through some important and potentially detrimental regulatory changes very soon. They ask that the amendments to the Patented Medicine Prices Review Board be rescinded, as these will restrict Canadians from receiving life-saving medications for cystic fibrosis and other illnesses, but in particular, a medicine called Trikafta, which can have the effect of treating cystic fibrosis in the case of 90% of cystic fibrosis sufferers. They ask the government to work with the provinces to find a strategy to jointly allow for the delivery of this life-saving medicine to Canadians across the country and to take a leadership role in negotiating a price for gene modulators throughout all the provinces of Canada. Ms. Elizabeth May (SaanichGulf Islands, GP) : Thank you, Mr. Chair. It's an honour to take the mike today, with all colleagues here. It's good to see you all virtually and safe. Petitioners in my community point out in this petition, which, of course, predates the pandemic, that the family doctor shortage is severe in this country. Nearly five million Canadians lack a regular family doctor. This problem is particularly profound in more rural areas, including, as the petitioners reference, the community in which I live, Sydney, British Columbia. We have a very significant crisis and a lack of family doctors. The petitioners call on the government to work with provinces and territories to find a collaborative, holistic solution so that every Canadian has a family doctor and we address the family doctor shortage. Mr. Brad Vis (MissionMatsquiFraser Canyon, CPC) : Good morning, Mr. Chair. I'm presenting a timely petition today that emphasizes the concerns constituents in my riding of MissionMatsquiFraser Canyon have with the Liberal government's inherently flawed and undemocratic approach to firearms legislation and regulation. The petitioners call upon the Government of Canada to stop targeting law-abiding firearms owners; to cancel all plans to confiscate firearms legally owned by federally licensed RCMP-vetted Canadians; to focus taxpayer dollars where they will actually increase public safety, which is on keeping at-risk youth from being involved in gangs and on anti-gang enforcement; and to provide our men and women in uniform at the Canada Border Services Agency with the resources they need to stop the flow of illegal guns into this country. Through this petition, my constituents take issue with how the Liberal government continues to target law-abiding firearms owners instead of the gangs, drug traffickers and illegal weapons smugglers responsible for the violence in our communities. They note that the use of the phrase military-style assault rifle is purely political posturing, as the term is undefined in Canadian law. They also draw attention to the numerous inaccuracies about current firearms legislation and regulation The Chair : I'd like to remind the honourable members that this is a concise prcis of what a petition says, not a speech. I'll let Mr. Vis continue. I'm sure he'll be very brief in wrapping up. Mr. Brad Vis : Yes. Thank you, Mr. Chair. That's sufficient. The Chair : Okay. Now we'll go to Mr. Johns. Mr. Gord Johns (CourtenayAlberni, NDP) : Thank you, Mr. Chair. It's a huge honour to table e-petition 2512, which was signed by 1,198 petitioners, primarily from the province of Nova Scotia. The Province of Nova Scotia invited multinational companies to scope out and develop expansive open-net salmon farming operations. The petitioners cite that the expansion would increase environmental degradation, as seen in similar aquaculture operations in British Columbia, Newfoundland, Norway, Vietnam and elsewhere in the world. It also, they cite, would pose risks to native fish stocks, pollute coastal ecosystems, impair at-risk wild Atlantic salmon, and threaten established fisheries and tourism operations. They also raise concerns that open-net fish farming would not create significant employment and would undermine existing lobster and other fisheries. They are calling on the government to uphold Bill C-68 and species-at-risk legislation, protect our oceans, ban expansion of open-net finfish aquaculture in our oceans, work to phase out any existing open-net fish farming operations currently in place and, lastly, invest in land-based, closed-containment finfish aquaculture. I want to thank these petitioners for fighting for clean oceans, for their local economy and for the well-being of Nova Scotia. Mr. Paul Manly (NanaimoLadysmith, GP) : Thank you, Mr. Chair. This petition was signed and sent in by constituents of my riding of NanaimoLadysmith. It calls upon the House of Commons to commit to upholding the UN Declaration on the Rights of Indigenous Peoples and the calls to action from the Truth and Reconciliation Commission of Canada by immediately halting all existing and planned construction of the Coastal GasLink project on Wet'suwet'en territory, ordering the RCMP to dismantle its exclusion zone and stand down, scheduling nation-to-nation talks between the Wet'suwet'en nation and the federal and provincial governmentssomething that has already happened, thankfullyand prioritizing the real implementation of the UN Declaration on the Rights of Indigenous Peoples. Ms. Yasmin Ratansi (Don Valley East, Lib.) : Thank you, Mr. Chair. I have the pleasure of presenting a petition on behalf of my constituents of Don Valley East. The petitioners are asking that the Government of Canada not provide any financial assistance to Canadian airlines until they promptly provide full refunds for flights that were cancelled due to COVID-19. They are asking the same for any foreign airlines that fly to, within or from Canada. The petitioners feel that these Canadians are facing economic hardship and need a refund. The Chair : We'll now proceed to questioning ministers. The first question will go to Mr. Albas. Mr. Dan Albas (Central OkanaganSimilkameenNicola, CPC) : Thank you, Mr. Chair. Today we've learned that federal workers have been told to ignore obvious signs of fraud when it comes to applying for government benefits. Can the Prime Minister confirm that 200,000 applications have been flagged as potentially fraudulent? Right Hon. Justin Trudeau (Prime Minister) : Thank you, Mr. Chair. Our priority from the beginning has been to make sure that Canadians get the support they need. We moved very quickly to get the Canada emergency response benefit out, to get the wage subsidy out and to help Canadians in this unprecedented situation. We recognize there will be challenges, and we are going to work through those challenges. Our priority every step of the way was to make sure we helped as many Canadians as possible. Mr. Dan Albas : Mr. Chair, can the Prime Minister confirm that the instruction has been given to federal employees to ignore these 200,000 applications being flagged as potentially fraudulent? This is important. Right Hon. Justin Trudeau : Our focus has been on helping as many people as we possible can. Our decision from the very beginning was to get the help out to people and figure out, with retroactive action if necessary, where and when there may have been fraudulent use. Our priority was getting that help out. Mr. Dan Albas : Mr. Chair, this came from a memo issued by a deputy minister. Did the minister's office or the Prime Minister sign off on this memo? Right Hon. Justin Trudeau : Again, in this unprecedented situation, our focus has been on helping as many people as possible, as quickly as possible. Other parties might have made a different choice had they been in government, but our focus was getting help to people when they needed it as quickly as possible and cleaning it up afterwards. Mr. Dan Albas : Mr. Chair, I asked a very simple question. Did the Prime Minister or his minister sign off on this memo that was issued by the deputy minister, yes or no? Right Hon. Justin Trudeau : Mr. Chair, we have been focused entirely on getting help to Canadians when they need it, and that has meant that yes, there will be things we will need to clean up after the fact and work to fix, but getting that help into Canadians' pockets during this pandemic was our priority. Mr. Dan Albas : I'm asking the Prime Minister to show some accountability. Did he or his office sign off on this memo? Right Hon. Justin Trudeau : Mr. Chair, my office and I have been absolutely focused on getting the necessary help to Canadians. Perhaps, as Mr. Albas points out, other parties would have been slower to get the money out. We were flowing money to people who needed it.

Given a transcripts between multiple speakers above, provide a concise summary within 200 words for each speaker highlighting their key learning points or contributions with the given format below:

Also add a "Overall Summary" section to summarise the entire discussion.

""",
        
        "answer":"""

The Chair (Hon. Anthony Rota): The Chair establishes meeting rules and emphasizes language channels, mic usage, and headset recommendations. They move on to petitions, covering topics like perinatal mental health, changes to the Patented Medicine Prices Review Board, the shortage of family doctors, firearms legislation concerns, open-net salmon farming, the Coastal GasLink project, and airline refunds.

Ms. Heather McPherson (Edmonton Strathcona, NDP): Presents a petition on World Maternal Mental Health Day, urging the creation of a national perinatal mental health strategy.

Mr. Scott Reid (LanarkFrontenacKingston, CPC): Discusses a petition related to cystic fibrosis, calling for rescinding amendments to the Patented Medicine Prices Review Board.

Ms. Elizabeth May (SaanichGulf Islands, GP): Highlights a petition addressing the severe family doctor shortage in Canada and urges collaboration for a solution.

Mr. Brad Vis (MissionMatsquiFraser Canyon, CPC): Presents a petition on firearms legislation, emphasizing the flawed approach and its impact on law-abiding owners.

Mr. Gord Johns (CourtenayAlberni, NDP): Presents a petition against expanding open-net salmon farming, citing environmental and economic concerns.

Mr. Paul Manly (NanaimoLadysmith, GP): Presents a petition calling for upholding the UN Declaration on the Rights of Indigenous Peoples regarding the Coastal GasLink project.

Ms. Yasmin Ratansi (Don Valley East, Lib.): Presents a petition urging the government not to provide financial assistance to airlines without refunding canceled flights.

Mr. Dan Albas (Central OkanaganSimilkameenNicola, CPC): Questions the Prime Minister about a memo instructing federal workers to ignore potential fraud in benefit applications.

Right Hon. Justin Trudeau (Prime Minister): Defends the government's focus on quick assistance during the pandemic, acknowledging challenges and the need for retrospective action. Deflects accountability on signing off the memo, emphasizing the priority of rapid aid over bureaucratic details.

Overall Summary: The whole meeting was a special Committee on the COVID-19 Pandemic. After some regulations proposed by The Chair, the members presented many petitions on behalf of different areas. Then the meeting proceeded to questioning ministers, the attendees asked for the reasons that the government put easy policy for fraudulence and tax evasion of businessmen. Moreover, the participants required government support under the COVID-19 Pandemic, not only for the elderly and vulnerable people, but also for energy resources and tourism sectors. At the same time, the exact funding from the government should be given to green economies including agriculture and forestry. In addition, the meeting also discussed the current situations of different sectors such as employment, fishing and tourism, oil and gas and business affected by the Covid-19 and called for government support for these sectors. Last but not least, the attendees required strict implementations of the laws and appealed for process following. They wanted a transparency and open environment for voting and debating under the precondition of community safety. Also, they wanted affordable medication including vaccines as a part of a sound health care system for their people.

"""
    },
]

In [59]:
for i in range(len(examples)):
    cleaned_text = re.sub(r'\s*{\s*([^}]+)\s*}\s*', ' ', examples[i]['question'])
    examples[i]['question'] = cleaned_text


In [60]:
examples

[{'question': '\nTranscript:\n\nGrad A : Ah , so comfortable . Grad F : Smooth . Grad A : Mm - hmm . Good . I know that he \'s going to like , Taiwan and other places to eat . So . Grad D : On ? Am I on ? Grad A : Yep . Yep . Grad D : I think I \'m on ? Grad B : Yeah . Grad D : Good . Good . Grad A : Bye . Grad B : Actually Grad F : I just had one of the most frustrating meetings of my career . Grad A : It \'s definitely not the most frustrating meeting I \'ve ever had . Grad D : You a You \'re you remember you \'re being recorded at this point . Grad A : Oh , yeah , so , w we didn\'t yet specify with whom . Professor E : Yeah . Grad F : Yeah . Professor E : Right . Grad A : But um . Professor E : Uh , right . Grad A : So that \'s why Keith and I are going to be a little dazed for the first half m the meeting . Professor E : Uh . Grad F : Huh . Yeah , I \'m just gonna sit here and Professor E : Right . Yeah , I I I avoided that as long as I could for you guys , Grad F : growl . Profess

In [124]:
print(df_scores["summary"].iloc[-1])

Overall Summary:
The discussion involved multiple speakers, including Professor E, Grad C, Postdoc F, PhD A, PhD B, and Grad D. The conversation covered various topics related to the recording and transcription of meetings, including the use of microphones, audio monitoring, transcription of speech, and the handling of technical jargon and acronyms. The speakers also discussed the use of different features for speech recognition and the process of bleep editing to exclude certain sections of the meeting. There was also a focus on the privacy issue and the use of passwords to restrict access to meeting transcripts. The conversation was detailed and technical, with a focus on improving the transcription and recording process for future meetings.

Each Speaker's Contribution:
- Professor E: Led the discussion and raised questions about the use of different features for speech recognition and the process of bleep editing.
- Grad C: Discussed the use of microphones, audio monitoring, and th

#### Map & Combine load_summarize_chain()

Best methods found in formatting summary in ideal format with prompt engineering.

In [75]:
from langchain import FewShotPromptTemplate
from langchain.chains import (
    LLMChain,
    StuffDocumentsChain,
    ReduceDocumentsChain,
    MapReduceDocumentsChain,
)

for model_index, model_row in df_openai_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0
    llm = ChatOpenAI(temperature=temperature, model_name=model_name, openai_api_key=openai_api_key)
    
    map_template = """
The following is a set of documents
{text}
Based on this list of docs, generate summary for each speaker.
Helpful Answer:
    """

    reduce_template = """
Transcript:

{text}

Given a transcripts between multiple speakers above, provide a concise summary within 200 words for each speaker highlighting their key learning points or contributions with the given format below:

Also add a "Overall Summary" section to summarise the entire discussion.
    """



    # document_prompt = PromptTemplate(template="{content}", input_variables=["content"])
    # document_variable_name = "context"

    # prompt = PromptTemplate.from_template(
    #     "Summarize this content: {context}"
    # )
    
    map_prompt = PromptTemplate(template=map_template, input_variables=["text"])
    
    reduce_prompt = PromptTemplate(template=reduce_template, input_variables=["text"])


    # few_shot_prompt = FewShotPromptTemplate(
    #     examples=examples,
    #     example_prompt=reduce_prompt,
    #     suffix="{transcript}",
    #     input_variables = ['transcript'])

    for index, row in df_qmsum_test.iterrows():
        method = "MapReduce"

        start_time = time.time()

        # return token of full text
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)


        max_tokens = model_row["max_tokens"]

        # CHUNKING "Split documents to shorter length."
        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))


        # # Take a single piece of text, feed it into the template and Summarize it.
        # map_chain = LLMChain(llm=llm, prompt=map_prompt)

        # # Have a whole bunch of summaries each of a part of the text, 
        # # Have to combine all these summaries, reducing them to a single summary.        
        # reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

        # # It will take smaller documents and combine them back into one bigger document
        # combine_documents_chain = StuffDocumentsChain(
        #                             llm_chain=reduce_chain, 
        #                             document_variable_name="transcript"
        #                             )
        # # If all summaries add up and become too large, and exceed the model limit.
        # # Recursively calling  provided chain until all the summaries and summaries of summaries are combined into a single summary.
        # reduce_documents_chain = ReduceDocumentsChain(
        #                             combine_documents_chain=combine_documents_chain, 
        #                             collapse_documents_chain=combine_documents_chain, 
        #                             token_max=max_tokens
        #                             )

        # # Ask for a summary of a single chunk of the whole text, 
        # # and a chain that will feed groups of summaries and reduce them until there is only a single summary left.
        # map_reduce_chain = MapReduceDocumentsChain(
        #                             llm_chain=map_chain, 
        #                             reduce_documents_chain=reduce_documents_chain, 
        #                             document_variable_name="docs", 
        #                             return_intermediate_steps=False
        #                             )

        # summary_chain = map_reduce_chain


        summary_chain = load_summarize_chain (
            llm=llm,
            chain_type='map_reduce',
            map_prompt = map_prompt,
            combine_prompt=reduce_prompt,
            verbose=False
        )


                
        summary = summary_chain.run(docs)
        print(summary)
        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': reduce_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)
        time.sleep(2)


gpt-3.5-turbo-1106
Number of tokens: 8437
Number of chunks: 1
PhD F: Provided detailed information about the use of wireless headsets, the process of running jobs using P-make and Customs, and the benefits of using "run command" for job execution. Also discussed the attributes associated with machines and the need to limit the number of jobs running simultaneously.

PhD A: Shared experiences with speech enhancement algorithms, including the use of LDA filters and on-line normalization. Also discussed the addition of endpoint information to the baseline and the impact on different test sets.

PhD D: Discussed the use of spectral subtraction from France Telecom and the need to retune time constants for on-line normalization. Also mentioned the impact of adding endpoint information on different test sets and the potential changes in qualification criteria.

Overall Summary: The discussion covered various aspects of recording and computing tasks, including the use of lapel microphones, wir

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:01<00:00,  1.24s/it]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 142.75it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 1.26 seconds, 0.79 sentences/sec


In [63]:
print(df_scores['transcript'].iloc[-1])

print(df_scores['summary'].iloc[-1])
# df_scores

Professor C : Uh , is it the twenty - fourth ? PhD F : now we 're on . Professor C : Yeah . PhD A : Uh Chuck , is the mike type wireless {disfmarker} PhD F : Yes . PhD A : wireless headset ? OK . PhD F : Yes . Professor C : Yeah . PhD F : For you it is . Professor C : Yeah . We uh {disfmarker} we abandoned the lapel because they sort of were not too {disfmarker} not too hot , not too cold , they were {disfmarker} you know , they were {vocalsound} uh , far enough away that you got more background noise , uh , and uh {disfmarker} and so forth PhD A : Uh - huh . Professor C : but they weren't so close that they got quite the {disfmarker} you know , the really good {disfmarker} No , th PhD A : OK . Professor C : they {disfmarker} I mean they didn't {disfmarker} Wait a minute . I 'm saying that wrong . They were not so far away that they were really good representative distant mikes , PhD A : Uh - huh . Professor C : but on the other hand they were not so close that they got rid of all the 

In [67]:
from langchain import FewShotPromptTemplate
from langchain.chains import (
    LLMChain,
    StuffDocumentsChain,
    ReduceDocumentsChain,
    MapReduceDocumentsChain,
)

for model_index, model_row in df_openai_models.iterrows():
    model_name = model_row["models"]
    print(model_name)

    temperature = 0
    llm = ChatOpenAI(temperature=temperature, model_name=model_name, openai_api_key=openai_api_key)
    
    map_template = """
Summarize the following text in a clear and concise way:
TEXT:`{text}`
Brief Summary:
"""

    combine_template = """
Generate a summary of the following text that includes the following elements:

* A title that accurately reflects the content of the text.
* An introduction paragraph that provides an overview of the topic.
* Bullet points that list the key points of the text.
* A conclusion paragraph that summarizes the main points of the text.

Text:`{text}`
    """
    
    map_prompt = PromptTemplate(template=map_template, input_variables=["text"])
    
    combine_prompt = PromptTemplate(template=combine_template, input_variables=["text"])


    # few_shot_prompt = FewShotPromptTemplate(
    #     examples=examples,
    #     example_prompt=reduce_prompt,
    #     suffix="{transcript}",
    #     input_variables = ['transcript'])

    for index, row in df_qmsum_test.iterrows():
        method = "MapReduce"

        start_time = time.time()

        # return token of full text
        num_tokens = llm.get_num_tokens(row['transcript'])
        print("Number of tokens:", num_tokens)


        max_tokens = model_row["max_tokens"]

        # CHUNKING "Split documents to shorter length."
        text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=max_tokens-100, chunk_overlap=100)
        docs = text_splitter.create_documents([row['transcript']])
        print("Number of chunks:", len(docs))


        summary_chain = load_summarize_chain (
            llm=llm,
            chain_type='map_reduce',
            map_prompt = map_prompt,
            combine_prompt=combine_prompt,
            verbose=False
        )

                
        summary = summary_chain.run(docs)
        print(summary)

        end_time = time.time()
        elapsed_time = end_time - start_time

        metrics = SummarizationMetrics(row['summary'], summary)

        new_result = {
            'model': model_name,
            'method': method,
            'max_tokens': max_tokens,
            'transcript': row['transcript'],
            'original summary': row['summary'],
            'summary': summary,
            'rouge': metrics.rouge_scores(),
            'bert_score': metrics.bert_score(),
            'bleu': metrics.bleu_score(),
            'time_taken': elapsed_time,
            'grammar': metrics.grammar_check(),
            'readability': metrics.readability_index(),
            'num_tokens': num_tokens,
            'prompt': combine_template,
            'temperature': temperature
        }


        new_row = pd.DataFrame([new_result])

        df_scores = pd.concat([df_scores, new_row], ignore_index=True)
        time.sleep(2)


gpt-3.5-turbo-1106
Number of tokens: 8437
Number of chunks: 1
Title: Discussion on Speech Enhancement Algorithms

Introduction:
In this conversation, PhD students and a professor discuss various speech enhancement algorithms and their results. They explore the use of spectral subtraction and Wiener filtering, as well as the addition of endpoint information to improve the baseline system. The impact of different test conditions on the results is also considered.

Key Points:
- Discussion of spectral subtraction and Wiener filtering
- Addition of endpoint information to improve baseline system
- Impact of different test conditions on results
- Need for further analysis and comparison of different approaches

Conclusion:
The conversation between the PhD students and the professor sheds light on the various speech enhancement algorithms and their results. It emphasizes the importance of further analysis and comparison of the different approaches to improve speech enhancement technology.


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


calculating scores...
computing bert embedding.


100%|██████████| 1/1 [00:00<00:00,  1.21it/s]


computing greedy matching.


100%|██████████| 1/1 [00:00<00:00, 333.57it/s]
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


done in 0.84 seconds, 1.19 sentences/sec


In [76]:
# df_scores.to_excel("./result/closed_source_model_openai_api.xlsx", index=False)