In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv('data/processed/all_reviews_2017.csv')

In [3]:
df[['text', 'gold']].head()

Unnamed: 0,text,gold
0,Summary: The paper presents low-rank bilinear ...,The program committee appreciates the authors'...
1,Results on the VQA task are good for this simp...,The program committee appreciates the authors'...
2,This work proposes to approximate the bilinear...,The program committee appreciates the authors'...
3,Summary:--------This paper proposes to use sur...,"Based on the feedback, I'm going to be rejecti..."
4,This paper proposes to use previous error sign...,"Based on the feedback, I'm going to be rejecti..."


The dataset for each year consist of `['id','text','gold']`
- Text: Is the source 
- Gold: We assume that the area chair's motivations for their decision provide a reasonable comparison (summary)

*Note*: For each paper, 3 reviews are extracted, you can notice that the `gold` value is same for all the reviews.

______

We start by following directories in order and apply some functions

We start from the directory `glimpse/baselines` where comparative results are treated

1. `generate_llm_summaries.py`

In [4]:
import pandas as pd
from pathlib import Path

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import re
import argparse
from tqdm import tqdm

from glimpse.baselines import generate_llm_summaries

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
#model replace 'togethercomputer/Llama-2-7B-32K-Instruct'
model = "meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model, token='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')
model = AutoModelForCausalLM.from_pretrained(
    model, trust_remote_code=True, torch_dtype=torch.float16, token='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')

In [6]:
df = generate_llm_summaries.prepare_dataset('reviews_2017', dataset_path='data/processed/')

In [7]:
df = generate_llm_summaries.group_text_by_id(df)

# Group text by sample id and concatenate text
df.head(3)

# Grouped by id, text is concatenated of all reviews, and gold is same.

Unnamed: 0_level_0,text,gold
id,Unnamed: 1_level_1,Unnamed: 2_level_1
https://openreview.net/forum?id=B1-Hhnslg,The paper is an extension of the matching netw...,The program committee appreciates the authors'...
https://openreview.net/forum?id=B1-q5Pqxl,The paper looks at the problem of locating the...,This paper provides two approaches to question...
https://openreview.net/forum?id=B16Jem9xe,I just noticed I submitted my review as a pre-...,"Hello Authors, Congratulations on the accepta..."


In [8]:
# We take first 10 samples for testing
df = df.head(10)
len(df)

10

In [9]:
# Add pad token
tokenizer.pad_token = tokenizer.eos_token

df = generate_llm_summaries.generate_summaries(model, tokenizer, df, batch_size=2, device='cuda')

  0%|          | 0/10 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


['[INST]\nThe paper is an extension of the matching networks by Vinyals et al. in NIPS2016. Instead of using all the examples in the support set during test, the method represents each class by the mean of its learned embeddings. The training procedure and experimental setting are very similar to the original matching networks. I am not completely sure about its advantages over the original matching networks. It seems to me when dealing with 1-shot case, these two methods are identical since there is only one example seen in this class, so the mean of the embedding is the embedding itself. When dealing with 5-shot case, original matching networks compute the weighted average of all examples, but it is at most 5x cost. The experimental results reported for prototypical nets are only slightly better than matching networks. I  think it is a simple, straightforward,  novel extension, but I am not fully convinced its advantages.  This paper proposes an improved version of matching networks,

  0%|          | 0/10 [00:01<?, ?it/s]


KeyboardInterrupt: 

In [10]:
import textwrap
textwrap.wrap(df['summary'].iloc[0], width=100)

KeyError: 'summary'

In [11]:
df.columns

Index(['text', 'gold', 'instruction'], dtype='object')

Conclusion:
- For each paper (document) we have 3 reviews, these reviews are concatenated
- The review (contactenated 3 reviews) are fed to a model to provide a summary for them
- In order to do that we use `generate_summaries` function that adds first a column to the `df` where instruction is applied along with `text`.
- Then the model provides a summary for each text and df is returned back

2. `sumy_baselines.py`

In [12]:
from glimpse.baselines import sumy_baselines

In [16]:
for N in [1]:
        summaries = []
        for text in df.text:
            summary = sumy_baselines.summarize('LSA', "english", N, "text", text)
            summaries.append(summary)

        df['summary'] = summaries
        df["metadata/method"] = 'LSA'
        df["metadata/sentence_count"] = N

        name = f"{df}-_-'LSA'-_-sumy_{N}.csv"

In [17]:
df

Unnamed: 0_level_0,text,gold,instruction,summary,metadata/method,metadata/sentence_count
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
https://openreview.net/forum?id=B1-Hhnslg,The paper is an extension of the matching netw...,The program committee appreciates the authors'...,[INST]\nThe paper is an extension of the match...,"On Cub 200, I thought that the state-of-the-ar...",LSA,1
https://openreview.net/forum?id=B1-q5Pqxl,The paper looks at the problem of locating the...,This paper provides two approaches to question...,[INST]\nThe paper looks at the problem of loca...,The authors might want to consider pointing to...,LSA,1
https://openreview.net/forum?id=B16Jem9xe,I just noticed I submitted my review as a pre-...,"Hello Authors, Congratulations on the accepta...",[INST]\nI just noticed I submitted my review a...,"It is well written and a good read, and one I ...",LSA,1
https://openreview.net/forum?id=B16dGcqlx,This paper proposed a novel adversarial framew...,pros: - new problem - huge number of experim...,[INST]\nThis paper proposed a novel adversaria...,I will list these concerns in the following (i...,LSA,1
https://openreview.net/forum?id=B184E5qee,The authors present a simple method to affix a...,Reviewers agree that this paper is based on a ...,[INST]\nThe authors present a simple method to...,They demonstrate good improvements on language...,LSA,1
https://openreview.net/forum?id=B186cP9gx,This paper investigates the hessian of small d...,This is quite an important topic to understand...,[INST]\nThis paper investigates the hessian of...,"Overall, the results feel preliminary but like...",LSA,1
https://openreview.net/forum?id=B1E7Pwqgl,The authors proposes an interesting idea of co...,While the paper may have an interesting theore...,[INST]\nThe authors proposes an interesting id...,"On the third in-painting tasks, baselines are ...",LSA,1
https://openreview.net/forum?id=B1ElR4cgg,This is a parallel work with BiGAN. The idea ...,The reviewers were positive about this paper a...,[INST]\nThis is a parallel work with BiGAN. T...,ALI's objective is to match the joint distribu...,LSA,1
https://openreview.net/forum?id=B1G9tvcgx,This paper proposes a multimodal neural machin...,The area chair agrees with the reviewers that ...,[INST]\nThis paper proposes a multimodal neura...,This paper proposes a multimodal neural machin...,LSA,1
https://openreview.net/forum?id=B1GOWV5eg,This paper shows that extending deep RL algori...,The basic idea of this paper is simple: run RL...,[INST]\nThis paper shows that extending deep R...,This has become quite standard and would make ...,LSA,1


Using this script, we can produce summaries using 'LSA', 'Text Rank', 'LexRank', 'Edmundson', 'Luhn', 'KL-Sum', 'Random'

_____

We move now to the second directory `glimpse/data_loading` where we have 3 scripts (one of them can be skipped)

1. `generate_abstractive_candidates.py`

In [None]:
GENERATION_CONFIGS = {
    "top_p_sampling": {
        "max_new_tokens": 200,
        "do_sample": True,
        "top_p": 0.95,
        "temperature": 1.0,
        "num_return_sequences": 8,
        "num_beams" : 1,

        #"num_beam_groups" : 4,
    },

    **{
        f"sampling_topp_{str(topp).replace('.', '')}": {
            "max_new_tokens": 200,
            "do_sample": True,
            "num_return_sequences": 8,
            "top_p": 0.95,
        }
        for topp in [0.5, 0.8, 0.95, 0.99]
    },
}

for key, value in GENERATION_CONFIGS.items():
    GENERATION_CONFIGS[key] = {
        # "max_length": 2048,
        "min_length": 0,
        "early_stopping": True,
        **value,
    }

In [23]:
GENERATION_CONFIGS['sampling_topp_05']

{'min_length': 0,
 'early_stopping': True,
 'max_new_tokens': 200,
 'do_sample': True,
 'num_return_sequences': 8,
 'top_p': 0.95}

In [24]:
from glimpse.data_loading import generate_abstractive_candidates

In [33]:
tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.unk_token_id
dataset = generate_abstractive_candidates.prepare_dataset('data/processed/all_reviews_2017.csv')

In [35]:
dataset = dataset.select(range(10))

In [39]:
tokenizer.pad_token = tokenizer.eos_token
dataset = generate_abstractive_candidates.evaluate_summarizer(
    model,
    tokenizer,
    dataset,
    GENERATION_CONFIGS['sampling_topp_05'],
    2,
    'cuda',
    True,
)


Generating summaries...


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 20%|██        | 1/5 [02:27<09:51, 147.86s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 40%|████      | 2/5 [04:54<07:21, 147.07s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 60%|██████    | 3/5 [07:21<04:54, 147.27s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 80%|████████  | 4/5 [18:13<05:46, 346.56s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
100%|██████████| 5/5 [23:49<00:00, 285.93s/it]
Map: 100%|██████████| 10/10 [00:00<00:00, 37.49 examples/s]


In [40]:
dataset

Dataset({
    features: ['id', 'text', 'gold', 'summary'],
    num_rows: 10
})

In [43]:
dataset[0]

{'id': 'https://openreview.net/forum?id=r1rhWnZkg',
 'text': 'Summary: The paper presents low-rank bilinear pooling that uses Hadamard product (commonly known as element-wise multiplication). The paper implements low-rank bilinear pooling on an existing model (Kim et al., 2016b) and builds a model for Visual Question Answering (VQA) that outperforms the current state-of-art by 0.42%. The paper presents various ablation studies of the new VQA model they built.----------------Strengths:----------------1. The paper presents new insights into element-wise multiplication operation which has been previously used in VQA literature (such as Antol et al., ICCV 2015) without insights on why it should work. ----------------2. The paper presents a new model for the task of VQA that beats the current state-of-art by 0.42%. However, I have concerns about the statistical significance of the performance (see weaknesses below).----------------3. The various design choices made in model development have

In [44]:
df_dataset = dataset.to_pandas()
df_dataset = df_dataset.explode('summary')
df_dataset = df_dataset.reset_index()

In [46]:
df_dataset['id_candidate'] = df_dataset.groupby(['index']).cumcount()

In [48]:
df_dataset.head()

Unnamed: 0,index,id,text,gold,summary,id_candidate
0,0,https://openreview.net/forum?id=r1rhWnZkg,Summary: The paper presents low-rank bilinear ...,The program committee appreciates the authors'...,Summary: The paper presents low-rank bilinear ...,0
1,0,https://openreview.net/forum?id=r1rhWnZkg,Summary: The paper presents low-rank bilinear ...,The program committee appreciates the authors'...,Summary: The paper presents low-rank bilinear ...,1
2,0,https://openreview.net/forum?id=r1rhWnZkg,Summary: The paper presents low-rank bilinear ...,The program committee appreciates the authors'...,Summary: The paper presents low-rank bilinear ...,2
3,0,https://openreview.net/forum?id=r1rhWnZkg,Summary: The paper presents low-rank bilinear ...,The program committee appreciates the authors'...,Summary: The paper presents low-rank bilinear ...,3
4,0,https://openreview.net/forum?id=r1rhWnZkg,Summary: The paper presents low-rank bilinear ...,The program committee appreciates the authors'...,Summary: The paper presents low-rank bilinear ...,4


2. `generate_extractive_candidates.py`

Same as abstractive_candidates while summaries are nothing but the set of sentences.
___

Next step is to discover `glimpse/evaluate` where a set of evaluators is introduced. Here nothing special, dataframe is filtered to get the gold vs summaries, and then some evaluator is called to be applied.

1. `evaluate_bartbert_metrics.py`: Computing Bert Score
2. `evaluate_commo_metrics_samples.py`: Evaluating Rouge; Rouge1, Rouge2, RougeL and RougeLsum
3. ``