
In order to develop and evaluate our parameter-efficient fine-tuning approach for automated peer review, we first examined the structure of the dataset derived from the *TP 2017 Conference* peer review records. Each entry consists of a paper (title and abstract) and one or more human-written reviews, which include qualitative feedback and numeric metadata such as rating, confidence, and decision. Understanding the basic characteristics of this dataset is crucial: it informs tokenization choices, maximum sequence lengths, and ultimately the feasibility of training resource-constrained large language models (LLMs) using Low-Rank Adaptation (LoRA).
These descriptive statistics provide insights into the expected input/output lengths, highlighting potential challenges in balancing completeness of context with computational efficiency.

In [3]:
import pandas as pd
# Load the dataset
df = pd.read_excel("../data/tp_2017conference.xlsx")

# Preview
df.head()

Unnamed: 0,title,Unnamed: 1,abstract,keywords,E,F,G,decision,I,J,K,confidence,rate,N,O,P,Q,R,review,T
0,Making Neural Programming Architectures Genera...,06 Nov 2016 (modified: 11 Mar 2017),"Abstract:###Empirically, neural networks that ...",Keywords:###Deep learning,Conflicts:###berkeley.edu,77,02 May 2017,Decision:###Accept (Oral),"Comment:###The reviewers were very favourable,...",129,17 Dec 2016 23 Dec 2016,Confidence:###4: The reviewer is confident but...,"Rating:###8: Top 50% of accepted papers, clear...",,,,,,This paper argues that being able to handle re...,1665
1,Making Neural Programming Architectures Genera...,06 Nov 2016 (modified: 11 Mar 2017),"Abstract:###Empirically, neural networks that ...",Keywords:###Deep learning,Conflicts:###berkeley.edu,77,02 May 2017,Decision:###Accept (Oral),"Comment:###The reviewers were very favourable,...",129,17 Dec 2016 23 Dec 2016,Confidence:###3: The reviewer is fairly confid...,"Rating:###8: Top 50% of accepted papers, clear...",,,,,,This is a very interesting and fairly easy to ...,1024
2,Making Neural Programming Architectures Genera...,06 Nov 2016 (modified: 11 Mar 2017),"Abstract:###Empirically, neural networks that ...",Keywords:###Deep learning,Conflicts:###berkeley.edu,77,02 May 2017,Decision:###Accept (Oral),"Comment:###The reviewers were very favourable,...",129,16 Dec 2016 16 Dec 2016 23 Dec 2016 (modified:...,Confidence:###5: The reviewer is absolutely ce...,"Rating:###9: Top 15% of accepted papers, stron...",,,,,,This paper improves significantly upon the ori...,566
3,End-to-end Optimized Image Compression | OpenR...,06 Nov 2016 (modified: 03 Mar 2017),Abstract:###We describe an image compression m...,Keywords:###Deep learning,"Conflicts:###nyu.edu, rwth-aachen.de, uv.es",51,20 Feb 2017 (modified: 20 Feb 2017),Decision:###Accept (Oral),Comment:###This is one of the two top papers i...,169,20 Feb 2017 (modified: 20 Feb 2017),Confidence:###4: The reviewer is confident but...,"Rating:###7: Good paper, accept",,,,,,Two things I*d like to see. 1) Specifics about...,698
4,End-to-end Optimized Image Compression | OpenR...,06 Nov 2016 (modified: 03 Mar 2017),Abstract:###We describe an image compression m...,Keywords:###Deep learning,"Conflicts:###nyu.edu, rwth-aachen.de, uv.es",51,20 Feb 2017 (modified: 20 Feb 2017),Decision:###Accept (Oral),Comment:###This is one of the two top papers i...,169,03 Jan 2017 (modified: 03 Jan 2017),Confidence:###4: The reviewer is confident but...,"Rating:###9: Top 15% of accepted papers, stron...",,,,,,This is the most convincing paper on image com...,761


In [4]:
# Compute statistics
num_papers = df["title"].nunique()
num_reviews = int(df["review"].count())

# Average tokens per paper (based on title + abstract)
df["paper_text"] = df["title"].astype(str) + " " + df["abstract"].astype(str)
avg_tokens_per_paper = df["paper_text"].apply(lambda x: len(x.split())).mean()
max_tokens_per_paper = df["paper_text"].apply(lambda x: len(x.split())).max()

# Average tokens per review
avg_tokens_per_review = df["review"].apply(lambda x: len(x.split())).mean()
max_tokens_per_review = df["review"].apply(lambda x: len(x.split())).max()

# Show computed statistics
stats = {
    "Number of papers": num_papers,
    "Number of reviews": num_reviews,
    "Avg. tokens per paper": round(avg_tokens_per_paper),
    "Avg. tokens per review": round(avg_tokens_per_review),
    "Max tokens per paper": max_tokens_per_paper,
    "Max tokens per review": max_tokens_per_review
}
stats

{'Number of papers': 489,
 'Number of reviews': 1495,
 'Avg. tokens per paper': 163,
 'Avg. tokens per review': 296,
 'Max tokens per paper': 358,
 'Max tokens per review': 1323}

# Results

The dataset contains 489 unique papers and a total of 1,495 reviews, confirming that most papers are associated with multiple independent evaluations.  

- On average, a paper (title + abstract) consists of approximately 163 tokens, with the longest paper reaching 358 tokens. This indicates that paper metadata is relatively short and manageable for input encoding.  
- Reviews are substantially longer, averaging 296 tokens per review, with the longest review reaching 1,323 tokens. This suggests that review generation requires handling longer outputs and emphasizes the need for efficient decoding strategies.  

These results highlight two important points for the modeling process:  
1. Input length — paper texts are short and consistent, which reduces the risk of truncation and allows for efficient batching.  
2. Output length — reviews exhibit much greater variability in length, which could pose challenges for model generation, particularly in maintaining coherence in long outputs.  

Overall, the statistics confirm that the dataset is of sufficient size and complexity to support fine-tuning experiments, while also motivating the use of parameter-efficient methods like LoRA to handle long-form review generation under resource constraints.
