# Multi-Agent Review Generation for Scientific Papers

### Introduction

Peer-review of scientific papers is a central part of decision making process in academic journals. A research paper submitted to the journal is typically assigned 2-3 reviewers with an expertise in corresponding fields, who are requested to provide detailed responses regarding originality and impact of research study, its potential flaws and areas for improvement. Based on results of peer review, editor makes decision either to publish paper in a journal, reject it, or request authors to do additional work and improve / revise the paper.

Results of peer review process can be highly subjective and susceptible to bias. To eliminate bias, some journals have introduced double-blind mode, when author identity is not revealed to reviewers. Studies have shown that under double-blind peer-review, acceptance rate is increased for female authors and decreases for famous authors and authors from high-prestige institutions [1-8]. However, double blind peer-review cannot completelty eliminate human factor and guarantee anonymization of author identity in e.g. small research areas. Furthermore, an increasing number of research works today are being made available open access before publication, in a preprint form.

In addition to this, the number of subsmissions to academic journals has been continously growing, making in harder for the journal to recruit enough professional reviewers to handle that volume [9].

Automatization of peer review process can be a potential remedy to these problems. Although machine learning and natural language processing methods today are used by academic publishers for automatic pre-screening of submissions [10], the area is still in its infancy and major part of the review process remains non-automated. Recently, large language models have revolutionized the field of AI; neural networks based on transformer architecture provide a much more nuanced understanding of human language and a able to memorize a large amount of knowledge while training.

### Approach & related work

The ability of large language models such as ChatGPT to generate reviews for academic papers has been explored in a number of studies [11-16]. One of the recent developments is MARG, a system for multi-agent review generation [16]. The rationale for this system was based on limitation of early version of GPT-4. The size of the context window was limited to 8192 tokens, that was not enough to process a whole research paper. MARG uses multiple AI agents to process different chunks of the paper, after that reviews generated by individual agents are summarized by expert and main agents. The system was shown to increase the amount of usable or actionable feedback instead of general comments in reviews produced with GPT-4.

Since the MARG system was introduced, large language models became more advanced; the latest version of GPT now supports up to 128K tokens in the context window, eliminating the problem initially solved by system [17]. Recently, Titan architecture for neural networks was introfuced by Google, thar enabled even larger context size up to 2M [18]. Still, multi-agent review generation can have advantages, as it enables different AI agents to focus on different sections and aspects of the paper. Furthermore, MARG uses traditional chat completion API, while OpenAI has recently introduced Assistants API that makes development of multi agent systems easier [19].

In this work, multi-agent review generation system for scientific papers will be developed, that will be based on approach introduced by MARG, but using a new API and implementing a number of improvements that are intended to make multi-agent systems more accessible for research and experimentation.

### Libraries & dependencies

This notebook uses some non-standard libraries that must be installed on the system:

In [None]:
! pip install names                # name generation for agents
! pip install OpenAI tiktoken      # interaction with GPT-4o model
! pip install markdown ipywidgets  # formatting outputs
                                   # loading and processing PDF files
! pip install ipyfilechooser grobid-client-python grobid2json lxml BeautifulSoup

In [2]:
import os
import json
import time
import shutil
import base64
import random
import pandas
import tiktoken
import ipywidgets

from glob import glob
from openai import OpenAI
from bs4 import BeautifulSoup
from markdown import markdown
from datetime import datetime
from tqdm.notebook import tqdm
from names import get_full_name
from ipyfilechooser import FileChooser
from grobid2json import convert_xml_to_json
from grobid_client.grobid_client import GrobidClient
from IPython.display import clear_output, display, HTML
from ipywidgets import HBox, Image, IntProgress, Layout

tiktoken.encoding_for_model("gpt-4o") ; # pre-load model data

### Configuration

To run code in this notebook smoothly, **Tier 2** OpenAI API key is recommended. The address of running [GrobID instance](https://grobid.readthedocs.io/en/latest/Run-Grobid/) is required to extract text from PDF files (the service can be installed locally with Docker)

In [9]:
OPENAI_API_KEY = ""
GROBID_SERVER  = "http://127.0.0.1:8070"

### PDF preprocessing

A research paper submitted for review will be split into sections, and every agent will be assigned one or more sections of the paper to analyze (short sections will be concatenated together). Furthermore, each agent will have information about title and abstact of the paper.

To do this task, I will use GROBID -- an advanced service for processing PDF research articles that can extract different sections of the paper. The following class imlpements pre-processing functionality:

In [11]:
grobid = GrobidClient(grobid_server = GROBID_SERVER) # connect to GROBID service

class Paper:
    
    def __init__(self):
        self.client = grobid
        self.folder = 'data/papers'  # cache for processed PDF
        self.tokens = None           # estimate n of tokens per chunk
        
        # properties to keep structured text from processed PDF
        self.title    = ''
        self.abstract = ''
        self.sections = {}
        self.chunks   = []

        self.filename = ''

    # take path to PDF file and convert to structured JSON with GROBID
    def json(self, pdf):
        data = self.folder
        pdfi = pdf.split('/')[-1].rsplit('.', 1)[0]
        xml = f'{data}/{pdfi}.grobid.tei.xml'

        # convert paper to XML form using GROBID service
        # / take XML file from cache if available
        if not os.path.exists(xml):
            print('processing PDF file...', end = ' ')
            try:
                shutil.copyfile(pdf, f'{data}/{pdfi}.pdf')
            except shutil.SameFileError:
                pass
            self.client.process('processFulltextDocument', data, output = data)
            print('done.')

        # convert XML to JSON format for easier processing
        xml  = BeautifulSoup(open(xml, 'rb').read(), 'xml')
        jres = convert_xml_to_json(xml, pdfi, "").as_json()

        self.filename = pdfi
        return jres

    
    # parse PDF file and extract title, abstact and sections
    def parse(self, pdf):
        paper = self.json(pdf) # parse PDF to JSON
        self.sections = {}     # extract sections
        for chunk in paper['body_text']:
            sect, text = chunk['section'], chunk['text']
            if sect not in self.sections:
                self.sections[sect] = ''
            self.sections[sect] += "\n" + text
            
        self.title    = paper['metadata']['title']
        self.abstract = paper['abstract'][0]['text']

    # split PDF article to chunks of text of max token size
    # cut off all text after stop section (conclusion by default)
    def process(self, pdf, stop = 'conclusion', maxt = 1024, model = 'gpt-4o'):
        self.parse(pdf) # extract article text split by sections
        if self.tokens is None: # get tokenizer for model
            # to estimate number of tokens in each chunk
            self.tokens = tiktoken.encoding_for_model(model)

        chunks, i = [], 0
        for sect, text in self.sections.items():
            if stop is not None:
                if stop in sect.lower(): break

            # append section to current chunk
            if len(chunks) == i: chunks.append('')
            chunks[i] += f"\n\n[{sect}]\n\n{text.strip()}"

            # append new chunk if current is large enough
            tokens = self.tokens.encode(chunks[i])
            if len(tokens) > maxt: i += 1

        self.chunks = chunks
        return chunks

GROBID server is up and running


I will choose research paper and process it using the class:

In [None]:
pdf = FileChooser()
pdf.title = 'Select PDF article for review: '
pdf.filter_pattern = '*.pdf'
display(pdf)

Process the article PDF (if remote GROBID server is used, text extraction can take a minute or two) :

In [13]:
paper  = Paper()
chunks = paper.process(pdf.selected)

# preview the processed PDF file
print(f'\n{paper.title}\n\n{paper.abstract}\n')

for i, section in enumerate(paper.sections.keys()):
    print(section)
    if 'conclusion' in section.lower() and i + 1 < len(paper.sections):
        print('-----------------------------------------')


Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 Englishto-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literatur

In [14]:
print(f'Number of chunks: {len(chunks)}')

Number of chunks: 5


To review this paper, 7 agents will be run (five reviewers for every chunk, one expert and one editor that will coordinate the whole process)

### Multi-agent framework

Let's start by initializing client for OpenAI API, and defining a generic class for OpenAI-based agent. The class will automatially generate a random name for AI agent if it was not provided, and assign a profile picture:

In [15]:
client = OpenAI(api_key = OPENAI_API_KEY)

class Agent:
    
    def __init__(self, name = '',
                     gender = '',
                     avatar = '',
                       role = '',
                      model = 'gpt-4o',
                    context = None # messages to initialize chat
                ):

        self.gender = gender if gender else random.choice(['male', 'female'])
        self.name = name if name else get_full_name(gender = self.gender)
        self.avatar = avatar if avatar else photosource.take(self.gender)
        self.role = role

        # initialize OpenAI assistant, communication thread and context
        self.agent = client.beta.assistants.create(
            name = self.name, instructions = '', model = model)
        self.thread = client.beta.threads.create()
        basic = f"Your name is {self.name}."
        if self.role: basic += f' You are {self.role.lower()}'
        if context is None: context = []
        context.insert(0, basic)
        self.context(context)
        
        self.message  = None  # most recent prompt
        self.response = None  # most recent response
        self.running  = None  # for asynchronous queries

    # push messages that will be used as context for answer
    def context(self, messages, role = "user"):
        for message in messages:
            client.beta.threads.messages.create(
                    thread_id = self.thread.id, role = role, content = message)
    
    # retrieve most recent message in chat (e.g. answer from the model)
    def answer(self):
        messages = client.beta.threads.messages.list( thread_id = self.thread.id )
        if messages.data:
            self.response = messages.data[0].content[0].text.value.strip()
        return self.response

    # send message to the model and get response
    def prompt(self, message):
        self.message = message
        client.beta.threads.runs.create_and_poll(
            thread_id    = self.thread.id, 
            assistant_id = self.agent.id,
            instructions = self.message)
        return self.answer()

    # for asynchronous answering, check for response
    def complete(self):
        if not self.running: return None
        # check status of background query
        self.running = client.beta.threads.runs.retrieve(
                           thread_id = self.thread.id,
                           run_id    = self.running.id)
        # repeat the query in case of error
        if self.running.status == "failed":
            time.sleep(1)
            self.queue()
        if self.running.status == "completed":
            self.running = None
            return True
        return False

    # query the model in background
    def queue(self, message = None):
        if message: self.message = message
        self.running = client.beta.threads.runs.create(
                           thread_id    = self.thread.id, 
                           assistant_id = self.agent.id,
                           instructions = self.message)

    # display information about the agent
    def _ipython_display_(self):
        display(HTML(f'''<div style = "text-align:center;padding:18px;display:flex;align-items:center">
                    <img src = "{self.avatar}"
                       align = "middle"
                       style = "width: 96px; border-radius: 12px;margin-right: 18px">
                    <div><b style = "font-size: 18px">
                        {self.name}</b><br>
                        {self.role}</div></div>'''))


# auxiliary class for issuing random profile pictures
class PhotoSource:

    def __init__(self): self.reset()
    def    reset(self): self.avatars = {gender: glob(f'data/avatars/{gender}/*') for gender in ['male', 'female']}
    
    def take(self, gender):
        picture = random.choice(self.avatars[gender])
        if picture:
            self.avatars[gender].remove(picture)
            if len(self.avatars[gender]) == 0:
                self.reset()
        return picture if picture else ''

photosource = PhotoSource()

Test class by creating a simple agent and asking a question:

In [16]:
student = Agent(role = "Phd student")
student

In [17]:
print(student.prompt("What does your typical day look like?"))

As Donna Gaydosh, a PhD student, my typical day likely revolves around a combination of research, coursework, and academic activities. Here's what my day might generally look like:

### Morning
- **6:30 AM**: Wake up and have breakfast. This is a good time to catch up on news or read articles related to my field of study.
- **7:30 AM**: Head to the university, either by walking, biking, or public transit, depending on the distance.
- **8:00 AM**: Start the day with lab work or begin my study session in a quiet spot at the library. This block of time is dedicated to focused work without interruptions.

### Midday
- **11:30 AM**: Attend a lecture or a seminar related to my coursework or research interests.
- **1:00 PM**: Lunch break. Sometimes, I meet with peers or colleagues to discuss ongoing projects or just to socialize.
- **2:00 PM**: Dedicate time to writing and reviewing my thesis or any papers I am working on. This includes data analysis and hypothesis testing.

### Afternoon
- *

The answer took quite a time to generate. To avoid long waiting, let's extend the class by adding support for streaming, and also apply some formatting to the output so it will be easier to read. I will define a separate class *View* that will handle display of output:

In [18]:
class View:

    def __init__(self):
        self.box = None
        self.author = None
        self.answer = None
    
    # this method will be invoked by agent on receiving
    # new tokens and once more when full answer in generated
    def update(self, author, text, complete = None):

        # some additional preprocessing depending on type of agent
        text = author.prepare(text)
        # answers from LLMs are formatted in markdown
        text = markdown(text).replace('<h3>', '<h3 style = "font-size: 14px">')

        # format and display the answer
        if not self.answer: self.answer = ipywidgets.HTML()
        self.answer.value = f"""<div style = "border: solid 2px #ccc;
                                     border-radius: 18px;
                                     font-size: 12px;
                                     line-height: 16px;
                                     font-family: Courier;
                                     max-width: 640px;
                                     padding: 12px">
                                     {text}
                                </div>"""

        # display full reply box with author name and picture
        if not self.box:
            self.author = f"""<div style = "padding: 0 12px 0 12px; 
                                         text-align: center">
                        <img src = "{author.avatar}"
                           style = "border-radius : 12px;
                                            width : 96px">
                        <p style = "text-align    : center">
                        <b style = "font-size     : 14px;
                                    font-weight   : bold;">
                                    {author.name}</b><br>
                                    {author.role}</p>
                        </div>"""
            
            self.author = widget(HTML(self.author))

            columns = [self.author, self.answer]
            if author.position == 'right':
                columns.reverse()
            self.box = HBox(columns)
            
            display(self.box)


class NiceAgent(Agent):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.position = 'left'
    
    # a stub method to apply additional processing
    # before returning answer to the user
    def prepare(self, response):
        return response
    
    def stream(self, message, view = None):
        if not view: view = View()

        self.message = message
        response = ''

        # invoke API method for token streaming
        with client.beta.threads.runs.stream(
            thread_id    = self.thread.id,
            assistant_id = self.agent.id,
            instructions = self.message) as stream:
                for text in stream.text_deltas:
                    # append new tokens and
                    # display updated answer
                    response += text
                    view.update(self, response)
        
        # finalize the answer when it's complete
        view.update(self, response, complete = True)
        self.response = response
        return response

# auxiliary function for creating
# re-usable element representations
def widget(element):
    result = ipywidgets.Output()
    with result: display(element)
    return result

Let's check how answering is now improved:

In [19]:
poet = NiceAgent(name = "William Shakespeare", role = "Poet", gender = "male")
poet.stream("To be or not to be? Answer in a poem.") ;

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

### Multi-agent review system

Now we can proceed to building a multi-agent system for peer review and define a generic class that will include everyone involved in the process - editor, expert and reviewers:

In [20]:
class ReviewAgent(NiceAgent):
    
    def __init__(self, role):
        # system prompt will be loaded at startup from .txt file
        self.system = open(f'data/prompts/{role.lower()}.txt').read()
        super().__init__(role = role)

    
    def invite(self, editor):
        # when agent joins a group, it will get an initial prompt
        # with the name of the group leader (editor)
        self.context([self.system.replace('{editor}', editor.name)])

    
    # the group will use a specific protocol for communication
    # so responses must be cleared from protocol terms before viewing
    
    def prepare(self, response):
        prefix, s = 'SEND MESSAGE', ' *:'
        messages = response.split(prefix)
        messages = [f.strip(s).strip() for f in messages if f.strip(s) != '']
        response = '\n\n'.join(messages)
        return response


Now we can define classes for each type of agent. Editor class will be the most complex of all, as it will handle hiring other members of the team:

In [21]:
class Editor(ReviewAgent):
    
    def __init__(self):
        super().__init__(role = 'Editor')
        self.context([self.system]) # apply initial prompt
        self.team = []              # list of all agents (reviewers) in the group
        self.discussion = []        # list of responses from all reviewers
        self.verbose = True         # if True live discussion will be displayed

    # save new message that was received from other agent, or Editor's own
    def save(self, member, message):
        self.discussion.append((datetime.now(),
                                member.name,
                                member.role,
                                member.avatar,
                                message))

    # add reviewer / expert to the group
    def invite(self, member):
        
        # send the bame of the Editor to new group member
        member.invite(self)
        
        self.team.append(member)
        if member.role == 'Expert':
            self.expert = member
        return member

    # send Editor's message to all agents in the group
    def broadcast(self, message):
        # display progress bar if not in silent mode
        # (sending through API will introduce small delays)
        if self.verbose:
            sending = IntProgress(value = 0, min = 0, max = len(self.team),
                            description = 'Broadcasting message:',
                                  style = {'bar_color': 'darkcyan',
                                           'description_width': 'initial'},
                                 layout = {'margin': '4px 0 12px 188px'})
            display(sending)

        # send message to every member in random order
        random.shuffle(self.team)
        for member in self.team:
            member.context([f"From: {self.role} {self.name}\n\n{message}"])
            member.queue(f"You received new message (above) from {self.role} {self.name}. {self.role} is waiting for your reply.")
            if self.verbose: sending.value += 1

    
    # collect answers from all members
    def collect(self):
        answers = []
        for member in self.team:

            # skip the member if Editor did not ask anything
            if member.running:
                while not member.complete():
                    time.sleep(1)
                response = member.answer()
                if 'will stand by for further instructions' not in response:
                    answers.append(f"From: {member.role} {member.name}\n\n{response}")
                    self.save(member, response)
                    if self.verbose:
                        View().update(member, response, True)
        return answers


    def submit(self, paper, limit = 128):

        if self.verbose:
            print(f"Thank you for submitting your paper!\n\033[1m{paper.title}\033[0m")
        
        self.invite(Expert())
        
        if self.verbose:
            print("\nYour paper will be evaluated by an expert:")
            display(self.expert)
        
            print('Recruiting reviewers...\n')
            tab = panel()
            
        for chunk in paper.chunks:
            
            reviewer = self.invite(Reviewer())
            reviewer.assign(paper, chunk)
            
            if self.verbose: tab.put(reviewer)
        
        task = open(f'data/prompts/editor.task.txt').read()
        task = task.replace("{expert}", self.expert.name)

        prompt = task

        prompt += f'\n\nThe title of the research paper is: {paper.title}\n\n'
        prompt += f'Abstract: {paper.abstract}'
        
        self.discussion = []
        askme = self.stream if self.verbose else self.prompt
        while prompt and not ('READY' in askme(prompt) and self.discussion):
            prompt = ''

            self.broadcast(self.response)
            self.save(self, self.response)
            
            answers = self.collect()
            if answers:
                answers.insert(0, "You have received following responses from the group members:")
                self.context(answers)
                prompt  = "Please carefully read the responses. If needed, ask for additional clarifications, and provide information requested by other agents. Then proceed with the task according to the plan."

            if limit != None and len(self.discussion) > limit: break

        self.save(self, self.response)


# auxiliary class to display teams
class panel():
    
    def __init__(self):
        self.b = HBox([], layout = Layout(flex_flow = 'row wrap'))
        display(self.b)
        
    def put(self, o):
        self.b.children = tuple(list(self.b.children) + [widget(o)])

Initialize editor agent:

In [22]:
editor = Editor()
editor

Reviewer and Expert classes will require less code:

In [23]:

class Expert(ReviewAgent):
    def __init__(self):
        super().__init__(role = 'Expert')

class Reviewer(ReviewAgent):
    def __init__(self):
        super().__init__(role = 'Reviewer')
        self.position = 'right'

    def assign(self, paper, chunk):
        self.context([f"""
        Title: {paper.title}
        Abstract: {paper.abstract}
        
        Your paper chunk is shown below:
            --- START PAPER CHUNK ---
            { chunk }
            --- END PAPER CHUNK ---"""])

    def _ipython_display_(self):
        display(HTML(f'''<div style = "text-align: center; width: 64px">
                         <img style = "width: 100%; border-radius: 4px" src = "{self.avatar}">
                           <p style = "text-align: center">
                           <b style = "font-size: 14px;
                                    font-weight: bold;">
                                    {self.name}</b></p></div>'''))
        

We can now submit our paper for automated review:

In [24]:
editor.submit(paper)

Thank you for submitting your paper!
[1mAttention Is All You Need[0m

Your paper will be evaluated by an expert:


Recruiting reviewers...



HBox(layout=Layout(flex_flow='row wrap'))

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

IntProgress(value=0, description='Broadcasting message:', layout=Layout(margin='4px 0 12px 188px'), max=6, sty…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

IntProgress(value=0, description='Broadcasting message:', layout=Layout(margin='4px 0 12px 188px'), max=6, sty…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

IntProgress(value=0, description='Broadcasting message:', layout=Layout(margin='4px 0 12px 188px'), max=6, sty…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

IntProgress(value=0, description='Broadcasting message:', layout=Layout(margin='4px 0 12px 188px'), max=6, sty…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

IntProgress(value=0, description='Broadcasting message:', layout=Layout(margin='4px 0 12px 188px'), max=6, sty…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

IntProgress(value=0, description='Broadcasting message:', layout=Layout(margin='4px 0 12px 188px'), max=6, sty…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(HTML(value='<div style = "border: solid 2px #ccc;\n                                     border-…

HBox(children=(Output(), HTML(value='<div style = "border: solid 2px #ccc;\n                                  …

### Reporting

The following class will be helpful to save output to HTML or JSON file for later analysis:

In [25]:
class Report:

    def __init__(self, editor, paper):
        self.editor = editor
        self.paper  = paper

    def json(self):

        def date2json(dt): 
            return dt.isoformat() if isinstance(dt, datetime) else None
        
        report = {'title'      : self.paper.title,
                  'abstract'   : self.paper.abstract,
                  'sections'   : self.paper.sections,
                  'filename'   : self.paper.filename,
                  
                  'discussion' : self.editor.discussion}
        
        jo = f'data/results/{paper.filename}.json'
        open(jo, 'w').write(json.dumps(report, default = date2json))
    
    def html(self):
        
        htmlres = ''
        for record in self.editor.discussion:
            tstamp, name, role, avatar, comment = record
            comment = self.editor.prepare(comment)
            comment = markdown(comment).replace('<h3>', '<h3 style = "font-size: 14px">')
            comment = f'<div class = "comment">{comment}</div>'
            image64 = base64.b64encode(open(avatar, 'rb').read()).decode()
            block   = f"""<div class = "author">
                                <img src = "data:image/jpeg;base64,{image64}">
                                <div class = "name">{name}</div>
                                <div class = "role">{role}</div>
                                <div class = "time">{tstamp}</div>
                          </div>"""
            if role in ['Editor', 'Expert']:
                  block  = comment + block
            else: block += comment
            htmlres += f'<div class = "reply">{block}</div>'
        
        html = f"""
        <html>
            <head>
                <title>Review for: {self.paper.title}</title>
            </head>
            <body>
                <style type = "text/css">
                    body {{font-family: Courier; font-size: 12px;width: 88%; margin: 6%}}
                    .wrapper {{width:fit-content;margin:9 auto; max-width: 900px}}
                    .title {{font-size: 32px}}
                    .abstract {{text-align:justify}}
                    .discussion {{margin-top: 32px}}
                    .reply {{display: flex; margin: 32px 0}}
                    .author {{text-align: center;padding: 0 32px}}
                    .author img {{width: 96px; border-radius: 8px}}
                    .name {{font-weight:bold}}
                    .role {{}}
                    .time {{font-size: 10px}}
                    .comment {{border:solid 2px #ddd; padding: 12px; border-radius: 12px; height: fit-content}}
                    .review {{font-weight:bold}}
                </style>
                <div class = "wrapper">
                    <div class = "review">automatically generated review for research paper</div>
                    <div class = "title">{self.paper.title}</div>
                    <div class = "abstract">{self.paper.abstract}</div>
                    <div class = "discussion">{htmlres}</div>
                </div>
            </body>
        </html>
        """
        
        report = f'data/results/{paper.filename}.html'
        open(report, 'w').write(html)
        
        return report

In [26]:
report = Report(editor, paper)
report.json()

display(HTML(f'<p style = "margin-top:18px"><a href = "{report.html()}">Review for {paper.title}</a></p>'))

### References

    [1] R. M. Blank, “The Effects of Double-Blind versus Single-Blind Reviewing: Experimental Evidence from The American Economic Review,” The American Economic Review, vol. 81, no. 5, pp. 1041–1067, 1991.
    
    [2] A. E. Budden, T. Tregenza, L. W. Aarssen, J. Koricheva, R. Leimu, and C. J. Lortie, “Double-blind review favours increased representation of female authors,” Trends in Ecology & Evolution, vol. 23, no. 1, pp. 4–6, Jan. 2008, doi: 10.1016/j.tree.2007.07.008.

    [3] T. J. Webb, B. O’Hara, and R. P. Freckleton, “Does double-blind review benefit female authors?,” Trends in Ecology & Evolution, vol. 23, no. 7, pp. 351–353, Jul. 2008, doi: 10.1016/j.tree.2008.03.003.
    
    [4] E. S. Darling, “Use of double-blind peer review to increase author diversity,” Conservation Biology, vol. 29, no. 1, pp. 297–299, 2015.
    
    [5] K. Okike, K. T. Hug, M. S. Kocher, and S. S. Leopold, “Single-blind vs Double-blind Peer Review in the Setting of Author Prestige,” JAMA, vol. 316, no. 12, pp. 1315–1316, Sep. 2016, doi: 10.1001/jama.2016.11014.
    
    [6] A. Tomkins, M. Zhang, and W. D. Heavlin, “Reviewer bias in single- versus double-blind peer review,” Proceedings of the National Academy of Sciences, vol. 114, no. 48, pp. 12708–12713, Nov. 2017, doi: 10.1073/pnas.1707323114.
    
    [7] A. R. Kern-Goldberger, R. James, V. Berghella, and E. S. Miller, “The impact of double-blind peer review on gender bias in scientific publishing: a systematic review,” American Journal of Obstetrics and Gynecology, vol. 227, no. 1, pp. 43-50.e4, Jul. 2022, doi: 10.1016/j.ajog.2022.01.030.
    
    [8] M. A. Ucci, F. D’Antonio, and V. Berghella, “Double- vs single-blind peer review effect on acceptance rates: a systematic review and meta-analysis of randomized trials,” American Journal of Obstetrics & Gynecology MFM, vol. 4, no. 4, p. 100645, Jul. 2022, doi: 10.1016/j.ajogmf.2022.100645.
    
    [9] W. Yuan, P. Liu, and G. Neubig, “Can We Automate Scientific Reviewing?,” jair, vol. 75, pp. 171–212, Sep. 2022, doi: 10.1613/jair.1.12862.
    
    [10] A. Checco, L. Bracciale, P. Loreti, S. Pinfield, and G. Bianchi, “AI-assisted peer review,” Humanit Soc Sci Commun, vol. 8, no. 1, Art. no. 1, Jan. 2021, doi: 10.1057/s41599-020-00703-8.
    
    [11] A. Saad et al., “Exploring the potential of ChatGPT in the peer review process: An observational study,” Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 18, no. 2, p. 102946, Feb. 2024, doi: 10.1016/j.dsx.2024.102946.
    
    [12] Y. Jin et al., “AgentReview: Exploring Peer Review Dynamics with LLM Agents,” Oct. 13, 2024, arXiv: arXiv:2406.12708. doi: 10.48550/arXiv.2406.12708.
    
    [13] M. Thelwall and A. Yaghi, “Evaluating the Predictive Capacity of ChatGPT for Academic Peer Review Outcomes Across Multiple Platforms,” Nov. 14, 2024, arXiv: arXiv:2411.09763. doi: 10.48550/arXiv.2411.09763.
    
    [14] K. Cheng, Z. Sun, X. Liu, H. Wu, and C. Li, “Generative artificial intelligence is infiltrating peer review process,” Crit Care, vol. 28, no. 1, p. 149, May 2024, doi: 10.1186/s13054-024-04933-z.
    
    [15] M. Hosseini and S. P. J. M. Horbach, “Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review,” Res Integr Peer Rev, vol. 8, no. 1, p. 4, May 2023, doi: 10.1186/s41073-023-00133-5.
    
    [16] M. D’Arcy, T. Hope, L. Birnbaum, and D. Downey, “MARG: Multi-Agent Review Generation for Scientific Papers,” Jan. 08, 2024, arXiv: arXiv:2401.04259. doi: 10.48550/arXiv.2401.04259.

    [17] A. F. Rasheed, M. Zarkoosh, S. F. Abbas, and S. S. Al-Azzawi, “TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks,” Sep. 30, 2024, arXiv: arXiv:2409.20189. doi: 10.48550/arXiv.2409.20189.

    [18]  A. Behrouz, P. Zhong, and V. Mirrokni, “Titans: Learning to Memorize at Test Time,” Dec. 31, 2024, arXiv: arXiv:2501.00663. doi: 10.48550/arXiv.2501.00663.

    [19] J. Cao, “A Study on Deploying Large Language Models as Agents,” Thesis, Massachusetts Institute of Technology, 2024. Accessed: Jan. 22, 2025. [Online]. Available: https://dspace.mit.edu/handle/1721.1/157177