In [25]:
%reload_ext autoreload
%autoreload 2

import asyncio
import json
import nest_asyncio
import os
import sys
from dotenv import load_dotenv
import pandas as pd

sys.path.append('../')
from lattereview.providers.openai_provider import OpenAIProvider
from lattereview.agents.scoring_reviewer import ScoringReviewer
from lattereview.review_workflow import ReviewWorkflow

## Setting up the notebook

Loading environment variables:

In [None]:
# Load environment variables from .env file
load_dotenv('../.env')
print(os.getenv('OPENAI_API_KEY'))

# Enable asyncio in Jupyter
nest_asyncio.apply()

Loading a dummy dataset:

In [4]:
data = pd.read_excel('data.xlsx')
data.head()

Unnamed: 0,ID,Title,1st author,repo,year,abstract
0,1,Segmentized quarantine policy for managing a t...,"Kim, J.",arXiv,2024,"By the end of 2021, COVID-19 had spread to ove..."
1,2,AutoProteinEngine: A Large Language Model Driv...,"Liu, Y.",arXiv,2024,Protein engineering is important for biomedica...
2,3,Integration of Large Vision Language Models fo...,"Chen, Z.",arXiv,2024,Traditional natural disaster response involves...
3,4,Choice between Partial Trajectories,"Marklund, H.",arXiv,2024,As AI agents generate increasingly sophisticat...
4,5,Building Altruistic and Moral AI Agent with Br...,"Zhao, F.",arXiv,2024,"As AI closely interacts with human society, it..."


## Testing the base functionalities

Testing the OpenAI provider:

In [26]:
openanai_provider = OpenAIProvider(model="gpt-4o-mini")
question = "What is the capital of France?"
asyncio.run(openanai_provider.get_response(question, temperature=0.9))

('The capital of France is Paris.',
 {'input_cost': 1.05e-06, 'output_cost': 4.2e-06, 'total_cost': 5.25e-06})

Testing the ScoringReviewer agent:

In [6]:
agent = ScoringReviewer(
    provider=OpenAIProvider(model="gpt-4o-mini"),
    name="Pouria",
    backstory="an expert reviewer and researcher!",
    input_description = "article title",
    temperature=0.1,
    reasoning = "brief",
    max_tokens=100,
    review_criteria="Look for articles that certainly do not employ any AI or machine learning agents",
    score_set=[1, 2],
    scoring_rules='Your scores should be either 1 or 2. 1 means that the paper does not meet the criteria, and 2 means that the paper meets the criteria.',
)


# Dummy input
text_list = data.Title.str.lower().tolist()
print("Inputs:\n\n", '\n'.join(text_list[:3]))

# Dummy review
results, total_cost = asyncio.run(agent.review_items(text_list[:3]))
print("\nOutputs:\n\n", '\n'.join(results))

# Dummy costs
print("\nCosts:\n")
for item in agent.memory:
    print(item['cost'])

print("\nTotal cost:\n")
print(total_cost)

Inputs:

 segmentized quarantine policy for managing a tradeoff between containment of infectious disease and social cost of quarantine
autoproteinengine: a large language model driven agent framework for multimodal automl in protein engineering
integration of large vision language models for efficient post-disaster damage assessment and reporting

Outputs:

 {"score":2,"reasoning":"The segmentized quarantine policy effectively addresses the balance between controlling infectious disease spread and minimizing social costs, demonstrating a thoughtful approach to public health management."}
{"score":2,"reasoning":"The article presents a comprehensive framework for integrating large language models into protein engineering, which aligns well with current trends in multimodal automl, thus meeting the criteria effectively."}
{"score":2,"reasoning":"The integration of large vision language models for post-disaster damage assessment is a relevant and innovative approach that meets the criteri

## Testing the main Functionalities

#### A multiagent review workflow for doing title/abstract analysis

Setting up the agents:

In [36]:
pouria = ScoringReviewer(
    provider=OpenAIProvider(model="gpt-4o-mini"),
    name="Pouria",
    backstory="You are a junior radiologist with many years of background in statistcis and data science, who are famous among your colleagues for your systematic thinking, organizaton of thoughts, and being conservative",
    input_description = "tilte and abstract of scientific articles",
    temperature=0.1,
    reasoning = "brief",
    max_tokens=100,
    scoring_task="Look for articles that disucss large languange models-based AI agents applied to medical imaging data",
    score_set=[1, 2],
    scoring_rules='Your scores should be either 1 or 2. 1 means that the paper does not meet the criteria, and 2 means that the paper meets the criteria.',
)

bardia = ScoringReviewer(
    provider=OpenAIProvider(model="gpt-4o-mini"),
    name="Bardia",
    backstory="You are an expert in data science with a background in developing ML models for healthcare, who are famous among your colleagues for your creativity and out of the box thinking",
    input_description = "tilte and abstract of scientific articles",
    temperature=0.7,
    reasoning = "brief",
    max_tokens=100,
    scoring_task="Look for articles that disucss large languange models-based AI agents applied to medical imaging data",
    score_set=[1, 2],
    scoring_rules='Your scores should be either 1 or 2. 1 means that the paper does not meet the criteria, and 2 means that the paper meets the criteria.',
)

brad = ScoringReviewer(
    provider=OpenAIProvider(model="gpt-4o"),
    name="Brad",
    backstory="You are a senior radiologist with a PhD in computer science and years of experience as the director of a DL lab focused on developing ML models for radiology and healthcare",
    input_description = "tilte and abstract of scientific articles",
    temperature=0.4,
    reasoning = "cot",
    max_tokens=100,
    scoring_task="""Pouria and Bardia have Looked for articles that disucss large languange models-based AI agents applied to medical imaging data. 
                       They scored an article 1 if they thought it does not meet this criteria, 2 if they thought it meets the criteria, 0 if they were uncertain of scoring.
                       You will receive an article they have had different opinions about, as well as each of their scores and their reasoning for that score. Read their reviews and determine who you agree with. 
                    """,
    score_set=[1, 2],
    scoring_rules="""Your scores should be either 1 or 2. 
                     1 means that you agree with Pouria, 2 means that you agree with Bardia, 
                  """,
)


Setting up the review workflow:

In [37]:
title_abs_review = ReviewWorkflow(
    workflow_schema=[
        {
            "round": 'A',
            "reviewers": [pouria, bardia],
            "inputs": ["Title", "abstract"]
        },
        {
            "round": 'B',
            "reviewers": [brad],
            "inputs": ["Title", "abstract", "round-A_Pouria_output", "round-A_Bardia_output"],
            "filter": lambda row: row["round-A_Pouria_output"]["score"] != row["round-A_Bardia_output"]["score"]
        }
    ]
)

Applying the review workflow to a number of sample articles:

In [38]:
# Reload the data if needed.
sample_data = pd.read_excel('data.xlsx').sample(10).reset_index(drop=True)
updated_data = asyncio.run(title_abs_review(sample_data))

print("Total cost: ")
print(title_abs_review.get_total_cost())

print("\nDetailed cost:")
print(title_abs_review.reviewer_costs)

updated_data


Starting review round A (1/2)...
Reviewers: ['Pouria', 'Bardia']
Input data: ['Title', 'abstract']


                                                       

Number of eligible rows for review: 10


                                                        


Starting review round B (2/2)...
Reviewers: ['Brad']
Input data: ['Title', 'abstract', 'round-A_Pouria_output', 'round-A_Bardia_output']


                                                       

Number of eligible rows for review: 1


                                                        

Total cost: 
0.0029861

Detailed cost:
{('A', 'Pouria'): 9.495e-05, ('A', 'Bardia'): 9.615e-05, ('B', 'Brad'): 0.002795}




Unnamed: 0,ID,Title,1st author,repo,year,abstract,round-A_Pouria_output,round-A_Bardia_output,round-B_Brad_output
0,85,An Autonomous GIS Agent Framework for Geospati...,"Ning, H.",arXiv,2024,Powered by the emerging large language models ...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
1,281,Probabilistic Phase Labeling and Lattice Refin...,"Chang, M.-C.",arXiv,2023,X-ray diffraction (XRD) is an essential techni...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
2,504,Deep Multiagent Reinforcement Learning: Challe...,"Wong, A.",arXiv,2021,This paper surveys the field of deep multiagen...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
3,622,Universal masking is urgent in the COVID-19 pa...,"Kai, D.",arXiv,2020,We present two models for the COVID-19 pandemi...,"{'score': 1, 'reasoning': 'The article focuses...","{'score': 0, 'reasoning': 'The article focuses...","{'score': 1, 'reasoning': 'The article provide..."
4,721,Hypoxia increases the tempo of evolution in gl...,"Grimes, D.R.",bioRxiv,2018,Background: Low oxygen in tumours have long be...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
5,489,Agent-based investigation of the impact of low...,"Krauland, M.G.",medRxiv,2021,Introduction Interventions to curb the spread ...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
6,419,A GENERAL FRAMEWORK FOR OPTIMISING COST-EFFECT...,"Nguyen, Q.D.",arXiv,2022,The COVID-19 pandemic created enormous public ...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article focuses...",
7,562,Open problems in cooperative AI,"Dafoe, A.",arXiv,2020,Problems of cooperation—in which agents seek w...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
8,253,Self-Supervised Neuron Segmentation with Multi...,"Chen, Y.",arXiv,2023,The performance of existing supervised neuron ...,"{'score': 1, 'reasoning': 'The article discuss...","{'score': 1, 'reasoning': 'The article discuss...",
9,578,Who are the 'silent spreaders'?: Contact traci...,"Hu, Y.",arXiv,2020,The COVID-19 epidemic has swept the world for ...,"{'score': 1, 'reasoning': 'The article focuses...","{'score': 1, 'reasoning': 'The article discuss...",


In [39]:
for i, row in updated_data.iterrows():
    print(
        f"""
        Title: {row.Title}
        Abstract: {row.abstract}
        Pouria's review: {row["round-A_Pouria_output"]}
        Bardia's review: {row["round-A_Bardia_output"]}
        Brad's review: {row["round-B_Brad_output"]}
        """
)


        Title: An Autonomous GIS Agent Framework for Geospatial Data Retrieval
        Abstract: Powered by the emerging large language models (LLMs), autonomous geographic information systems (GIS) agents have the potential to accomplish spatial analyses and cartographic tasks. However, a research gap exists to support fully autonomous GIS agents: how to enable agents to discover and download the necessary data for geospatial analyses. This study proposes an autonomous GIS agent framework capable of retrieving required geospatial data by generating, executing, and debugging programs. The framework utilizes the LLM as the decision-maker, selects the appropriate data source (s) from a pre-defined source list, and fetches the data from the chosen source. Each data source has a handbook that records the metadata and technical details for data retrieval. The proposed framework is designed in a plug-and-play style to ensure flexibility and extensibility. Human users or autonomous data scra

In [40]:
brad.memory

[{'identity': {'system_prompt': "Your name is <<Brad>> and you are <<You are a senior radiologist with a PhD in computer science and years of experience as the director of a DL lab focused on developing ML models for radiology and healthcare>>. Your task is to review input itmes with the following description: <<tilte and abstract of scientific articles>>. Your final output should have the following keys: score (<class 'int'>), reasoning (<class 'str'>).",
   'item_prompt': 'Review the input item below and evaluate it against the following criteria: Scoring task: <<Pouria and Bardia have Looked for articles that disucss large languange models-based AI agents applied to medical imaging data. They scored an article 1 if they thought it does not meet this criteria, 2 if they thought it meets the criteria, 0 if they were uncertain of scoring. You will receive an article they have had different opinions about, as well as each of their scores and their reasoning for that score. Read their re