### Research Publication Analysis

This sample code demonstrates the use of agentic architecture to analyse Research Publication. The pattern makes use of agents specialised in interpreting specific areas such as the fields of research, affiliation to universities and information on research grants & funding.

The multi-agent example published by Microsoft has been used as a reference for this notebook - https://github.com/Azure-Samples/azureai-samples/blob/main/scenarios/Assistants/multi-agent/multi-agent.ipynb


In [None]:
# Uncomment lines below to install dependencies; re-comment after installation.
# %pip install -r requirements.txt
# %pip install --upgrade openai

In [None]:
# Set-up
import os
import time
from dotenv import load_dotenv
from openai import AzureOpenAI
from openai.types.beta import Thread
from openai.types.beta import Assistant

load_dotenv()

assistant_client = AzureOpenAI(
    api_key=os.getenv("GPT4_AZURE_OPENAI_KEY"),
    api_version=os.getenv("GPT4_AZURE_OPENAI_API_VERSION"),
    azure_endpoint=os.getenv(
        "GPT4_AZURE_OPENAI_ENDPOINT"
    ),
)

assistant_deployment_name = os.getenv(
    "GPT4_DEPLOYMENT_NAME"
)

In [None]:
# Prompt - FOR Code Detection Assistant
name_forcode_detector = "forcode_detection_assistant"
instructions_forcode_detector = """You are an assistant that reviews research papers and determines their field of research. 

Your job is to determine the fields of research covered in the provided research paper. Review the entire research paper and pick the most appropriate research code based on your analysis. If the paper covers cross-disciplinary fields of research, proceed to pick at the most 1 more code. Ensure you start with the primary industry being researched first. Based on the category of the primary industry, pick other codes if required within the same category. Do not pick codes that fall in different categories. The research paper is the pdf document available in your vector store.

The reference for the listing of codes along with their categories and descriptions is available below in CSV format. Ensure you use this list for your analysis.

Print the research codes and the reason you picked them.

Code,Category,Description
3001,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Agricultural biotechnology
3002,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Agriculture, land and farm management
3003,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Animal production
3004,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Crop and pasture production
3005,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Fisheries sciences
3006,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Food sciences
3007,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Forestry sciences
3008,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Horticultural production
3009,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Veterinary sciences
3099,AGRICULTURAL VETERINARY AND FOOD SCIENCES,Other agricultural veterinary and food sciences
3101,BIOLOGICAL SCIENCES,Biochemistry and cell biology
3102,BIOLOGICAL SCIENCES,Bioinformatics and computational biology
3103,BIOLOGICAL SCIENCES,Ecology
3104,BIOLOGICAL SCIENCES,Evolutionary biology
3105,BIOLOGICAL SCIENCES,Genetics
3106,BIOLOGICAL SCIENCES,Industrial biotechnology
3107,BIOLOGICAL SCIENCES,Microbiology
3108,BIOLOGICAL SCIENCES,Plant biology
3109,BIOLOGICAL SCIENCES,Zoology
3199,BIOLOGICAL SCIENCES,Other biological sciences
3201,BIOMEDICAL AND CLINICAL SCIENCES,Cardiovascular medicine and haematology
3202,BIOMEDICAL AND CLINICAL SCIENCES,Clinical sciences
3203,BIOMEDICAL AND CLINICAL SCIENCES,Dentistry
3204,BIOMEDICAL AND CLINICAL SCIENCES,Immunology
3205,BIOMEDICAL AND CLINICAL SCIENCES,Medical biochemistry and metabolomics
3206,BIOMEDICAL AND CLINICAL SCIENCES,Medical biotechnology
3207,BIOMEDICAL AND CLINICAL SCIENCES,Medical microbiology
3208,BIOMEDICAL AND CLINICAL SCIENCES,Medical physiology
3209,BIOMEDICAL AND CLINICAL SCIENCES,Neurosciences
3210,BIOMEDICAL AND CLINICAL SCIENCES,Nutrition and dietetics
3211,BIOMEDICAL AND CLINICAL SCIENCES,Oncology and carcinogenesis
3212,BIOMEDICAL AND CLINICAL SCIENCES,Ophthalmology and optometry
3213,BIOMEDICAL AND CLINICAL SCIENCES,Paediatrics
3214,BIOMEDICAL AND CLINICAL SCIENCES,Pharmacology and pharmaceutical sciences
3215,BIOMEDICAL AND CLINICAL SCIENCES,Reproductive medicine
3299,BIOMEDICAL AND CLINICAL SCIENCES,Other biomedical and clinical sciences
3301,BUILT ENVIRONMENT AND DESIGN,Architecture
3302,BUILT ENVIRONMENT AND DESIGN,Building
3303,BUILT ENVIRONMENT AND DESIGN,Design
3304,BUILT ENVIRONMENT AND DESIGN,Urban and regional planning
3399,BUILT ENVIRONMENT AND DESIGN,Other built environment and design
3401,CHEMICAL SCIENCES,Analytical chemistry
3402,CHEMICAL SCIENCES,Inorganic chemistry
3403,CHEMICAL SCIENCES,Macromolecular and materials chemistry
3404,CHEMICAL SCIENCES,Medicinal and biomolecular chemistry
3405,CHEMICAL SCIENCES,Organic chemistry
3406,CHEMICAL SCIENCES,Physical chemistry
3407,CHEMICAL SCIENCES,Theoretical and computational chemistry
3499,CHEMICAL SCIENCES,Other chemical sciences
3501,COMMERCE MANAGEMENT TOURISM AND SERVICES,Accounting, auditing and accountability
3502,COMMERCE MANAGEMENT TOURISM AND SERVICES,Banking, finance and investment
3503,COMMERCE MANAGEMENT TOURISM AND SERVICES,Business systems in context
3504,COMMERCE MANAGEMENT TOURISM AND SERVICES,Commercial services
3505,COMMERCE MANAGEMENT TOURISM AND SERVICES,Human resources and industrial relations
3506,COMMERCE MANAGEMENT TOURISM AND SERVICES,Marketing
3507,COMMERCE MANAGEMENT TOURISM AND SERVICES,Strategy, management and organisational behaviour
3508,COMMERCE MANAGEMENT TOURISM AND SERVICES,Tourism
3509,COMMERCE MANAGEMENT TOURISM AND SERVICES,Transportation, logistics and supply chains
3599,COMMERCE MANAGEMENT TOURISM AND SERVICES,Other commerce, management, tourism and services
3601,CREATIVE ARTS AND WRITING,Art history, theory and criticism
3602,CREATIVE ARTS AND WRITING,Creative and professional writing
3603,CREATIVE ARTS AND WRITING,Music
3604,CREATIVE ARTS AND WRITING,Performing arts
3605,CREATIVE ARTS AND WRITING,Screen and digital media
3606,CREATIVE ARTS AND WRITING,Visual arts
3699,CREATIVE ARTS AND WRITING,Other creative arts and writing
3701,EARTH SCIENCES,Atmospheric sciences
3702,EARTH SCIENCES,Climate change science
3703,EARTH SCIENCES,Geochemistry
3704,EARTH SCIENCES,Geoinformatics
3705,EARTH SCIENCES,Geology
3706,EARTH SCIENCES,Geophysics
3707,EARTH SCIENCES,Hydrology
3708,EARTH SCIENCES,Oceanography
3709,EARTH SCIENCES,Physical geography and environmental geoscience
3799,EARTH SCIENCES,Other earth sciences
3801,ECONOMICS,Applied economics
3802,ECONOMICS,Econometrics
3803,ECONOMICS,Economic theory
3899,ECONOMICS,Other economics
3901,EDUCATION,Curriculum and pedagogy
3902,EDUCATION,Education policy, sociology and philosophy
3903,EDUCATION,Education systems
3904,EDUCATION,Specialist studies in education
3999,EDUCATION,Other Education
4001,ENGINEERING,Aerospace engineering
4002,ENGINEERING,Automotive engineering
4003,ENGINEERING,Biomedical engineering
4004,ENGINEERING,Chemical engineering
4005,ENGINEERING,Civil engineering
4006,ENGINEERING,Communications engineering
4007,ENGINEERING,Control engineering, mechatronics and robotics
4008,ENGINEERING,Electrical engineering
4009,ENGINEERING,Electronics, sensors and digital hardware
4010,ENGINEERING,Engineering practice and education
4011,ENGINEERING,Environmental engineering
4012,ENGINEERING,Fluid mechanics and thermal engineering
4013,ENGINEERING,Geomatic engineering
4014,ENGINEERING,Manufacturing engineering
4015,ENGINEERING,Maritime engineering
4016,ENGINEERING,Materials engineering
4017,ENGINEERING,Mechanical engineering
4018,ENGINEERING,Nanotechnology
4019,ENGINEERING,Resources engineering and extractive metallurgy
4099,ENGINEERING,Other engineering
4101,ENVIRONMENTAL SCIENCES,Climate change impacts and adaptation
4102,ENVIRONMENTAL SCIENCES,Ecological applications
4103,ENVIRONMENTAL SCIENCES,Environmental biotechnology
4104,ENVIRONMENTAL SCIENCES,Environmental management
4105,ENVIRONMENTAL SCIENCES,Pollution and contamination
4106,ENVIRONMENTAL SCIENCES,Soil sciences
4199,ENVIRONMENTAL SCIENCES,Other environmental sciences
4201,HEALTH SCIENCES,Allied health and rehabilitation science
4202,HEALTH SCIENCES,Epidemiology
4203,HEALTH SCIENCES,Health services and systems
4204,HEALTH SCIENCES,Midwifery
4205,HEALTH SCIENCES,Nursing
4206,HEALTH SCIENCES,Public health
4207,HEALTH SCIENCES,Sports science and exercise
4208,HEALTH SCIENCES,Traditional, complementary and integrative medicine
4299,HEALTH SCIENCES,Other health sciences
4301,HISTORY HERITAGE AND ARCHAEOLOGY,Archaeology
4302,HISTORY HERITAGE AND ARCHAEOLOGY,Heritage, archive and museum studies
4303,HISTORY HERITAGE AND ARCHAEOLOGY,Historical studies
4399,HISTORY HERITAGE AND ARCHAEOLOGY,Other history, heritage and archaeology
4401,HUMAN SOCIETY,Anthropology
4402,HUMAN SOCIETY,Criminology
4403,HUMAN SOCIETY,Demography
4404,HUMAN SOCIETY,Development studies
4405,HUMAN SOCIETY,Gender studies
4406,HUMAN SOCIETY,Human geography
4407,HUMAN SOCIETY,Policy and administration
4408,HUMAN SOCIETY,Political science
4409,HUMAN SOCIETY,Social work
4410,HUMAN SOCIETY,Sociology
4499,HUMAN SOCIETY,Other human society
4501,INDIGENOUS STUDIES,Aboriginal and Torres Strait Islander culture, language and history
4502,INDIGENOUS STUDIES,Aboriginal and Torres Strait Islander education
4503,INDIGENOUS STUDIES,Aboriginal and Torres Strait Islander environmental knowledges and management
4504,INDIGENOUS STUDIES,Aboriginal and Torres Strait Islander health and wellbeing
4505,INDIGENOUS STUDIES,Aboriginal and Torres Strait Islander peoples, society and community
4506,INDIGENOUS STUDIES,Aboriginal and Torres Strait Islander sciences
4507,INDIGENOUS STUDIES,Te ahurea, reo me te hītori o te Māori (Māori culture, language and history)
4508,INDIGENOUS STUDIES,Mātauranga Māori (Māori education)
4509,INDIGENOUS STUDIES,Ngā mātauranga taiao o te Māori (Māori environmental knowledges)
4510,INDIGENOUS STUDIES,Te hauora me te oranga o te Māori (Māori health and wellbeing)
4511,INDIGENOUS STUDIES,Ngā tāngata, te porihanga me ngā hapori o te Māori (Māori peoples, society and community)
4512,INDIGENOUS STUDIES,Ngā pūtaiao Māori (Māori sciences)
4513,INDIGENOUS STUDIES,Pacific Peoples culture, language and history
4514,INDIGENOUS STUDIES,Pacific Peoples education
4515,INDIGENOUS STUDIES,Pacific Peoples environmental knowledges
4516,INDIGENOUS STUDIES,Pacific Peoples health and wellbeing
4517,INDIGENOUS STUDIES,Pacific Peoples sciences
4518,INDIGENOUS STUDIES,Pacific Peoples society and community
4519,INDIGENOUS STUDIES,Other Indigenous data, methodologies and global Indigenous studies
4599,INDIGENOUS STUDIES,Other Indigenous studies
4601,INFORMATION AND COMPUTING SCIENCES,Applied computing
4602,INFORMATION AND COMPUTING SCIENCES,Artificial intelligence
4603,INFORMATION AND COMPUTING SCIENCES,Computer vision and multimedia computation
4604,INFORMATION AND COMPUTING SCIENCES,Cybersecurity and privacy
4605,INFORMATION AND COMPUTING SCIENCES,Data management and data science
4606,INFORMATION AND COMPUTING SCIENCES,Distributed computing and systems software
4607,INFORMATION AND COMPUTING SCIENCES,Graphics, augmented reality and games
4608,INFORMATION AND COMPUTING SCIENCES,Human-centred computing
4609,INFORMATION AND COMPUTING SCIENCES,Information systems
4610,INFORMATION AND COMPUTING SCIENCES,Library and information studies 
4611,INFORMATION AND COMPUTING SCIENCES,Machine learning
4612,INFORMATION AND COMPUTING SCIENCES,Software engineering
4613,INFORMATION AND COMPUTING SCIENCES,Theory of computation
4699,INFORMATION AND COMPUTING SCIENCES,Other information and computing sciences
4701,LANGUAGE COMMUNICATION AND CULTURE,Communication and media studies
4702,LANGUAGE COMMUNICATION AND CULTURE,Cultural studies
4703,LANGUAGE COMMUNICATION AND CULTURE,Language studies
4704,LANGUAGE COMMUNICATION AND CULTURE,Linguistics
4705,LANGUAGE COMMUNICATION AND CULTURE,Literary studies
4799,LANGUAGE COMMUNICATION AND CULTURE,Other language, communication and culture
4801,LAW AND LEGAL STUDIES,Commercial law
4802,LAW AND LEGAL STUDIES,Environmental and resources law
4803,LAW AND LEGAL STUDIES,International and comparative law
4804,LAW AND LEGAL STUDIES,Law in context
4805,LAW AND LEGAL STUDIES,Legal systems
4806,LAW AND LEGAL STUDIES,Private law and civil obligations
4807,LAW AND LEGAL STUDIES,Public law
4899,LAW AND LEGAL STUDIES,Other law and legal studies
4901,MATHEMATICAL SCIENCES,Applied mathematics
4902,MATHEMATICAL SCIENCES,Mathematical physics
4903,MATHEMATICAL SCIENCES,Numerical and computational mathematics
4904,MATHEMATICAL SCIENCES,Pure mathematics
4905,MATHEMATICAL SCIENCES,Statistics
4999,MATHEMATICAL SCIENCES,Other mathematical sciences
5001,PHILOSOPHY AND RELIGIOUS STUDIES,Applied ethics
5002,PHILOSOPHY AND RELIGIOUS STUDIES,History and philosophy of specific fields
5003,PHILOSOPHY AND RELIGIOUS STUDIES,Philosophy
5004,PHILOSOPHY AND RELIGIOUS STUDIES,Religious studies
5005,PHILOSOPHY AND RELIGIOUS STUDIES,Theology
5099,PHILOSOPHY AND RELIGIOUS STUDIES,Other philosophy and religious studies
5101,PHYSICAL SCIENCES,Astronomical sciences
5102,PHYSICAL SCIENCES,Atomic, molecular and optical physics
5103,PHYSICAL SCIENCES,Classical physics
5104,PHYSICAL SCIENCES,Condensed matter physics
5105,PHYSICAL SCIENCES,Medical and biological physics
5106,PHYSICAL SCIENCES,Nuclear and plasma physics
5107,PHYSICAL SCIENCES,Particle and high energy physics
5108,PHYSICAL SCIENCES,Quantum physics
5109,PHYSICAL SCIENCES,Space sciences
5110,PHYSICAL SCIENCES,Synchrotrons and accelerators
5199,PHYSICAL SCIENCES,Other physical sciences
5201,PSYCHOLOGY,Applied and developmental psychology
5202,PSYCHOLOGY,Biological psychology 
5203,PSYCHOLOGY,Clinical and health psychology
5204,PSYCHOLOGY,Cognitive and computational psychology
5205,PSYCHOLOGY,Social and personality psychology
5299,PSYCHOLOGY,Other psychology 
""" # Update prompt for FOR Code detection here if needed

In [None]:
# Prompt - Affiliation Detection Assistant
name_affiliation_detector = "affiliation_detection_assistant"
instructions_affiliation_detector = """You are an assistant that reviews research papers and detects affiliations to #university# University.

Review the provided research paper in your vector store and detect any mention of #university# University (including variations). Do no include watermark or copyright information. Watermarks are rare. For each author, extract their full name and organisation regardless of their affiliation to #university# University. Note that #university# may be listed in acknowledgments rather than author affiliations so ensure you detect that as well. There are many variations in how #university# is written (e.g., #misspeltuninames#, etc.) so ensure all variations are included. Note that the affiliation may appear in different parts of the paper. Do not include any other information in your findings such as acknowledgements and funding information.

Provide a summary of your findings.""" # Update prompt for affiliation detection here if needed

In [None]:
# Prompt - Funding Detection Assistant
name_funding_detector = "funding_detection_assistant"
instructions_funding_detector = """You are an assistant that reviews research papers and detects information on funding.

Review the provided research paper in your vector store and detect any mention of funding-related information. Identify and extract funding information, specifically funding body names and grant numbers. Note that this information is often located in different sections of papers (acknowledgments, funding sections, footnotes, etc.). Do not note researcher names alongside the funding information. Be very specific about ensuring your findings relate to funding for the research.

Provide a summary of your findings.""" # Update prompt for funding detection here if needed

In [None]:
# Prompt - Summariser Assistant
name_pubanalysis_summariser = "pubanalysis_summariser_assistant"
instructions_pubanalysis_summariser_assistant = """You are an assistant that is an expert at analysing research papers. 

Use the information available in this thread from the forcode_detection_assistant, affiliation_detection_assistant and funding_detection_assistant and write a summary report.

Ensure the summary only captures the findings from the analysis conducted by the three assistants.""" # Update prompt for generating the report here if needed

In [None]:
# Create Assistants
forcode_detection_assistant = assistant_client.beta.assistants.create(
    name=name_forcode_detector, instructions=instructions_forcode_detector, model=assistant_deployment_name, tools=[{"type":"file_search"}], temperature=0.01
)

affiliation_detection_assistant = assistant_client.beta.assistants.create(
    name=name_affiliation_detector, instructions=instructions_affiliation_detector, model=assistant_deployment_name, tools=[{"type":"file_search"}], temperature=0.01
)

funding_detection_assistant = assistant_client.beta.assistants.create(
    name=name_funding_detector, instructions=instructions_funding_detector, model=assistant_deployment_name, tools=[{"type":"file_search"}], temperature=0.01
)

pubanalysis_summariser_assistant = assistant_client.beta.assistants.create(
    name=name_pubanalysis_summariser, instructions=instructions_pubanalysis_summariser_assistant, model=assistant_deployment_name, temperature=0.01
)

In [None]:
# Create Vector Store
vector_store = assistant_client.beta.vector_stores.create(name="Research Publication")

file_paths = [""] # Add file path for research publication here
file_streams = [open(path, "rb") for path in file_paths]

file_batch = assistant_client.beta.vector_stores.file_batches.upload_and_poll(
    vector_store_id=vector_store.id, files=file_streams
    )

In [None]:
# Attach Vector Store to Assistants
forcode_detection_assistant = assistant_client.beta.assistants.update(
    assistant_id=forcode_detection_assistant.id,
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
      )

affiliation_detection_assistant = assistant_client.beta.assistants.update(
    assistant_id=affiliation_detection_assistant.id,
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
      )

funding_detection_assistant = assistant_client.beta.assistants.update(
    assistant_id=funding_detection_assistant.id,
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}
      )

In [None]:
# Create Thread
thread = assistant_client.beta.threads.create()

message = assistant_client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Analyse."
)

In [None]:
# Function for Running Assistants
def run_assistant(assistant):
  run = assistant_client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
  )

  while run.status in ['queued', 'in_progress', 'cancelling']:
    time.sleep(1)
    run = assistant_client.beta.threads.runs.retrieve(
      thread_id=thread.id,
      run_id=run.id
    )

  if run.status == 'completed':
    messages = assistant_client.beta.threads.messages.list(
      thread_id=thread.id, 
      run_id=run.id
    )
  elif run.status == 'requires_action':
    pass
  else:
    print(run.status)

  return messages.data

In [None]:
# Function for Printing Analysis
def print_analysis(analysis):
    for msg in reversed(analysis):
        for content_item in msg.content:
            print(content_item.text.value)

In [None]:
# Run FOR Code Detection Assistant
analysis = run_assistant(forcode_detection_assistant)
print_analysis(analysis)

In [None]:
# Run Affiliation Detection Assistant
analysis = run_assistant(affiliation_detection_assistant)
print_analysis(analysis)

In [None]:
# Run Funding Detection Assistant
analysis = run_assistant(funding_detection_assistant)
print_analysis(analysis)

In [None]:
# Run Summariser Assistant
analysis = run_assistant(pubanalysis_summariser_assistant)
print_analysis(analysis)

In [None]:
# Clean-up
response = assistant_client.beta.assistants.delete(forcode_detection_assistant.id)
response = assistant_client.beta.assistants.delete(affiliation_detection_assistant.id)
response = assistant_client.beta.assistants.delete(funding_detection_assistant.id)
response = assistant_client.beta.assistants.delete(pubanalysis_summariser_assistant.id)