# **Annotate an material and method section about molecular dynamics üìù**

### üéØ Objectives:
- Annotate the Materials and Methods section of a molecular dynamics study.
- Identify key experimental procedures, parameters, and computational tools like MOLECULE (`MOL`), FORCEFIELD (`FFM`), SIMULATION_TIME (`STIME`), TEMPERATURE (`TEMP`), SOFTWARE NAME (`SOFTNAME`) and SOFTWARE VERSION (`SOFTVERS`).
- Generate a structured JSON output suitable for downstream analysis.
- Visualize the annotation results for clarity and verification.


## Load libraries

In [1]:
import sys
from pathlib import Path

sys.path.append("..")
from src.pydantic_output_models import ListOfEntities, ListOfEntitiesPositions
from src.utils import annotate, assign_all_instructor_clients, vizualize_llm_annotation

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
%load_ext watermark
%watermark

Last updated: 2025-12-04T13:24:28.296947+01:00

Python implementation: CPython
Python version       : 3.13.7
IPython version      : 8.13.2

Compiler    : GCC 14.3.0
OS          : Linux
Release     : 6.14.0-35-generic
Machine     : x86_64
Processor   : x86_64
CPU cores   : 32
Architecture: 64bit



## Step 1: Define the text to annotate


In [3]:
TEXT_TO_ANNOTATE = """Molecular Dynamics Simulations

MD simulations were performed using GROMACS 4.0.7 software [17] with the OPLS-AA
force-field [18].L33 and P33 forms of Œ≤3 were soaked in a rhombic dodecahedral
simulation box with 60,622 TIP3P water molecules and 28 Na+ ions.
The distance between any atom of the protein and the box edges was set to at least 10 √Ö.
The total energy of the system was minimized twice (before and after the addition of the
ions) with a steepest descent algorithm. MD simulations were run under the NPT
thermodynamic ensemble and periodic boundary conditions were applied in all directions.
We used the weak coupling algorithm of Berendsen [19] to maintain the system at a
constant physiological temperature of 310 K using a coupling constant of 0.1 ps (protein
and water ions separately). Pressure was held constant using the Berendsen algorithm
[19] at 1 atm with a coupling constant of 1 ps. Water molecules were kept rigid using
the SETTLE algorithm [20]. All other bond lengths were constrained with the LINCS
algorithm [21], allowing a 2 fs time step. We used a short-range coulombic and van der
Waals cut-off of 10 √Ö and calculated the long-range electrostatic interactions using
the smooth particle mesh Ewald (PME) algorithm [22], [23] with a grid spacing of 1.2 √Ö
and an interpolation order of 4. The neighbor list was updated every 10 steps. After a
1 ns equilibration (with position restraints on the protein), each system was simulated
for 50 ns. For both systems, five 50 ns simulations were performed (using different
initial velocities) and one was extended until 100 ns for a total simulation time of
300 ns. Molecular conformations were saved every 100 ps for further analysis.
"""

## Step 2: Select the model

In [4]:
# Choices:
MODELS_OPENROUTER = [
    "openai/gpt-5",
    "openai/gpt-4o",
    "openai/gpt-oss-120b",
    "meta-llama/llama-4-maverick",
    "moonshotai/kimi-k2-thinking",
    "google/gemini-3-pro-preview",
    "qwen/qwen-2.5-72b-instruct",
    "deepseek/deepseek-chat-v3-0324",
    "allenai/olmo-3-32b-think"
]

# We will use "openai/gpt-5" for annotation
SELECTED_MODEL = "openai/gpt-5"
instructor_clients = assign_all_instructor_clients(MODELS_OPENROUTER)
llm_client = instructor_clients[SELECTED_MODEL]

## Step 3: Annotate the text using JSON output for structured annotation

In [5]:
response = annotate(
    TEXT_TO_ANNOTATE,
    SELECTED_MODEL,
    llm_client,
    "json",
    prompt_json_path=Path("../prompts/json_few_shot.txt"),
    prompt_positions_path=Path("../prompts/json_with_positions_few_shot.txt")
)

response

ListOfEntities(entities=[Molecule(label='SOFTNAME', text='GROMACS'), Molecule(label='SOFTVERS', text='4.0.7'), Molecule(label='FFM', text='OPLS-AA'), Molecule(label='MOL', text='L33'), Molecule(label='MOL', text='P33'), Molecule(label='MOL', text='Œ≤3'), Molecule(label='FFM', text='TIP3P'), Molecule(label='MOL', text='water'), Molecule(label='MOL', text='Na+'), Molecule(label='TEMP', text='310 K'), Molecule(label='SOFTNAME', text='SETTLE'), Molecule(label='SOFTNAME', text='LINCS'), Molecule(label='STIME', text='1 ns'), Molecule(label='STIME', text='50 ns'), Molecule(label='STIME', text='50 ns'), Molecule(label='STIME', text='100 ns'), Molecule(label='STIME', text='300 ns')])

## Step 4: Visualize the annotation results

In [6]:
vizualize_llm_annotation(response, TEXT_TO_ANNOTATE)

üßê VISUALIZATION OF ENTITIES 



