From a49969591407c21714c8a6f15e18bc4b697c316d Mon Sep 17 00:00:00 2001 From: Amruthesh Thirumalaiswamy Date: Fri, 10 Oct 2025 01:36:53 -0700 Subject: [PATCH 1/4] Add: IMD-documentation and tutorial-example --- doc/source/examples/other/README.rst | 1 + doc/source/examples/other/streaming_imd.ipynb | 325 +++++++++++++++++ doc/source/formats/reference/imd.rst | 330 ++++++++++++++++++ 3 files changed, 656 insertions(+) create mode 100644 doc/source/examples/other/streaming_imd.ipynb create mode 100644 doc/source/formats/reference/imd.rst diff --git a/doc/source/examples/other/README.rst b/doc/source/examples/other/README.rst index 84ab1fb0b..05edc9af1 100644 --- a/doc/source/examples/other/README.rst +++ b/doc/source/examples/other/README.rst @@ -9,3 +9,4 @@ Other :maxdepth: 1 parmed_sim + streaming_imd diff --git a/doc/source/examples/other/streaming_imd.ipynb b/doc/source/examples/other/streaming_imd.ipynb new file mode 100644 index 000000000..fd574e06f --- /dev/null +++ b/doc/source/examples/other/streaming_imd.ipynb @@ -0,0 +1,325 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d090b35a", + "metadata": {}, + "source": [ + "# Real-time Streaming Analysis with IMDv3\n", + "\n", + "This tutorial demonstrates how to use MDAnalysis for real-time streaming analysis of molecular dynamics simulations using the Interactive Molecular Dynamics (IMD) v3 protocol. You'll learn how to connect to running simulations and perform live analysis as the simulation progresses.\n", + "\n", + "**Streaming** involves processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means sending simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk.\n", + "\n", + "This is achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol.\n", + "\n", + "## What it covers\n", + "\n", + "- How to set up streaming connections to MD engines\n", + "- Real-time monitoring\n", + "- Live analysis workflows\n", + "\n", + "## Prerequisites\n", + "\n", + "Before starting, you'll need:\n", + "- MDAnalysis with IMD support\n", + "- The `imdclient` package (≥ 0.2.2)\n", + "- A running MD simulation with IMD enabled (examples are engine agnostic for the most part)" + ] + }, + { + "cell_type": "markdown", + "id": "e2168297", + "metadata": {}, + "source": [ + "## Installation and Setup\n", + "\n", + "The IMDReader requires the `imdclient` package. Let's check if everything is properly installed:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6024bb10", + "metadata": {}, + "outputs": [], + "source": [ + "# Install required packages (uncomment if needed)\n", + "# !pip install imdclient>=0.2.2\n", + "\n", + "import warnings\n", + "warnings.filterwarnings('ignore')\n", + "\n", + "import MDAnalysis as mda\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from datetime import datetime\n", + "import time\n", + "\n", + "# Check if IMD support is available\n", + "try:\n", + " from MDAnalysis.coordinates.IMD import IMDReader, HAS_IMDCLIENT\n", + " print(f\"IMD support available: {HAS_IMDCLIENT}\")\n", + " if HAS_IMDCLIENT:\n", + " import imdclient\n", + " print(f\"imdclient version: {imdclient.__version__}\")\n", + " print(\"✅ Ready for streaming analysis!\")\n", + " else:\n", + " print(\"❌ IMD support not available\")\n", + "except ImportError as e:\n", + " print(f\"❌ IMD support not available: {e}\")\n", + " print(\"Please install imdclient: pip install imdclient>=0.2.2\")" + ] + }, + { + "cell_type": "markdown", + "id": "e698ca2d", + "metadata": {}, + "source": [ + "## Setting Up a Simulation with IMD\n", + "\n", + "Before we can demonstrate streaming analysis, we need a simulation running with IMD enabled. Here are configuration examples for different MD engines:\n", + "\n", + "### GROMACS Setup\n", + "\n", + "Add these comprehensive IMD settings to your `.mdp` file:\n", + "```code\n", + "; IMD settings for v3 protocol\n", + "IMD-group = System ; Group to stream (typically System)\n", + "IMD-version = 3 ; Use IMDv3 protocol (required for MDAnalysis)\n", + "IMD-nst = 1 ; Frequency of data transmission (every step)\n", + "IMD-time = No ; Send time information\n", + "IMD-coords = Yes ; Send atomic coordinates (essential)\n", + "IMD-vels = No ; Send velocities (optional)\n", + "IMD-forces = No ; Send forces (optional)\n", + "IMD-box = No ; Send box dimensions (optional)\n", + "IMD-unwrap = No ; Unwrap coordinates across PBC\n", + "IMD-energies = No ; Send energy information (optional)\n", + "```\n", + "\n", + "Run the simulation:\n", + "```bash\n", + "gmx mdrun -v -nt 4 -imdwait -imdport 8889\n", + "```\n", + "\n", + "### LAMMPS Setup\n", + "\n", + "Use the comprehensive IMD fix in your input script:\n", + "```code\n", + "# IMD setup - full parameter specification\n", + "fix ID group-ID imd trate version 3 unwrap time box coordinates velocities forces \n", + "\n", + "# Example with specific values:\n", + "fix imd all imd 8889 trate 1 version 3 unwrap on time on box on coordinates on velocities on forces on\n", + "```\n", + "\n", + "Run your LAMMPS simulation as usual.\n", + "\n", + "### NAMD Setup\n", + "\n", + "Add comprehensive IMD configuration to your NAMD configuration file:\n", + "```code\n", + "# IMD Settings\n", + "IMDon yes\n", + "IMDport 8889 # Must match client port\n", + "IMDwait yes # Wait for client connection\n", + "IMDfreq 1 # Frequency of sending data\n", + "\n", + "# Data transmission settings\n", + "IMDsendPositions yes # Send atomic coordinates\n", + "IMDsendEnergies yes # Send energy information\n", + "IMDsendTime yes # Send timing data\n", + "IMDsendBoxDimensions yes # Send simulation box info\n", + "IMDsendVelocities yes # Send atomic velocities\n", + "IMDsendForces yes # Send atomic forces\n", + "IMDwrapPositions no # Don't wrap coordinates\n", + "```\n", + "\n", + "Run your NAMD simulation as usual." + ] + }, + { + "cell_type": "markdown", + "id": "47c27d17", + "metadata": {}, + "source": [ + "## Example 1: Simple IMD Analysis\n", + "\n", + "This example shows the basics of connecting to a live simulation and performing simple analysis." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d9181c5c", + "metadata": {}, + "outputs": [], + "source": [ + "# Simple Example: Basic IMD Connection and Analysis\n", + "\n", + "import MDAnalysis as mda\n", + "import numpy as np\n", + "\n", + "def simple_imd_analysis():\n", + " \"\"\"\n", + " Simple example of connecting to a live simulation and analyzing it.\n", + " \n", + " This assumes you have GROMACS running with IMD enabled on localhost:8889\n", + " \"\"\"\n", + " try:\n", + " # Connect to the live simulation\n", + " u = mda.Universe(\"topol.tpr\", \"imd://localhost:8889\")\n", + " \n", + " print(f\"Connected to simulation: {u.atoms.n_atoms} atoms\")\n", + " \n", + " # Basic analysis on streaming frames\n", + " protein = u.select_atoms(\"protein\")\n", + " \n", + " frame_count = 0\n", + " for ts in u.trajectory:\n", + " # Calculate radius of gyration\n", + " rg = protein.radius_of_gyration()\n", + " \n", + " print(f\"Frame {frame_count}: Time = {ts.time:.1f} ps, Rg = {rg:.2f} Å\")\n", + " \n", + " frame_count += 1\n", + " \n", + " # Stop after 10 frames for this simple example\n", + " if frame_count >= 10:\n", + " break\n", + " \n", + " print(\"Simple analysis completed!\")\n", + " \n", + " except Exception as e:\n", + " print(f\"Connection failed: {e}\")\n", + " print(\"Make sure your simulation is running with IMD enabled\")\n", + " print(\"Example: gmx mdrun -s topol.tpr -imdport 8889 -imdwait\")\n", + "\n", + "# To run this example, uncomment the line below:\n", + "# simple_imd_analysis()" + ] + }, + { + "cell_type": "markdown", + "id": "43179ebe", + "metadata": {}, + "source": [ + "## Example 2: Advanced Live Streaming with Real-time Plotting\n", + "\n", + "This example demonstrates continuous monitoring with live plots that update as the simulation runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "30bafb63", + "metadata": {}, + "outputs": [], + "source": [ + "# Advanced Example: Live Streaming Analysis with Real-time Plotting\n", + "\n", + "import MDAnalysis as mda\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from collections import deque\n", + "import time\n", + "\n", + "def advanced_live_streaming():\n", + " \"\"\"\n", + " Advanced example with live plotting and continuous monitoring.\n", + " \n", + " This connects to your simulation and creates real-time plots.\n", + " \"\"\"\n", + " try:\n", + " # Connect to the simulation\n", + " u = mda.Universe(\"system.tpr\", \"imd://localhost:8889\")\n", + " \n", + " print(f\"Starting live analysis of {u.atoms.n_atoms} atoms\")\n", + " \n", + " # Setup for real-time plotting\n", + " plt.ion() # Interactive mode\n", + " fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))\n", + " \n", + " # Data storage for plotting\n", + " times = deque(maxlen=100) # Keep last 100 points\n", + " rg_values = deque(maxlen=100)\n", + " energies = deque(maxlen=100)\n", + " \n", + " protein = u.select_atoms(\"protein\")\n", + " \n", + " frame_count = 0\n", + " for ts in u.trajectory:\n", + " current_time = ts.time\n", + " \n", + " # Calculate properties\n", + " rg = protein.radius_of_gyration()\n", + " \n", + " # Simulate potential energy (real simulations would have this)\n", + " # In real IMD, you might get this from the simulation engine\n", + " potential_energy = -1000 + np.random.normal(0, 50)\n", + " \n", + " # Store data\n", + " times.append(current_time)\n", + " rg_values.append(rg)\n", + " energies.append(potential_energy)\n", + " \n", + " # Update plots every 5 frames\n", + " if frame_count % 5 == 0:\n", + " # Clear and replot\n", + " ax1.clear()\n", + " ax2.clear()\n", + " \n", + " # Plot radius of gyration\n", + " ax1.plot(list(times), list(rg_values), 'b-', linewidth=2)\n", + " ax1.set_xlabel('Time (ps)')\n", + " ax1.set_ylabel('Radius of Gyration (Å)')\n", + " ax1.set_title('Real-time Rg Evolution')\n", + " ax1.grid(True)\n", + " \n", + " # Plot energy\n", + " ax2.plot(list(times), list(energies), 'r-', linewidth=2)\n", + " ax2.set_xlabel('Time (ps)')\n", + " ax2.set_ylabel('Potential Energy (kJ/mol)')\n", + " ax2.set_title('Real-time Energy Evolution')\n", + " ax2.grid(True)\n", + " \n", + " plt.tight_layout()\n", + " plt.draw()\n", + " plt.pause(0.01) # Small pause to update display\n", + " \n", + " # Print status\n", + " print(f\"Frame {frame_count}: Time = {current_time:.1f} ps, \"\n", + " f\"Rg = {rg:.2f} Å, Energy = {potential_energy:.1f} kJ/mol\")\n", + " \n", + " frame_count += 1\n", + " \n", + " # Run for 50 frames in this example\n", + " if frame_count >= 50:\n", + " break\n", + " \n", + " print(\"Live streaming analysis completed!\")\n", + " plt.ioff() # Turn off interactive mode\n", + " plt.show()\n", + " \n", + " except KeyboardInterrupt:\n", + " print(\"\\nAnalysis stopped by user\")\n", + " plt.ioff()\n", + " \n", + " except Exception as e:\n", + " print(f\"Error during live analysis: {e}\")\n", + " print(\"Make sure your simulation is running with IMD enabled\")\n", + " plt.ioff()\n", + "\n", + "# To run this advanced example, uncomment the line below:\n", + "# advanced_live_streaming()" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/source/formats/reference/imd.rst b/doc/source/formats/reference/imd.rst new file mode 100644 index 000000000..0e4f2f94d --- /dev/null +++ b/doc/source/formats/reference/imd.rst @@ -0,0 +1,330 @@ +.. -*- coding: utf-8 -*- +.. _IMD-format: + +================================================================== +IMD (Data streamed via Interactive Molecular Dynamics protocol v3) +================================================================== + +.. include:: classes/IMD.txt + +Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data. + +.. note:: + MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and allows gaps in the data stream. + +What is Streaming? +================== + +Streaming involves processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means sending simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk. + +This can be achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol. + +MDAnalysis's :class:`~MDAnalysis.coordinates.IMD.IMDReader` uses the `imdclient `_ package and provides a familiar interface for reading streaming data, similar to other trajectory readers in MDAnalysis. + +When to Use Streaming? +====================== + +Streaming analysis is particularly valuable for: + +**Long-running simulations** + Early detection of problems (crashes, artifacts, equilibration issues) can save computational resources. + +**Adaptive sampling workflows** + Real-time analysis can guide simulation parameters or trigger enhanced sampling methods. + +**Interactive research** + Immediate feedback allows researchers to make informed decisions about continuing, modifying, or terminating simulations. + +**Storage-constrained environments** + Analyze data as it's generated without storing large trajectory files. + +Installation and Setup +====================== + +Required Dependencies +--------------------- + +The IMDReader requires the ``imdclient`` package: + +.. code-block:: bash + + pip install imdclient + +.. note:: + MDAnalysis requires ``imdclient >= 0.2.2`` for its current implementation. + +MD Engine Configuration +----------------------- + +We provide below example configurations for enabling IMDv3 streaming in popular MD engines. + +**GROMACS** + +Add IMD settings to your ``.mdp`` file: + +.. code-block:: text + + ; IMD settings + IMD-group = System + IMD-version = 3 + IMD-nst = 1 + IMD-time = No + IMD-coords = Yes + IMD-vels = No + IMD-forces = No + IMD-box = No + IMD-unwrap = No + IMD-energies = No + + + +Run with IMD enabled: + +.. code-block:: bash + + gmx mdrun -v -nt 4 -imdwait -imdport 8889 + +**LAMMPS** + +Use the IMD fix in your input script: + +.. code-block:: text + + # IMD setup + fix ID group-ID imd trate version 3 unwrap time box coordinates velocities forces + +Run your LAMMPS simulation as usual. + +**NAMD** + +Add IMD configuration to your NAMD configuration file: + +.. code-block:: text + + # IMD Settings + IMDon yes + IMDport + IMDwait + IMDfreq + + IMDsendPositions + IMDsendEnergies + IMDsendTime + IMDsendBoxDimensions + IMDsendVelocities + IMDsendForces + IMDwrapPositions + +Run your NAMD simulation as usual. + +.. seealso:: + For detailed engine-specific setup instructions, see the `imdclient simulation engine documentation `_. + +Basic Usage +=========== + +Connecting to a Running Simulation +----------------------------------- + +Once your simulation is running with IMD enabled: + +.. code-block:: python + + import MDAnalysis as mda + + # Connect to the simulation + u = mda.Universe("topol.tpr", "imd://localhost:8889", buffer_size=10*1024*1024) + + # Streaming analysis loop + for ts in u.trajectory: + print(f"Time: {ts.time:.2f} ps, Step: {ts.data.get('step', 'N/A')}") + + # Your analysis code here + selected_atoms = u.select_atoms("protein and name CA") + center_of_mass = selected_atoms.center_of_mass() + print(f"Protein COM: {center_of_mass}") + + # Optional: break on some condition + if ts.time > 1000: # Stop after 1000 ps + break + +Real-time Quality Control +------------------------- + +Monitor simulation health in real-time: + +.. code-block:: python + + import MDAnalysis as mda + import numpy as np + + u = mda.Universe("system.tpr", "imd://localhost:8889") + + previous_positions = None + + for ts in u.trajectory: + current_positions = u.atoms.positions.copy() + + # Check for simulation artifacts + if previous_positions is not None: + displacement = np.linalg.norm(current_positions - previous_positions, axis=1) + max_displacement = np.max(displacement) + + if max_displacement > 10.0: # Atoms moved > 10 Å in one step + print(f"WARNING: Large displacement detected at {ts.time} ps: {max_displacement:.2f} Å") + + # Monitor energies if available + if 'potential' in ts.data: + print(f"Potential energy: {ts.data['potential']:.2f}") + + previous_positions = current_positions + +Advanced Features +================= + +Buffer Management +----------------- + +For compute-intensive analysis, increase the buffer size to reduce communication overhead: + +.. code-block:: python + + # Larger buffer for better performance + u = mda.Universe("topol.tpr", "imd://localhost:8889", buffer_size=50*1024*1024) + +Connection Management +--------------------- + +Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks et al.: + +.. code-block:: python + + import MDAnalysis as mda + + try: + u = mda.Universe("topol.tpr", "imd://localhost:8889") + + for ts in u.trajectory: + # Your analysis here + pass + + except Exception as e: + print(f"Error during streaming: {e}") + finally: + # Always close the connection + u.trajectory.close() + +Available Data +-------------- + +The IMDReader provides access to additional simulation data through ``ts.data``: + +* ``dt``: Time step size in picoseconds +* ``step``: Current simulation step number +* Energy terms: ``potential``, ``total``, etc. (engine-dependent) + +.. code-block:: python + + for ts in u.trajectory: + print(f"Step {ts.data.get('step')}: dt={ts.data.get('dt')} ps") + + # Available energy terms vary by MD engine + for key, value in ts.data.items(): + if key not in ['dt', 'step']: + print(f" {key}: {value}") + +Multiple Client Connections +=========================== + +The ability to connect multiple clients to the same IMD port depends on the MD engine implementation: + +* **GROMACS**: Typically supports single client connections +* **LAMMPS**: May support multiple clients (version-dependent) +* **NAMD**: Supports multiple clients + +.. important:: + Even when multiple connections are supported, each receives an independent data stream. Different clients may receive different data depending on the engine configuration. + +Integration with MDAnalysis Tools +================================= + +Most MDAnalysis analysis classes work with streaming data, but some limitations apply: + +**Compatible Analysis** + +.. code-block:: python + + from MDAnalysis.analysis import distances, contacts + + u = mda.Universe("system.tpr", "imd://localhost:8889") + + for ts in u.trajectory: + # Distance calculations work normally + protein = u.select_atoms("protein") + rg = protein.radius_of_gyration() + + # Contact analysis + selection1 = u.select_atoms("resid 1-10") + selection2 = u.select_atoms("resid 50-60") + dist_array = distances.distance_array(selection1.positions, selection2.positions) + +**Limitations with Streaming** + +Some analysis methods require the complete trajectory and won't work with streaming: + +.. code-block:: python + + # These will NOT work with streaming: + # - trajectory.timeseries() + # - Most analysis classes that need multiple passes + # - Random frame access (trajectory[10]) + # - Backward iteration + +Important Limitations +===================== + +Streaming analysis has fundamental constraints due to its real-time nature: + +**Data Access Limitations** + +* **No random access**: Cannot jump to arbitrary frames or seek backwards +* **Forward-only**: Can only iterate through frames as they arrive +* **Single-use**: Cannot restart iteration once the stream is consumed +* **No trajectory length**: Total frame count unknown until simulation ends +* **No independent copies**: Cannot create multiple reader instances for the same stream + +**Analysis Constraints** + +* **No timeseries methods**: Cannot use ``trajectory.timeseries()`` +* **No bulk operations**: Cannot extract all data at once +* **Limited multiprocessing**: Cannot split across processes +* **Single client**: Only one reader per IMD stream (engine-dependent) + +**Practical Considerations** + +.. code-block:: python + + # This WILL work - forward iteration + for ts in u.trajectory: + analysis_data.append(calculate_something(ts)) + + # This will NOT work - random access + ts = u.trajectory[10] # TypeError + + # This will NOT work - backwards iteration + for ts in u.trajectory[::-1]: # ValueError + pass + + # This will NOT work - restarting iteration + for ts in u.trajectory: + break + for ts in u.trajectory: # Won't start from beginning + pass + +See Also +======== + +* :class:`~MDAnalysis.coordinates.IMD.IMDReader` - Technical API documentation +* :class:`~MDAnalysis.coordinates.base.StreamReaderBase` - Base class for streaming readers +* `imdclient documentation `_ - Complete imdclient package documentation +* `IMDv3 protocol specification `_ - Technical protocol details \ No newline at end of file From 4829f19301d400dbd1b1693e426f1237ad054c71 Mon Sep 17 00:00:00 2001 From: Oliver Beckstein Date: Wed, 15 Oct 2025 17:23:27 -0700 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: ljwoods2 <145226270+ljwoods2@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- doc/source/formats/reference/imd.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/source/formats/reference/imd.rst b/doc/source/formats/reference/imd.rst index 0e4f2f94d..f39d8445b 100644 --- a/doc/source/formats/reference/imd.rst +++ b/doc/source/formats/reference/imd.rst @@ -7,17 +7,17 @@ IMD (Data streamed via Interactive Molecular Dynamics protocol v3) .. include:: classes/IMD.txt -Real-time streaming of simulation data between molecular dynamics engines and receiving clients can be achieved using IMD protocols like IMDv2 and IMDv3. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol, enabling live streaming of ongoing simulation data. +IMDv2 and IMDv3 enable real-time streaming of simulation data between molecular dynamics engines and receiving clients. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol. .. note:: - MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and allows gaps in the data stream. + MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and doesn't enforce a consistent number of integration steps between transmitted frames What is Streaming? ================== Streaming involves processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means sending simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk. -This can be achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information using the IMDv3 protocol. +In IMDv3, this is achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information. MDAnalysis's :class:`~MDAnalysis.coordinates.IMD.IMDReader` uses the `imdclient `_ package and provides a familiar interface for reading streaming data, similar to other trajectory readers in MDAnalysis. @@ -56,7 +56,7 @@ The IMDReader requires the ``imdclient`` package: MD Engine Configuration ----------------------- -We provide below example configurations for enabling IMDv3 streaming in popular MD engines. +We provide example configurations below for enabling IMDv3 streaming in popular MD engines. **GROMACS** @@ -195,7 +195,7 @@ For compute-intensive analysis, increase the buffer size to reduce communication Connection Management --------------------- -Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks et al.: +Always ensure proper cleanup, especially in interactive environments like Jupyter notebooks and other interactive environments: .. code-block:: python From 5ca8a6d46294ac18826e3c43ece7508419cfbe07 Mon Sep 17 00:00:00 2001 From: Amruthesh Thirumalaiswamy Date: Fri, 17 Oct 2025 00:11:06 -0700 Subject: [PATCH 3/4] DocUpdate: `imd.rst` -Changes to the docs based on comments --- doc/source/formats/reference/imd.rst | 301 ++++++++++++++++++--------- 1 file changed, 202 insertions(+), 99 deletions(-) diff --git a/doc/source/formats/reference/imd.rst b/doc/source/formats/reference/imd.rst index f39d8445b..9e1af9fa2 100644 --- a/doc/source/formats/reference/imd.rst +++ b/doc/source/formats/reference/imd.rst @@ -7,19 +7,34 @@ IMD (Data streamed via Interactive Molecular Dynamics protocol v3) .. include:: classes/IMD.txt -IMDv2 and IMDv3 enable real-time streaming of simulation data between molecular dynamics engines and receiving clients. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` implements the IMDv3 protocol. +**Interactive Molecular Dynamics (IMD)** is a data streaming utility that enables transfer of live simulation data (like coordinates, energies, forces) from molecular dynamics engines to receiving clients while the simulation is running. The :class:`~MDAnalysis.coordinates.IMD.IMDReader` is a reader class for accessing IMD live-streamed data by connecting to a live TCP/IP connection. + +IMD is particularly useful for long-running simulations, adaptive sampling workflows, real-time quality control, and storage-constrained environments where immediate analysis is needed without writing large trajectory files. More details are provided below. .. note:: - MDAnalysis supports **IMDv3 only**, which provides continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and doesn't enforce a consistent number of integration steps between transmitted frames + MDAnalysis supports **IMDv3 only**, which uses the IMDv3 protocol to provide continuous, gap-free streaming and is implemented in modern versions of GROMACS, LAMMPS, and NAMD. IMDv2, while widely available, was designed primarily for visualization and doesn't enforce a consistent number of integration steps between transmitted frames What is Streaming? ================== -Streaming involves processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means sending simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk. +Streaming involves sending and processing data in real-time as it is generated, rather than storing it for later analysis. In molecular dynamics, this means transmitting simulation data to a client on-the-fly while the simulation is running, without writing large trajectory files to disk. The simulation engine acts as a data producer, while the analysis client serves as a receiver that receives this data in real-time. + +Interactive Molecular Dynamics (IMD) +------------------------------------ + +Interactive Molecular Dynamics (IMD) is a specific implementation of streaming for molecular dynamics simulations [#imd2001]_. IMD establishes a TCP/IP socket connection between the simulation engine and receiving client, enabling real-time transmission of simulation data while the simulation is running. It uses a custom protocol that governs the format and type of simulation data that is exchanged. + +The previous implementation, IMDv2, was primarily designed for visualization purposes and was built to connect to the `VMD `_ molecular visualization program [#imdv2]_. It enabled the transmission of coordinates and energies to VMD for real-time visualization of running simulations. The IMDv3 implementation, however, is designed for more general streaming applications beyond visualization, enabling the transfer of positions, velocities, forces, energies, and timing information [#imdv3]_. + +IMDv3 in MD Engines +^^^^^^^^^^^^^^^^^^^^ -In IMDv3, this is achieved through a TCP/IP socket connection between the simulation engine and receiving client, transmitting coordinates, velocities, forces, energies, and timing information. +The IMDv3 protocol is currently implemented in popular MD engines including GROMACS, LAMMPS, and NAMD. Each engine provides configuration options to enable IMDv3 streaming and specify which data types to transmit during simulation execution. -MDAnalysis's :class:`~MDAnalysis.coordinates.IMD.IMDReader` uses the `imdclient `_ package and provides a familiar interface for reading streaming data, similar to other trajectory readers in MDAnalysis. +IMDReader in MDAnalysis +^^^^^^^^^^^^^^^^^^^^^^^^ + +MDAnalysis's :class:`~MDAnalysis.coordinates.IMD.IMDReader` supports the IMDv3 protocol and provides a convenient interface for reading streaming data. It uses the `imdclient `_ package which acts as a client/receiver, gathering data from TCP/IP sockets and making it available through MDAnalysis's familiar Universe interface. When to Use Streaming? ====================== @@ -41,14 +56,7 @@ Streaming analysis is particularly valuable for: Installation and Setup ====================== -Required Dependencies ---------------------- - -The IMDReader requires the ``imdclient`` package: - -.. code-block:: bash - - pip install imdclient +The IMDReader requires the ``imdclient`` package, which is an optional dependency that must be installed separately from MDAnalysis using either ``pip`` or ``mamba/conda``. For detailed installation instructions, see the `imdclient installation guide `_. .. note:: MDAnalysis requires ``imdclient >= 0.2.2`` for its current implementation. @@ -56,27 +64,37 @@ The IMDReader requires the ``imdclient`` package: MD Engine Configuration ----------------------- -We provide example configurations below for enabling IMDv3 streaming in popular MD engines. +To enable IMDv3 streaming, you need to configure your simulation engine with the appropriate IMD settings. The key parameters are: -**GROMACS** +* **IMD version**: Must be set to 3 for compatibility with MDAnalysis +* **Port**: Network port for the IMD connection (typically 8889) +* **Data transfer rate**: How frequently data is sent (e.g., every 100 steps) +* **Data types**: Which simulation data to stream (coordinates, energies, velocities, etc.) -Add IMD settings to your ``.mdp`` file: +Below are minimal example configurations for each supported engine. For comprehensive setup instructions and advanced configuration options, see the `imdclient simulation engine documentation `_. -.. code-block:: text +GROMACS +^^^^^^^ - ; IMD settings - IMD-group = System - IMD-version = 3 - IMD-nst = 1 - IMD-time = No - IMD-coords = Yes - IMD-vels = No - IMD-forces = No - IMD-box = No - IMD-unwrap = No - IMD-energies = No +Add these IMD settings to your ``.mdp`` file: +.. code-block:: text + ; Required IMD settings + IMD-version = 3 ; Use IMDv3 protocol (2/3) + IMD-group = System ; Atom group to stream (System = all atoms) + IMD-nst = 100 ; Send data every 100 steps + + ; Data streams - specify what to send + IMD-time = Yes ; Stream timing information (Yes/No) + IMD-energies = Yes ; Stream energy data (Yes/No) + IMD-box = Yes ; Stream box dimensions (for PBC analysis) (Yes/No) + IMD-coords = Yes ; Stream coordinates (Yes/No) + IMD-vels = Yes ; Stream velocities (Yes/No) + IMD-forces = Yes ; Stream forces (Yes/No) + + ; Coordinate processing + IMD-unwrap = Yes ; Unwrap coordinates across PBC (Yes/No) Run with IMD enabled: @@ -84,42 +102,57 @@ Run with IMD enabled: gmx mdrun -v -nt 4 -imdwait -imdport 8889 -**LAMMPS** +LAMMPS +^^^^^^ Use the IMD fix in your input script: .. code-block:: text - # IMD setup - fix ID group-ID imd trate version 3 unwrap time box coordinates velocities forces + # Complete IMD configuration + fix imd all imd 8889 trate 100 version 3 unwrap yes time yes box yes coordinates yes velocities yes forces yes + + # Parameters explained: + # 8889: Network port for connection + # trate 100: Send data every 100 timesteps (transmission rate) + # version 3: Use IMDv3 protocol (2/3) + # unwrap yes: Unwrap coordinates across PBC (yes/no) + # time yes: Send timing information (yes/no) + # box yes: Send box dimensions (for PBC analysis) (yes/no) + # coordinates yes: Send atomic coordinates (yes/no) + # velocities yes: Send velocity data (yes/no) + # forces yes: Send force data (yes/no) + # Note: Energy streaming not supported in LAMMPS IMD fix Run your LAMMPS simulation as usual. -**NAMD** +NAMD +^^^^ -Add IMD configuration to your NAMD configuration file: +Add these IMD settings to your configuration file: .. code-block:: text - # IMD Settings - IMDon yes - IMDport - IMDwait - IMDfreq - - IMDsendPositions - IMDsendEnergies - IMDsendTime - IMDsendBoxDimensions - IMDsendVelocities - IMDsendForces - IMDwrapPositions + # Required IMD settings + IMDon yes ; Enable IMD functionality + IMDversion 3 ; Use IMDv3 protocol (2/3) + IMDport 8889 ; Network port for connection + IMDwait on ; Wait for client connection before starting (on/off) + IMDfreq 100 ; Send data every 100 steps (transmission rate) + + # Data streams - specify what to send + IMDsendTime yes ; Send timing information (yes/no) + IMDsendEnergies yes ; Send energy information (yes/no) + IMDsendBoxDimensions yes ; Send simulation box data (for PBC analysis) (yes/no) + IMDsendPositions yes ; Send coordinates (yes/no) + IMDsendVelocities yes ; Send velocity data (yes/no) + IMDsendForces yes ; Send force data (yes/no) + + # Coordinate processing + IMDwrapPositions no ; Don't wrap (i.e unwrap) coordinates into simulation box (yes/no) Run your NAMD simulation as usual. -.. seealso:: - For detailed engine-specific setup instructions, see the `imdclient simulation engine documentation `_. - Basic Usage =========== @@ -134,13 +167,15 @@ Once your simulation is running with IMD enabled: # Connect to the simulation u = mda.Universe("topol.tpr", "imd://localhost:8889", buffer_size=10*1024*1024) + + # Select atoms for analysis + selected_atoms = u.select_atoms("protein and name CA") # Streaming analysis loop for ts in u.trajectory: print(f"Time: {ts.time:.2f} ps, Step: {ts.data.get('step', 'N/A')}") # Your analysis code here - selected_atoms = u.select_atoms("protein and name CA") center_of_mass = selected_atoms.center_of_mass() print(f"Protein COM: {center_of_mass}") @@ -148,6 +183,8 @@ Once your simulation is running with IMD enabled: if ts.time > 1000: # Stop after 1000 ps break +The ``buffer_size`` parameter (10 MB = 10*1024*1024 bytes in this example) controls the buffer used by `imdclient `_ to temporarily store data received from the socket. This buffer accounts for speed differences between the producer (simulation engine) and receiver (analysis code), preventing data loss when analysis is slower than data transmission. A larger buffer is more suitable for systems with many atoms or high transmission frequencies. For more details on buffer management and optimization, see the `imdclient documentation `_. + Real-time Quality Control ------------------------- @@ -156,8 +193,10 @@ Monitor simulation health in real-time: .. code-block:: python import MDAnalysis as mda + from MDAnalysis.lib.distances import calc_bonds import numpy as np + # Connect to simulation streaming positions, box dimensions, and energies u = mda.Universe("system.tpr", "imd://localhost:8889") previous_positions = None @@ -165,17 +204,21 @@ Monitor simulation health in real-time: for ts in u.trajectory: current_positions = u.atoms.positions.copy() - # Check for simulation artifacts + # Check for simulation artifacts using PBC-aware distance calculation if previous_positions is not None: - displacement = np.linalg.norm(current_positions - previous_positions, axis=1) - max_displacement = np.max(displacement) + # Create atom pairs for displacement calculation + atom_pairs = np.column_stack([np.arange(len(u.atoms)), np.arange(len(u.atoms))]) + + # Use PBC-aware distance calculation + displacements = calc_bonds(previous_positions, current_positions, + atom_pairs, box=ts.dimensions) + max_displacement = np.max(displacements) if max_displacement > 10.0: # Atoms moved > 10 Å in one step print(f"WARNING: Large displacement detected at {ts.time} ps: {max_displacement:.2f} Å") - # Monitor energies if available - if 'potential' in ts.data: - print(f"Potential energy: {ts.data['potential']:.2f}") + # Monitor energies + print(f"Potential energy: {ts.data['potential']:.2f}") previous_positions = current_positions @@ -185,13 +228,15 @@ Advanced Features Buffer Management ----------------- -For compute-intensive analysis, increase the buffer size to reduce communication overhead: +The ``buffer_size`` parameter (specified in bytes) controls how much data imdclient can temporarily store while managing speed differences between simulation and analysis. For compute-intensive analysis, increase the buffer size to reduce communication overhead: .. code-block:: python - # Larger buffer for better performance + # Larger buffer for demanding scenarios (50 MB = 50*1024*1024 bytes) u = mda.Universe("topol.tpr", "imd://localhost:8889", buffer_size=50*1024*1024) +For detailed information about buffer behavior and usage, see the `imdclient buffer management documentation `_. + Connection Management --------------------- @@ -201,6 +246,9 @@ Always ensure proper cleanup, especially in interactive environments like Jupyte import MDAnalysis as mda + u = None + error = None + try: u = mda.Universe("topol.tpr", "imd://localhost:8889") @@ -209,10 +257,18 @@ Always ensure proper cleanup, especially in interactive environments like Jupyte pass except Exception as e: + # Log error but don't re-raise yet + error = e print(f"Error during streaming: {e}") + finally: - # Always close the connection - u.trajectory.close() + # Always close the connection first + if u is not None: + u.trajectory.close() + + # Re-raise after cleanup is done + if error: + raise error Available Data -------------- @@ -220,37 +276,56 @@ Available Data The IMDReader provides access to additional simulation data through ``ts.data``: * ``dt``: Time step size in picoseconds -* ``step``: Current simulation step number -* Energy terms: ``potential``, ``total``, etc. (engine-dependent) +* ``step``: Current simulation step number +* Energy terms: ``potential_energy``, ``total_energy``, etc. (IMD-streamed in NAMD and GROMACS only) .. code-block:: python for ts in u.trajectory: print(f"Step {ts.data.get('step')}: dt={ts.data.get('dt')} ps") - # Available energy terms vary by MD engine + # Energy terms available only when IMD-streaming from NAMD and GROMACS only for key, value in ts.data.items(): if key not in ['dt', 'step']: print(f" {key}: {value}") -Multiple Client Connections -=========================== +Integration with MDAnalysis Tools +================================= -The ability to connect multiple clients to the same IMD port depends on the MD engine implementation: +Most MDAnalysis analysis classes work with streaming data, but some limitations apply: -* **GROMACS**: Typically supports single client connections -* **LAMMPS**: May support multiple clients (version-dependent) -* **NAMD**: Supports multiple clients +Compatible Analysis +^^^^^^^^^^^^^^^^^^^ -.. important:: - Even when multiple connections are supported, each receives an independent data stream. Different clients may receive different data depending on the engine configuration. +**What works with streaming:** -Integration with MDAnalysis Tools -================================= +* **Single-frame calculations**: Analyses that work on individual timesteps, for example: -Most MDAnalysis analysis classes work with streaming data, but some limitations apply: + - Within-frame: :meth:`~MDAnalysis.core.groups.AtomGroup.center_of_mass`, :meth:`~MDAnalysis.core.groups.AtomGroup.radius_of_gyration` + - Within-frame: :func:`~MDAnalysis.analysis.distances.distance_array` for pairwise distances between atom groups + - Between-frames: :func:`~MDAnalysis.lib.distances.calc_bonds` for displacement calculations comparing consecutive frames + - Between-frames: Frame-to-frame RMSD calculations using :func:`~MDAnalysis.analysis.rms.rmsd` with stored reference coordinates + - Within-frame: Real-time monitoring using :meth:`~MDAnalysis.core.groups.AtomGroup.center_of_mass` for quality control checks + +* **Accumulative analyses**: Building results incrementally across frames using :class:`~MDAnalysis.analysis.base.AnalysisBase` patterns, for example: + + - :class:`~MDAnalysis.analysis.rdf.InterRDF` - Frame-by-frame radial distribution function calculations + - :class:`~MDAnalysis.analysis.dihedrals.Dihedral` - Dihedral angle accumulation for conformational analysis + - :class:`~MDAnalysis.analysis.lineardensity.LinearDensity` - Density profile building over streaming frames + +**What doesn't work:** + +* **Multi-pass analyses**: Methods requiring multiple trajectory passes, for example: + + - :class:`~MDAnalysis.analysis.rms.RMSD` - Needs reference structure alignment across all frames / entire trajectory + - :class:`~MDAnalysis.analysis.pca.PCA` - Principal component analysis requires full trajectory + +* **Global trajectory methods**: Analyses needing simultaneous access to all frames, for example: -**Compatible Analysis** + - :meth:`~MDAnalysis.coordinates.base.ProtoReader.timeseries` - Bulk coordinate extraction + - :class:`~MDAnalysis.analysis.encore.encore` - Ensemble similarity calculations + +**Example streaming-compatible analyses:** .. code-block:: python @@ -258,34 +333,25 @@ Most MDAnalysis analysis classes work with streaming data, but some limitations u = mda.Universe("system.tpr", "imd://localhost:8889") + # Select atoms once outside the loop (best practice for performance) + protein = u.select_atoms("protein") + selection1 = u.select_atoms("resid 1-10") + selection2 = u.select_atoms("resid 50-60") + for ts in u.trajectory: # Distance calculations work normally - protein = u.select_atoms("protein") rg = protein.radius_of_gyration() # Contact analysis - selection1 = u.select_atoms("resid 1-10") - selection2 = u.select_atoms("resid 50-60") dist_array = distances.distance_array(selection1.positions, selection2.positions) -**Limitations with Streaming** - -Some analysis methods require the complete trajectory and won't work with streaming: - -.. code-block:: python - - # These will NOT work with streaming: - # - trajectory.timeseries() - # - Most analysis classes that need multiple passes - # - Random frame access (trajectory[10]) - # - Backward iteration - Important Limitations ===================== Streaming analysis has fundamental constraints due to its real-time nature: -**Data Access Limitations** +Data Access Limitations +^^^^^^^^^^^^^^^^^^^^^^^^ * **No random access**: Cannot jump to arbitrary frames or seek backwards * **Forward-only**: Can only iterate through frames as they arrive @@ -293,32 +359,60 @@ Streaming analysis has fundamental constraints due to its real-time nature: * **No trajectory length**: Total frame count unknown until simulation ends * **No independent copies**: Cannot create multiple reader instances for the same stream -**Analysis Constraints** +.. note:: + Multiple client connections to the same IMD port may be possible with some MD engines. For details on engine-specific behavior, see the `imdclient documentation `_. + +Analysis Constraints +^^^^^^^^^^^^^^^^^^^^ * **No timeseries methods**: Cannot use ``trajectory.timeseries()`` * **No bulk operations**: Cannot extract all data at once * **Limited multiprocessing**: Cannot split across processes * **Single client**: Only one reader per IMD stream (engine-dependent) -**Practical Considerations** +Practical Considerations +^^^^^^^^^^^^^^^^^^^^^^^^^ + +**Forward iteration works correctly:** .. code-block:: python # This WILL work - forward iteration for ts in u.trajectory: analysis_data.append(calculate_something(ts)) - + +**Random frame access will fail:** + +.. code-block:: python + # This will NOT work - random access - ts = u.trajectory[10] # TypeError - + ts = u.trajectory[10] # ValueError + +**Backward iteration will fail:** + +.. code-block:: python + # This will NOT work - backwards iteration for ts in u.trajectory[::-1]: # ValueError pass - + +**Setting end-frames is not supported:** + +.. code-block:: python + + # This will NOT work - cannot set stop index + for ts in u.trajectory[:10]: # ValueError + pass + +**Restarting iteration will not work as expected:** + +.. code-block:: python + # This will NOT work - restarting iteration for ts in u.trajectory: break - for ts in u.trajectory: # Won't start from beginning + for ts in u.trajectory: + # Won't start from beginning but rather continue from where it left off pass See Also @@ -327,4 +421,13 @@ See Also * :class:`~MDAnalysis.coordinates.IMD.IMDReader` - Technical API documentation * :class:`~MDAnalysis.coordinates.base.StreamReaderBase` - Base class for streaming readers * `imdclient documentation `_ - Complete imdclient package documentation -* `IMDv3 protocol specification `_ - Technical protocol details \ No newline at end of file +* `IMDv3 protocol specification `_ - Technical protocol details + +References +========== + +.. [#imd2001] John E. Stone, Justin Gullingsrud, and Klaus Schulten. A system for interactive molecular dynamics simulation. In Proceedings of the 2001 Symposium on Interactive 3D Graphics, I3D '01, 191–194. New York, NY, USA, 2001. Association for Computing Machinery. ``_ + +.. [#imdv2] IMDv2 protocol implementation. `VMD Interactive Molecular Dynamics `_. Accessed: 2024. + +.. [#imdv3] IMDv3 protocol specification. `imdclient documentation `_. Accessed: 2024. \ No newline at end of file From a644d7d81328da18b7fd491e537a89fbb874cfef Mon Sep 17 00:00:00 2001 From: Oliver Beckstein Date: Thu, 30 Oct 2025 16:21:24 -0700 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: ljwoods2 <145226270+ljwoods2@users.noreply.github.com> --- doc/source/formats/reference/imd.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/source/formats/reference/imd.rst b/doc/source/formats/reference/imd.rst index 9e1af9fa2..a53707027 100644 --- a/doc/source/formats/reference/imd.rst +++ b/doc/source/formats/reference/imd.rst @@ -277,7 +277,7 @@ The IMDReader provides access to additional simulation data through ``ts.data``: * ``dt``: Time step size in picoseconds * ``step``: Current simulation step number -* Energy terms: ``potential_energy``, ``total_energy``, etc. (IMD-streamed in NAMD and GROMACS only) +* Energy terms: ``potential_energy``, ``total_energy``, etc. (available as an option in NAMD and GROMACS only) .. code-block:: python @@ -312,7 +312,8 @@ Compatible Analysis - :class:`~MDAnalysis.analysis.rdf.InterRDF` - Frame-by-frame radial distribution function calculations - :class:`~MDAnalysis.analysis.dihedrals.Dihedral` - Dihedral angle accumulation for conformational analysis - :class:`~MDAnalysis.analysis.lineardensity.LinearDensity` - Density profile building over streaming frames - +.. note:: + Passing any backend other than ``backend="serial"`` to these analysis classes will cause them to fail since streams are single-pass and forward-only. **What doesn't work:** * **Multi-pass analyses**: Methods requiring multiple trajectory passes, for example: