# Streaming Theory, Implementation, and Technical Details

## Lawson Woods


# Streaming theory

### - Simulations with femtosecond-scale output can generate petabytes of trajectory data

### - Syncing to disk is orders of magnitude slower than writing to RAM

<!-- ![image](./streaming-theory/gromacs_performance_2.png) -->

<img src="./streaming-theory/gromacs_performance_2.png" height="600px" width="600px"></center>


# Streaming theory

### - Linux receiving sockets by default only expand to 6MB!

### - Even though TCP prevents packet loss, it will exponentially throttle applications which don't empty the socket buffer quickly

### - Application-level congestion-control is the answer

<!-- ![image](./streaming-theory/tcp_window.png) -->

<img src="./streaming-theory/tcp_window.png" height="600px" width="600px"></center>

Fig: "Layer 4 Optimizer (L4O) for Enhancing Battery Life in Smart Devices"


# IMDClient package architecture

### 1. Client is "batteries included"- no buffer management needed by user

### 2. Use imdclient outside of MDAnalysis

### 3. Strive for as much MDAnalysis compatibility as possible, documenting limitation

![image](./streaming-theory/imd-mda-1.png)


# IMDClient package

![img4](./streaming-theory/imd-mda-3.png)


# IMDClient API Methods

### 1. `get_imdsessioninfo()`

### 2. `get_imdframe()`

### 3. `stop()` _(handled by context manager)_


In [14]:
import imdclient

with imdclient.IMDClient("localhost", 8889, n_atoms=50786) as client:
    info = client.get_imdsessioninfo()
    print(info)

    frame = client.get_imdframe()
    print(f"Simulation integration step: {frame.step}")
    print(f"Simulation time (fs): {frame.time}")
    print(f"First atom's position (angstroms): {frame.positions[0]}")

IMDSessionInfo(version=3, endianness='<', wrapped_coords=True, energies=False, time=True, box=True, positions=True, velocities=True, forces=True)
Simulation integration step: 0
Simulation time (fs): 0.0
First atom's position (angstroms): [44.       29.400002 31.029999]


# API interactions

![img4](./streaming-theory/imd-mda-4.png)


# Options

### Configurable timeout for high latency

### Configurable buffer size for fast simulations


In [None]:
import imdclient

client = imdclient.IMDClient(
    "localhost",
    8889,
    n_atoms=50786,
    # Wait up to 10 seconds for a simulation frame
    timeout=10,
    # 1 MB
    buffer_size=1024**2,
)

# Automatic pausing and resuming


In [17]:
import imdclient

# 2MB Buffer
with imdclient.IMDClient(
    "localhost", 8889, n_atoms=50786, buffer_size=2 * 1024**2
) as client:
    while True:
        try:
            frame = client.get_imdframe()
        except EOFError:
            break

# Relationship with MDA

![img2](./streaming-theory/imd-mda-2.png)


# Reader wraps client, handles its limitations

### Compatible ✅

```python
for ts in u.trajectory:
    pass

for ts in u.trajectory[:]:
    pass

for ts in u.trajectory[::10]:
    pass
```

### Incompatible (raises `RuntimeError`) ❌

```python
for ts in u.trajectory[:10]:
    pass

for ts in u.trajectory[10:]:
    pass

for ts in u.trajectory[::-1]:
    pass

len(u.trajectory)

u.trajectory.n_frames
```


# The client works out of the box with some MDAnalysis analysis classes

- The analysis class must be able to handle a trajectory without a known length
- One forward pass only!


In [None]:
import MDAnalysis as mda

u = mda.Universe("streaming-theory/gmx/imdgroup.gro", "imd://localhost:8889")
protein = u.select_atoms("protein")

for ts in u.trajectory:
    print(protein.center_of_mass())

# Questions?
