# Dask and kugupu

The generation of the coupling matrix is the most time consuming calculation done in kugupu.
Luckily as there are no dependencies in calculating different frames, the problem is embarrassingly parallel.
Kugupu is able to use the `dask.distributed` package to calculate many frames in parallel,
as show in this notebook

In [1]:
import kugupu as kgp
import MDAnalysis as mda
from dask import distributed

First we load our simulation data as normal:

In [2]:
u = mda.Universe('./datafiles/C6.data', './datafiles/C6.dcd')

u.add_TopologyAttr('names')
namedict = {
    1.008: 'H',
    12.011: 'C',
    14.007: 'N',
    15.999: 'O',
    32.06: 'S',
}
for m, n in namedict.items():
    u.atoms[u.atoms.masses == m].names = n



Now now create a `distributed.Client` to assign the work to.
Here we create a Client running on our local machine, however it is also possible to use a much more powerful Client.

In [3]:
c = distributed.Client()
c

Perhaps you already have a cluster running?
Hosting the HTTP server on port 56424 instead


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:56424/status,

0,1
Dashboard: http://127.0.0.1:56424/status,Workers: 4
Total threads: 8,Total memory: 8.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:56425,Workers: 4
Dashboard: http://127.0.0.1:56424/status,Total threads: 8
Started: Just now,Total memory: 8.00 GiB

0,1
Comm: tcp://127.0.0.1:56436,Total threads: 2
Dashboard: http://127.0.0.1:56437/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:56428,
Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-clh32pav,Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-clh32pav

0,1
Comm: tcp://127.0.0.1:56442,Total threads: 2
Dashboard: http://127.0.0.1:56443/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:56430,
Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-4qu0yw7p,Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-4qu0yw7p

0,1
Comm: tcp://127.0.0.1:56445,Total threads: 2
Dashboard: http://127.0.0.1:56446/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:56432,
Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-mlm2j2xo,Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-mlm2j2xo

0,1
Comm: tcp://127.0.0.1:56439,Total threads: 2
Dashboard: http://127.0.0.1:56440/status,Memory: 2.00 GiB
Nanny: tcp://127.0.0.1:56434,
Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-fzw5hl4l,Local directory: /var/folders/8_/xls29m695yl7qglq81r94h7w0000gr/T/dask-scratch-space/worker-fzw5hl4l


2025-04-11 12:20:25,115 - tornado.application - ERROR - Exception in callback <bound method SystemMonitor.update of <SystemMonitor: cpu: 6 memory: 31 MB fds: 171>>
Traceback (most recent call last):
  File "/Users/k2584788/.local/share/mamba/envs/forked_kugupu/lib/python3.13/site-packages/tornado/ioloop.py", line 937, in _run
    val = self.callback()
  File "/Users/k2584788/.local/share/mamba/envs/forked_kugupu/lib/python3.13/site-packages/distributed/system_monitor.py", line 168, in update
    net_ioc = psutil.net_io_counters()
  File "/Users/k2584788/.local/share/mamba/envs/forked_kugupu/lib/python3.13/site-packages/psutil/__init__.py", line 2148, in net_io_counters
    rawdict = _psplatform.net_io_counters()
OSError: [Errno 12] Cannot allocate memory


The generation of results uses the same function,
but we include the `client=` keyword to make the calculation happen in parallel.

In [4]:
res = kgp.coupling_matrix(u, 5.0, 'lumo', degeneracy=1, client=c)

2025-04-11T09:08:10.229751+0100 INFO Processing 5 frames
2025-04-11T09:08:14.033087+0100 INFO Finding dimers within 5.0, passed 250 fragments
2025-04-11T09:08:14.058144+0100 INFO Finding dimers within 5.0, passed 250 fragments
2025-04-11T09:08:14.069958+0100 INFO Finding dimers within 5.0, passed 250 fragments
2025-04-11T09:08:14.549767+0100 INFO Finding dimers within 5.0, passed 250 fragments
2025-04-11T09:08:14.583546+0100 INFO Found 3254 dimers
  0%|                                                 | 0/3254 [00:00<?, ?it/s]ERROR: Can't open parameter file: /Users/runner/miniforge3/conda-bld/yaehmop_1658385592334/work/tightbind/eht_parms.dat using default data in eht_parms.h....
2025-04-11T09:08:14.591314+0100 INFO Finding dimers within 5.0, passed 250 fragments
2025-04-11T09:08:14.592504+0100 INFO Found 3286 dimers
  0%|                                                 | 0/3286 [00:00<?, ?it/s]ERROR: Can't open parameter file: /Users/runner/miniforge3/conda-bld/yaehmop_1658385592334/w

The results generated this way are identical to as before, and can be saved and used is subsequent analysis as normal.