## High-performance fitting with Dask

Although fitting of a single medium sized protein will only take minutes on a typical computer, processing many datasets 
which cover many peptides can take long time. Therefore pyHDX supports parallel fitting through the parallel computing 
library ``Dask``. This allows fitting to be done in parallel or on a remote high-performance machine, thereby significantly
speeding up the fitting process. The prerequisite for parallel fitting is that the fitting problem can be split in 
independent tasks. Weighted averaging fitting can be split into many subtasks by default as each block of residues is
fitted separately. Parallelization of global fitting depends on the presence of regions of no coverage which split the
fitting problem into subproblems.

Parallel fitting in pyHDX uses the python 3.6+ ``async/await`` syntax together with an 
[asynchronous](https://distributed.dask.org/en/latest/asynchronous.html) ``dask`` client.  

In [None]:
from dask.distributed import LocalCluster, Client
import asyncio

First we need a ``dask`` cluster, either we connect to an existing cluster or we create a local cluster. 

In [None]:
cluster = LocalCluster(n_workers=8)
cluster.scheduler_address


By passing the cluster's scheduler address to a ``KineticsFitting`` object we can use it's ``async`` methods to do
asynchronous fitting. 

In [None]:
kf_async = KineticsFitting(series, cluster=cluster.scheduler_address, bounds=(1e-2, 300))

In jupyter notebooks, the ``async`` fitting methods can be directly ``await``ed

In [None]:
async_result_wt_avg = await kf_async.weighted_avg_fit_async()
async_output_wt_avg = async_result_wt_avg.get_output(['rate', 'tau', 'tau1', 'tau2', 'r'])

In [None]:
async_result_global = await kf_async.global_fit_async(async_output_wt_avg)


In [None]:
async_output_global = async_result_global.get_output(['rate', 'tau', 'tau1', 'tau2', 'r'])

We can verify that this produces the same result as the concurrent procedure.

In [None]:
fig, ax = plt.subplots()
ax.set_yscale('log')
ax.scatter(async_output_wt_avg['r_number'], async_output_wt_avg['rate'])
ax.scatter(async_output_global['r_number'], async_output_global['rate'])
ax.set_xlabel('Residue number')
ax.set_ylabel('Rate (min⁻¹)')
cluster.close()

