# 🚧 Under construction 🚧

# Abstract

# Introduction

# Methods

# Results

Parallelization with `ray` provided considerable speedups over serial excicution for both constrained sperical deconvolution models and free water models. We saw a much greater speedup for the free water model, which is possibly explained by the fact that it is much more computationally expensive per voxel. This would mean that the overhead from parallelizing the model would have a smaller effect on the runtime. Interestlingly 48 and 72 core instances performed slightly worse than the 32 core instance on the csdm model, which may indicate that there is some increased overhead for each core, separate from the overhead for each task sent to ray.

![](figures/csdm_speedup.png){width=50% height=50%}
![](figures/fwdtim_speedup.png){width=50% height=50%}

Efficiency decreases as a function of number of CPUs, but is still rather high in many configurations. Efficiency is also considerably higher for the free water tensor model, which is consistent with out expectations given that it is more computationally expensive per voxel and therefor ray overhead would have less effect. The high efficency of 8 core machines suggest that the most cost effective configuration for processing may be relativly cheap low core machines.

![](figures/csdm_efficency.png){width=50% height=50%}
![](figures/fwdtim_efficency.png){width=50% height=50%}

Ray tends to spill a large amount of data to disk and does not clean up afterwards. This can quickly become problematic when running multiple consecuitive models. Withing just an hour or two of running ray could easily spill over 500gb to disk. We have implemented a fix for this within our model as follows:


In [None]:
#| echo: false
    if engine == "ray":
        if not has_ray:
            raise ray()

        if clean_spill:
            tmp_dir = tempfile.TemporaryDirectory()

            if not ray.is_initialized():
                ray.init(_system_config={
                    "object_spilling_config": json.dumps(
                        {"type": "filesystem", "params": {"directory_path":
                         tmp_dir.name}},
                    )
                },)

        func = ray.remote(func)
        results = ray.get([func.remote(ii, *func_args, **func_kwargs)
                          for ii in in_list])

        if clean_spill:
            shutil.rmtree(tmp_dir.name)

# Discussion
z

## Acknowledgments

This work was funded through NIH grant EB027585 (PI: Eleftherios Garyfallidis) and
a grant from the Chan Zuckerberg Initiative Essential Open Source Software program (PI: Serge Koudoro).