# Pre-calculating voxel spacing

The [Med3D](https://arxiv.org/pdf/1904.00625) study reports that medical volumes often exhibit heterogeneous voxel spacing due to differences in scanners and acquisition protocols. To mitigate this variability, the median spacing for each axis (x, y, z) is calculated within each domain, where a domain is defined by specific equipment and acquisition protocols. This approach ensures that the estimated spacing remains robust against outliers and minor data errors.

In this notebook, a domain is defined by imaging modality (which can be "CTA", "MRA", "MRI T1post", "MRI T2"), resulting in four distinct domains. For each domain, the median spacing of each series is calculated. The method for determining spacing in each series depends on the type of DICOM file present:
- For series containing a single DICOM file representing the entire volume (3D), spacing is obtained from the attribute specifying the spacing for the whole volume.
- For series containing a collection of DICOM files from slices (2D), spacing is determined as the mode of all spacings. It is assumed that most spacings within each series are consistent. Instances with differing spacings are considered data errors and are excluded from the final median computation for each domain, by using only the mode of all spacings.

---
**Results (x, y, z):**
- CTA: (0.46875, 0.46875, 0.8)
- MRA: (0.410156, 0.410156, 0.6)
- MRI T1post: (0.5, 0.5, 1.2)
- MRI T2: (0.5, 0.5, 5.)

In [None]:
#|default_exp precalculating_voxel_spacing

In [None]:
import sys
sys.path.append("../lib")

>**Note**: Since i had already implemented this before the DICOM to NifTI part, this code loads DICOM files instead of the NifTIs that already have the computed mode of the spacing for each volume.

In [None]:
#|export
from dicom_to_nifti import *
import numpy as np

def dicom_series_get_spacing(base_series_path, series_uid):

    spacings = np.zeros((len(series_uid), 3))
    for i, serie_uid in enumerate(series_uid):
        ds_l = dicom_serie_load(base_series_path, serie_uid)
        spacings[i] = dicom_serie_get_spacing(ds_l)

    return spacings

In [None]:
import os

base_path_dicom = os.environ["RSNA_IAD_DATA_DIR"]
series_path_dicom = f"{base_path_dicom}/series"

In [None]:
import random

series_uid_l = os.listdir(series_path_dicom)
random.seed(0)
sample_series_uid_l = random.sample(series_uid_l, 5)

spacings = dicom_series_get_spacing(series_path_dicom, sample_series_uid_l)
print(spacings)

Here i have the option to load series in batches per task/job. So I implemented a dictionary that maps the job object to the series indexes, so later i can retrieve their modalities/domains, from these indexes, as the jobs' completion order is not pre-determined. Then i can update the list of spacings for that domain:

In [None]:
from concurrent.futures import ProcessPoolExecutor, as_completed
import multiprocessing
from tqdm import tqdm
import numpy as np
import polars as pl

train_df = pl.read_csv(f"{base_path_dicom}/train.csv")
series_uid = train_df["SeriesInstanceUID"].to_numpy()
modalities = train_df["Modality"].to_numpy()

domain_spacing_dict = { modality: [] for modality in np.unique(modalities) }
n_series = series_uid.shape[0]
batch_size = 1
with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
    job_to_idx_dict = {}
    for i in tqdm(range(0, n_series, batch_size)):
        ixs = slice(i, min(n_series, i+batch_size))
        series_uid_i = series_uid[ixs]
        job = executor.submit(dicom_series_get_spacing, series_path_dicom, series_uid_i)
        job_to_idx_dict[job] = ixs

    n_jobs = len(job_to_idx_dict)
    for job in tqdm(as_completed(job_to_idx_dict), total=n_jobs):
        try:
            spacings = job.result()
            ixs = job_to_idx_dict[job]
            for modality, spacing in zip(modalities[ixs], spacings):
                domain_spacing_dict[modality].append(spacing)
        except Exception as e:
            raise Exception(e)

     # Stack lists of numpy arrays
    for domain, spacings in domain_spacing_dict.items():
        if len(spacings) > 0:
            domain_spacing_dict[domain] = np.vstack(spacings)
        else:
            domiiain_spacing_dict[domain] = None   

In [None]:
domain_spacing_dict

In [None]:
for domain, spacings in domain_spacing_dict.items():
    print(f"{domain}: {np.median(spacings, axis=0)}")