# Generating Data Objects

While PyDicer primarily deals with converting data objects from DICOM, there are instances where
you may want to generate a new data object and have it integrated into your PyDicer dataset.

Some examples of when you may want to do this are:
- Generate a new dose grid with EQD2 correction applied.
- Generate a structure set of auto-segmented structures.
- Generate a Pseudo-CT image from an MRI.

In this guide we show you how to generate new data objects to help you perform tasks such as those
described in the examples mentioned above.

In [None]:
try:
    from pydicer import PyDicer
except ImportError:
    !pip install pydicer
    from pydicer import PyDicer

from pathlib import Path
import SimpleITK as sitk

from pydicer.utils import fetch_converted_test_data, load_object_metadata, read_simple_itk_image

working_directory = fetch_converted_test_data("./testdata_hnscc", dataset="HNSCC")

pydicer = PyDicer(working_directory)

## Generate Dose Objects

In the following cell, we:
- Iterate over each dose grid in our dataset
- Load the dose grid using SimpleITK
- Apply EQD2 dose correction (hard coded for demonstration purposes)
- Save the corrected dose as a new object in our dataset

Once the dose object is saved, when you compute DVHs and dose metrics, this new dose will appear
in that data.

In [None]:
alpha_beta = 2

df = pydicer.read_converted_data()
df_doses = df[df["modality"] == "RTDOSE"]

for _, dose_row in df_doses.iterrows():
    if "EQD2_ab" in dose_row.hashed_uid:
        # This is an already scaled dose
        continue

    df_linked_plan = df[df["sop_instance_uid"] == dose_row.referenced_sop_instance_uid]

    linked_plan = df_linked_plan.iloc[0]
    ds_plan = load_object_metadata(linked_plan)

    # Read the planned fractions from the plan object
    fractions = int(ds_plan.FractionGroupSequence[0].NumberOfFractionsPlanned)

    print(f"{dose_row.patient_id} has {fractions} fractions")

    # Load the dose grid
    dose_path = Path(dose_row.path).joinpath("RTDOSE.nii.gz")
    dose = sitk.ReadImage(str(dose_path))
    dose = sitk.Cast(dose, sitk.sitkFloat64)

    dose_id = f"{dose_row.hashed_uid}_EQD2_ab{alpha_beta}"

    if len(df_doses[df_doses.hashed_uid == dose_id]) > 0:
        print(f"Already converted dose for {dose_id}")
        continue

    # Apply the EQD2 correction
    eqd2_dose = dose * (((dose / fractions) + alpha_beta) / (2 + alpha_beta))

    # Save off the new dose grid
    try:
        print(f"Saving dose grid with ID: {dose_id}")
        pydicer.add_dose_object(
            eqd2_dose, dose_id, dose_row.patient_id, linked_plan, dose_row.for_uid
        )
    except SystemError:
        print(f"Dose object {dose_id} already exists!")

Now we can load our data objects, and check that our new dose grids are stored alongside our
converted data.

In [None]:
df = pydicer.read_converted_data()
df[df.modality=="RTDOSE"]

## Generate Structure Set Objects

In this example, we:
- Iterate over each CT image in our dataset
- Load the CT image using SimpleITK, and apply a threshold to segment bones
- Save the segmented bones as a new structure set object

> Note: This specific functionality is supported by the auto-segmentation inference module. If you
> are using this to generate auto-segmentations it is recommended you use that functionality.

In [None]:
bone_threshold = 300 # Set threshold at 300 HU

df = pydicer.read_converted_data()
df_cts = df[df["modality"] == "CT"]

for idx, ct_row in df_cts.iterrows():

    # Load the image
    img = read_simple_itk_image(ct_row)

    # Apply the threshold
    bone_mask = img > bone_threshold

    # Save the mask in a new structure set
    structure_set_id = f"bones_{ct_row.hashed_uid}"
    new_structure_set = {
        "bones": bone_mask
    }


    try:
        print(f"Saving structure set with ID: {structure_set_id}")
        pydicer.add_structure_object(
            new_structure_set,
            structure_set_id,
            ct_row.patient_id,
            ct_row,
        )
    except SystemError:
        print(f"Structure Set {structure_set_id} already exists!")


Let's load out data objects to see if we have our new structure sets.

In [None]:
df = pydicer.read_converted_data()
df[df.modality=="RTSTRUCT"]

And we can also run the visualise module. Use the `force=False` flag to ensure that only the newly
generated objects are visualised

In [None]:
pydicer.visualise.visualise(force=False)

Take a look inside the `testdata_hnscc/data` directory for the new structure set folders. See the
visualised snapshot to check that our bone segmentation worked!

## Generate Image Objects

In this example, we:
- Iterate over each CT image in our dataset
- Load the CT image using SimpleITK, and apply a Laplacian Sharpening image filter to it
- Save the image as a new image object

In [None]:
df = pydicer.read_converted_data()
df_cts = df[df["modality"] == "CT"]

for idx, ct_row in df_cts.iterrows():

    # Load the image
    img = read_simple_itk_image(ct_row)

    # Sharpen the image
    img_sharp = sitk.LaplacianSharpening(img)

    # Save the sharpened image
    img_id = f"sharp_{ct_row.hashed_uid}"
    
    try:
        print(f"Saving image with ID: {img_id}")
        pydicer.add_image_object(
            img_sharp,
            img_id,
            ct_row.modality,
            ct_row.patient_id,
            for_uid=ct_row.for_uid
        )
    except SystemError:
        print(f"Image {img_id} already exists!")

Now we can visualise the images and produce snapshots once more. Find the sharpen images in the
working directory. Can you see the difference between the sharpened CT and the original?

In [None]:
pydicer.visualise.visualise(force=False)