# Working with Data

Here we present some useful tips & tricks which to help working with data which has been converted
using PyDicer. As you will see, working with data in PyDicer is heavily oriented around DataFrames
provided by the Pandas library. If you aren't familiar with Pandas, we recommend working through 
the [Pandas Getting Started Tutorials](https://pandas.pydata.org/docs/getting_started/index.html).

In [None]:
try:
    from pydicer import PyDicer
except ImportError:
    !pip install pydicer
    from pydicer import PyDicer

from pathlib import Path

from pydicer.utils import (
    fetch_converted_test_data,
    load_object_metadata,
    determine_dcm_datetime,
    read_simple_itk_image
)

## Setup PyDicer

Here we load the LCTSC data which has already been converted. This is downloaded into the
`testdata_lctsc` directory. We also initialise a `PyDicer` object.

In [None]:
working_directory = fetch_converted_test_data("./testdata_lctsc", dataset="LCTSC")

pydicer = PyDicer(working_directory)

## Read Converted Data

To obtain a DataFrame of the converted data, use the `read_converted_data` function.

In [None]:
df = pydicer.read_converted_data()
df

## Iterating Over Objects

If you want to perform some operation on (for example) all images in your dataset, you can iterate
over each image row like this. Within each loop we load each image as a `SimpleITK` image (just
for demonstration purposes).)

In [None]:
for idx, ct_row in df[df.modality=="CT"].iterrows():

    print(f"Loading image with hashed UID: {ct_row.hashed_uid}...", end="")

    img = read_simple_itk_image(ct_row)

    print(" Complete")

## Loading Object Metadata

The metadata from the DICOM headers is stored by PyDicer and can be easily loaded using the
`load_object_metadata` function. Simply pass a row from the converted DataFrame into this function
to load the metadata for that object.

In [None]:
first_row = df.iloc[0]
ds = load_object_metadata(first_row)
ds

### Keep only specific header tags

Loading object metadata can be slow, especially when doing this for many objects at once. So, you
can specify the `keep_tags` argument if you know which header attributes you want to use. This
speeds up loading metadata significantly.

Here we load only the `StudyDate`, `PatientSex` and `Manufacturer`.

> Tip: These tags are defined by the DICOM standard, and we use `pydicom` to load this metadata. In
> fact, the metadata returned is a `pydicom` Dataset. Check out the [`pydicom` documentation](https://pydicom.github.io/pydicom/dev/old/pydicom_user_guide.html) for more information.

In [None]:
ds = load_object_metadata(first_row, keep_tags=["StudyDate", "PatientSex", "Manufacturer"])
ds

### Loading metadata for all data objects

You can use the Pandas `apply` function to load metadata for all rows and add it as a column to the
converted DataFrame.

In [None]:
df["StudyDescription"] = df.apply(lambda row: load_object_metadata(row, keep_tags="StudyDescription").StudyDescription, axis=1)
df

### Determine Date of Object

There are several DICOM header tags which could define the date of an object. The DICOM standard
doesn't require all of these to be set within the metadata. PyDicer provides the 
`determine_dcm_datetime` function to extract the date from the DICOM header.

In [None]:
ds = load_object_metadata(first_row)
obj_datetime = determine_dcm_datetime(ds)
print(obj_datetime)