# AutoPET Dataset

To download the dataset, please visit the following link: [AutoPET Dataset](https://it-portal.med.uni-muenchen.de/autopet/Autopet_v1.1.tgz)

The dataset is provided in the nnUNet format and contains resampled FDG and PSMA images as NIFTI. It also includes the files obtained by running the nnUNet fingerprint extractor and a splits file which can be used to train nnUNet models. The dataset includes the following training cases: 1,014 FDG studies (900 patients) and 597 PSMA studies (378 patients).

```text
|--- imagesTr
     |--- tracer_patient1_study1_0000.nii.gz  (CT image resampled to PET)

     |--- tracer_patient1_study1_0001.nii.gz  (PET image in SUV)
     |--- ...
|--- labelsTr
     |--- tracer_patient1_study1.nii.gz       (manual annotations of tumor lesions)

|--- dataset.json                             (nnUNet dataset description)
|--- dataset_fingerprint.json                 (nnUNet dataset fingerprint)

|--- splits_final.json                        (reference 5fold split)

|--- psma_metadata.csv                        (metadata csv for psma)

|--- fdg_metadata.csv                         (original metadata csv for fdg)
```

## Filter Lymphoma Cases

In [None]:
import pandas as pd
import os
from monai.transforms import LoadImaged, ConcatItemsd, EnsureChannelFirstD, Compose, SaveImageD
from collections import defaultdict
from pathlib import Path
import numpy as np
import shutil

In [None]:
dataset_dir = "2024-05-10_Autopet_v1.1"

In [3]:
df = pd.read_csv(os.path.join(dataset_dir,"fdg_metadata.csv"))

In [4]:
patient_classes = {}

for row in df.iterrows():
    patient_classes['fdg_'+row[1]['Subject ID'].split("_")[1]] = row[1]['diagnosis']

In [None]:



grouped = defaultdict(list)

for key, cls in patient_classes.items():
    grouped[cls].append(key)

grouped_patient_classes = dict(grouped)

In [6]:
lymphoma_subjects = grouped_patient_classes['LYMPHOMA']

## Create Decathlon Dataset with Lymphoma Cases

The PET-CT modalities are also combined into a 4-D Volume.

In [7]:
def subfiles(folder, prefix=None, suffix=None, join=True, sort=True):
    files = [f for f in os.listdir(folder) if os.path.isfile(os.path.join(folder, f))]
    if prefix is not None:
        files = [f for f in files if f.startswith(prefix)]
    if suffix is not None:
        files = [f for f in files if f.endswith(suffix)]
    if sort:
        files.sort()
    if join:
        files = [os.path.join(folder, f) for f in files]
    return files

In [10]:
image_files = subfiles(os.path.join(dataset_dir,"imagesTr"), join=False)

In [11]:
patient_ids = [image_file[:-len("_0000.nii.gz")] for image_file in image_files]

In [None]:
patient_ids = np.unique(patient_ids)

In [None]:
output_dir = Path("Data/Task100_AutoPET_Lymphoma")

imgs_output_dir = Path(output_dir).joinpath("imagesTr")
labels_output_dir = Path(output_dir).joinpath("labelsTr")
test_imgs_output_dir = Path(output_dir).joinpath("imagesTs")

In [14]:
imgs_output_dir.mkdir(parents=True, exist_ok=True)
labels_output_dir.mkdir(parents=True, exist_ok=True)
test_imgs_output_dir.mkdir(parents=True, exist_ok=True)

Create the Transform to Concatenate the PET and CT Images and Save the 4-D Volume.

In [17]:
transform = Compose([
    LoadImaged(keys=["CT", "PET"]),
    EnsureChannelFirstD(keys=["CT", "PET"]),
    ConcatItemsd(keys=["CT", "PET"], name="image"),
    SaveImageD(keys=["image"], output_dir=imgs_output_dir, output_postfix="",separate_folder=False),
])

In [None]:

for patient_id in patient_ids:
    if "_".join(patient_id.split("_")[:2]) in lymphoma_subjects:
        transform({"CT": f"imagesTr/{patient_id}_0000.nii.gz", "PET": f"imagesTr/{patient_id}_0001.nii.gz"})
        shutil.copy(f"labelsTr/{patient_id}.nii.gz", Path(labels_output_dir).joinpath(f"{patient_id}.nii.gz"))