# Evaluation

This notebook guides you through the generation of inpainting results which can be uploaded to Synapse. This notebook complements the respective [Synapse wiki page on submissions](https://www.synapse.org/#!Synapse:syn51156910/wiki/622349) for our inpainting challenge. Make sure to also read through this synapse page.

Additionally, we also show how the evaluation script works that runs on the Synapse servers. For all examples, we use our baseline model. To have t1n images to evaluate against in chapter ```Server-Side Evaluation Script```, we use the **training dataset** instead of the **validation dataset**. For your submission during the validation phase, you need to use the actual validation set of course. Just comment in the respective line below (marked with a ```TODO```).

*Notebook runtime:* Roughly 20 minutes (10 generation, 10 evaluation) using 16 cores and 1 GPU.


## Section Overview
    
- **Inpainting Submissions - Example**:  
    Create inpaitning results which can be uploaded to the Synapse server during the validation phase. You will probably want to **modify this script for your specific approach/model**.
    
- **(Optional) Server-Side Evaluation Script**:  
    Demonstration of the evaluation process on the Synapse server.


*Note:* For more details/context on the evaluation metrics, see our website\* or [section 2.5 "Performance Evaluation"](https://arxiv.org/abs/2305.08992) of our challenge manuscript. 


## Inpainting Submissions - Example

We use our baseline model to generate inference results and save them in a properly formatted output folder.

**Note:** You might want to use the [actual challenge validation set](https://www.synapse.org/#!Synapse:syn51684975).


In [2]:
# Make other sub-repositories available
import sys
from pathlib import Path

repoRoot = Path(".").absolute().parent
sys.path.append(str(repoRoot))
baselineRoot = Path(".").absolute().parent.joinpath("baseline")
sys.path.append(str(baselineRoot))

# Imports
import torch
import numpy as np
import nibabel as nib
from tqdm import tqdm
from baseline.baseline_utils import get_latest_Checkpoint  # to load last checkpoint
from baseline.dataset3D import Dataset_Inference
from baseline.train_Pix2Pix3D import Pix2Pix3D

# Define output folder
resultsFolder = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation-Results")
resultsFolder.mkdir(exist_ok=True)

# Get validation dataset 
dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training")
#dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation") #TODO: comment in for actual validation phase submission!
crop_shape = (208, 208, 144)  # the cuboid our model works on.
seed = 2023
train_p, val_p = 0.80, 0.2  # original 0.8, 0.2, or if you use the actual valdiation set, no split is required!
torch.manual_seed(seed)
dataset = Dataset_Inference(dataset_path, crop_shape=crop_shape, center_on_mask=True)
train_set, validation_set = torch.utils.data.random_split(dataset, [train_p, val_p])
dataset = validation_set  # use a very small subset of the validation set

# Get latest model checkpoint
modelName = "Pix2Pix3D"
latest_checkpoint = get_latest_Checkpoint(modelName, version="*", log_dir_name=repoRoot.joinpath("baseline").joinpath("lightning_logs"))

if latest_checkpoint == None:
    raise UserWarning("No latest model found!")

# Load your model
gpus = [0]
model = Pix2Pix3D.load_from_checkpoint(latest_checkpoint, map_location=torch.device("cuda"))
model.eval()  # Make drop-out/Norms and other deterministic
model.cuda()  # Move to GPU

# Do inference for all samples in the dataset
for sample in tqdm(dataset):
    voided_image = sample["voided_image"].unsqueeze(0)  # add batch dimension (1, X, Y, Z) -> (1, 1, X, Y, Z)
    mask = sample["mask"].unsqueeze(0)

    with torch.no_grad():
        prediction = model.forward(voided_image.cuda(), mask.cuda())
        prediction = prediction.cpu().numpy()[0]  # remove batch
        result, img = Dataset_Inference.get_result_image(prediction, sample)

    # Save to output folder
    sampleFolderName = Path(sample["t1n_voided_path"]).parent.name  # e.g. "BraTS-GLI-00006-000"
    nib.save(img, resultsFolder.joinpath(f"{sampleFolderName}-t1n-inference.nii.gz"))

100%|██████████| 250/250 [04:24<00:00,  1.06s/it]


In [None]:
# List output files
print(resultsFolder)
for fileName in list(resultsFolder.glob("*")):
    print(f"- {fileName.name}")

../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation-Results
- BraTS-GLI-01095-000-t1n-inference.nii.gz
- BraTS-GLI-00025-000-t1n-inference.nii.gz
- BraTS-GLI-00583-000-t1n-inference.nii.gz
- BraTS-GLI-01335-000-t1n-inference.nii.gz
- BraTS-GLI-01039-000-t1n-inference.nii.gz
- BraTS-GLI-00525-000-t1n-inference.nii.gz
- BraTS-GLI-00311-000-t1n-inference.nii.gz
- BraTS-GLI-00074-000-t1n-inference.nii.gz
- BraTS-GLI-00807-000-t1n-inference.nii.gz
- BraTS-GLI-00139-000-t1n-inference.nii.gz
- BraTS-GLI-00045-001-t1n-inference.nii.gz
- BraTS-GLI-00642-000-t1n-inference.nii.gz
- BraTS-GLI-01145-000-t1n-inference.nii.gz
- BraTS-GLI-00122-000-t1n-inference.nii.gz
- BraTS-GLI-00625-000-t1n-inference.nii.gz
- BraTS-GLI-00736-000-t1n-inference.nii.gz
- BraTS-GLI-01266-000-t1n-inference.nii.gz
- BraTS-GLI-00282-000-t1n-inference.nii.gz
- BraTS-GLI-01459-000-t1n-inference.nii.gz
- BraTS-GLI-01526-000-t1n-inference.nii.gz
- BraTS-GLI-01406-000-t1n-inference.nii.gz
- BraTS-GLI-00608-001-t1n-i

You might want to check if the output file names reported above match the naming convention from Synapse (see Section [Create your Segmentation Files](https://www.synapse.org/#!Synapse:syn51156910/wiki/622349) )

Also note that your output files have to be **full size images** with dimension 250x250x144.

## (Optional) Server-Side Evaluation Script

After your uploaded your full size inpainted images, our evaluation script will compare your results against the ground truth.

The following code is used:
    

In [None]:
from tqdm import tqdm
from pathlib import Path
import nibabel as nib
import torch
import numpy as np
from evaluation_utils import compute_metrics  # evaluation metrics

In [None]:
# Get submission metadata
teamName = "YourTeamName"

# Task dataset (on synapse server)
dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training")
solutionFilePaths = list(dataset_path.rglob("**/BraTS-GLI-*-*-t1n.nii.gz"))
print(f"Task: {dataset_path}")
print(f"\tlen: {len(solutionFilePaths)}")

# Solution dataset (participant upload)
resultsFolder = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation-Results")
resultFilePaths = list(resultsFolder.rglob("**/BraTS-GLI-*-*-t1n-inference.nii.gz"))
print(f"Solution: {resultsFolder}")
print(f"\tlen: {len(resultFilePaths)}")


# Evaluation
performance = {"folderName": [], "SSIM": [], "PSNR": [], "MSE": []}

print(f"FolderName\t\tSSIM\t\tPSNR\t\tMSE")
for resultFilePath in resultsFolder.rglob("**/BraTS-GLI-*-*-t1n-inference.nii.gz"):
    folderName = resultFilePath.name[:19]  # constant to extract folder name (file prefix)
    folderPath = dataset_path.joinpath(folderName)
    if folderPath.exists() == False:
        print(f'Result with ID "{folderName}" has no corresponding solution folder {folderPath}')
    else:
        performance["folderName"].append(folderName)
        # Read result
        result_img = nib.load(resultFilePath)
        result = torch.Tensor(result_img.get_fdata()).unsqueeze(0).unsqueeze(0)
        # Inference mask
        mask_path = folderPath.joinpath(f"{folderName}-mask-healthy.nii.gz")
        mask_img = nib.load(mask_path)
        mask = torch.Tensor(mask_img.get_fdata()).bool().unsqueeze(0).unsqueeze(0)
        # Ground truth
        t1n_path = dataset_path.joinpath(folderName).joinpath(f"{folderName}-t1n.nii.gz")
        t1n_img = nib.load(t1n_path)
        t1n = torch.Tensor(t1n_img.get_fdata()).unsqueeze(0).unsqueeze(0)
        # Compute metrics
        MSE, PSNR, SSIM = compute_metrics(result, t1n, mask)
        # Scores
        performance["SSIM"].append(SSIM)
        performance["PSNR"].append(PSNR)
        performance["MSE"].append(MSE)

        print(f'{folderName}\t{performance["SSIM"][-1]:.8f}\t{performance["PSNR"][-1]:.8f}\t{performance["MSE"][-1]:.8f}')

Task: ../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training
	len: 1251
Solution: ../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation-Results
	len: 250
FolderName		SSIM		PSNR		MSE
BraTS-GLI-01095-000	0.73533165	18.98053741	0.01264312
BraTS-GLI-00025-000	0.85468560	23.56568146	0.00439979
BraTS-GLI-00583-000	0.90652430	23.86465454	0.00410709
BraTS-GLI-01335-000	0.58858001	18.82366943	0.01311092
BraTS-GLI-01039-000	0.74511403	19.31783676	0.01170082
BraTS-GLI-00525-000	0.86806691	16.18917274	0.00625038
BraTS-GLI-00311-000	0.71633214	19.23754311	0.01191914
BraTS-GLI-00074-000	0.87123865	22.82328796	0.00343628
BraTS-GLI-00807-000	0.69997513	17.71653175	0.01691792
BraTS-GLI-00139-000	0.86283368	21.34919739	0.00490174
BraTS-GLI-00045-001	0.81980222	23.09348106	0.00490515
BraTS-GLI-00642-000	0.93146986	18.16591263	0.00619885
BraTS-GLI-01145-000	0.99699104	21.38383865	0.00131520
BraTS-GLI-00122-000	0.83625728	20.57025719	0.00876949
BraTS-GLI-00625-000	0.99056989	17.82758522	0.0050

BraTS-GLI-00103-000	0.83002311	21.87991905	0.00648647
BraTS-GLI-01083-000	0.93768609	22.70738411	0.00536120
BraTS-GLI-00147-000	0.92037821	16.27783966	0.02336437
BraTS-GLI-01395-000	0.82200992	15.90441608	0.01304252
BraTS-GLI-00511-000	0.90050954	19.23569870	0.01189767
BraTS-GLI-01153-000	0.75760996	16.96236038	0.02012630
BraTS-GLI-00329-000	0.97851264	20.35997391	0.00701343
BraTS-GLI-00134-000	0.66781825	14.05030727	0.03935222
BraTS-GLI-00127-000	0.79075748	18.15543938	0.00858160
BraTS-GLI-01158-000	0.69315177	15.21132088	0.03012089
BraTS-GLI-00310-000	0.66220158	18.40862083	0.01442573
BraTS-GLI-00788-000	0.76342779	21.54329681	0.00700923
BraTS-GLI-01454-000	0.95857537	19.75567627	0.00189115
BraTS-GLI-00201-000	0.79629630	20.28840828	0.00935749
BraTS-GLI-00526-000	0.89388758	18.82326126	0.01311215
BraTS-GLI-01096-000	0.79441059	22.87995148	0.00508816
BraTS-GLI-01496-000	0.70355493	18.76943398	0.01327568
BraTS-GLI-01517-000	0.98185021	15.70634747	0.00438873
BraTS-GLI-00791-000	0.960601

In [None]:
# Overall statistics:
samples = len(performance["folderName"])
SSIM_stat = (np.mean(performance["SSIM"]), np.median(performance["SSIM"]), np.std(performance["SSIM"]))
PSNR_stat = (np.mean(performance["PSNR"]), np.median(performance["PSNR"]), np.std(performance["PSNR"]))
MSE_stat = (np.mean(performance["MSE"]), np.median(performance["MSE"]), np.std(performance["MSE"]))

print(f"Team statistics\tSamples\tSSIM\t\t\tPSNR\t\tMSE")
print(
    f"{teamName}\t{samples}\t{SSIM_stat[0]:.5f} ±{SSIM_stat[2]:.5f}\t{PSNR_stat[0]:.5f} ±{PSNR_stat[2]:.5f}\t{MSE_stat[0]:.5f} ±{MSE_stat[2]:.5f}"
)

Team statistics	Samples	SSIM			PSNR		MSE
YourTeamName		250	0.79695 ±0.12912	18.68752 ±2.64184	0.01119 ±0.00792


### Creating output files for the web frontend

To nicely display these results, the web frontend expects them either in JSON or YAML format.

In [None]:
## Convert to json
import json

# Team details
detail_rows = []
for i in range(len(performance["folderName"])):
    row = (performance["folderName"][i], performance["PSNR"][i], performance["MSE"][i], performance["SSIM"][i])
    detail_rows.append(row)

# Team dictionary
teamDict = {
    "name": teamName,
    "samples": samples,
    "metrics": {
        "SSIM": {"mean": SSIM_stat[0], "median": SSIM_stat[1], "std": SSIM_stat[2]},
        "PSNR": {"mean": PSNR_stat[0], "median": PSNR_stat[1], "std": PSNR_stat[2]},
        "MSE": {"mean": MSE_stat[0], "median": MSE_stat[1], "std": MSE_stat[2]},
    },
    "sample_details": detail_rows,
}

json_str = json.dumps(teamDict, indent=4)
print(json_str)

with open(f"evaluation_{teamName}.json", "a") as f:
    f.write(json_str)

{
    "name": "YourTeamName",
    "samples": 250,
    "metrics": {
        "PSNR": {
            "median": 18.687524795532227,
            "std": 2.6418407271642326
        },
        "MSE": {
            "median": 0.011189654003828764,
            "std": 0.007922094256614243
        },
        "SSIM": {
            "median": 0.7969546318054199,
            "std": 0.12912171779192505
        }
    },
    "sample_details": [
        [
            "BraTS-GLI-01095-000",
            18.98053741455078,
            0.012643116526305676,
            0.735331654548645
        ],
        [
            "BraTS-GLI-00025-000",
            23.56568145751953,
            0.004399785306304693,
            0.8546856045722961
        ],
        [
            "BraTS-GLI-00583-000",
            23.864654541015625,
            0.00410709111019969,
            0.9065243005752563
        ],
        [
            "BraTS-GLI-01335-000",
            18.82366943359375,
            0.01311091985553503,
        

In [None]:
import yaml

yaml_str = yaml.dump(teamDict)
print(yaml_str)
with open(f"evaluation_{teamName}.yaml", "a") as f:
    f.write(yaml_str)

metrics:
  MSE:
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - &id001 !!python/object/apply:numpy.dtype
      args:
      - f8
      - false
      - true
      state: !!python/tuple
      - 3
      - <
      - null
      - null
      - null
      - -1
      - -1
      - 0
    - !!binary |
      AAAA8Jnqhj8=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      7eD7fXU5gD8=
  PSNR:
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAoAGwMkA=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      N0ojZH0iBUA=
  SSIM:
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAAKeA6T8=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AFGPeQ+HwD8=
name: YourTeamName
sample_details:
- !!python/tuple
  - BraTS-GLI-01095-000
  - 18.98053741455078
  - 