# Evaluation

This notebook guides you through the generation of inpainting results which can be uploaded to Synapse. This notebook complements the respective [Synapse wiki page on submissions](https://www.synapse.org/#!Synapse:syn51156910/wiki/622349) for our inpainting challenge. Make sure to also read through this synapse page.

Additionally, we also show how the evaluation script works that runs on the Synapse servers. For all examples, we use our baseline model. To have t1n images to evaluate against in chapter ```Server-Side Evaluation Script```, we use the **training dataset** instead of the **validation dataset**. For your submission during the validation phase, you need to use the actual validation set of course. Just comment in the respective line below (marked with a ```TODO```).

*Notebook runtime:* Roughly 20 minutes (10 generation, 10 evaluation) using 16 cores and 1 GPU.


## Section Overview
    
- **Inpainting Submissions - Example**:  
    Create inpaitning results which can be uploaded to the Synapse server during the validation phase. You will probably want to **modify this script for your specific approach/model**.
    
- **(Optional) Server-Side Evaluation Script**:  
    Demonstration of the evaluation process on the Synapse server.


*Note:* For more details/context on the evaluation metrics, see our website\* or [section 2.5 "Performance Evaluation"](https://arxiv.org/abs/2305.08992) of our challenge manuscript. 


## Inpainting Submissions - Example

We use our baseline model to generate inference results and save them in a properly formatted output folder.

**Note:** You might want to use the [actual challenge validation set](https://www.synapse.org/#!Synapse:syn51684975).


In [2]:
# Make other sub-repositories available
import sys
from pathlib import Path

repoRoot = Path(".").absolute().parent
sys.path.append(str(repoRoot))
baselineRoot = Path(".").absolute().parent.joinpath("baseline")
sys.path.append(str(baselineRoot))

# Imports
import torch
import numpy as np
import nibabel as nib
from tqdm import tqdm
from baseline.baseline_utils import get_latest_Checkpoint  # to load last checkpoint
from baseline.dataset3D import Dataset_Inference
from baseline.train_Pix2Pix3D import Pix2Pix3D

# Define output folder
resultsFolder = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training-Results")
# resultsFolder = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation-Results") #TODO: comment in for validation phase submission!
resultsFolder.mkdir(exist_ok=True)

# Get validation dataset 
dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training")
#dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation") #TODO: comment in for validation phase submission!
crop_shape = (208, 208, 144)  # the cuboid our model works on.
seed = 2023
train_p, val_p = 0.80, 0.2  # original 0.8, 0.2, or if you use the actual valdiation set, no split is required!
torch.manual_seed(seed)
dataset = Dataset_Inference(dataset_path, crop_shape=crop_shape, center_on_mask=True)
train_set, validation_set = torch.utils.data.random_split(dataset, [train_p, val_p])
dataset = validation_set  # use a very small subset of the validation set

# Get latest model checkpoint
modelName = "Pix2Pix3D"
latest_checkpoint = get_latest_Checkpoint(modelName, version="*", log_dir_name=repoRoot.joinpath("baseline").joinpath("lightning_logs"))

if latest_checkpoint == None:
    raise UserWarning("No latest model found!")

# Load your model
gpus = [0]
model = Pix2Pix3D.load_from_checkpoint(latest_checkpoint, map_location=torch.device("cuda"))
model.eval()  # Make drop-out/Norms and other deterministic
model.cuda()  # Move to GPU

# Do inference for all samples in the dataset
for sample in tqdm(dataset):
    voided_image = sample["voided_image"].unsqueeze(0)  # add batch dimension (1, X, Y, Z) -> (1, 1, X, Y, Z)
    mask = sample["mask"].unsqueeze(0)

    with torch.no_grad():
        prediction = model.forward(voided_image.cuda(), mask.cuda())
        prediction = prediction.cpu().numpy()[0]  # remove batch
        result, img = Dataset_Inference.get_result_image(prediction, sample)

    # Save to output folder
    sampleFolderName = Path(sample["t1n_voided_path"]).parent.name  # e.g. "BraTS-GLI-00006-000"
    nib.save(img, resultsFolder.joinpath(f"{sampleFolderName}-t1n-inference.nii.gz"))

100%|██████████| 250/250 [04:24<00:00,  1.06s/it]


In [3]:
# List output files
print(resultsFolder)
for fileName in list(resultsFolder.glob("*")):
    print(f"- {fileName.name}")

../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training-Results
- BraTS-GLI-01095-000-t1n-inference.nii.gz
- BraTS-GLI-00025-000-t1n-inference.nii.gz
- BraTS-GLI-00583-000-t1n-inference.nii.gz
- BraTS-GLI-01335-000-t1n-inference.nii.gz
- BraTS-GLI-01039-000-t1n-inference.nii.gz
- BraTS-GLI-00525-000-t1n-inference.nii.gz
- BraTS-GLI-00311-000-t1n-inference.nii.gz
- BraTS-GLI-00074-000-t1n-inference.nii.gz
- BraTS-GLI-00807-000-t1n-inference.nii.gz
- BraTS-GLI-00139-000-t1n-inference.nii.gz
- BraTS-GLI-00045-001-t1n-inference.nii.gz
- BraTS-GLI-00642-000-t1n-inference.nii.gz
- BraTS-GLI-01145-000-t1n-inference.nii.gz
- BraTS-GLI-00122-000-t1n-inference.nii.gz
- BraTS-GLI-00625-000-t1n-inference.nii.gz
- BraTS-GLI-00736-000-t1n-inference.nii.gz
- BraTS-GLI-01266-000-t1n-inference.nii.gz
- BraTS-GLI-00282-000-t1n-inference.nii.gz
- BraTS-GLI-01459-000-t1n-inference.nii.gz
- BraTS-GLI-01526-000-t1n-inference.nii.gz
- BraTS-GLI-01406-000-t1n-inference.nii.gz
- BraTS-GLI-00608-001-t1n-inf

You might want to check if the output file names reported above match the naming convention from Synapse (see Section [Create your Segmentation Files](https://www.synapse.org/#!Synapse:syn51156910/wiki/622349) )

Also note that your output files have to be **full size images** with dimension 250x250x144.

## (Optional) Server-Side Evaluation Script

After your uploaded your full size inpainted images, our evaluation script will compare your results against the ground truth.

The following code is used:
    

In [4]:
from tqdm import tqdm
from pathlib import Path
import nibabel as nib
import torch
import numpy as np
from inpainting.challenge_metrics_2023 import generate_metrics

In [5]:
# Get submission metadata
teamName = "YourTeamName"

# Task dataset (on synapse server)
dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training")
# resultsFolder = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation-Results") #TODO: comment in for validation phase submission!
solutionFilePaths = list(dataset_path.rglob("**/BraTS-GLI-*-*-t1n.nii.gz"))
print(f"Task: {dataset_path}")
print(f"\tlen: {len(solutionFilePaths)}")

# Solution dataset (participant upload)
resultsFolder = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training-Results_2")
#dataset_path = Path("../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Validation") #TODO: comment in for validation phase submission!
resultFilePaths = list(resultsFolder.rglob("**/BraTS-GLI-*-*-t1n-inference.nii.gz"))
print(f"Solution: {resultsFolder}")
print(f"\tlen: {len(resultFilePaths)}")


# Evaluation
performance = {}
initialized = False
for resultFilePath in list(resultsFolder.rglob("**/BraTS-GLI-*-*-t1n-inference.nii.gz"))[:10]:
    folderName = resultFilePath.name[:19]  # constant to extract folder name (file prefix)
    folderPath = dataset_path.joinpath(folderName)
    if folderPath.exists() == False:
        print(f'Result with ID "{folderName}" has no corresponding solution folder {folderPath}')
    else:
        # Read result (prediction)
        result_img = nib.load(resultFilePath)
        result = torch.Tensor(result_img.get_fdata()).unsqueeze(0)

        # Healthy mask (evaluation volume)
        mask_path = folderPath.joinpath(f"{folderName}-mask-healthy.nii.gz")
        mask_img = nib.load(mask_path)
        mask_healthy = torch.Tensor(mask_img.get_fdata()).bool().unsqueeze(0)

        # Reference (ground truth)
        t1n_path = dataset_path.joinpath(folderName).joinpath(f"{folderName}-t1n.nii.gz")
        t1n_img = nib.load(t1n_path)
        t1n = torch.Tensor(t1n_img.get_fdata()).unsqueeze(0)

        # Normalization Tensor (on what basis shall be normalized? On the model input!)
        t1n_voided_path = dataset_path.joinpath(folderName).joinpath(f"{folderName}-t1n-voided.nii.gz")
        t1n_voided_img = nib.load(t1n_voided_path)
        t1n_voided = torch.Tensor(t1n_voided_img.get_fdata()).unsqueeze(0)

        # Compute metrics
        metrics_dict = generate_metrics( #expected Tensor dimension: 1 x 255 x 255 x 
            prediction=result,
            target=t1n,
            mask=mask_healthy,
            normalization_tensor= t1n_voided #former: t1n * ~mask_healthy
            )
            
        # Initialize if necessary
        if(initialized == False):
            print(f"FolderName\t",end="")
            performance["folderName"] = []
            for metric_name in metrics_dict.keys():
                print(f"\t\t{metric_name}",end="")
                performance[metric_name] = []
            print()
            initialized = True

        #Add data to performance dict and print it
        performance["folderName"].append(folderName)
        print(f'{folderName}\t',end="")
        for metric_name in metrics_dict.keys():
            performance[metric_name].append(metrics_dict[metric_name])
            print(f"\t{metrics_dict[metric_name]:.8f}",end="")
        print("\n")



Task: ../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training
	len: 1251
Solution: ../ASNR-MICCAI-BraTS2023-Local-Synthesis-Challenge-Training-Results
	len: 250
FolderName		SSIM		PSNR		MSE
BraTS-GLI-01095-000	0.72735530	17.72808456	0.01687296
BraTS-GLI-00025-000	0.85241127	23.09199715	0.00490682
BraTS-GLI-00583-000	0.90313363	23.52370453	0.00444252
BraTS-GLI-01335-000	0.57686007	17.47027206	0.01790493
BraTS-GLI-01039-000	0.74069595	18.89037704	0.01291107
BraTS-GLI-00525-000	0.86155689	20.97226524	0.00799417
BraTS-GLI-00311-000	0.70442790	18.87165070	0.01296686
BraTS-GLI-00074-000	0.84550190	21.70802879	0.00674834
BraTS-GLI-00807-000	0.66356975	16.82251740	0.02078491
BraTS-GLI-00139-000	0.85885954	23.10871506	0.00488797
BraTS-GLI-00045-001	0.82188219	23.12951088	0.00486462
BraTS-GLI-00642-000	0.93095821	21.14937782	0.00767471
BraTS-GLI-01145-000	0.99700242	28.39275169	0.00144785
BraTS-GLI-00122-000	0.82608730	19.31245804	0.01171532
BraTS-GLI-00625-000	0.99057257	22.88106346	0.005151

BraTS-GLI-00185-000	0.51425904	14.51819038	0.03533303
BraTS-GLI-00103-000	0.81746924	20.06198502	0.00985829
BraTS-GLI-01083-000	0.93756139	22.21614456	0.00600324
BraTS-GLI-00147-000	0.92037809	17.16148949	0.01922432
BraTS-GLI-01395-000	0.82098758	18.30490494	0.01477438
BraTS-GLI-00511-000	0.90046769	19.11684227	0.01225507
BraTS-GLI-01153-000	0.75516993	16.25734520	0.02367366
BraTS-GLI-00329-000	0.97852129	21.38629150	0.00726726
BraTS-GLI-00134-000	0.66261166	13.31556129	0.04660621
BraTS-GLI-00127-000	0.78547186	20.12664032	0.00971261
BraTS-GLI-01158-000	0.68430358	12.88384819	0.05147722
BraTS-GLI-00310-000	0.64554858	16.88127708	0.02050558
BraTS-GLI-00788-000	0.75300717	20.75441360	0.00840540
BraTS-GLI-01454-000	0.95810485	26.98786354	0.00200085
BraTS-GLI-00201-000	0.79301792	20.29152107	0.00935078
BraTS-GLI-00526-000	0.89324188	16.51476288	0.02231123
BraTS-GLI-01096-000	0.76074880	19.44641495	0.01135948
BraTS-GLI-01496-000	0.68914717	18.17435265	0.01522525
BraTS-GLI-01517-000	0.981850

In [6]:
# Overall statistics:

#Header
print(f"Team name\t\tsamples\t\t",end="")
for metric_name in performance.keys():
    if(metric_name == "folderName"):
        continue
    print(f"{metric_name}\t\t\t",end="")
print()

#Team statistics
print(f"{performance['folderName'][0]}\t{len(performance['folderName'])}\t",end="")
teamStats = {}
for metric_name in performance.keys():
    if(metric_name == "folderName"):
        continue
    stat = (np.mean(performance[metric_name]), np.median(performance[metric_name]), np.std(performance[metric_name]))
    print(f"\t{stat[0]:.5f} ±{stat[2]:.5f}",end="")
    teamStats[metric_name] = {"mean": stat[0], "median": stat[1], "std":stat[2]}

print()


Team statistics	Samples	SSIM			PSNR			MSE
YourTeamName	250	0.78754 ±0.13514	19.25725 ±3.23972	0.01513 ±0.01047


### Creating output files for the web frontend

To nicely display these results, the web frontend expects them either in JSON or YAML format.

In [7]:
## Convert to json
import json


# Team dictionary
teamDict = {
    "name": teamName,
    "samples": len(performance['folderName']),
    "metrics": teamStats
}

json_str = json.dumps(teamDict, indent=4)
print(json_str)

with open(f"evaluation_{teamName}.json", "a") as f:
    f.write(json_str)

{
    "name": "YourTeamName",
    "samples": 250,
    "metrics": {
        "SSIM": {
            "mean": 0.7875433679819107,
            "median": 0.7927334308624268,
            "std": 0.13513599872791573
        },
        "PSNR": {
            "mean": 19.257250980377197,
            "median": 18.806931495666504,
            "std": 3.2397190395537123
        },
        "MSE": {
            "mean": 0.015125259269494564,
            "median": 0.013161749113351107,
            "std": 0.010465739155855485
        }
    },
    "sample_details": [
        [
            "BraTS-GLI-01095-000",
            17.728084564208984,
            0.016872961074113846,
            0.7273553013801575
        ],
        [
            "BraTS-GLI-00025-000",
            23.091997146606445,
            0.0049068215303123,
            0.8524112701416016
        ],
        [
            "BraTS-GLI-00583-000",
            23.523704528808594,
            0.004442520439624786,
            0.9031336307525635
    

In [8]:
import yaml

yaml_str = yaml.dump(teamDict)
print(yaml_str)
with open(f"evaluation_{teamName}.yaml", "a") as f:
    f.write(yaml_str)

metrics:
  MSE:
    mean: !!python/object/apply:numpy.core.multiarray.scalar
    - &id001 !!python/object/apply:numpy.dtype
      args:
      - f8
      - false
      - true
      state: !!python/tuple
      - 3
      - <
      - null
      - null
      - null
      - -1
      - -1
      - 0
    - !!binary |
      7nw/7/35jj8=
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAEIz0ij8=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      yRE5uw9vhT8=
  PSNR:
    mean: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      gZVDM9tBM0A=
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAEJPOMkA=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      0OTY0PHqCUA=
  SSIM:
    mean: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      fT81Jo