# Evaluation

This notebook guides you through the generation of inpainting results which can be uploaded to Synapse. This notebook complements the respective [Synapse wiki page on submissions](https://www.synapse.org/#!Synapse:syn51156910/wiki/622349) for our inpainting challenge. Make sure to also read through this synapse page.

Additionally, we also show how the evaluation script works that runs on the Synapse servers. For all examples, we use our baseline model. 

*Notebook runtime:* Roughly 5 minutes using 16 cores and 1 GPU.


## Section Overview
    
- **Inpainting Submissions - Example**:  
    Create inpaitning results which can be uploaded to the Synapse server during the validation phase. You will probably want to **modify this script for your specific approach/model**.
    
- **(Optional) Server-Side Evaluation Script**:  
    Demonstration of the evaluation process on the Synapse server.


*Note:* For more details/context on the evaluation metrics, see section 2.5 "Performance Evaluation" of our challenge [manuscript](https://arxiv.org/abs/2305.08992)

## Inpainting Submissions - Example

We use our baseline model to generate inference results and save them in a properly formatted output folder.

**Note:** You might want to you the actual challenge validation set.\*

\* *todo, coming soon*


In [1]:
#Make other sub-repositories available
import sys
from pathlib import Path
repoRoot = Path(".").absolute().parent
sys.path.append(str(repoRoot))
baselineRoot = Path(".").absolute().parent.joinpath("baseline")
sys.path.append(str(baselineRoot))

#Imports
import torch
import numpy as np
import nibabel as nib
from tqdm import tqdm
from baseline.baseline_utils import get_latest_Checkpoint # to load last checkpoint
from baseline.dataset3D import Dataset_Inference
from baseline.train_Pix2Pix3D import Pix2Pix3D

#Define output folder
resultsFolder = Path("../BraTS2023_Inpainting_Results")
resultsFolder.mkdir(exist_ok=True)

#Get validation dataset (TODO: use actual validation dataset!)
dataset_path = Path("../BraTS2023_Dataset_Local_Synthesis")
crop_shape=(160, 160, 128) # the maximal cuboid your model can work on. Should be at least 208, 208, 160 !
#crop_shape=(208, 208, 160)
seed = 2023
train_p, val_p = 0.95, 0.05 # original 0.8, 0.2, or if you use the actual valdiation set, no split is required!
torch.manual_seed(seed)
dataset = Dataset_Inference(dataset_path, crop_shape=crop_shape, center_on_mask=True)
train_set, validation_set = torch.utils.data.random_split(dataset, [train_p, val_p])
dataset = validation_set # use a very small subset of the validation set

#Get latest model checkpoint
modelName = "Pix2Pix3D"
latest_checkpoint = get_latest_Checkpoint(
    modelName,
    version="*",
    log_dir_name=repoRoot.joinpath("baseline").joinpath("lightning_logs"))
    
if(latest_checkpoint == None):
    raise UserWarning("No latest model found!")

#Load your model
gpus = [0]
model = Pix2Pix3D.load_from_checkpoint(latest_checkpoint, map_location=torch.device('cuda')) 
model.eval()  # Make drop-out/Norms and other deterministic
model.cuda()  # Move to GPU

#Do inference for all samples in the dataset
for sample in tqdm(dataset):
    voided_image = sample["voided_image"].unsqueeze(0) # add batch (1, 160, 160, 96) -> (1, 1, 160, 160, 96)
    mask = sample["mask"].unsqueeze(0)
    
    with torch.no_grad():
        prediction = model.forward(voided_image.cuda(), mask.cuda())
        prediction = prediction.cpu().numpy()[0] #remove batch
        result, img = Dataset_Inference.get_result_image(prediction, sample)
        
    #Save to output folder
    sampleFolderName = Path(sample["t1n_voided_path"]).parent.name #e.g. "BraTS-GLI-00006-000"
    nib.save(img, resultsFolder.joinpath(f"{sampleFolderName}-t1n-inference.nii.gz"))



100%|██████████| 62/62 [01:08<00:00,  1.10s/it]


In [2]:
#List output files
print(resultsFolder)
for fileName in list(resultsFolder.glob("*")):
    print(f"- {fileName.name}")


../BraTS2023_Inpainting_Results
- BraTS-GLI-00495-000-t1n-inference.nii.gz
- BraTS-GLI-01034-000-t1n-inference.nii.gz
- BraTS-GLI-01259-000-t1n-inference.nii.gz
- BraTS-GLI-01610-000-t1n-inference.nii.gz
- BraTS-GLI-01128-000-t1n-inference.nii.gz
- BraTS-GLI-00456-000-t1n-inference.nii.gz
- BraTS-GLI-00193-000-t1n-inference.nii.gz
- BraTS-GLI-01141-000-t1n-inference.nii.gz
- BraTS-GLI-00266-000-t1n-inference.nii.gz
- BraTS-GLI-01133-000-t1n-inference.nii.gz
- BraTS-GLI-01138-000-t1n-inference.nii.gz
- BraTS-GLI-00046-000-t1n-inference.nii.gz
- BraTS-GLI-01187-000-t1n-inference.nii.gz
- BraTS-GLI-01205-000-t1n-inference.nii.gz
- BraTS-GLI-01520-000-t1n-inference.nii.gz
- BraTS-GLI-00020-001-t1n-inference.nii.gz
- BraTS-GLI-01311-000-t1n-inference.nii.gz
- BraTS-GLI-01333-000-t1n-inference.nii.gz
- BraTS-GLI-00547-001-t1n-inference.nii.gz
- BraTS-GLI-00502-001-t1n-inference.nii.gz
- BraTS-GLI-01226-000-t1n-inference.nii.gz
- BraTS-GLI-00332-000-t1n-inference.nii.gz
- BraTS-GLI-00658-000-

You might want to check if the output file names reported above match the naming convention from Synapse (see Section [Create your Sgementation Files](https://www.synapse.org/#!Synapse:syn51156910/wiki/622349) )

## (Optional) Server-Side Evaluation Script

After your inference script has run on the server, our evaluation script will run.
    
We will use the following metrics to quantify how realistic the synthesized image regions are compared to real ones: 
- structural similarity index measure (SSIM),
- peak-signal-to-noise-ratio (PSNR), 
- mean-square-error (MSE). 

The following code is used:
    

In [3]:
from tqdm import tqdm
from pathlib import Path
import nibabel as nib
import torch
import numpy as np
from evaluation_utils import compute_metrics #evaluation metrics


In [4]:
#Get submission metadata
teamName = "YourTeamName"

#Task dataset (on synapse server)
dataset_path = Path("../BraTS2023_Dataset_Local_Synthesis")
solutionFilePaths = list(dataset_path.rglob("**/BraTS-GLI-*-*-t1n.nii.gz"))
print(f"Task: {dataset_path}")
print(f"\tlen: {len(solutionFilePaths)}")

#Solution dataset (participant upload)
resultsFolder = Path("../BraTS2023_Inpainting_Results")
resultFilePaths = list( resultsFolder.rglob("**/BraTS-GLI-*-*-t1n-inference.nii.gz") )
print(f"Solution: {resultsFolder}")
print(f"\tlen: {len(resultFilePaths)}")


#Evaluation
performance = {
    "folderName":[],
    "SSIM":[],
    "PSNR":[],
    "MSE":[]
}

print(f'FolderName\t\tSSIM\t\tPSNR\t\tMSE')
for resultFilePath in resultsFolder.rglob("**/BraTS-GLI-*-*-t1n-inference.nii.gz"):
    folderName = resultFilePath.name[:19]
    
    folderPath = dataset_path.joinpath(folderName)
    
    if folderPath.exists() == False:
        print(f"Result with ID \"{folderName}\" has no corresponding solution folder {folderPath}")
    else:
        performance["folderName"].append(folderName)
        
        #Read result
        result_img = nib.load(resultFilePath)
        result = torch.Tensor(result_img.get_fdata()).unsqueeze(0).unsqueeze(0)
        
        #Inference mask
        mask_path = folderPath.joinpath(f"{folderName}-mask-healthy.nii.gz")
        mask_img = nib.load(mask_path)
        mask = torch.Tensor(mask_img.get_fdata()).bool().unsqueeze(0).unsqueeze(0)
        
        #Ground truth
        t1n_path = dataset_path.joinpath(folderName).joinpath(f"{folderName}-t1n.nii.gz")
        t1n_img = nib.load(t1n_path)
        t1n = torch.Tensor(t1n_img.get_fdata()).unsqueeze(0).unsqueeze(0)
        
        #Compute metrics
        MSE, PSNR, SSIM = compute_metrics(result,t1n,mask)

        #Scores
        performance["SSIM"].append(SSIM)
        performance["PSNR"].append(PSNR)
        performance["MSE"].append(MSE)
        
        #TODO: what about precisions?
        print(f'{folderName}\t{performance["SSIM"][-1]:.8f}\t{performance["PSNR"][-1]:.8f}\t{performance["MSE"][-1]:.8f}')



Task: ../BraTS2023_Dataset_Local_Synthesis
	len: 1251
Solution: ../BraTS2023_Inpainting_Results
	len: 62
FolderName		SSIM		PSNR		MSE
BraTS-GLI-00495-000	0.73359513	19.52866745	0.01114637
BraTS-GLI-01034-000	0.58504981	18.62007141	0.01374019
BraTS-GLI-01259-000	0.73262960	17.78317451	0.01666028
BraTS-GLI-01610-000	0.92542565	21.79668236	0.00217543
BraTS-GLI-01128-000	0.94124007	17.55110168	0.00870547
BraTS-GLI-00456-000	0.71897346	20.42724609	0.00906307
BraTS-GLI-00193-000	0.97608650	25.86091042	0.00259360
BraTS-GLI-01141-000	0.75952679	18.22871017	0.01503588
BraTS-GLI-00266-000	0.72738451	17.45674515	0.01579036
BraTS-GLI-01133-000	0.73548496	20.13336945	0.00968854
BraTS-GLI-01138-000	0.91767311	17.40706253	0.01815529
BraTS-GLI-00046-000	0.84327847	20.26145935	0.00941573
BraTS-GLI-01187-000	0.99789476	23.45488739	0.00108046
BraTS-GLI-01205-000	0.89534050	16.13674545	0.01278230
BraTS-GLI-01520-000	0.69365406	20.34408760	0.00923828
BraTS-GLI-00020-001	0.98565340	23.63872910	0.00237683
Bra

In [8]:
#Overall statistics:
samples = len(performance["folderName"])
SSIM_stat = (np.median(performance["SSIM"]), np.std(performance["SSIM"]))
PSNR_stat = (np.median(performance["PSNR"]), np.std(performance["PSNR"]))
MSE_stat = (np.median(performance["MSE"]), np.std(performance["MSE"]))

print(f"Team statistics\tSamples\tSSIM\t\t\tPSNR\t\tMSE")
print( f"{teamName}\t\t{samples}\t{SSIM_stat[0]:.5f} ±{SSIM_stat[1]:.5f}\t{PSNR_stat[0]:.5f} ±{PSNR_stat[1]:.5f}\t{MSE_stat[0]:.5f} ±{MSE_stat[1]:.5f}" )
#TODO: what about precisions?

Team statistics	Samples	SSIM			PSNR		MSE
YourTeamName		62	0.84312 ±0.11981	19.94672 ±2.88756	0.00896 ±0.00659


**Note:** On the submission server, the complete image will be used for evlauation (crop_shape = 240x240x155). Note how a biger evaluation region makes the performance evaluation more reliably:

**crop_shape** |  **SSIM** | **PSNR**  | **MSE**  | 
|---|---|---|---|
| (128, 128, 96)   | 0.77738 **±0.20456**  | 17.48961 **±4.55310** | 0.01658 **±0.09909**  |  
| (160,160,128)  |  0.84312 **±0.11981** | 19.94672 **±2.88756**  | 0.00896 **±0.00659**  |  
| (240,240,155)  | \* | \* | \* |  

### Creating output files for the web frontend

In [9]:
## Convert to json
import json

# Team details
detail_rows = []
for i in range(len(performance["folderName"])):
    row = ( performance["folderName"][i], performance["PSNR"][i], performance["MSE"][i], performance["SSIM"][i])
    detail_rows.append(row)
    
# Team dictionary
teamDict = {
    "name": teamName,
    "samples": samples,
    "metrics": {
        "PSNR": {
            "median": PSNR_stat[0],
            "std": PSNR_stat[1]
        },
        "MSE": {
            "median": MSE_stat[0],
            "std": MSE_stat[1]
        },
        "SSIM": {
            "median": SSIM_stat[0],
            "std": SSIM_stat[1]
        }
    },  
    "sample_details": detail_rows
}

json_str = json.dumps(teamDict, indent=4)
print(json_str)


{
    "name": "YourTeamName",
    "samples": 62,
    "metrics": {
        "PSNR": {
            "median": 19.9467191696167,
            "std": 2.8875642391835528
        },
        "MSE": {
            "median": 0.008959121070802212,
            "std": 0.006589996654470476
        },
        "SSIM": {
            "median": 0.8431159555912018,
            "std": 0.11981018929701116
        }
    },
    "sample_details": [
        [
            "BraTS-GLI-00495-000",
            19.528667449951172,
            0.01114636566489935,
            0.7335951328277588
        ],
        [
            "BraTS-GLI-01034-000",
            18.620071411132812,
            0.013740185648202896,
            0.5850498080253601
        ],
        [
            "BraTS-GLI-01259-000",
            17.783174514770508,
            0.016660284250974655,
            0.7326295971870422
        ],
        [
            "BraTS-GLI-01610-000",
            21.796682357788086,
            0.0021754333283752203,
     

In [10]:
import yaml

yaml_str = yaml.dump(teamDict)
print(yaml_str) #TODO: is that how a yaml has to look like??!

metrics:
  MSE:
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - &id001 !!python/object/apply:numpy.dtype
      args:
      - f8
      - false
      - true
      state: !!python/tuple
      - 3
      - <
      - null
      - null
      - null
      - -1
      - -1
      - 0
    - !!binary |
      AAAA4ChZgj8=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      DYbJwRz+ej8=
  PSNR:
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAMFzyM0A=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      zSOjR7sZB0A=
  SSIM:
    median: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      AAAAUM766j8=
    std: !!python/object/apply:numpy.core.multiarray.scalar
    - *id001
    - !!binary |
      gRvCbOGrvj8=
name: YourTeamName
sample_details:
- !!python/tuple
  - BraTS-GLI-00495-000
  - 19.528667449951172
  -