# FAIRmodels.org validation notebook

This notebook performs an initial validation of a given AI model. The AI model is described and packaged using the repository at [https://fairmodels.org](https://fairmodels.org).

This notebook assumes metadata according to [https://fairmodels.org](https://fairmodels.org), and packaged into an image using [https://github.com/MaastrichtU-BISS/FAIRmodels-model-package](https://github.com/MaastrichtU-BISS/FAIRmodels-model-package).

The first step below is to identify the:

- URI (URL) of the model metadata
- Dataset for validation (currently excel sheet)
- Outcome parameter/column in the dataset

The current notebook assumes a classification problem, however the last cell in the notebook can be adapted to accomodate different predictions.

In [1]:
url = "https://v2.fairmodels.org/instance/3f400afb-df5e-4798-ad50-0687dd439d9b"
validation_filename = "thunder_reduced.xlsx"
outcome_parameter = 'pCR'

## Install needed dependencies

In [2]:
! pip install docker




[notice] A new release of pip is available: 23.0.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Fetch information about model
Fetch model metadata based on the URI of the model metadata, pull the docker image from the repository, and run the model locally on a random port (bound to localhost only).

In [3]:
# fetch model metadata from URL, and specifically request the accept type as JSON-LD
import requests
import json

response = requests.get(url, headers={'Accept': 'application/ld+json'})
model_metadata = json.loads(response.text)
docker_image_name = model_metadata['FAIRmodels image name']['@value']

In [4]:
# pull docker image
import docker
client = docker.from_env()
try:
    client.images.pull(docker_image_name)
except docker.errors.APIError as e:
    print("could not pull image: ", e)

# run docker image and expose port 8000 to a random port which is freely available
import socket
import random

port = random.randint(49152, 65535)
# check if port is available
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    while s.connect_ex(('localhost', port)) == 0:
        port = random.randint(49152, 65535)

container = client.containers.run(docker_image_name, detach=True, ports={8000:port}, remove=True)

# wait for the server to start
import time
time.sleep(5)

## Assess model input parameters

Based on the docker image, find the input parameter column names. Afterwards, test in the excel sheet whether these columns can be found.

In [5]:
# get the JSON from the root webpage in the container
image_root_url = f"http://localhost:{port}"
response = requests.get(image_root_url)
data = response.json()

# get input/output parameters of the first model
model_parameters = data["path_parameters"]
print(f"Model parameters: {model_parameters}")

columns = model_parameters + [outcome_parameter]

Model parameters: ['cT', 'cN', 'tLength']


In [6]:
# read excel sheet as input data using pandas dataframe
import pandas as pd

input_data = pd.read_excel(validation_filename)

def check_columns_exist(input_data, columns, raise_exception=False):
    """
    Check if all columns exist in the input data

    :param input_data: pandas dataframe
    :param columns: list of columns
    :param raise_exception: boolean, if True, raise exception if columns are missing, otherwise print missing columns

    :return: None
    """
    missing_columns = []
    for column in columns:
        if column not in input_data.columns:
            missing_columns.append(column)
    
    # if there are missing columns, throw exception
    if len(missing_columns) > 0:
        if raise_exception:
            raise ValueError(f"Missing columns: {missing_columns}")
        else:
            print(f"Missing columns: {missing_columns}")
    else:
        print("All columns exist")

check_columns_exist(input_data, columns)

Missing columns: ['tLength']


In [7]:
# rename column "SizeZ" to "tLength"
input_data = input_data.rename(columns={"SizeZ": "tLength"})
check_columns_exist(input_data, columns)

All columns exist


## Execution of model (inferencing)

In the following section, the input data is sent to the model for inferencing, and results are retrieved.

When execution is done, the model container is stopped and removed.

In [8]:
# replace all cells with "x" to NA
input_data = input_data.replace("x", pd.NA)

# input data should only contain complete cases
input_data = input_data.dropna(subset=columns)

# convert pandas dataframe to JSON, but only for the columns specified in the model
input_data_json = json.loads(input_data[columns].to_json(orient='records'))

# send the input data to the model
response = requests.post(f"http://localhost:{port}/predict", json=input_data_json)

# send the input data to the model
response = requests.get(f"http://localhost:{port}/status", json=input_data_json)
print(f"Model triggered with input data, {response.json()['message']}")

if response.json()["status"] != 3:
    raise AppError("Prediction model execution exited with an error message")

# fetch model results
response = requests.get(f"http://localhost:{port}/result")
input_data['predictions'] = response.json()

Model triggered with input data, Prediction completed


In [9]:
# stop the running container
container.stop()

## Calculate performance metrics

Now we can calculate the performance metrics. For demonstration purposes, only AUC and Brier score are included.

In [10]:
# convert outcome parameter to boolean
input_data[outcome_parameter] = input_data[outcome_parameter] == 1

# calculate AUC
from sklearn.metrics import roc_auc_score, brier_score_loss

auc = roc_auc_score(input_data[outcome_parameter], input_data['predictions'])

# calculate brier score
brier = brier_score_loss(input_data[outcome_parameter], input_data['predictions'])

scores = {
    "AUC": auc,
    "Brier": brier
}

print(f"Scores: {scores}")

Scores: {'AUC': 0.6988795518207283, 'Brier': 0.18465341199853313}
