Skip to content

Autoencoder-driven evaluation of machine learning prediction Reliability

Notifications You must be signed in to change notification settings

bmi-labmedinfo/relAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

Welcome to RelAI!

RelAI is a Python library designed to compute the pointwise reliability of Machine Learning predictions, as implemented by Peracchio et al. in [1].

AIM

Provide a method for the assessment of the reliability of machine learning predictions on new unseen samples.

Background

Reliability is one of the key points to achieve trustworthy AI systems [2]. Like Saria et al. did [3], we use the term reliability to indicate the degree of trust of the prediction of a ML model on a single instance, and we build a method that relies on two fundamental principles: the density and the local fit principles. The density principle checks if the new test case is close to the training distribution, while the local fit principle checks if the classifier was accurate on the training samples closest to the new test case.

Methods

The method implemented in this package computes the reliability of a new instance by evaluating the density and the local fit principles: a new instance is considered reliable if it is reliable according to both these principles, but they can also be applied separately.

Density Principle

The density principle is implemented with the use of an Autoencoder: it is exploited to learn how to reproduce the training samples, so that samples coming from the same distribution of the training set are characterized by a low reconstruction error, while samples far from the training distribution (out-of-distribution samples) are characterized by a high reconstruction error. To assess the "density reliability" of a new unseen instance, the mean squared error (MSE) between the instance itself and its projection produced by the autoencoder is evaluated with respect to a threshold: if MSE <= threshold, then the prediction on such new instance can be considered "density reliable", while if MSE > threshold, it is considered "density unreliable".

Local Fit Principle

The local fit principle is implemented by training a classifier (i.e. an MLP) on a dataset of synthetic points generated ad-hoc to characterize the local performance of the classifier in the feature space; each synthetic point is associated with the performance value (accuracy for classification problems, mean squared error for regression problems) of its k closest training samples, and then labelled with respect to a performance threshold. In case of a classification problem, each synthetic point is labelled as "local fit reliable" if the accuracy value of its k nearest training samples is equal or higher than a certain accuracy threshold, "local fit unreliable" otherwise. In case of a regression problem, each synthetic point is labelled as "local fit reliable" if the Mean Squared Error of its k nearest training samples is equal or lower than a certain MSE threshold, "local fit unreliable" otherwise. Finally, a classifier is trained on these so-labelled synthetic points, so that it learns how to classify new samples in terms of local fit reliability.

Documentation

Please find the Documentation of the library at https://rel-doc.readthedocs.io/en/latest/index.html

Installation

  1. Make sure you have the latest version of pip installed
pip install --upgrade pip  
  1. Install the ReliabilityPackage through pip
python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple ReliabilityPackage 

Usage

Classification Problem

Here's a simple example of usage of the ReliabilityPackage for a typical classification problem, using the breast_cancer dataset of sklearn.

  1. import the needed functions from the package
from ReliabilityPackage.ReliabilityFunctions import *
  1. import all the other necessary packages and functions
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import plotly.offline as pyo
  1. load the breast cancer dataset and split it in a training, a validation, and a test set
X, y = datasets.load_breast_cancer(return_X_y=True)

X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42)
  1. Train a classifier on the training set
clf = RandomForestClassifier(random_state=42, min_samples_leaf=10, n_estimators=100)
clf.fit(X_train, y_train)
  1. Create and train an autoencoder for the implementation of the Density Principle
    (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture)
ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000)
  1. Generate the dataset of the synthetic points and their associated values of accuracy
syn_pts, acc_syn_pts = generate_synthetic_points(problem_type = 'classification', predict_func=clf.predict, X_train=X_train, y_train=y_train, method='GN', k=5)
  1. Define a Mean Squared Error threshold and an Accuracy threshold
    (the mse_threshold_plot can be generated to see how the performances change based on percentiles of the MSE of the validation set)
fig_mse_thresh = mse_threshold_plot(ae, X_val, y_val, clf.predict, metric = 'balanced_accuracy')
fig_mse_thresh.show()

mse_thresh = perc_mse_threshold(ae, X_val, perc=95)
acc_thresh = 0.90
  1. Generate an instance of the ReliabilityDetector class for classification problems
RD = create_reliability_detector('classification', ae, syn_pts, acc_syn_pts, mse_thresh=mse_thresh, perf_thresh=acc_thresh, proxy_model="MLP")
  1. It is now possible to compute the Reliability of the test_set
test_reliability= compute_dataset_reliability(RD, X_test, mode='total')
reliable_test = X_test[np.where(reliability_test == 1)]
unreliable_test = X_test[np.where(reliability_test == 0)]

Regression Problem

Here's a simple example of usage of the ReliabilityPackage for a typical regression problem generated through the make_regression function of sklearn.

  1. import the needed functions from the package
from ReliabilityPackage.ReliabilityFunctions import *
  1. import all the other necessary packages and functions
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
  1. Generate a random regression dataset and split it in a training, a validation, and a test set
X, y = make_regression(n_samples=1000, n_features=20, noise=1, random_state=42)

X_seventy, X_test, y_seventy, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_seventy, y_seventy, test_size=0.30, random_state=42)
  1. Train a linear regressor on the training set
reg = LinearRegression().fit(X_train, y_train)
  1. Create and train an autoencoder for the implementation of the Density Principle
    (Please note that if the layer_sizes are not specified, the default autoencoder is built as follows: [dim_input, dim_input + 4, dim_input + 8, dim_input + 16, dim_input + 32]; if needed, specify a more suitable architecture)
ae = create_and_train_autoencoder(X_train, X_val, batchsize=80, epochs=1000)
  1. Generate the dataset of the synthetic points and their associated values of Mean Squared Error
syn_pts, mse_syn_pts = generate_synthetic_points(problem_type = 'regression', predict_func=reg.predict, X_train=X_train, y_train=y_train, method='GN', k=5)
  1. Define a Mean Squared Error threshold for the Density Principle and a performance threshold for the Local Fit Principle (MSE as the performance metric for the Local Fit Principle)
mse_thresh = perc_mse_threshold(ae, X_val, perc=95)
performance_thresh = 0.8
  1. Generate an instance of the ReliabilityDetector class for regression problems
RD = create_reliability_detector('regression', ae, syn_pts, mse_syn_pts, mse_thresh=mse_thresh, perf_thresh=performance_thresh, proxy_model="MLP")
  1. It is now possible to compute the Reliability of the test_set
reliability_test = compute_dataset_reliability(RD, X_test, mode='total')
reliable_test = X_test[np.where(reliability_test == 1)]
unreliable_test = X_test[np.where(reliability_test == 0)]

License

Distributed under the Creative Commons Attribution-NonCommercial 4.0 International License

Contacts

For any question or information, please contact us at lorenzo.peracchio01@universitadipavia.it

References

[1] Peracchio L, Nicora G, Parimbelli E, Buonocore TM, Bergamaschi R, Tavazzi E, et al. Evaluation of Predictive Reliability to Foster Trust in Artificial Intelligence. A case study in Multiple Sclerosis 2024. http://arxiv.org/abs/2402.17554
[2] Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self-assessment | Shaping Europe’s digital future 2020. https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment.
[3] Saria S, Subbaswamy A. Tutorial: Safe and Reliable Machine Learning. ArXiv 2019; abs/1904.07204. https://doi.org/10.48550/arXiv.1904.07204.

About

Autoencoder-driven evaluation of machine learning prediction Reliability

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published