# Evaluation and calibration with uncertain ground truth

This tutorial goes through Google DeepMind's [Uncertain Ground Truth (UGT)](https://github.com/google-deepmind/uncertain_ground_truth) framework for evaluating and calibrating machine learning models under uncertain ground truth. The tutorial is based on the paper [Conformal prediction under ambiguous ground truth](https://openreview.net/forum?id=CAd6V2qXxc) [1] and the case study [Evaluating AI systems under uncertain ground truth: a case study in dermatology](https://arxiv.org/abs/2307.02191) [2].

The tutorial will look to explain how the approach can be used to other experimental settings, and how it can be used to evaluate and calibrate machine learning models. For ease it can be run in colab (click below) or you can run the setup commands in the terminal within a virtual enviroment.

<a target="_blank" href="https://colab.research.google.com/github/SamPIngram/uncertain_ground_truth_experiments/blob/main/tutorial.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# 1. Setup

In [None]:
!git clone https://github.com/google-deepmind/uncertain_ground_truth.git
!pip install tensorflow tensorflow-datasets absl-py scikit-learn jax jupyter matplotlib
!cd uncertain_ground_truth && python -m unittest discover -s . -p '*_test.py'

Cloning into 'uncertain_ground_truth'...
remote: Enumerating objects: 47, done.[K
remote: Counting objects: 100% (47/47), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 47 (delta 6), reused 47 (delta 6), pack-reused 0[K
Receiving objects: 100% (47/47), 90.92 KiB | 4.13 MiB/s, done.
Resolving deltas: 100% (6/6), done.
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting qtconsole (from jupyter)
  Downloading qtconsole-5.5.1-py3-none-any.whl (123 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.4/123.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Collecting qtpy>=2.4.0 (from qtconsole->jupyter)
  Downloading QtPy-2.4.1-py3-none-any.whl (93 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.5/93.5 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
Collecting jedi>=0.16 (from ipython>=5.0.0->ipykernel->jupyter)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━