# Test your AI component compatibility

This notebook demonstrates how your AI component can be built and tested to determine if it will be compatible with the pipeline evaluation process.

This notebook provides an overview of how an AI component candidate will be evaluated. Even though it does not compute metrics, it gives a precise idea of how to build a compatible AI component and how it is used in inference for score generation.

## Objective

The task is to build an AI component whose main objective is to predict the welding state from a given image.

### Inputs to the AI component
The AI component shall take the following inputs: 
- A list of numpy arrays representing the input images to process . 
- A list of dictionnaries containing a meta-description of the images.

### Outputs of the AI component  
It shall return a dictionnary with four keys {predictions , probabilities, OOD_score, explainabilities}
    
The first key is required:   
- *predictions*:  The list of predicted welding state. The welding state can have three possible values: [OK, KO, UNKNOWN]
    
The following keys are not mandatory, but their presence will significantly contribute to the improvement of the quality score for the developed AI component:

- *probabilities*:  The list of associated probabilities for each image. The list should have the format: [$P_{KO}$, $P_{OK}$, $P_{UNKNWON}$]  where $\sum_{i \in \{\text{OK, KO, UNKNOWN}\}} P_i = 1$.

- *OOD_scores*: The list of OOD scores predicted by the AI component for each images. This score (X) is a real positive number. If $0\leq X < 1$ the image is considered as *In-Domain*, if $X >1$ the image is considered as *Out-of-Domain (OoD)*.

- *explainabilities*: The list of explainabilities for each input image. An explainability is an intensity matrix (a matrix with values between 0 and 1) of the same size of the image tensor, which represents the importance of each pixel in the model's prediction.

### Prerequisites
Install the dependencies if it is not already done. For more information look at the [readme](../README.md) file.

##### For development on Local Machine

In [1]:
### Install a virtual environment
# Option 1:  using conda (recommended)
# !conda create -n venv python=3.12
# !conda activate venv
# !pip install torch==2.6.0

# Option 2: using virtualenv
# !pip install virtualenv
# !virtualenv -p /usr/bin/python3.12 venv
# !source venv_lips/bin/activate

### Install the welding challenge package
# Option 1: Get the last version from Pypi
# !pip install 'challenge_welding'

# Option 2: Get the last version from github repository
# !git clone https://github.com/XX
# !pip install -U .

##### For Google Colab Users
You could also use a GPU device from Runtime > Change runtime type and by selecting T4 GPU.

In [2]:
### Install the welding challenge package
# Option 1: Get the last version of LIPS framework from PyPI (Recommended)
# !pip install 'challenge_welding'
# !pip install torch==2.6.0

In [3]:
# Option 2: Get the last version from github repository
# !git clone https://github.com/XX
# !pip install -U .
# !pip install torch==2.6.0

Attention: You may restart the session after this installation, in order that the changes be effective.

In [4]:
#import subprocess
#repo_url = "git+https://github.com/confianceai/Challenge-Welding-Starter-Kit.git"
#requirements_url = "https://raw.githubusercontent.com/confianceai/Challenge-Welding-Starter-Kit/refs/heads/main/requirements.txt"
#subprocess.run([sys.executable, "-m", "pip", "install", repo_url])
#subprocess.run([sys.executable, "-m", "pip", "install", "-r", requirements_url])

## Build your Ai component

An AI component must be a buildable Python package . It should be a folder that contains at least the following files and folders:
   ```
    /
    MANIFEST.in
    setup.py
    requirements.txt
    challenge_solution/
        AIcomponent.py
        __init__.py
 ```

You can refer to the [Requirements and evaluation process](../docs/Requirements_and_Evaluation_process.md) sections for more details.

Only these files will be used by the evaluation pipeline to test your AI component. The names of files and folders must not be changed.
The most important file is the AIcomponent.py file, which serves as the interface for your AI component. This interface will be the only part used by the evaluation pipeline to interact with your component. For this reason, this file must include specific methods and classes as required by the abstract class [AIComponent interface](../challenge_welding/AIComponent_interface.py) interface.

You are free to add other files as needed to make your component work.

The easiest way to build your AI component for the challenge is to start with the AI component template provided in the ```AIcomponent_template``` folder of this repository and complete the folder following the process described in its ```readme.md```.

## Test an AI component compatibility

In this section, we will test the AI component's compatibility with the evaluation pipeline of this challenge. The following lines will verify whether the proposed AI component can be loaded properly into the evaluation pipeline and used for inference computation on a given dataset. We do not provide the computation metrics function here, but all score metrics computation functions used by the evaluation pipeline are based solely on the inference results of the AI component across multiple evaluation datasets. Therefore, if inference works with your AI component, it will ensure that the score computation and the full evaluation process will work as well.

In this example, we will use the reference solution provided within this challenge, which is accessible here:

In [5]:
# Set here the path of the AI component to test here, you can set a local filsystem path, or an url to a public git repository
# You can replace it by the path to your own component to test

AI_component_path= "../Challenge-Welding-Reference-Solution-1"  
#AI_component_path="https://github.com/confianceai/Challenge-Welding-Reference-Solution-1"

In [6]:
import sys
# sys.path.insert(0, "..") # Uncomment this line For local tests without pkg installation, to make challenge_welding module visible 
from challenge_welding.user_interface import ChallengeUI
from challenge_welding.inference_tools import TestAIComponent

## Launch the test pipeline

The test pipeline take an AI component (the solution to test) and perform the following tasks

- Install your AI component as a python package 
- Load the AI component of the solution you want to test
- Apply inference on this AI component on one or many evaluation datasets. Each inference process on a dataset generate as output a dataframe( stored as a parquet file) containing evaluation dataset metadata extended with prediction results.


## Init the pipeline

In [7]:
# Initialize test pipeline
myPipeline=TestAIComponent(proposed_solution_path=AI_component_path, # Set here the AI component path you want to evaluate
                              meta_root_path="starter_kit_test_AI_comp_results", # Set the directory here where pipeline results will be stored (inference results, and computed metrics)
                              cache_strategy="local", # "local" or "remote" .If set on "local", all image used for evaluation , will be locally stored in a cache directory. Else, image will be loaded directly from downloding
                              cache_dir="test_cache") # chosen directory for cache

## Load your AI component into the evaluation environnement

The load_proposed_solution() method below is divided into two main tasks:
- Install the python package of your AI component --> ( execute the commande pip install AI_comp_path)
- Call the load_model() method of your AIcomponent interface


In [8]:
myPipeline.load_proposed_solution()

current_dir C:\SAUVEGARDES_FIN_CONFIANCE\CSIA++\Challenge\Build2025\Challenge-Welding-Reference-Solution-1\challenge_solution


  super().__init__(**kwargs)


AI component loaded


## Load an evaluation dataset metadescription

In the next cell, we load the metadata of the evaluation dataset we want to use to evaluate our AI component

In [9]:
# In this example we will choose a small dataset

ds_name="example_mini_dataset"

# Load all metadata of your dataset as a pandas dataframe, (you can point to a local cache metafile instead of original one pointing on remote repository)

my_challenge_UI=ChallengeUI()
evaluation_ds_meta_df=my_challenge_UI.get_ds_metadata_dataframe(ds_name)

display(evaluation_ds_meta_df.head(5))

https://minio-storage.apps.confianceai-public.irtsysx.fr/challenge-welding/datasets/example_mini_dataset/metadata/ds_meta.parquet


Unnamed: 0,sample_id,class,timestamp,welding-seams,labelling_type,resolution,path,sha256,storage_type,data_origin,blur_level,blur_class,luminosity_level,external_path
0,data_92409,OK,22/01/20 12:49,c33,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b'GN\xd7\xa7B\x98\xb0r\xa4\xdfn\x8cT\x8e:\xc07...,s3,real,701.938341,blur,50.533365,http://minio-storage.apps.confianceai-public.i...
1,data_67943,OK,20/02/20 23:53,c102,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b's\xf6;3i-\x10\xfd8y\xf2\xe1\xa6JQ\x84`\xc6\x...,s3,real,715.670702,blur,47.050604,http://minio-storage.apps.confianceai-public.i...
2,data_4843,OK,20/01/20 20:34,c20,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b'\xdbZ\xb3\x12e&\xd5\x83\x13*\x87S\xe1\x19\xc...,s3,real,715.85738,blur,46.204245,http://minio-storage.apps.confianceai-public.i...
3,data_25309,OK,18/07/2022 20:18,c102,operator,"[960, 540]",challenge-welding/datasets/example_mini_datase...,b'/c\xe3\xd9\xc8|&\xaf\xb1}\xf6\xe3s\xae\xea\x...,s3,real,869.513006,blur,34.35928,http://minio-storage.apps.confianceai-public.i...
4,data_76144,OK,03/10/19 21:14,c20,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b'\xca%\x0c\x92\x1f\x0c\x00\xcc\x02\r\xb8\xf1\...,s3,real,2676.246904,clean,46.256244,http://minio-storage.apps.confianceai-public.i...


##  Perform inference on an evaluation dataset

We pass the evaluation_dataframe in the method below. It use the loaded AI component to perform inference of each sample referenced in the evaluation dataframe and add the inference results as new columns

The predict method of your AI component will be called with the **device** parameter on "cuda" . 


In [None]:
result_df=myPipeline.perform_grouped_inference(evaluation_dataset=evaluation_ds_meta_df, # dataframe containing metadescription of your evaluation ds
                                               results_inference_path=myPipeline.meta_root_path+"/res_inference.parquet", # path to file that will contains inference_results
                                               batch_size=150 # You can group inference by batch if you want
                                              ) 

Number of  batch to process for inference :  20  , start processing..


  0%|                                                                                                                               | 0/20 [00:00<?, ?it/s]

0


  5%|█████▉                                                                                                                 | 1/20 [00:02<00:50,  2.64s/it]

1


 10%|███████████▉                                                                                                           | 2/20 [00:04<00:41,  2.29s/it]

2


 15%|█████████████████▊                                                                                                     | 3/20 [00:06<00:35,  2.09s/it]

3


 20%|███████████████████████▊                                                                                               | 4/20 [00:08<00:32,  2.03s/it]

4


 25%|█████████████████████████████▊                                                                                         | 5/20 [00:10<00:29,  1.99s/it]

5


 30%|███████████████████████████████████▋                                                                                   | 6/20 [00:12<00:27,  1.93s/it]

6


 35%|█████████████████████████████████████████▋                                                                             | 7/20 [00:14<00:25,  1.94s/it]

7


 40%|███████████████████████████████████████████████▌                                                                       | 8/20 [00:16<00:23,  1.93s/it]

8


 45%|█████████████████████████████████████████████████████▌                                                                 | 9/20 [00:18<00:21,  1.94s/it]

9


 50%|███████████████████████████████████████████████████████████                                                           | 10/20 [00:19<00:19,  1.92s/it]

10


 55%|████████████████████████████████████████████████████████████████▉                                                     | 11/20 [00:21<00:17,  1.91s/it]

11


 60%|██████████████████████████████████████████████████████████████████████▊                                               | 12/20 [00:23<00:15,  1.89s/it]

12


 65%|████████████████████████████████████████████████████████████████████████████▋                                         | 13/20 [00:25<00:13,  1.89s/it]

13


You can see the inference results below. See that new column has been added :
- "predicted_state" imported from "predicitions" key of your AI component predict method output dictionnary
- "scores KO" imported from "probabilities" key of your AI component predict method output dictionnary
- "scores OK" imported from "probabilities" key of your AI component predict method output dictionnary
- "score OOD" imported from "OOD scores" key of your AI component predict method output dictionnary


In [None]:
display(result_df)

Check the output dataframe contain columns corresponding to ouptut fields filled with correct values. 
Here the reference component tested shall create columns named, 
- predicted_states
- scores_KO
-  scores OK
-   OOD_scores
-   
If your result dataframe is correct and the parquet is well created, then your AI comp is compatible with the evaluation pipeline


# Evaluation process

The Trustworthy AI Challenge aim to build a trustworthy AI-Component that assists in weld seam conformity qualification. The evaluation of a trustworthy AI-Component will be done across several dimensions of trustworthy to ensure it reliability, robustness, and efficiency facing real observation that may be affected by hazards. The evaluation framework consists of six "trust-attributes" : \textbf{Performance, uncertainty, robustness, ood monitoring, generalization, and drift}.  These aspects are some of the trust attributes that may determine the AI system’s ability to operate effectively in real-world scenarios as for example be robust to small environmental hazard, generalize across datasets, express confidence or be able to face anomalies.

From the predictions made by the developed AI component on many evaluation datasets, we will compute a set of different evaluation criteria as discussed below:

- **Performance metrics**: Measure the gain brought the AI component compared to a human only qualification process. This metrics is based on the confusion matrix and penalize strongly false negative predictions. 

- **Uncertainty metrics**: Measure the ability of the AI component to produce a calibrated prediction confidence indicator expressing risk of model errors.

- **Robustness metrics**: Measure the ability of the AI component to have invariant output is images have slight perturbations (blut, luminosity, rotation, translation)

- **OOD-Monitoring metrics**: Measure the ability of the AI component to detect if an input image is ood, and gives the appropriate output ->Unknown

- **Generalisation metrics**: Measure the ability of the AI component to generalize to a unseen context.

- **Drift metrics**: Measure the ability of the AI component to generalize to a unseen context.



# Scoring

# Trustworthy ML Challenge Evaluation Protocol

## Multi-Criteria Aggregation Methodology

The ML-Trustworthy evaluation of the submitted AI component follows a **multi-criteria aggregation methodology** designed to ensure a fair and reliable assessment of various **trust attributes**.  
The table below illustrates the principle of metrics aggregation:

![image](./docs/assets/Metric_tabular.png)

## Example of a comparative results table for four fictional submissions.

The following table illustrates a performance overview of four different virtual solutions which were actually evaluated (using manually constructed inference files) using the trustworthy AI pipeline. The indicator color codes are for illustrative purposes only.

Among these four submissions:
 - **Solu-Perfect**: The ideal solution, achieving perfect scores in both performance and all trust-related attributes.
- **Solu-No-Trust**: A realistic solution without any dedicated mechanisms to address trustworthy AI concerns.
- **Solu-With-Trust**: The same base solution as Solu-No-Trust, but enhanced with mechanisms for handling uncertainty, robustness, OOD monitoring, and drift management.
- **Solu-Random**: A baseline solution that returns random predictions.

We observe that the **Solu-Perfect** solution achieves a perfect score across all metrics. Both **Solu-No-Trust** and **Solu-With-Trust** show identical scores in terms of Performance and Generalization. However, Solu-With-Trust significantly improves its trustworthiness scores across other attributes such as Uncertainty, Robustness, Monitoring, and Drift Management.

![image](./docs/assets/Tabular_metrique.png)

## ML-Trustworthy Evaluation design

The evaluation protocol was designed to assess both **performance** and **trustworthiness requirements**, based on the **Operational Design Domain (ODD)** derived from operational needs linked to the AI component's automated function (i.e., assistance in weld validation).

After identifying the relevant **trust attributes** (e.g., *robustness*) associated with specific **trust properties** (e.g., *output invariance under blur perturbation*), the evaluation methodology was structured into the following stages:

- **Evaluation Specification**  
  What specific model behaviors do we want to assess and validate?

- **Evaluation Set Specification**  
  What kind of data must be used or constructed to test whether the model exhibits the expected behavior under specific conditions?

- **Evaluation Set Design**  
  What data should be selected or generated to build these evaluation sets?

- **Evaluation Set Validation**  
  How can we ensure that the evaluation datasets are reliable and representative of the scenarios being analyzed?

- **Criteria Specification**  
  What criteria should be defined to measure the presence or absence of the expected behavior?

- **Metrics Design**  
  What metrics can be used to quantify these criteria?

- **Trust-KPI Design**  
  How can these criteria be aggregated into a **Trust-KPI** for each trust attribute?

## Steps of the Metrics and Trust-KPI Computation

The aggregation process consists in several key steps:

In this section, $\alpha_{i}$ $\beta_{i}$ and $k_{i}$ are weigthing or scaling coeficients used for the multicriterions aggregation.


### 1. Computation of Metrics Related to Trust Attributes

- Several metrics are computed for each attribute using specific evaluation datasets, in order to capture different aspects of the attribute’s performance.
- These evaluation datasets are either selected or synthetically generated to test distinct behavioral criteria.

### 2. Normalization of Attribute Metrics

- All attribute-specific metrics are normalized to a score within the range \[0, 1\], where **1** represents the best possible performance.
- Normalization is performed using appropriate transformations (e.g., sigmoid functions, exponential decay), depending on the nature of each metric.

### 3. Trust-KPI Aggregation

- For each attribute denoted X, a specific aggregation function combines the k-th normalized X metrics into a single **trust-KPI** denoted $I_X$.
- This allows for a comprehensive representation of the model’s performance with respect to each trust attribute.

$$ I_X = agg(X_{metric_1},..,X_{metric_k})$$

For example, if X is the attribute "performance":  $X_{metric_1}=OP$, $X_{metric_2}=ML$, and $X_{metric_3}=Time$

### 4.Piecewise Linear Rescaling of Trust-KPIs

  - To ensure consistency and comparability across attributes, each KPI undergoes a **piecewise linear rescaling**.
  - This rescaling takes into account both predefined performance and confidence requirements.
  - This rescaling accounts for predefined performance and confidence thresholds, aligning the raw scores with evaluation constraints.

$$f'(x) =
\begin{cases}
\frac{\beta_1}{\alpha_1} f(x), & 0 \leq f(x) < \alpha_1 \\
\frac{\beta_2 - \beta_1}{\alpha_2 - \alpha_1} (f(x) - \alpha_1) + \beta_1, & \alpha_1 \leq f(x) \leq \alpha_2 \\[8pt]
\frac{1 - \beta_2}{1 - \alpha_2} (f(x) - \alpha_2) + \beta_2, & \alpha_2 < f(x) \leq 1
\end{cases}$$

### 5. Weighted Aggregation of Trust-KPIs
  - The rescaled attribute KPIs are then aggregated into a **final evaluation score** using a **weighted mean**.
  - Each weight reflects the relative importance of its corresponding attribute within the overall trustworthy AI assessment.
$$ score= \alpha_1*I_{perf} + \alpha_2*I_{U} + \alpha_3*I_{rob} + \alpha_4*I_{ood} + \alpha_5*I_{gen}+\alpha_6*I_{drift}$$

### 6. Purpose of the Aggregation Protocol

The goal of this aggregation process is to produce a single, comprehensive trust score that captures the system’s performance across six key trust attributes. Each of these attributes is assessed through multiple criteria, measured with relevant metrics and normalized to reflect their practical impact.

## Trust-KPI and metrics by attribute.

### Performance attribute

**Purpose**: Measures the model's predictive accuracy and efficiency, ensuring it meets baseline expectations in a controlled environment.

**Evaluation sets**: Standard ML evaluation set based on a representative 20% split of the dataset.

**Metrics**:
  - **OP-Perf** (Operational Performance): Evaluates model performance through an operational view using confusion-matrix-based metrics that account for the cost of different error types and weld criticality.

       $$OP = \sum_{k}^{|N|} \sum_{i}^{true_{class}} \sum_{j}^{pred_{class}} \mathbb{1}_{Top_{class}(\hat{y}_k)=j} * cost(i,j,k,k_{seam}) $$

  where N is the number of sample in the evaluation datasset and $k_{seams}$ is the name of the welding-seam


  - **ML-Perf** (Machine Learning Performance): Assesses performance using standard ML metrics such as precision.
    $$ ML = \frac{\sum_{i=1}^{N} \mathbb{1} (y_i = 1 \land \hat{y}_i = 1)}{\sum_{i=1}^{N} \mathbb{1} (\hat{y}_i = 1)}$$

where $y_i$ is the ground truth and $\hat{y}_i$ is the AI component prediction


  - **Inference Time (Times)**: Measures computational efficiency and runtime.

**Performance-KPI**: Combines OP-Perf and ML-Perf using a weighted average, penalized by inference time to reflect operational constraints.
$$ I_{perf}=\frac{(\alpha_{op} e^{-k_c OP} + \alpha_{ml} ML)}{1 + k_t ln(1+t)} $$

where $t$ is the inference time

### Uncertainty assessement
**Purpose** : Evaluates the AI component’s ability to express meaningful and calibrated uncertainty, helping assess the risk of decision errors.

**Evaluation sets**: Standard ML evaluation set based on a representative 20% split of the dataset.

**Metrics**:
  - **U-OP** (Uncertainty Operational Gain): Relative measures of the virtual gain (in operational term) to consider probabilistic outputs compared to hard outputs predictions in relation to the gap between the perfect solution and the current hard outputs predictions.
  $$c^{U} = \sum_{k}^{|N|} \sum_{i}^{true_{class}} \sum_{j}^{pred_{class}} \hat{y}_k(j) * cost(i,j,k,k_{seam}) $$

  $$ UOP = \frac{(c^{U} - c^{op})}{(c^{op} - c^{op}_{perfect})}$$

  - **U-Calib** (Calibration Quality): Evaluates how well predicted probabilities align with actual error rates (e.g., Expected Calibration Error).
    $$ UCalib = \sum_{m=1}^{M} \frac{|B_m|}{N} acc(B_m) - conf(B_m)$$

Fore more information, see Expected calibration error definition :  lien wiki

**Uncertainty-KPI** : Combines Uncertainty Operational Gain with calibration error.
$$I_{U} = e^{k_{UOP}} * (1 - UCalib)^{k_{UCalib}} $$

### Robustness
**Purpose**: Assesses model stability under perturbations such as blur, lighting variation, rotation, and translation.

**Evaluation sets**: Generated by applying synthetic perturbations to a weld-balanced subset of the standard evaluation set.

![image](./images/Blur_illu.png)

**Metrics**:
   - **Blur Robustness** : Aggregation (AUC) of the ML-performance (Precision score) across increasing perturbation levels .
   - **Luminance Robustness** : Aggregation (AUC) of the ML-performance (Precision score) across increasing perturbation levels.
   - **Rotation Robustness** : Aggregation (AUC) of the ML-performance (Precision score) across increasing perturbation levels.
   - **Translation Robustness**: Aggregation (AUC) of the ML-performance (Precision score) across increasing perturbation levels.

$$ r^x = Auc(ML_{\delta_1}/,..., ML_{\delta_k}) $$ 

where $x\in \{blur,lum,rot,trans\}$ and $\delta_k$ are the different perturbation levels

**Robustness-KPI** : Weighted aggregation of robustness scores across all perturbation types.

$$ I_{rob} = \sum_{i \in {blur,lum,rot,trans}} \alpha_{r_i} * r^i $$ 

### OOD-Monitoring 

**Purpose**: Evaluates the model's ability to detect and handle out-of-distribution (OOD) inputs.

**Evaluation sets**: Includes both synthetic and real OOD datasets with a balanced mix of normal and OOD samples. Real OOD samples are manually selected, and synthetic OOD samples are generated through transformations.

![image](./docs/assets/Ood_illu.png)

**Metrics**
  - **Real-OOD score** : AUROC on the real OOD evaluation set denoted $OOD_{real}$.
  - **Syn-OOD score** :AUROC on the synthetic OOD evaluation set $OOD_{syn}$.
  

**OOD-Monitoring KPI**: Weighted average of real and synthetic OOD detection performance.
$$I_{ood} = \alpha_{syn}*OOD_{syn} + \alpha_{real}*OOD_{real}$$

### Generalization 
**Purpose**: Measures the model’s ability to generalize to unseen weld types that share characteristics with the training set.

**Evaluation sets**: Built using data from weld types excluded during training but with similar visual/structural traits.

![image](./docs/assets/Gen_illu.png)

**Metrics**:
  - **OP-Perf-g** Operational performance on the generalization set.
  - **ML-Perf-g** ML performance (e.g., precision) on the generalization set.

**Generalization-KPI**: Aggregated from OP-Perf-g and ML-Perf-g.
$$I_{gen} = \alpha_{op}*e^{-k_{op}*OP_g} + \alpha_{ml}*ML_{g}$$

Subindice $g$ in $ML_g$ or $OP_g$ means that metrics are computed on the generalization dataset.

### Data-Drift handling
**Purpose**: Evaluates both the robustness and OOD detection of the model in response to gradual data drift.

**Evaluation sets**: Constructed by applying increasing levels of synthetic perturbations to a normal data sequence, simulating drift. Final segments are manually labeled as OOD.

![image](./docs/assets/Drift_illu.png)

**Metrics**:
  - Perf-OP-d : Operational performance under drift.
  - OOD-d: "OOD-Detection score" : AUROC on the drift-induced OOD subset.  

**Data-Drift-KPI**: Combines performance and detection ability during simulated drift.
$$I_{drift} = \alpha_{OP_{d}} * e^{-k_{op} * OP_{d}} + \alpha_{OOD_{d}}*OOD_{d}$$


where subindice $d$ means that the metrics are computed only on the drifted dataset