# Solution Evaluation

This notebook give an overview about how an AI component candidate will be evaluated.  Even if it does not compute all metrics, it give a precise idea , about how to build an compatible AI component, and how it is used for score generation.

## Objective

The task is to build a AI component whose main task whose main objective is to predict the welding state from a given image.

### Inputs to the AI component
The AI component shall takes as input : 
- A list of numpy arrays representing the list of input images to process . 
- A list  of dictionnary containing a meta-description of the image.

### The outputs of the AI component  
It shall return a dictionnary with four keys {predictions , probabilities, OOD_score, explainabilities}
    
First key is required    
- *predictions*:  The list of predicted welding state. The welding state can have three possible values: [OK, KO, UNKNOWN]
    
The following keys are not mandatory, but will greatly participate to the improvement of the quality score for the developed AI component if present:

- *probabilities*:  The list of associated probabilities for each images [$P_{KO}$, $P_{OK}$, $P_{UNKNWON}$]  where $\sum_{i \in \{\text{OK, KO, UNKNOWN}\}} P_i = 1$.

- *OOD_scores*: The list OOD score predicted by the AI component of for each images. This score X is a real positive. If $0\leq X < 1$ the image is considered as *In-Domain*, if $X >1$ the image is considered as *Out-of-Domain (OoD)*.

- *explainabilities*: The list of explainabilities for each input images. An explainability is an intensity matrix ( matrix with values between 0 and 1) with same size of the image tensor, that represents the importance of each pixel in the model prediction

### Evaluation criteria
From the predictions made by the developed AI component, we will compute a set of different evaluation criteria as discussed below:

- **Operationnal metrics**: Measure the gain brought the AI component compared to a human only qualification process. This metrics is based on the confusion matrix and penalize strongly false negative predictions. 

- **Uncertainty metrics**: Measure the ability of the AI component to produce a calibrated prediction confidence indicator expressing risk of model errors.

- **Robustness metrics**: Measure the ability of the AI component to have invariant output is images have slight perturbations (blut, luminosity, rotation, translation)

- **Monitoring metrics**: Measure the ability of the AI component to detect if an input image is ood, and gives the appropriate output ->Unknown

- **Explainability metrics**: Measure the ability of the AI component to give appropriate explanations

### Prerequisites
Install the dependencies if it is not already done. For more information look at the [readme](../README.md) file.

##### For development on Local Machine

In [None]:
### Install a virtual environment
# Option 1:  using conda (recommended)
!conda create -n venv python=3.12
!conda activate venv
!pip install torch==2.6.0

# Option 2: using virtualenv
# !pip install virtualenv
# !virtualenv -p /usr/bin/python3.12 venv
# !source venv_lips/bin/activate

### Install the welding challenge package
# Option 1: Get the last version from Pypi
# !pip install 'challenge_welding'

# Option 2: Get the last version from github repository
# !git clone https://github.com/XX
# !pip install -U .

##### For Google Colab Users
You could also use a GPU device from Runtime > Change runtime type and by selecting T4 GPU.

In [None]:
### Install the welding challenge package
# Option 1: Get the last version of LIPS framework from PyPI (Recommended)
!pip install 'XX'
!pip install torch==2.6.0

In [None]:
# Option 2: Get the last version from github repository
!git clone https://github.com/XX
!pip install -U .
!pip install torch==2.6.0

Attention: You may restart the session after this installation, in order that the changes be effective.

In [None]:
# Clone the starting kit
!git clone https://github.com/confianceai/Challenge-Welding-Starter-Kit.git
# and change the directory to the starting kit to be able to run correctly this notebook
import os
os.chdir("Challenge-Welding-Starter-Kit")

## Import the required libraries

In [1]:
import os
import sys
# sys.path.insert(0, "..") # Uncomment this line For local tests without pkg installation, to make challenge_welding module visible 
from challenge_welding.user_interface import ChallengeUI
from challenge_welding.Evaluation_tools import EvaluationPipeline

# Build your AI component
An AI component shall be a buildable python package . Thus, it is a folder that shall have at least the following files and folders :
   ```
    /
    setup.py
    requirements.txt
    challenge_solution/
        AIcomponent.py
        __init__.py
 ```

Only those files will be used by the evaluation pipeline to test your AI component. The names of files and folders shall not be changed.
The most important file is the AIcomponent.py file that is the interface of your AI component. Only this interface will be used by the evaluation pipeline to interact with your component. That is why this file require some strict named methods and class to be present. It shall follow this abstract class [Aicomponent interface](absAIcomponent.py)

In this starter-kit , we provided an example of such AI component, that is evaluated here. This example AI component has been built just to show what is a correct a AI component architecture. It has no good performance in kind of quality predictions.

# Create an evaluation pipeline

An evaluation pipeline take an AI component (the solution to test) and evaluate it by generating differents metrics and scores. In this notebooks we generate only operationnal metrics and uncertainty scores. 

An evaluation pipeline :
- Install your AI component as a python package 
- Load the AI component of the solution you want to test
- Apply inference on this AI component on one or many evaluation datasets. Each inference process on a dataset generate as output a dataframe( stored as a parquet file) containing evaluation dataset metadata extended with prediction results.
- Apply metrics computation functions that takes only inference_results as parquet files to generate output metrics

## Init the pipeline
To set the evaluation pipeline, you should instantiate the `EvaluationPipeline` class with the following parameters:
- `proposed_solution_path`: Path to the folder containing the AI component
- `meta_root_path`: Path where pipeline results will be stored (inference results, and computed metrics)
- `cache_strategy`: Could be `local` or `remote`. If set on `local`, all image used for evaluation , will be locally stored in a cache directory. Else, image will be loaded directly from downloding
- `cache_dir`: A directory where the cache should be stored

In [2]:
# Define path of AI component to test
AI_comp_path = os.path.join("..", "reference-solutions", "Solution-1")

# Initialize test pipeline
myPipeline=EvaluationPipeline(proposed_solution_path=AI_comp_path,
                              meta_root_path="starter_kit_pipeline_results",
                              cache_strategy="local",
                              cache_dir="evaluation_cache")

## Load your AI component into the evaluation environnement

The `load_proposed_solution()` method below is divided into two main tasks:
- Install the python package of your AI component --> ( execute the commande pip install AI_comp_path)
- Call the load_model() method of your AIcomponent interface


In [None]:
myPipeline.load_proposed_solution()

## Load an evaluation dataset metadescription

In the next cell, we load the metadata of the evaluation dataset we want to use to evaluate our AI component

In [None]:
# In this example we will choose a small dataset
ds_name="example_mini_dataset"

# Load all metadata of your dataset as a pandas dataframe, (you can point to a local cache metafile instead of original one pointing on remote repository)
my_challenge_UI=ChallengeUI()
evaluation_ds_meta_df=my_challenge_UI.get_ds_metadata_dataframe(ds_name)

display(evaluation_ds_meta_df.head(5))

##  Perform inference on an evaluation dataset
Once the AI component is loaded to the pipeline, we use `perform_grouped_inference` function to make inference on a dataset. For demonstration, we use the `example_mini_dataset` as evaluation dataset. This function takes the following arguments:
- `evaluation_dataset`: dataframe containing metadescription of your evaluation ds
- `results_inference_path`: path to file that will contains inference_results
- `batch_size`: You can group inference by batch if it is required

In [None]:
result_df=myPipeline.perform_grouped_inference(evaluation_dataset=evaluation_ds_meta_df,
                                               results_inference_path=myPipeline.meta_root_path+"/res_inference.parquet",
                                               batch_size=150
                                              ) 

The inference results are added as additional columns to the dataframe. These new columns are added :
- `predicted_state` imported from `predicitions` key of the dictionary returned by the predict function of your AI component 
- `scores KO` imported from `probabilities` key of the dictionary returned by the predict function of your AI component 
- `scores OK` imported from `probabilities` key of the dictionary returned by the predict function of your AI component 
- `score OOD` imported from `OOD scores` key of the dictionary returned by the predict function of your AI component 


In [None]:
display(result_df)

## Compute operationnal metrics

The first criterion to evaluate your component quality will be the operationnal cost. The objective is to compare the cost of using your AI component versus
the cost of using only humans to process images from the evaluation dataset. The import scores to maximimze is "gain in euros" and inference_time

As the example AI component we provided in this starter-kit has not been designed to be performant, here the gain_score is very bad

The below method compute these two metrics, results are display as output, and in results folder defined at this pipeline initilization too.

In [None]:
# Compute operationnal metrics
myPipeline.compute_operationnal_metrics(AIcomp_name="sol_0", # This name is only used for the name of result files
                                        res_inference_path=myPipeline.meta_root_path+"/res_inference.parquet" # inference_results file
                                        )

## Compute uncertainty metrics

Same thing with uncertainty metrics

To detail



In [None]:
res_df,final_results=myPipeline.compute_uncertainty_metrics(res_inference_path=myPipeline.meta_root_path+"/res_inference.parquet",
                                                            AIcomp_name="sol_0"
                                                            )      