# Solution Evaluation

This notebook give an overview about how an AI component candidate will be evaluated.  Even if it does not compute all metrics, it give a precise idea , about how to build an compatible AI component, and how it is used for score generation.

# Objective

The task is to build a AI component whose main task whose main objective is to predict the welding state from a given image.
If the image is in the operationnal domain :

The AI component shall takes as input : 
 - A list of numpy arrays representing the list of input images to process . 
- A list  of dictionnary containing a metadescription of the image.
    
It shall return a dictionnary with four keys {predictions , probabilities, OOD_score, explainabilities}
    
First key is required    
- predictions:  The list of predicted welding state. The welding state can have three possible values: [OK, KO, UNKNOWN]
    
Others keys are not required but will greatly improve the quality score of the component if present.

- probabilities:  The list of associated probabilities for each images [ proba KO, proba OK, proba UNKNWON] (sum of proba =1)
- OOD_scores : The list OOD score predicted by the AI component of for each images. This score X is a real positive. If 0<= X <1 the image is considered as in Domain, if X >1 the image is considered as OOD.
- explainabilities : The list of explainabilities for each input images. An explainability is an intensity matrix ( matrix with values between 0 and 1) with same size of the image tensor, that represents the importance of each pixel in the model prediction

## Evaluation criterions

Operationnal metrics : Measure the gain brought the AI component compared to a human only qualification process. This metrics is based on the confusion matrix and penalize strongly false negative predictions. 

Uncertainty metrics: Measure the ability of the AI component to produce a calibrated prediction confidence indicator expressing risk of model errors.

Robustness metrics : Measure the ability of the AI component to have invariant output is images have slight perturbations (blut, luminosity, rotation, translation)

Monitoring metrics: Measure the ability of the AI component to detect if an input image is ood, and gives the appropriate output ->Unknown

Explainability metrics : Measure the ability of the AI component to give appropriate explanations



In [9]:
import sys
# sys.path.insert(0, "..") # Uncomment this line For local tests without pkg installation, to make challenge_welding module visible 
from challenge_welding.user_interface import ChallengeUI
from challenge_welding.Evaluation_tools import EvaluationPipeline

# Build your AI component
An AI component shall be a buildable python package . Thus, it is a folder that shall have at least the following files and folders :
   ```
    /
    setup.py
    requirements.txt
    challenge_solution/
        AIcomponent.py
        __init__.py
 ```

Only those files will be used by the evaluation pipeline to test your AI component. The names of files and folders shall not be changed.
The most important file is the AIcomponent.py file that is the interface of your AI component. Only this interface will be used by the evaluation pipeline to interact with your component. That is why this file require some strict named methods and class to be present. It shall follow this abstract class [AIComponent interface](../absAIComponent.py)

In this starter-kit , we provided an example of such AI component, that is evaluated here. This example AI component has been built just to show what is a correct a AI component architecture. It has no good performance in kind of quality predictions.

# Create an evaluation pipeline

An evaluation pipeline take an AI component (the solution to test) and evaluate it by generating differents metrics and scores. In this notebooks we generate only operationnal metrics and uncertainty scores. 

An evaluation pipeline :
- Install your AI component as a python package 
- Load the AI component of the solution you want to test
- Apply inference on this AI component on one or many evaluation datasets. Each inference process on a dataset generate as output a dataframe( stored as a parquet file) containing evaluation dataset metadata extended with prediction results.
- Apply metrics computation functions that takes only inference_results as parquet files to generate output metrics

## Init the pipeline

In [1]:

# Define path of AI component to test, 
AI_comp_path= "..\\reference-solutions\\Solution-1"

# Initialize test pipeline
myPipeline=EvaluationPipeline(proposed_solution_path=AI_comp_path, # Set here the AI component path you want to evaluate
                              meta_root_path="starter_kit_pipeline_results", # Set the directory here where pipeline results will be stored (inference results, and computed metrics)
                              cache_strategy="local", # "local" or "remote" .If set on "local", all image used for evaluation , will be locally stored in a cache directory. Else, image will be loaded directly from downloding
                              cache_dir="evaluation_cache") # chosen directory for cache

NameError: name 'EvaluationPipeline' is not defined

## Load your AI component into the evaluation environnement

The load_proposed_solution() method below is divided into two main tasks:
- Install the python package of your AI component --> ( execute the commande pip install AI_comp_path)
- Call the load_model() method of your AIcomponent interface


In [11]:
myPipeline.load_proposed_solution()

AI component loaded


  super().__init__(**kwargs)


## Load an evaluation dataset metadescription

In the next cell, we load the metadata of the evaluation dataset we want to use to evaluate our AI component

In [12]:
# In this example we will choose a small dataset

ds_name="example_mini_dataset"

# Load all metadata of your dataset as a pandas dataframe, (you can point to a local cache metafile instead of original one pointing on remote repository)

my_challenge_UI=ChallengeUI()
evaluation_ds_meta_df=my_challenge_UI.get_ds_metadata_dataframe(ds_name)

display(evaluation_ds_meta_df.head(5))

https://minio-storage.apps.confianceai-public.irtsysx.fr/challenge-welding/datasets/example_mini_dataset/metadata/ds_meta.parquet


Unnamed: 0,sample_id,class,timestamp,welding-seams,labelling_type,resolution,path,sha256,storage_type,data_origin,blur_level,blur_class,luminosity_level,external_path
0,data_92409,OK,22/01/20 12:49,c33,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b'GN\xd7\xa7B\x98\xb0r\xa4\xdfn\x8cT\x8e:\xc07...,s3,real,701.938341,blur,50.533365,http://minio-storage.apps.confianceai-public.i...
1,data_67943,OK,20/02/20 23:53,c102,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b's\xf6;3i-\x10\xfd8y\xf2\xe1\xa6JQ\x84`\xc6\x...,s3,real,715.670702,blur,47.050604,http://minio-storage.apps.confianceai-public.i...
2,data_4843,OK,20/01/20 20:34,c20,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b'\xdbZ\xb3\x12e&\xd5\x83\x13*\x87S\xe1\x19\xc...,s3,real,715.85738,blur,46.204245,http://minio-storage.apps.confianceai-public.i...
3,data_25309,OK,18/07/2022 20:18,c102,operator,"[960, 540]",challenge-welding/datasets/example_mini_datase...,b'/c\xe3\xd9\xc8|&\xaf\xb1}\xf6\xe3s\xae\xea\x...,s3,real,869.513006,blur,34.35928,http://minio-storage.apps.confianceai-public.i...
4,data_76144,OK,03/10/19 21:14,c20,expert,"[1920, 1080]",challenge-welding/datasets/example_mini_datase...,b'\xca%\x0c\x92\x1f\x0c\x00\xcc\x02\r\xb8\xf1\...,s3,real,2676.246904,clean,46.256244,http://minio-storage.apps.confianceai-public.i...


##  Perform inference on an evaluation dataset

We pass the evaluation_dataframe in the method below. It use the loaded AI component to perform inference of each sample referenced in the evaluation dataframe and the inference results as new columns

In [None]:
result_df=myPipeline.perform_grouped_inference(evaluation_dataset=evaluation_ds_meta_df, # dataframe containing metadescription of your evaluation ds
                                               results_inference_path=myPipeline.meta_root_path+"/res_inference.parquet", # path to file that will contains inference_results
                                               batch_size=150 # You can group inference by batch if you want
                                              ) 

Number of  batch to process for inference :  20  , start processing..


 15%|█████████████████████████▊                                                                                                                                                  | 3/20 [00:05<00:31,  1.85s/it]

You can see the inference results below. See that new column has been added :
- "predicted_state" imported from "predicitions" key of your AI component predict method output dictionnary
- "scores KO" imported from "probabilities" key of your AI component predict method output dictionnary
- "scores OK" imported from "probabilities" key of your AI component predict method output dictionnary
- "score OOD" imported from "OOD scores" key of your AI component predict method output dictionnary


In [None]:
display(result_df)

## Compute operationnal metrics

The first criterion to evaluate your component quality will be the operationnal cost. The objective is to compare the cost of using your AI component versus
the cost of using only humans to process images from the evaluation dataset. The import scores to maximimze is "gain in euros" and inference_time

As the example AI component we provided in this starter-kit has not been designed to be performant, here the gain_score is very bad

The below method compute these two metrics, results are display as output, and in results folder defined at this pipeline initilization too.

In [None]:
# Compute operationnal metrics
myPipeline.compute_operationnal_metrics(AIcomp_name="sol_0", # This name is only used for the name of result files
                                        res_inference_path=myPipeline.meta_root_path+"/res_inference.parquet" # inference_results file
                                        )

## Compute uncertainty metrics

Same thing with uncertainty metrics

To detail



In [None]:
res_df,final_results=myPipeline.compute_uncertainty_metrics(res_inference_path=myPipeline.meta_root_path+"/res_inference.parquet",
                                                            AIcomp_name="sol_0"
                                                            )            