# Tutorial notebook

This notebook shows:
- how to use main user functions to manipulate the datasets
- familiarize the participants with the use case
- provide some tools to visulaize the use case's data

### Prerequisites
Install the dependencies if not already done. For more information look at the [readme](../README.md) file.

##### For development on Local Machine

In [None]:
### Install a virtual environment
# Option 1:  using conda (recommended)
# !conda create -n venv python=3.12
# !conda activate venv

# Option 2: using virtualenv
# !pip install virtualenv
# !virtualenv -p /usr/bin/python3.12 venv
# !source venv_lips/bin/activate

##### For Google Colab Users
You could also use a GPU device from Runtime > Change runtime type and by selecting T4 GPU.

In [None]:
### Install the welding challenge package
# Option 1: Get the last version of challenge welding package framework from PyPI (Recommended)
# !pip install 'challenge-welding'

In [None]:
# Option 2: Get the last version from github repository
# !git clone https://github.com/confianceai/Challenge-Welding-Starter-Kit
# !pip install -U .

Attention: You may restart the session after this installation, in order that the changes be effective.

In [None]:
import sys
import subprocess

In [None]:
#Install the challenge_solution from git repository

In [None]:
repo_url = "git+https://github.com/confianceai/Challenge-Welding-Starter-Kit.git"
requirements_url = "https://raw.githubusercontent.com/confianceai/Challenge-Welding-Starter-Kit/refs/heads/main/requirements.txt"

In [None]:
subprocess.run([sys.executable, "-m", "pip", "install", repo_url])

In [None]:
subprocess.run([sys.executable, "-m", "pip", "install", "-r", requirements_url])

## Introduction: What is the Welding Quality Detection?

In the highly competitive automotive industry, quality control is essential to ensuring vehicle reliability and user safety. A failure in quality control can severely compromise safety, lead to significant financial losses, and cause substantial reputational damage to the company involved.

One of the key challenges is improving the reliability of quality control for welding seams in automotive body manufacturing. Currently, this inspection is consistently performed by a human operator due to legal requirements related to user safety. However, during the industrial process, this task is resource-intensive. The main challenge is to develop an AI-based solution that reduces the number of inspections required by the operator through automated pre-validation.

See an example of welding below:

<div>
<img src="docs/imgs/hero_image_3D.png" width="500"/>
</div>


Within the [Confiance.ai](https://www.confiance.ai/) Research Program, Renault Group and SystemX have collaborated on developing trustworthy AI components to address this challenge. Now part of the [European Trustworthy Foundation (ETF)](https://www.confiance.ai/foundation/), our goal is to ensure that these tools effectively validate proposed AI models according to the trustworthy criteria defined by the industry (Intended Purpose).

This industrial use case, provided by Renault Group, ocuses on the “Visual Inspection” theme through a classification problem.

The objective is to assess weld quality based on photos taken by cameras on vehicle production lines.

A weld can have two distinct states:
- OK: The welding is normal.
- KO: The welding has defects.

Below are some examples of `OK` and `KO` welds on two different seams `c10` and `c19`.

<div>
<img src="docs/imgs/welding_examples.png" width="500"/>
</div>

The main goal of the challenge is to **develop an AI component (see [Notebook 3](03-Evaluate_solution.ipynb)) that assists operators in performing weld classification while minimizing the need for manual image inspection and double-checking of classifications**.

For defect identification (KO), the system should provide operators with relevant information on the location of the detected defect in the image, thereby reducing the time spent on the control task.

## Load and manipulate the data

In [None]:
"""
This script is a tutorial example of how to use of ChallengeWelding-UI functions
"""
# sys.path.insert(0, "..") # Uncomment this line For local tests without pkg installation, to make challenge_welding module visible 
from challenge_welding.user_interface import ChallengeUI
from matplotlib import pyplot as plt

### Init the user interface and list available datasets
The dataset contains 22,851 images split across three different welding seams. An important feature of this dataset is its high unbalance. There are only 500 KO images in the entire dataset. *A dataset is considered a list of samples. In this challenge , a sample is a single image.*

We begin by listing the available datasets:
- `example_mini_dataset`: A demo version of the complete dataset used for demonstration, containing 2,857 images.
- `welding-detection-challenge-dataset`: The complete dataset to be used by the participants, containing 22,851 images. 

See [this docoumentation](../docs/Dataset_description.md) for more information concerning the datasets and their properties.

In [None]:
# Initiate the user interface
my_challenge_UI=ChallengeUI(cache_strategy="local",cache_dir="notebooks_cache")

# Get list of available datasets
ds_list=my_challenge_UI.list_datasets()
print(ds_list)

### Display the dataset metadata
Create a pandas dataframe containing metadata of all samples. 

In [None]:
# We choose here the dataset named "example_mini_dataset"
ds_name="example_mini_dataset"

# the complete dataset
# ds_name="welding-detection-challenge-dataset"

# Load all metadata of your dataset as a pandas dataframe
meta_df=my_challenge_UI.get_ds_metadata_dataframe(ds_name)

display(meta_df)

### Exploration of dataset properties




You may want to see the different type resolution of image in the dataset

In [None]:
meta_df["resolution"]=meta_df["resolution"].astype(str)
meta_df["resolution"].value_counts()

With this dataframe you can explore, and draw statistics. For example, you can compute the repartition of weld class

In [None]:
meta_df["class"].value_counts()

You may  want to see the class distribution for each welding-seams , or the blur distributoin

In [None]:
meta_df.groupby(["welding-seams","class"]).count()["sample_id"]

In [None]:
meta_df.groupby(["welding-seams","blur_class"]).count()["sample_id"]

Or you may want ot see the distribiution of blur level and luminosity overs each welding-seams

In [None]:
meta_df.groupby(["welding-seams"])[["blur_level","luminosity_level"]].describe()

## Display an image
### Open an image
In this section we open a specific sample from the dataset, and display it

In [None]:
sample_idx=56 # idx of tested image in the dataset.
sample_meta=meta_df.iloc[sample_idx] # Get medata of image at index sample_idx

print("opening image metadata with idx ..", sample_idx)
print(sample_meta.to_dict())

img=my_challenge_UI.open_image(sample_meta["path"]) # Always Use external_path of sample to open the image

print("size of the opened image", img.shape)

### Display the image
We can simply visualize the opened the image using `matplotlib` library.

In [None]:
plt.imshow(img, interpolation='nearest')
plt.show()

You can also the already provided `display_image` function to display directly the required sample

In [None]:
img = my_challenge_UI.display_image(meta_df, index=129, show_info=True)

In [None]:
img = my_challenge_UI.display_image(meta_df, index=134, show_info=False)

## Check dataset integrity

Get the sha256 of each image file and compare it to those stored in its metadata. All anomalies ares stored in a Yaml file named  anomalous_samples_list.yml and returned as output of the method check_integrity() . 

In [None]:
# Check integrity of all files in your dataset (this may take a while . .)

# anomalie_list=my_challenge_UI.check_integrity(ds_name)