# Create a pytorch dataloader

This notebook show how to create a pytorch dataloader from the challenge dataset. It helps the participants to provide the batches required for the training of their AI component.

### Prerequisites
Install the dependencies if it is not already done. For more information look at the [readme](../README.md) file.

##### For development on Local Machine

In [1]:
### Install a virtual environment
# Option 1:  using conda (recommended)
# !conda create -n venv python=3.12
# !conda activate venv
# !pip install torch==2.6.0

# Option 2: using virtualenv
# !pip install virtualenv
# !virtualenv -p /usr/bin/python3.12 venv
# !source venv_lips/bin/activate

### Install the welding challenge package
# Option 1: Get the last version from Pypi
# !pip install 'challenge_welding'

# Option 2: Get the last version from github repository
# !git clone https://github.com/XX
# !pip install -U .

##### For Google Colab Users
You could also use a GPU device from Runtime > Change runtime type and by selecting T4 GPU.

In [2]:
### Install the welding challenge package
# Option 1: Get the last version of LIPS framework from PyPI (Recommended)
# !pip install 'XX'
# !pip install torch==2.6.0

In [3]:
# Option 2: Get the last version from github repository
# !git clone https://github.com/XX
# !pip install -U .
# !pip install torch==2.6.0

Attention: You may restart the session after this installation, in order that the changes be effective.

In [4]:
# Clone the starting kit
# !git clone https://github.com/confianceai/Challenge-Welding-Starter-Kit.git
# and change the directory to the starting kit to be able to run correctly this notebook
# import os
# os.chdir("Challenge-Welding-Starter-Kit")

## Import the required libraries 

In [5]:
import sys
# sys.path.insert(0, "..") # For local tests without pkg installation, to make challenge_welding module visible 
import challenge_welding.dataloaders
from challenge_welding.user_interface import ChallengeUI

## Load the required dataset 

### Get dataset list

In [6]:
# Initiate the user interface

my_challenge_UI=ChallengeUI(cache_strategy="local",cache_dir="notebooks_cache")

# Get list of available datasets

ds_list=my_challenge_UI.list_datasets()
print(ds_list)

# In this example we will choose a small dataset

ds_name="example_mini_dataset"

['example_mini_dataset', 'welding-detection-challenge-dataset']


### Get your dataset metadata
For demonstration we use `example_mini_dataset` dataset, however, **the participant should use the complete dataset `welding-detection-challenge-dataset` for the challenge purpose.**

In [7]:
# Load all metadata of your dataset
ds_name=ds_name="example_mini_dataset"
# ds_name="welding-detection-challenge-dataset"
meta_df=my_challenge_UI.get_ds_metadata_dataframe(ds_name)

https://minio-storage.apps.confianceai-public.irtsysx.fr/challenge-welding/datasets/example_mini_dataset/metadata/ds_meta.parquet


## Create a Pytorch DataLoader on the imported dataset

The `create_pytorch_dataloader` function of `my_challenge_UI` class allows to create a torch based DataLoader easily on your dataset.

In [8]:
# Create your dataloader
dataloader=challenge_welding.dataloaders.create_pytorch_dataloader(input_df=meta_df[0:20],
                                                     cache_strategy=my_challenge_UI.cache_strategy,
                                                     cache_dir=my_challenge_UI.cache_dir,
                                                     batch_size=100,
                                                     shuffle=False)

Cache storage has been activated in  notebooks_cache
cache_metadata_unique_id 132503
Cache directory has already been built, loading local metadata..
local metadata loaded !
0     challenge-welding/datasets/example_mini_datase...
1     challenge-welding/datasets/example_mini_datase...
2     challenge-welding/datasets/example_mini_datase...
3     challenge-welding/datasets/example_mini_datase...
4     challenge-welding/datasets/example_mini_datase...
5     challenge-welding/datasets/example_mini_datase...
6     challenge-welding/datasets/example_mini_datase...
7     challenge-welding/datasets/example_mini_datase...
8     challenge-welding/datasets/example_mini_datase...
9     challenge-welding/datasets/example_mini_datase...
10    challenge-welding/datasets/example_mini_datase...
11    challenge-welding/datasets/example_mini_datase...
12    challenge-welding/datasets/example_mini_datase...
13    challenge-welding/datasets/example_mini_datase...
14    challenge-welding/datasets/example_m

## Visualize some batches of the created DataLoader

In [9]:
# Test your dataloader       
for i_batch, sample_batched in enumerate(dataloader):
    print("batch number", i_batch)
    print("batch content image",    sample_batched['image'].shape)
    print("batch content meta",sample_batched['meta'])

    # observe 4th batch and stop.
    if i_batch == 3:
        break

batch number 0
batch content image torch.Size([20, 540, 540, 3])
batch content meta {'sample_id': ['data_92409', 'data_67943', 'data_4843', 'data_25309', 'data_76144', 'data_40839', 'data_79549', 'data_80892', 'data_68392', 'data_70776', 'data_2681', 'data_92491', 'data_80084', 'data_39992', 'data_79686', 'data_40851', 'data_70665', 'data_26756', 'data_69068', 'data_40094'], 'class': ['OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK', 'OK'], 'timestamp': ['22/01/20 12:49', '20/02/20 23:53', '20/01/20 20:34', '18/07/2022 20:18', '03/10/19 21:14', '21/07/2022 22:44', '11/07/20 19:08', '04/11/2020 20:09', '11/03/20 17:59', '28/10/2020 18:47', '20/07/20 15:14', '25/01/20 00:24', '08/09/20 17:47', '18/07/2022 23:24', '18/07/20 07:34', '21/07/2022 23:04', '22/10/2020 15:28', '28/07/2022 01:21', '18/06/20 06:14', '19/07/2022 04:56'], 'welding-seams': ['c33', 'c102', 'c20', 'c102', 'c20', 'c33', 'c20', 'c20', 'c102', 'c102', 'c102'