# Create a pytorch dataloader

This notebook show how to create a pytorch dataloader from the challenge dataset. It helps the participants to provide the batches required for the training of their AI component.

### Prerequisites
Install the dependencies if it is not already done. For more information look at the [readme](../README.md) file.

##### For development on Local Machine

In [None]:
### Install a virtual environment
# Option 1:  using conda (recommended)
!conda create -n venv python=3.12
!conda activate venv
!pip install torch==2.6.0

# Option 2: using virtualenv
# !pip install virtualenv
# !virtualenv -p /usr/bin/python3.12 venv
# !source venv_lips/bin/activate

### Install the welding challenge package
# Option 1: Get the last version from Pypi
# !pip install 'challenge_welding'

# Option 2: Get the last version from github repository
# !git clone https://github.com/XX
# !pip install -U .

##### For Google Colab Users
You could also use a GPU device from Runtime > Change runtime type and by selecting T4 GPU.

In [None]:
### Install the welding challenge package
# Option 1: Get the last version of LIPS framework from PyPI (Recommended)
!pip install 'XX'
!pip install torch==2.6.0

In [None]:
# Option 2: Get the last version from github repository
!git clone https://github.com/XX
!pip install -U .
!pip install torch==2.6.0

Attention: You may restart the session after this installation, in order that the changes be effective.

In [None]:
# Clone the starting kit
!git clone https://github.com/confianceai/Challenge-Welding-Starter-Kit.git
# and change the directory to the starting kit to be able to run correctly this notebook
import os
os.chdir("Challenge-Welding-Starter-Kit")

## Import the required libraries 

In [None]:
import sys
# sys.path.insert(0, "..") # For local tests without pkg installation, to make challenge_welding module visible 
from challenge_welding.user_interface import ChallengeUI

## Load the required dataset 

### Get dataset list

In [None]:
# Initiate the user interface

my_challenge_UI=ChallengeUI(cache_strategy="local",cache_dir="notebooks_cache")

# Get list of available datasets

ds_list=my_challenge_UI.list_datasets()
print(ds_list)

# In this example we will choose a small dataset

ds_name="example_mini_dataset"

### Get your dataset metadata
For demonstration we use `example_mini_dataset` dataset, however, **the participant should use the complete dataset `welding-detection-challenge-dataset` for the challenge purpose.**

In [3]:
# Load all metadata of your dataset
ds_name=ds_name="example_mini_dataset"
# ds_name="welding-detection-challenge-dataset"
meta_df=my_challenge_UI.get_ds_metadata_dataframe(ds_name)

## Create a Pytorch DataLoader on the imported dataset

The `create_pytorch_dataloader` function of `my_challenge_UI` class allows to create a torch based DataLoader easily on your dataset.

In [None]:
# Create your dataloader
dataloader=my_challenge_UI.create_pytorch_dataloader(input_df=meta_df.iloc[0:50],
                                                     batch_size=10,
                                                     shuffle=False,
                                                     )

## Visualize some batches of the created DataLoader

In [None]:
# Test your dataloader       
for i_batch, sample_batched in enumerate(dataloader):
    print("batch number", i_batch)
    print("batch content image",    sample_batched['image'].shape)
    print("batch content meta",sample_batched['meta'])

    # observe 4th batch and stop.
    if i_batch == 3:
        break