# Data Preparation for YOLO Training

This notebook demonstrates how to use the `export_yolo_training_data.py` script to prepare image and label data from HDF5 files for YOLO model training.

The script is located at `src/utils/export_yolo_training_data.py`.

## List Available Dataset Builds

First, let's see which dataset builds are configured and available. The script uses build keys defined in `src/config.py`.

In [1]:
!python ../src/utils/export_yolo_training_data.py --list_builds

Available dataset build keys:
  - tcr_phase1_build1
  - tcr_phase1_build2
  - tcr_phase1_build3
  - tcr_phase1_build4
  - tcr_phase1_build5


## Export Data for a Specific Build

Now, let's export the data for a specific build. Replace `'your_build_key_here'` with one of the keys listed above (e.g., `tcr_phase1_build1`).

The script will create `images/train` and `labels/train` directories under `data/<build_key>/`.

In [2]:
!python ../src/utils/export_yolo_training_data.py --build_key tcr_phase1_build1

2025-05-09 11:53:09,283 - INFO - Starting YOLO data export process for build: tcr_phase1_build1
2025-05-09 11:53:09,283 - INFO - Target training image directory: /piml-in-metal-am/data/tcr_phase1_build1/images/train
2025-05-09 11:53:09,283 - INFO - Target training label directory: /piml-in-metal-am/data/tcr_phase1_build1/labels/train
2025-05-09 11:53:09,283 - INFO - Reading HDF5 data from: /mnt/ssd/l-pbf-dataset/2021-07-13 TCR Phase 1 Build 1.hdf5
2025-05-09 11:53:09,284 - INFO - Exporting 3575 layers as images and masks...
Processing layers:   2%|▎                   | 57/3575 [03:47<4:05:51,  4.19s/it]^C


## Verify Exported Data (Optional)

You can verify the exported data by checking the output directories. For example, to list the contents of the output directories for `tcr_phase1_build1`:

In [None]:
# Replace 'tcr_phase1_build1' with the build key you used for export
import os

build_key = 'tcr_phase1_build1' # Change this if you used a different key
project_root = '..'
img_dir = os.path.join(project_root, 'data', build_key, 'images', 'train')
lbl_dir = os.path.join(project_root, 'data', build_key, 'labels', 'train')

print(f"Checking images in: {img_dir}")
if os.path.exists(img_dir):
    print(os.listdir(img_dir)[:5]) # Print first 5 image files
else:
    print(f"Directory not found: {img_dir}")

print(f"\nChecking labels in: {lbl_dir}")
if os.path.exists(lbl_dir):
    print(os.listdir(lbl_dir)[:5]) # Print first 5 label files
else:
    print(f"Directory not found: {lbl_dir}")