<a href="https://colab.research.google.com/github/arifes123/bubble_cls/blob/main/bubble-detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!wget <your_dataset_url> -O data.zip

In [None]:
!unzip /content/bubble-detection0.zip

In [None]:
!pip install ultralytics --quiet

It seems the label files are not organized in `train`, `val`, and `test` subdirectories within the `labels` folder. YOLO expects a similar structure for images and labels. Let's move them to the correct locations.

Now that the labels are correctly organized, you can re-run the YOLO training cell (cell with `!yolo train ...`). Make sure to use `device=cpu` since no CUDA devices are available, and `data=/content/bubble_detection0/bubbledetection.yaml` for the data configuration.

# Task
Randomly split the image files located in `/content/bubble-detection0/images` into training (70%), validation (20%), and testing (10%) sets, and move them into corresponding `train`, `val`, and `test` subdirectories within `/content/bubble-detection0/images`.

## Identify Image Files

### Subtask:
List all image files in the source directory `/content/bubble-detection0/images`.


**Reasoning**:
The subtask is to identify and list all image files in the specified directory. This code block will perform that action by importing the `os` module, defining the image directory path, listing all files, filtering them by common image extensions, and storing the result in `all_image_files`.



In [None]:
import os

images_src_path = '/content/bubble-detection0/images'

all_image_files = []
for file in os.listdir(images_src_path):
    if file.lower().endswith(('.png', '.jpg', '.jpeg')):
        all_image_files.append(file)

print(f"Found {len(all_image_files)} image files.")
print(f"First 5 image files: {all_image_files[:5]}")

## Create Split Directories

### Subtask:
Create `train`, `val`, and `test` subdirectories within `/content/bubble-detection0/images` if they don't already exist.


**Reasoning**:
The subtask requires creating 'train', 'val', and 'test' subdirectories within the images directory. I will use the `os` module to achieve this, ensuring that existing directories are not overwritten.



In [None]:
import os

images_base_path = '/content/bubble-detection0/images'
subdirectories = ['train', 'val', 'test']

for sub_dir in subdirectories:
    path = os.path.join(images_base_path, sub_dir)
    os.makedirs(path, exist_ok=True)
    print(f"Created directory: {path}")


## Split Images and Move

### Subtask:
Randomly split the identified image files into 70% for training, 20% for validation, and 10% for testing. Then, move these images to their corresponding newly created subdirectories.


**Reasoning**:
The subtask requires randomly splitting the image files into training, validation, and testing sets and then moving them to their respective directories. This code block will perform these actions using `random` for shuffling and `shutil` for moving files, following the specified ratios.



In [None]:
import random
import shutil

images_src_path = '/content/bubble-detection0/images'
images_base_path = '/content/bubble-detection0/images'

# Shuffle the list of all image files to ensure random splitting
random.shuffle(all_image_files)

# Calculate split sizes
total_images = len(all_image_files)
train_split = int(0.7 * total_images)
val_split = int(0.2 * total_images)
test_split = total_images - train_split - val_split # Ensure all images are accounted for

# Slice the list to create train, val, and test sets
train_files = all_image_files[:train_split]
val_files = all_image_files[train_split : train_split + val_split]
test_files = all_image_files[train_split + val_split :]

splits = {
    'train': train_files,
    'val': val_files,
    'test': test_files
}

print(f"Total images: {total_images}")
print(f"Train images: {len(train_files)}")
print(f"Validation images: {len(val_files)}")
print(f"Test images: {len(test_files)}")

# Move files to their respective directories
for split_name, files in splits.items():
    dest_dir = os.path.join(images_base_path, split_name)
    for img_file in files:
        src_path = os.path.join(images_src_path, img_file)
        dest_path = os.path.join(dest_dir, img_file)
        if os.path.exists(src_path):
            shutil.move(src_path, dest_path)
        else:
            print(f"Warning: Source file not found: {src_path}")
    print(f"Moved {len(files)} images to {dest_dir}/")

print("Image files split and moved successfully!")

**Reasoning**:
The previous step successfully split and moved the image files. The next logical step is to ensure that the corresponding label files (.txt) are also correctly moved into the respective 'train', 'val', and 'test' subdirectories, matching the new image organization. First, I will collect all existing label files from their current (potentially misaligned) subdirectories and move them to a temporary central location. Then, I will iterate through the image file lists (train_files, val_files, test_files) generated in the previous step and move the corresponding label files to their correct new destinations.



In [None]:
import os
import shutil

labels_base_path = '/content/bubble-detection0/labels'
images_base_path = '/content/bubble-detection0/images'

# Ensure label subdirectories exist
for sub_dir in ['train', 'val', 'test']:
    path = os.path.join(labels_base_path, sub_dir)
    os.makedirs(path, exist_ok=True)
    print(f"Ensured directory exists: {path}")

# Consolidate all existing label files to a temporary flat directory (or root labels_base_path)
# This handles cases where labels might be in old train/val/test subfolders
all_label_files_flat = []

# First, check the root labels directory
for file in os.listdir(labels_base_path):
    if file.lower().endswith('.txt'):
        current_path = os.path.join(labels_base_path, file)
        all_label_files_flat.append(current_path)

# Then, check existing subdirectories within labels_base_path
for sub_dir in ['train', 'val', 'test']:
    sub_dir_path = os.path.join(labels_base_path, sub_dir)
    if os.path.exists(sub_dir_path):
        for file in os.listdir(sub_dir_path):
            if file.lower().endswith('.txt'):
                current_path = os.path.join(sub_dir_path, file)
                # Move to root labels_base_path if not already there
                if not os.path.exists(os.path.join(labels_base_path, file)):
                    shutil.move(current_path, os.path.join(labels_base_path, file))
                all_label_files_flat.append(os.path.join(labels_base_path, file))

print(f"Consolidated {len(all_label_files_flat)} label files to {labels_base_path}/")

# Now, move labels based on the image splits generated previously
# (assuming train_files, val_files, test_files are still in memory)

# Re-populating splits dictionary for robustness in case it's lost
# We'll re-list files from the *actual* image directories now
current_train_images = [f for f in os.listdir(os.path.join(images_base_path, 'train')) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
current_val_images = [f for f in os.listdir(os.path.join(images_base_path, 'val')) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
current_test_images = [f for f in os.listdir(os.path.join(images_base_path, 'test')) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]

new_splits = {
    'train': current_train_images,
    'val': current_val_images,
    'test': current_test_images
}

for split_name, image_files_in_split in new_splits.items():
    dest_dir = os.path.join(labels_base_path, split_name)
    for img_file_name in image_files_in_split:
        label_file_name = os.path.splitext(img_file_name)[0] + '.txt'
        src_path = os.path.join(labels_base_path, label_file_name)
        dest_path = os.path.join(dest_dir, label_file_name)

        if os.path.exists(src_path):
            shutil.move(src_path, dest_path)
        else:
            print(f"Warning: Label file {label_file_name} not found in {labels_base_path}/")
    print(f"Moved {len(image_files_in_split)} label files to {dest_dir}/")

print("Label files reorganized successfully to match image split!")

In [None]:
import yaml

# Path to the data.yaml file
yaml_file_path = '/content/bubble-detection0/bubble-detection.yaml'

# Read the YAML file
with open(yaml_file_path, 'r') as f:
    data = yaml.safe_load(f)

# Update the 'path' key to the correct absolute path in Colab
data['path'] = '/content/bubble-detection0'

# Update train, val, and test paths to be relative to the new 'path'
data['train'] = 'images/train'
data['val'] = 'images/val'
data['test'] = 'images/test'

# Write the updated YAML back to the file
with open(yaml_file_path, 'w') as f:
    yaml.dump(data, f)

print(f"Updated {yaml_file_path} with new path: {data['path']}")
print(f"Updated train path: {data['train']}")
print(f"Updated val path: {data['val']}")
print(f"Updated test path: {data['test']}")

In [None]:
from ultralytics import YOLO

!yolo train model=yolov8s.pt data=/content/bubble-detection0/bubble-detection.yaml epochs=100 imgsz=640 device=0 patience=20

In [None]:
!yolo detect predict model=/content/runs/detect/train/weights/best.pt source=/content/bubble-detection0/images/test

In [None]:
import shutil
from google.colab import files

# Path to the directory containing predicted images
prediction_output_dir = '/content/runs/detect/predict'

# Create a zip archive of the predicted images
shutil.make_archive('yolo_predictions', 'zip', prediction_output_dir)

print(f"Predicted images zipped to yolo_predictions.zip. You can download it using the following command:")

## Inspect Training Results

YOLOv8 saves various artifacts from the training process, including metrics, plots, and the best-performing model. Let's list the contents of the training run directory to see what's available for evaluation.

In [None]:
import os

# The training results are typically saved in a directory named 'train' within 'runs/detect/'
# The exact name might vary if you ran multiple training sessions (e.g., train2, train3, etc.)
# We will assume the latest run is 'train'.

train_results_dir = '/content/runs/detect/train'

if os.path.exists(train_results_dir):
    print(f"Contents of {train_results_dir}:")
    for item in os.listdir(train_results_dir):
        print(item)
else:
    print(f"Training results directory not found: {train_results_dir}")

In [None]:
files.download('yolo_predictions.zip')

In [None]:
from ultralytics import YOLO

# Load a trained model
model = YOLO('/content/runs/detect/train/weights/best.pt')

# Export the model to ONNX format
model.export(format='onnx', opset=11)

In [None]:
import shutil
from google.colab import files

# Path to the folder to be zipped
folder_to_zip = '/content/bubble-detection0'

# Name of the output zip file (without .zip extension)
output_zip_name = 'bubble-detection0'

# Create the zip archive
shutil.make_archive(output_zip_name, 'zip', folder_to_zip)

print(f"Folder '{folder_to_zip}' has been zipped to '{output_zip_name}.zip'")

In [None]:
import shutil
from google.colab import files
import os

# Original file path on Colab's file system
source_file_path = 'bubble-detection0.zip'

# New name for the downloaded file on your local machine
desired_filename = 'bubble.zip'

# Create a temporary copy of the file with the desired name
# This ensures the download function receives a single, unambiguous path
# and the user gets the desired name on their local machine.

# Check if the source file exists
if os.path.exists(source_file_path):
    # Copy the file to the desired temporary name (on Colab)
    shutil.copy(source_file_path, desired_filename)

    # Download the temporarily named file
    files.download(desired_filename)

    # Clean up the temporary file on Colab
    os.remove(desired_filename)
    print(f"File '{source_file_path}' downloaded as '{desired_filename}' and temporary file removed.")
else:
    print(f"Error: Source file '{source_file_path}' not found on Colab.")