# 🚀 Dataset Splitter for YOLO11

This notebook will:
1. Read all images (`.jpg`/`.png`) and their matching YOLO `.txt` labels from a **single** folder.
2. Shuffle and split them into **train (70%)**, **val (20%)**, and **test (10%)** subfolders.
3. Generate a `drone.yaml` pointing at your `images/*` splits and setting up 2 classes (`bird`

In [1]:
import os, random, shutil
from tqdm import tqdm   # remove this line if you didn't install tqdm

In [2]:
# 1. SETTINGS — edit these:
SRC_DIR     = 'datasets'      # folder containing both .jpg/.png and .txt files
TRAIN_RATIO = 0.7           # 70% train
VAL_RATIO   = 0.2           # 20% val
# (test will be the remaining 10%)

### 📂 Create the `train/val/test` subfolders

We need `Daraset/All/train/images`, `dataset/train/labels`, etc.

In [3]:
# Cell 2: Create target folders
for split in ('train','val','test'):
    os.makedirs(os.path.join(SRC_DIR, split, 'images'), exist_ok=True)
    os.makedirs(os.path.join(SRC_DIR, split, 'labels'), exist_ok=True)
print("✓ Created train/val/test folders under", SRC_DIR)

✓ Created train/val/test folders under datasets


### 🔍 Gather & Shuffle Filenames

List out all image files, shuffle with a fixed seed for reproducibility, and compute split sizes.


In [4]:
# Cell 3: Gather & shuffle image filenames
all_files = os.listdir(SRC_DIR)
images    = [f for f in all_files if f.lower().endswith(('.jpg','.png'))]
random.seed(42)
random.shuffle(images)

n = len(images)
n_train = int(TRAIN_RATIO * n)
n_val   = int(VAL_RATIO   * n)

train_imgs = images[:n_train]
val_imgs   = images[n_train:n_train + n_val]
test_imgs  = images[n_train + n_val:]

print(f"Total images: {n}")
print(f" → Train: {len(train_imgs)}, Val: {len(val_imgs)}, Test: {len(test_imgs)}")


Total images: 0
 → Train: 0, Val: 0, Test: 0


### 🚚 Move Images & Labels into Splits

For each subset, move the image and its `.txt` label into the proper folder.


In [5]:
# Cell 4: Move files into their splits
def move_subset(img_list, subset):
    for img in tqdm(img_list, desc=f"Moving {subset}"):
        base = os.path.splitext(img)[0]
        lbl  = base + '.txt'
        # source
        img_src = os.path.join(SRC_DIR, img)
        lbl_src = os.path.join(SRC_DIR, lbl)
        # destination
        img_dst = os.path.join(SRC_DIR, subset, 'images', img)
        lbl_dst = os.path.join(SRC_DIR, subset, 'labels', lbl)
        # move
        shutil.move(img_src, img_dst)
        shutil.move(lbl_src, lbl_dst)

move_subset(train_imgs, 'train')
move_subset(val_imgs,   'val')
move_subset(test_imgs,  'test')

print("✅ Files moved!")


Moving train: 0it [00:00, ?it/s]
Moving val: 0it [00:00, ?it/s]
Moving test: 0it [00:00, ?it/s]

✅ Files moved!





### 📝 Generate the `drone.yaml` Config

This YAML tells YOLO where to find each split and what classes you have.


In [6]:
# 1) Normalize SRC_DIR to a POSIX style path
posix_dir = SRC_DIR.replace('\\', '/')

# 2) Build the YAML text without backslashes in the f-string expressions
yaml_text = (
    f"train: {posix_dir}/train/images\n"
    f"val:   {posix_dir}/val/images\n"
    f"test:  {posix_dir}/test/images\n\n"
    "nc: 2\n"
    "names:\n"
    "  0: bird\n"
    "  1: drone\n"
)

# 3) Write it out
yaml_path = os.path.join(SRC_DIR, 'drone.yaml')
with open(yaml_path, 'w') as f:
    f.write(yaml_text)

# 4) Verify
print("Created:", yaml_path)
print()
print(yaml_text)


Created: datasets\drone.yaml

train: datasets/train/images
val:   datasets/val/images
test:  datasets/test/images

nc: 2
names:
  0: bird
  1: drone



### 📦 6. Install Ultralytics and Import YOLO
We need the `ultralytics` package (which includes YOLO11) and to import the Python API.

In [7]:
!pip install ultralytics --quiet

from ultralytics import YOLO
import os
# make sure drone.yaml is on the Python path
DATA_YAML = os.path.join(SRC_DIR, 'drone.yaml')

### 🚀 7. Train the YOLO11 Model
Start from the medium (`yolo11m.pt`) pretrained weights.  
Feel free to adjust `epochs`, `imgsz`, `batch`, and `lr0` to suit your GPU.


In [8]:
# Cell 1: Install & Imports
!pip install ultralytics --quiet

import torch
from ultralytics import YOLO

# detect device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

model= YOLO('yolo11n.pt')  # load a pretrained model
model.to(device)          # move to GPU if available
print(model.device)




Using device: cuda
cuda:0


In [9]:
import torch



device1 = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device1}")  # Check if CUDA is available and print the device being used  

devNumber= torch.cuda.current_device() 
print(f"Device number: {devNumber}")  # Print the device number being used
denName= torch.cuda.get_device_name(devNumber)
print(f"Gpu name: {denName}")  # Print the name of the device being used

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

Using device: cuda
Device number: 0
Gpu name: NVIDIA GeForce RTX 3060 Laptop GPU
Using device: cuda


In [None]:
!pip install matplotlib --quiet


: 

In [None]:
# ── Cell 2: Epoch-wise training with explicit GPU use ───────────────

from math import ceil
import torch
import time
from ultralytics import YOLO


batch_size = 16

total_iterations_per_epoch = 1800
print(f"Total iterations per epoch: {total_iterations_per_epoch}")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model = YOLO("yolo11n.pt") 

for epoch in range(10):
    print(f"Starting Epoch {epoch + 1}/10")
    model.train(
        data="drone.yaml",
        epochs=1,
        imgsz=320,
        batch=batch_size,
        device=device
    )
    for iteration in range(total_iterations_per_epoch):
        print(f"Epoch {epoch + 1}/10, Iteration {iteration + 1}/{total_iterations_per_epoch}")
    print(f"Completed Epoch {epoch + 1}/10")




Total iterations per epoch: 1800
Using device: cuda
Starting Epoch 1/10
New https://pypi.org/project/ultralytics/8.3.131 available  Update with 'pip install -U ultralytics'
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=drone.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=1, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=320, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train3, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto

[34m[1mtrain: [0mScanning E:\Yolo 11 Model Train\datasets\train\labels... 40306 images, 0 backgrounds, 0 corrupt: 100%|██████████| 40306/40306 [01:56<00:00, 345.94it/s]


[34m[1mtrain: [0mNew cache created: E:\Yolo 11 Model Train\datasets\train\labels.cache
[34m[1mval: [0mFast image access  (ping: 0.20.2 ms, read: 1.00.3 MB/s, size: 7.3 KB)


[34m[1mval: [0mScanning E:\Yolo 11 Model Train\datasets\val\labels... 11516 images, 0 backgrounds, 0 corrupt: 100%|██████████| 11516/11516 [00:33<00:00, 341.22it/s]


[34m[1mval: [0mNew cache created: E:\Yolo 11 Model Train\datasets\val\labels.cache
Plotting labels to runs\detect\train3\labels.jpg... 
