**Github repo used for the following purposes**: [link](https://github.com/WyattAutomation/Train-YOLOv3-with-OpenImagesV4)

In [1]:
!git clone https://github.com/EscVM/OIDv4_ToolKit.git

fatal: destination path 'OIDv4_ToolKit' already exists and is not an empty directory.


In [2]:
!pip install -r "/home/khaled-ekramy/OIDv4_ToolKit/requirements.txt"



**Ensuring that kernel is using GPU to process the model and the data**

In [3]:
import torch
torch.cuda.is_available() #Value should be True if kernel is using GPU

True

In [4]:
!nvidia-smi #GPU information

Sun Nov 23 07:28:54 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Quadro T1000                   Off |   00000000:01:00.0 Off |                  N/A |
| N/A   50C    P8              1W /   50W |       8MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

## Our main plan

### Final Plan Summary:

**Model**: YOLOv8n  
**Dataset**: ALL Woman class images from OpenImages V7 (~40K-60K images)  
**Resolution**: 416x416 (45-50 FPS target)  
**Classes**: Woman only (class 0)  
**Training**: Google Colab Pro  
**Deployment**: Nvidia T1000 4GB
**Goal**: Very low false positive rate  

### Implementation Steps We'll Build:

1. **Data Preparation**:
   - Download only Woman class images from OpenImages V7 using OID toolkit
   - Convert annotations to YOLO format
   - Train/val/test split (80/15/5)

2. **Training Pipeline**:
   - Load YOLOv8n pretrained on COCO
   - Fine-tune on Woman class
   - Strong augmentation to reduce false positives
   - Train for sufficient epochs (~50-100)

3. **Optimization**:
   - Export to TensorRT FP16
   - Optimize for T1000 inference

4. **Validation**:
   - Test precision/recall
   - Measure FPS on T1000

## Downloading Woman Class dataset

### Step1: Importing important libraries

In [2]:
import subprocess
import sys
import os
from pathlib import Path

### Step2: Creating directory structure

In [5]:
base_dir = "./openimages_woman"  # Local directory
os.makedirs(base_dir, exist_ok=True)
os.chdir(base_dir) #changing the working directory of the python session to this folder

print("=" * 60)
print("OpenImages V7 - Woman Class Download")
print("=" * 60)

OpenImages V7 - Woman Class Download


### Step3: Download Woman class images
- Class name in OpenImages: "Woman"
- This will download train, validation, and test sets

**Terminal code if we don't wanna do it inside the notebook**
- Just make sure the notebook you have OID toolkit files inside your working director.
- In summary the notebook should be in the same directory with `main.py` file.
```bash
python main.py downloader --classes Woman --type_csv train --limit 50000
python main.py downloader --classes Woman --type_csv validation --limit 5000
python main.py downloader --classes Woman --type_csv test --limit 2000
```

#### Downloading Train Images

In [None]:
print("\n Downloading TRAIN images with Woman class...")
try:
    subprocess.run([
        sys.executable, "OIDv4_ToolKit/main.py",
        "downloader",
        "--classes", "Woman",
        "--type_csv", "train",
        "--limit", "50000",
        "--multiclasses", "0",
        "--yes"
    ], check=True)

except subprocess.CalledProcessError as e:
    print(" Error during download:", e)

#### Downloading Validation Images

In [None]:
print("\n Downloading VALIDATION images with Woman class...")
try:
    subprocess.run([
        sys.executable, "OIDv4_ToolKit/main.py",
        "downloader",
        "--classes", "Woman",
        "--type_csv", "validation",
        "--limit", "5000",
        "--multiclasses", "0",
        "--yes"
    ], check=True)

except subprocess.CalledProcessError as e:
    print(f" Error during validation download (return code {e.returncode}): {e}")

#### Downloading Test Images

In [None]:
print("\n Downloading TEST images with Woman class...")
try:
    subprocess.run([
        sys.executable, "OIDv4_ToolKit/main.py",
        "downloader",
        "--classes", "Woman",
        "--type_csv", "test",
        "--limit", "2000",
        "--multiclasses", "0",
        "--yes"
    ], check=True)

except subprocess.CalledProcessError as e:
    print(f" Error during validation download (return code {e.returncode}): {e}")

### Checking Download Summary

In [9]:
print("\n" + "=" * 60)
print("DOWNLOAD SUMMARY\n")
print("=" * 60)

for split in ['train', 'validation', 'test']:
    split_path = Path(f"OID/Dataset/{split}/Woman")
    if split_path.exists():
        img_count = len(list(split_path.glob("*.jpg")))
        label_count = len(list(split_path.glob("Label/*.txt")))
        print(f"{split.upper()}: {img_count} images, {label_count} labels")
    else:
        print(f"{split.upper()}: Not found")

print("\n Dataset location: ./openimages_woman/OID/Dataset/")
print(" Download complete!")


DOWNLOAD SUMMARY

TRAIN: 49928 images, 49928 labels
VALIDATION: 1936 images, 1936 labels
TEST: 1998 images, 1998 labels

 Dataset location: ./openimages_woman/OID/Dataset/
 Download complete!


### Step 4: Display dataset structure

In [29]:
print("\n" + "=" * 60)
print("ðŸ“‚ DIRECTORY STRUCTURE")
print("=" * 60)

# Show
for split in ['train', 'validation', 'test']:
    split_path = Path(f"OID/Dataset/{split}")
    if split_path.exists():
        print(f"\n{split}/")
        
        for item in split_path.iterdir():
            print(f"  â””â”€â”€ {item.name}/")

        for p in item.iterdir():
            if p.is_dir():
                print(f"    â””â”€â”€ {p.name}/")
                print(f"        â””â”€â”€ {len(list(p.glob('*.txt')))} txt files")
        
        jpg_count = len(list(item.glob("*.jpg")))
        print(f"    Images Count = {jpg_count} Images")    


ðŸ“‚ DIRECTORY STRUCTURE

train/
  â””â”€â”€ Woman/
    â””â”€â”€ Label/
        â””â”€â”€ 49928 txt files
    Images Count = 49928 Images

validation/
  â””â”€â”€ Woman/
    â””â”€â”€ Label/
        â””â”€â”€ 1936 txt files
    Images Count = 1936 Images

test/
  â””â”€â”€ Woman/
    â””â”€â”€ Label/
        â””â”€â”€ 1998 txt files
    Images Count = 1998 Images
