# Phase 1: Dataset Preparation & Environment Setup
Outdoor Object Detection & Face Recognition System

This notebook handles:
- Environment setup and dependency installation
- Dataset download (LFW, WiderFace, RTTS, BDD100K)
- Preprocessing (resize to 640x640, train/val/test split)
- Data augmentation (fog, rain, low-light, motion blur)
- Dataset statistics and verification

**Runtime**: GPU (T4) recommended for faster processing
**Storage**: Results saved to Google Drive

In [5]:
from google.colab import drive
drive.mount('/content/drive')

import os
PROJECT_DIR = '/content/drive/MyDrive/computer_vision'
os.makedirs(PROJECT_DIR, exist_ok=True)
print(f"Project directory: {PROJECT_DIR}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Project directory: /content/drive/MyDrive/computer_vision


In [6]:
%cd /content
!rm -rf computer_vision_expirement
!git clone https://github.com/Ib-Programmer/computer_vision_expirement.git
%cd computer_vision_expirement
!pip install -q -r requirements.txt
!pip install -q gdown

/content
Cloning into 'computer_vision_expirement'...
remote: Enumerating objects: 33, done.[K
remote: Counting objects: 100% (33/33), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 33 (delta 7), reused 28 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (33/33), 37.05 KiB | 2.18 MiB/s, done.
Resolving deltas: 100% (7/7), done.
/content/computer_vision_expirement
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.4/129.4 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.8/46.8 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.0/178.0 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25

## 1.1 Dataset Download

In [7]:
!python scripts/download_datasets.py

Phase 1: Dataset Download
Base directory: /content/computer_vision_expirement
Datasets directory: /content/computer_vision_expirement/datasets

DOWNLOADING: LFW (Labeled Faces in the Wild)
  Trying sklearn.datasets.fetch_lfw_people...
  Downloaded via sklearn: 13233 images
  LFW download complete!
  Location: /content/computer_vision_expirement/datasets/lfw

DOWNLOADING: WiderFace
  Downloading WIDER_train.zip from Google Drive...
Downloading...
From (original): https://drive.google.com/uc?id=15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M
From (redirected): https://drive.google.com/uc?id=15hGDLhsx8bLgLcIRD5DhYt5iBxnjNF1M&confirm=t&uuid=0ff59cff-9fb1-4d5b-83be-ebad0c51697a
To: /content/computer_vision_expirement/datasets/widerface/WIDER_train.zip
100% 1.47G/1.47G [00:20<00:00, 70.8MB/s]
  Extracting WIDER_train.zip...
  Extracted to: /content/computer_vision_expirement/datasets/widerface
  Downloading WIDER_val.zip from Google Drive...
Downloading...
From (original): https://drive.google.com/uc?id=1

## 1.2 Preprocessing

In [8]:
!python scripts/preprocess_data.py

Phase 1: Data Preprocessing
Target size: (640, 640)
Split ratio: {'train': 0.7, 'val': 0.15, 'test': 0.15}
Chunk size: 200 images
JPEG quality: 90

PREPROCESSING: lfw
  Found 13233 images
  Chunk size: 200 images | JPEG quality: 90

  train: 9263 images in 47 chunks
    train: 100%|██████████████████████████| 9263/9263 [01:06<00:00, 139.40img/s]
    -> processed: 9263 | skipped: 0 | failed: 0

  val: 1984 images in 10 chunks
    val: 100%|████████████████████████████| 1984/1984 [00:12<00:00, 160.82img/s]
    -> processed: 1984 | skipped: 0 | failed: 0

  test: 1986 images in 10 chunks
    test: 100%|████████████████████████████| 1986/1986 [00:20<00:00, 97.01img/s]
    -> processed: 1986 | skipped: 0 | failed: 0

  ────────────────────────────────────────
  lfw DONE
    Total processed: 13233
    Already existed: 0
    Failed/corrupt:  0
    train: 9263 images
    val: 1984 images
    test: 1986 images
  Output: /content/computer_vision_expirement/datasets/lfw_processed

PREPROCESSING: 

## 1.3 Data Augmentation

In [None]:
!python scripts/augment_data.py

  A.RandomFog(
  A.RandomRain(
  A.RandomFog(fog_coef_lower=0.2, fog_coef_upper=0.5, alpha_coef=0.08, p=1.0),
  A.RandomRain(slant_lower=-10, slant_upper=10, drop_length=20, drop_width=1,
  result = _ensure_odd_values(result, info.field_name)
  A.GaussNoise(var_limit=(10.0, 40.0), p=0.3),
Phase 1: Data Augmentation
Output directory: /content/computer_vision_expirement/outputs/augmented
Chunk size: 150 images
Augmentation types: ['fog', 'low_light', 'motion_blur', 'rain', 'combined']

AUGMENTING: lfw
  Split: train (9263 images)
  Chunk size: 150 | Augmentations: ['fog', 'low_light', 'motion_blur', 'rain', 'combined']
    fog: 9263 images in 62 chunks
      fog:   0%|                                      | 0/9263 [00:00<?, ?img/s]

## 1.4 Dataset Statistics

In [None]:
!python scripts/dataset_stats.py

## 1.5 Save to Google Drive

In [None]:
import shutil

src = '/content/computer_vision_expirement/datasets'
dst = f'{PROJECT_DIR}/datasets'

if os.path.exists(dst):
    print(f"Datasets already exist at {dst}, skipping copy")
else:
    print(f"Copying datasets to Google Drive...")
    shutil.copytree(src, dst)
    print("Done!")

# Also copy augmented outputs
src_aug = '/content/computer_vision_expirement/outputs'
dst_aug = f'{PROJECT_DIR}/outputs'
if os.path.exists(src_aug):
    if os.path.exists(dst_aug):
        print(f"Augmented data already exists at {dst_aug}, skipping copy")
    else:
        print(f"Copying augmented data to Google Drive...")
        shutil.copytree(src_aug, dst_aug)
        print("Done!")

print("\nPhase 1 Complete! Data saved to Google Drive.")
print(f"Location: {PROJECT_DIR}")

## Next Steps
- Open **Phase2_Image_Enhancement.ipynb** to evaluate enhancement models
- Datasets are saved in Google Drive and will persist across sessions