This notebook performs data augmentation on the cleaned and split dataset produced in:
- 01_data_merge_and_cleaning.ipynb
- 02_dataset_splitting.ipynb

Augmentation follows the LEAD-CNN paper methodology:
-90° rotation
-Horizontal flipping

Augmented images will be saved into a separate directory: data/augmented_data/ to preserve the original cleaned dataset.

In this notebook we:

1. Load the cleaned dataset (train / val / test)
2. Apply two augmentation techniques:
- 90° rotation
- Horizontal flipping
3. Save augmented images to a new directory structure
4. Preserve the original cleaned dataset

This replicates the augmentation strategy used in the LEAD-CNN paper to reduce overfitting and improve generalization.

In [1]:
from pathlib import Path
import cv2
from tqdm import tqdm

In [2]:
# Paths
CLEAN_DIR = Path(r"..\\data\\cleaned_data")
AUG_DIR = Path(r"..\\data\\augmented_data")


# Dataset configuration
CLASSES = ['glioma', 'meningioma', 'notumor', 'pituitary']
SPLITS = ['train', 'val', 'test']


IMG_EXTENSIONS = ['.jpg', '.jpeg', '.png']

In [3]:
# Creating Augmented Data Directory Folder Structure

for split in SPLITS:
  for cls in CLASSES:
    path = AUG_DIR / split / cls
    path.mkdir(parents=True, exist_ok=True)

print("Augmented dataset folders created at:", AUG_DIR.resolve())

Augmented dataset folders created at: C:\Users\ekowd\Desktop\FYP\FYP\data\augmented_data


In [5]:
# Because I'm paranoid, Double Checking the Number of Images in Each Split After Moving

for split in SPLITS:
  print(f"\n{split.upper()} set:")
  for cls in CLASSES:
    folder = CLEAN_DIR / split / cls
    count = len(list(folder.glob('*')))
    print(f" {cls}: {count} images")


TRAIN set:
 glioma: 1296 images
 meningioma: 1316 images
 notumor: 1600 images
 pituitary: 1405 images

VAL set:
 glioma: 162 images
 meningioma: 164 images
 notumor: 200 images
 pituitary: 175 images

TEST set:
 glioma: 163 images
 meningioma: 165 images
 notumor: 200 images
 pituitary: 177 images


In [6]:
# Actual Augmentation Functions

def rotate_90(image):
  """Rotate image 90 degrees clockwise."""
  return cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE)


def horizontal_flip(image):
  """Flip image horizontally."""
  return cv2.flip(image, 1)

In [7]:
# Augmenting and SAving Function

def augment_and_save_split(split_name):
  print(f"\nProcessing split: {split_name}")


  for cls in CLASSES:
    input_dir = CLEAN_DIR / split_name / cls
    output_dir = AUG_DIR / split_name / cls


    images = [p for p in input_dir.glob('*') if p.suffix.lower() in IMG_EXTENSIONS]


    if not images:
      print(f" {cls}: No images found, skipping.")
      continue


    print(f" {cls}: {len(images)} images")


    for img_path in tqdm(images, desc=f"{split_name}/{cls}", leave=False):
      img = cv2.imread(str(img_path))
      if img is None:
        continue


      base_name = img_path.stem
      ext = img_path.suffix


      # Save original image
      cv2.imwrite(str(output_dir / img_path.name), img)


      # Rotation augmentation
      rotated = rotate_90(img)
      cv2.imwrite(str(output_dir / f"{base_name}_rot90{ext}"), rotated)


      # Horizontal flip augmentation
      flipped = horizontal_flip(img)
      cv2.imwrite(str(output_dir / f"{base_name}_flip{ext}"), flipped)

In [8]:
# Augmented Pipeline Execution

for split in SPLITS:
  augment_and_save_split(split)


print("\nAugmentation completed successfully.")


Processing split: train
 glioma: 1296 images


                                                                  

 meningioma: 1316 images


                                                                      

 notumor: 1600 images


                                                                   

 pituitary: 1405 images


                                                                     


Processing split: val
 glioma: 162 images


                                                              

 meningioma: 164 images


                                                                  

 notumor: 200 images


                                                               

 pituitary: 175 images


                                                                


Processing split: test
 glioma: 163 images


                                                               

 meningioma: 165 images


                                                                   

 notumor: 200 images


                                                                

 pituitary: 177 images


                                                                  


Augmentation completed successfully.




In [9]:
for split in SPLITS:
  print(f"\nAUGMENTED {split.upper()} set:")
  for cls in CLASSES:
    folder = AUG_DIR / split / cls
    count = len(list(folder.glob('*')))
    print(f" {cls}: {count} images")


AUGMENTED TRAIN set:
 glioma: 3888 images
 meningioma: 3948 images
 notumor: 4800 images
 pituitary: 4215 images

AUGMENTED VAL set:
 glioma: 486 images
 meningioma: 492 images
 notumor: 600 images
 pituitary: 525 images

AUGMENTED TEST set:
 glioma: 489 images
 meningioma: 495 images
 notumor: 600 images
 pituitary: 531 images


Expected Output for This Notebook

After execution, the expected structure will exist as

```

data/augmented_data/
  train/<class>/
  val/<class>/
  test/<class>/

```

Each original image now produces
- 1 original copy
- 1 rotated copy (rot90)
- 1 flipped copy (_flip)