# 256x256 Extractor

The following code takes various color images from different online dataset sources (one folder with nested folders of different images inside), combines them, renames them uniformly, and resizes anything that's big enough to 256x256 resolution. I'm using offline directories (on my local machine) outside of this repo because Github causes issues if I make more than 10000 changes at once.

Further dataset processing will occur in `image-preprocess_256.ipynb`

In [7]:
import glob
import shutil
import os
from PIL import Image
import random

# Source directory with subfolders
raw_images_source = "C:/Users/ziven/OneDrive/School/UBC/Fourth Year/CPSC 440/Final Project/Offline/random_images"

color_raw_dir = "C:/Users/ziven/OneDrive/School/UBC/Fourth Year/CPSC 440/Final Project/Offline/extracted_256s"

color_dir = "C:/Users/ziven/OneDrive/School/UBC/Fourth Year/CPSC 440/Final Project/Offline/extracted_downsized_256s"

The following code block extracts images from a sub-directory structure (as they are in the aerial images dataset) and places them uniformly in one directory with unique numbers 1 thru whatever max is necessary. 

In [8]:
# Get a list of all image files in nested folders
files = glob.glob(raw_images_source + '/**/*.jpeg', recursive=True)

# Randomly shuffle the images, so we can partition them into training and testing based on number alone
random.shuffle(files)

counter = 1

# Copy each image to the destination folder
for file in files:
    filename = os.path.basename(file)
    new_filename = f"{counter}.jpg"
    destination_path = os.path.join(color_raw_dir, new_filename)

    # Ensure unique filenames
    while os.path.exists(destination_path):
        counter += 1
        new_filename = f"{counter}.jpg"
        destination_path = os.path.join(color_raw_dir, new_filename)

    # Open the image to check its aspect ratio
    img = Image.open(file)
    width, height = img.size

    # Check if the aspect ratio is square
    if height >= 256 and width >= 256:
        # Copy the image to the destination folder
        shutil.copy(file, destination_path)
        # Increment the counter
        counter += 1
    else:
        print(f"Discarded {filename} because it is not square or has a dimension that is too small.")
        continue

print("All files copied with unique names. Total number extracted:", counter)

Discarded Glacier-Train (1971).jpeg because it is not square or has a dimension that is too small.
Discarded Glacier-Train (885).jpeg because it is not square or has a dimension that is too small.
Discarded Glacier-Train (786).jpeg because it is not square or has a dimension that is too small.
Discarded Glacier-Train (552).jpeg because it is not square or has a dimension that is too small.
Discarded Forest-Train (1130).jpeg because it is not square or has a dimension that is too small.
Discarded Mountain (3205).jpeg because it is not square or has a dimension that is too small.
Discarded Mountain-Valid (211).jpeg because it is not square or has a dimension that is too small.
Discarded Coast-Train (634).jpeg because it is not square or has a dimension that is too small.
Discarded Coast-Train (164).jpeg because it is not square or has a dimension that is too small.
Discarded Coast-Test (130).jpeg because it is not square or has a dimension that is too small.
Discarded Mountain (3805).jpe

In [None]:
target_size = (256, 256)

# Iterate through all files in the source directory
for filename in os.listdir(color_raw_dir):
    if filename.lower().endswith(".jpg"):
        try:
            # Open the image
            img_path = os.path.join(color_raw_dir, filename)
            img = Image.open(img_path)

            # Resize the image while preserving aspect ratio
            img.thumbnail(target_size, Image.ANTIALIAS)

            # Save the resized image to the destination folder
            new_filename = os.path.splitext(filename)[0] + ".jpg"
            img.save(os.path.join(color_dir, new_filename), "JPEG")

            # print(f"Resized {filename} to {target_size[0]}x{target_size[1]}")
        except Exception as e:
            print(f"Error processing {filename}: {e}")

print("All images resized")