# Image Augmentation Documentation

## Introduction
This documentation provides an overview of the image augmentation code, which performs augmentation on a collection of input images. The code incorporates multithreading, batch processing, and optimized memory usage to enhance efficiency and scalability.

## Code Overview
The code can be divided into the following sections:

### 1. Importing Required Libraries
- `os`: For file and directory operations.
- `csv`: For writing data to a CSV file.
- `numpy`: For numerical operations.
- `PIL`: For image manipulation.
- `rembg`: For background removal.
- `concurrent.futures.ThreadPoolExecutor`: For multithreading support.

### 2. Setting Input and Output Directories
- `input_dir`: The directory containing the input images.
- `output_dir`: The directory to store the augmented images.
- `csv_file`: The path to the CSV file for storing augmented data.

### 3. Creating the Output Directory
- Checks if the output directory exists. If not, it creates the directory.

### 4. Configuring Rotation Parameters
- `min_angle`: The minimum rotation angle (in degrees).
- `max_angle`: The maximum rotation angle (in degrees).
- `angle_increment`: The increment value for rotation (in degrees).

### 5. Setting Batch Size
- `batch_size`: The number of images to be processed in each batch.

### 6. Initializing Augmented Data List
- Creates an empty list to store augmented image data.

### 7. Function to Process a Batch of Images
- The `process_batch` function takes a batch of filenames as input.
- Iterates over each filename and performs image processing tasks.
- Resizes the image, removes the background, rotates it, and saves the augmented images.
- Stores the augmented data in a batch-specific list.

### 8. Processing Images in Batches
- Iterates over the input images in batches.
- Collects a batch of filenames and passes them to the `process_batch` function.
- Uses multithreading with `ThreadPoolExecutor` to process multiple batches concurrently.
- Collects the augmented data from each batch and appends it to the `augmented_data` list.

### 9. Writing Augmented Data to CSV File
- Opens the CSV file in write mode.
- Creates a CSV writer object.
- Writes the header row.
- Writes the augmented data rows.

### 10. Removing Resized Images
- Removes the resized images (prefixed with "resized_") to clean up the output directory.

## Conclusion
The image augmentation code efficiently performs augmentation on a collection of input images. It utilizes multithreading and batch processing to parallelize image processing tasks, optimizing performance. Additionally, the code incorporates optimized memory usage to handle large datasets. The augmented images are saved in the specified output directory, and the augmented data is recorded in a CSV file. This code provides flexibility, scalability, and improved efficiency in generating augmented image datasets.


### Input target : 1326 images
### Output target : 1326 * 37 = 49062 images 

In [1]:
import os
import csv
import numpy as np
from PIL import Image
from rembg import remove
from concurrent.futures import ThreadPoolExecutor

In [2]:
# Set input and output directories
input_dir = './data'
output_dir = './augmented'
csv_file = './augmented_data.csv'


In [3]:
# Create the output directory if it doesn't exist
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

In [4]:
# Set rotation angle range and increment
min_angle = -90
max_angle = 90
angle_increment = 5

# Set batch size
batch_size = 100

# Create a list to store the augmented data
augmented_data = []

In [5]:
def process_batch(batch_filenames):
    total_files = len(batch_filenames)
    processed_files = 0

    batch_augmented_data = []
    for filename in batch_filenames:
        processed_files += 1
        progress = processed_files / total_files * 100
        print(f"Processing file {processed_files}/{total_files} ({progress:.2f}% complete)")

        if filename.endswith('.png') or filename.endswith('.jpg'):
            # Load the input image
            input_path = os.path.join(input_dir, filename)
            input_image = Image.open(input_path)

            # Get the width and height of the input image
            width, height = input_image.size

            # Calculate the target size based on 0.1 times the height and width
            target_width = int(0.1 * width)
            target_height = int(0.1 * height)
            target_size = (target_width, target_height)

            # Resize the input image
            resized_image = input_image.resize(target_size)

            # Convert the resized image to RGBA format
            resized_image_rgba = resized_image.convert("RGBA")

            # Save the resized image as PNG
            resized_output_filename = f"resized_{filename}.png"
            resized_output_path = os.path.join(output_dir, resized_output_filename)
            resized_image_rgba.save(resized_output_path, "PNG")

            # Remove the background from the resized image
            resized_image_array = np.array(resized_image_rgba)
            output_image_array = remove(resized_image_array)
            output_image = Image.fromarray(output_image_array, 'RGBA')

            # Apply rotation to create augmented images
            for angle in range(min_angle, max_angle + 1, angle_increment):
                # Rotate the image
                rotated_image = output_image.rotate(angle, expand=True)

                # Save the augmented image as PNG
                output_filename = f"augmented_{angle}_{filename}"
                output_path = os.path.join(output_dir, output_filename)
                rotated_image.save(output_path, "PNG")

                # Store the augmented data in the list
                batch_augmented_data.append([output_filename, output_path, angle])

    return batch_augmented_data

In [6]:
# Iterate over the input images in batches
batch_filenames = []
for filename in os.listdir(input_dir):
    batch_filenames.append(filename)
    if len(batch_filenames) == batch_size:
        with ThreadPoolExecutor() as executor:
            batch_augmented_data = executor.submit(process_batch, batch_filenames).result()
            augmented_data.extend(batch_augmented_data)
        batch_filenames = []

Processing file 1/100 (1.00% complete)
Processing file 2/100 (2.00% complete)
Processing file 3/100 (3.00% complete)
Processing file 4/100 (4.00% complete)
Processing file 5/100 (5.00% complete)
Processing file 6/100 (6.00% complete)
Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% com

Processing file 2/100 (2.00% complete)
Processing file 3/100 (3.00% complete)
Processing file 4/100 (4.00% complete)
Processing file 5/100 (5.00% complete)
Processing file 6/100 (6.00% complete)
Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% complete)
Processing file 26/100 (26.00% c

Processing file 3/100 (3.00% complete)
Processing file 4/100 (4.00% complete)
Processing file 5/100 (5.00% complete)
Processing file 6/100 (6.00% complete)
Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% complete)
Processing file 26/100 (26.00% complete)
Processing file 27/100 (27.00%

Processing file 4/100 (4.00% complete)
Processing file 5/100 (5.00% complete)
Processing file 6/100 (6.00% complete)
Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% complete)
Processing file 26/100 (26.00% complete)
Processing file 27/100 (27.00% complete)
Processing file 28/100 (28.0

Processing file 5/100 (5.00% complete)
Processing file 6/100 (6.00% complete)
Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% complete)
Processing file 26/100 (26.00% complete)
Processing file 27/100 (27.00% complete)
Processing file 28/100 (28.00% complete)
Processing file 29/100 (29

Processing file 6/100 (6.00% complete)
Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% complete)
Processing file 26/100 (26.00% complete)
Processing file 27/100 (27.00% complete)
Processing file 28/100 (28.00% complete)
Processing file 29/100 (29.00% complete)
Processing file 30/100 (

Processing file 7/100 (7.00% complete)
Processing file 8/100 (8.00% complete)
Processing file 9/100 (9.00% complete)
Processing file 10/100 (10.00% complete)
Processing file 11/100 (11.00% complete)
Processing file 12/100 (12.00% complete)
Processing file 13/100 (13.00% complete)
Processing file 14/100 (14.00% complete)
Processing file 15/100 (15.00% complete)
Processing file 16/100 (16.00% complete)
Processing file 17/100 (17.00% complete)
Processing file 18/100 (18.00% complete)
Processing file 19/100 (19.00% complete)
Processing file 20/100 (20.00% complete)
Processing file 21/100 (21.00% complete)
Processing file 22/100 (22.00% complete)
Processing file 23/100 (23.00% complete)
Processing file 24/100 (24.00% complete)
Processing file 25/100 (25.00% complete)
Processing file 26/100 (26.00% complete)
Processing file 27/100 (27.00% complete)
Processing file 28/100 (28.00% complete)
Processing file 29/100 (29.00% complete)
Processing file 30/100 (30.00% complete)
Processing file 31/100

In [7]:
# Process any remaining images in the last batch
if len(batch_filenames) > 0:
    with ThreadPoolExecutor() as executor:
        batch_augmented_data = executor.submit(process_batch, batch_filenames).result()
        augmented_data.extend(batch_augmented_data)

Processing file 1/26 (3.85% complete)
Processing file 2/26 (7.69% complete)
Processing file 3/26 (11.54% complete)
Processing file 4/26 (15.38% complete)
Processing file 5/26 (19.23% complete)
Processing file 6/26 (23.08% complete)
Processing file 7/26 (26.92% complete)
Processing file 8/26 (30.77% complete)
Processing file 9/26 (34.62% complete)
Processing file 10/26 (38.46% complete)
Processing file 11/26 (42.31% complete)
Processing file 12/26 (46.15% complete)
Processing file 13/26 (50.00% complete)
Processing file 14/26 (53.85% complete)
Processing file 15/26 (57.69% complete)
Processing file 16/26 (61.54% complete)
Processing file 17/26 (65.38% complete)
Processing file 18/26 (69.23% complete)
Processing file 19/26 (73.08% complete)
Processing file 20/26 (76.92% complete)
Processing file 21/26 (80.77% complete)
Processing file 22/26 (84.62% complete)
Processing file 23/26 (88.46% complete)
Processing file 24/26 (92.31% complete)
Processing file 25/26 (96.15% complete)
Processing 

In [8]:
# Write the augmented data to the CSV file
with open(csv_file, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Image Name', 'Image Address', 'Rotation Angle'])
    writer.writerows(augmented_data)

In [9]:
# Remove the resized images
for filename in os.listdir(output_dir):
    if filename.startswith("resized_"):
        resized_image_path = os.path.join(output_dir, filename)
        if os.path.exists(resized_image_path):
            os.remove(resized_image_path)