# Random image selector

This code will randomly select images from classes, organised in subfolders. The number of images selected from each subfolder is depended on the number of available images for a class. An output csv file is generated, with the filenames of the randomly selected images, and their label (based on the subdirectory they were taken from)

In [1]:
# Import necessary libraries
import os
import shutil
import random
import csv

In [2]:
# Function to select and move .tif files based on given criteria and generate a CSV
def select_move_files_create_csv(source_directory, dest_directory, csv_file):
    # Create destination directory if it doesn't exist
    if not os.path.exists(dest_directory):
        os.makedirs(dest_directory)

    # Open the CSV file for writing
    with open(csv_file, mode='w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(['Filename', 'Subdirectory'])

        for subdir, _, files in os.walk(source_directory):
            tif_files = [file for file in files if file.endswith('.tif')]
            num_files = len(tif_files)

# Here we define the number of images selected for the test-set, dependend on th enumber of images available for a class.
            
            if num_files > 1000:
                num_to_move = 100
            elif 500 <= num_files <= 1000:
                num_to_move = 50
            elif 100 <= num_files <= 500:
                num_to_move = 20
            elif 50 <= num_files <= 100:
                num_to_move = 7
            elif 10 <= num_files < 50:
                num_to_move = 3
            else:
                num_to_move = 0

            if num_to_move > 0:
                selected_files = random.sample(tif_files, num_to_move)
                subdirectory_name = os.path.basename(subdir)
                for file in selected_files:
                    src_path = os.path.join(subdir, file)
                    dest_path = os.path.join(dest_directory, file)
                    shutil.move(src_path, dest_path)
                    print(f'Moved {src_path} to {dest_path}')
                    # Write the file and subdirectory to the CSV
                    writer.writerow([file, subdirectory_name])

In [3]:
# Paths to the source directory and destination directory (update these paths)
source_directory = 'data/DETAILED_merged'
dest_directory = 'data/DETAILED_test'
csv_file = 'data/DETAILED_test.csv'

In [4]:
# Execute the function to select and move files and create the CSV
select_move_files_create_csv(source_directory, dest_directory, csv_file)

Moved data/DETAILED_merged/Cnidaria_Hydrozoa-polyp/pia6.2023-05-31.1810+N00033187.tif to data/DETAILED_test/pia6.2023-05-31.1810+N00033187.tif
Moved data/DETAILED_merged/Cnidaria_Hydrozoa-polyp/pia7.2024-01-11.0610+N00002101.tif to data/DETAILED_test/pia7.2024-01-11.0610+N00002101.tif
Moved data/DETAILED_merged/Cnidaria_Hydrozoa-polyp/pia1.2023-08-15.0720+N00014880.tif to data/DETAILED_test/pia1.2023-08-15.0720+N00014880.tif
Moved data/DETAILED_merged/Crustacea_Cirripedia-larvae/pia6.2023-06-08.1600+N00075447.tif to data/DETAILED_test/pia6.2023-06-08.1600+N00075447.tif
Moved data/DETAILED_merged/Crustacea_Cirripedia-larvae/pia6.2023-06-09.0200+N00073892.tif to data/DETAILED_test/pia6.2023-06-09.0200+N00073892.tif
Moved data/DETAILED_merged/Crustacea_Cirripedia-larvae/pia6.2023-06-08.1630+N00095666.tif to data/DETAILED_test/pia6.2023-06-08.1630+N00095666.tif
Moved data/DETAILED_merged/Crustacea_Cirripedia-larvae/pia6.2023-06-01.0800+N00084517.tif to data/DETAILED_test/pia6.2023-06-01.08

Moved data/WMR_PI10_learning_set_v4/echinoderm_echinopluteus_type_2/pia6.2023-06-06.1230+N00047277.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-06.1230+N00047277.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_echinopluteus_type_2/pia6.2023-05-31.1712+N00022834.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-05-31.1712+N00022834.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_echinopluteus_type_2/pia6.2023-05-30.1810+N00025190.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-05-30.1810+N00025190.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_echinopluteus_type_2/pia6.2023-06-06.1340+N00024030.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-06.1340+N00024030.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_echinopluteus_type_2/pia6.2023-06-08.2310+N00084112.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-08.2310+N00084112.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_echinopluteus_type_2/pia6.2023-05-30.1510+N

Moved data/WMR_PI10_learning_set_v4/echinoderm_bipinnaria/pia6.2023-06-05.1430+N00071825.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-05.1430+N00071825.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_bipinnaria/pia6.2023-05-31.1720+N00048581.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-05-31.1720+N00048581.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_bipinnaria/pia6.2023-06-05.1700+N00087896.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-05.1700+N00087896.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_bipinnaria/pia6.2023-06-06.0520+N00006444.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-06.0520+N00006444.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_bipinnaria/pia6.2023-06-01.0500+N00092217.tif to data/test_train_WMR_PI10_learning_set_v4/pia6.2023-06-01.0500+N00092217.tif
Moved data/WMR_PI10_learning_set_v4/echinoderm_bipinnaria/pia6.2023-06-09.1200+N00059110.tif to data/test_train_WMR_PI10_learning_set_v4/pia