# COD Workshop - Data Sampling
This Notebook will serve as a guide on how to perform data sampling on the COD10K Dataset.
This is done for the workshop to reduce the number of classes and images used on the test models due to time and performance limitations that other devices might have.

## Instructions
1. Download the COD10K found in this link: https://dengpingfan.github.io/pages/COD.html
2. Extract the COD10K Dataset into a folder and place it in the `Datasets` directory.
3. Rename the COD10K variable in this Notebook to the relative directory of the COD10K Dataset.

## 1. Importing the Dataset

In [34]:
import os

COD10K = "Datasets/COD10K-v3"
SAMPLED_COD10K = "Datasets/Sampled_COD10K"
if not os.path.exists(COD10K):
    print("COD10K Directory not found. Please check the folder path again.")


train_path = os.path.join(COD10K, "Train")
test_path = os.path.join(COD10K, "Test")

output_train_path = os.path.join(SAMPLED_COD10K, "Train")
output_test_path = os.path.join(SAMPLED_COD10K, "Test")

The function `get_image_categories` is a helper function designed for you to view the different classes available in the COD10K dataset so that you can choose which classes to sample from.

In [None]:
def get_image_categories(dataset_path):
    #Show the level of categories in the dataset. Each level of category is separated by a "-" in the image's filename.

get_image_categories(train_path)


{0: ['COD10K'],
 1: ['CAM', 'NonCAM'],
 2: ['1', '2', '3', '4', '5'],
 3: ['Aquatic',
  'Terrestrial',
  'Flying',
  'Amphibian',
  'Other',
  'Terrestial',
  'Background'],
 4: ['1',
  '10',
  '11',
  '12',
  '13',
  '14',
  '15',
  '16',
  '17',
  '18',
  '19',
  '2',
  '20',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9',
  '21',
  '22',
  '23',
  '24',
  '25',
  '26',
  '27',
  '28',
  '29',
  '30',
  '31',
  '32',
  '33',
  '34',
  '35',
  '36',
  '37',
  '38',
  '39',
  '40',
  '41',
  '42',
  '43',
  '44',
  '45',
  '46',
  '47',
  '48',
  '49',
  '50',
  '51',
  '52',
  '53',
  '54',
  '55',
  '56',
  '57',
  '58',
  '59',
  '60',
  '61',
  '62',
  '63',
  '64',
  '65',
  '66',
  '67',
  '68',
  '69'],
 5: ['BatFish',
  'LeafySeaDragon',
  'Octopus',
  'Pagurian',
  'Pipefish',
  'ScorpionFish',
  'SeaHorse',
  'Shrimp',
  'Slug',
  'StarFish',
  'Stingaree',
  'ClownFish',
  'Turtle',
  'Crab',
  'Crocodile',
  'CrocodileFish',
  'Fish',
  'Flounder',
  'FrogFish',
  'GhostPi

Type in the Classes you wish to include in your sampling in this array. <br>
Make sure these are the same classes you wish to evaluate in `Model_Eval.ipynb`.

In [36]:
classes_to_include = ["Chameleon", "Cat", "Dog", "Wolf", "Owl"]

After determining the classes. Sampling is then performed. Corresponding Images with Ground Truths from each class are sampled accordingly.

In [None]:
import random
from shutil import copyfile

def sample_images(dataset_path, output_path, sample_count=5):
    global classes_to_include

    #1.Image and GT paths

    #2. Output Directories for Sampled Images and GTs

    #3. Create output directories if they don't exist

    #4. Get all images. Remove images not in the specified classes

    #5. Sample images randomly
    
    #6. Save the images

The sampling is done below, change the number at the end to specify the number of images to get for the Training and Testing Datasets.

In [38]:
sample_images(train_path, output_train_path, 250)

['COD10K-CAM-2-Terrestrial-23-Cat-1298.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1302.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1303.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1305.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1306.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1307.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1308.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1309.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1311.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1312.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1313.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1314.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1315.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1316.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1317.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1318.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1319.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1321.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1322.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1323.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1324.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1326.jpg', 'COD10K-CAM-2-Terrestrial-23-Ca

In [39]:
sample_images(test_path, output_test_path, 30)

['COD10K-CAM-2-Terrestrial-23-Cat-1299.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1300.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1301.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1304.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1310.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1320.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1325.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1327.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1328.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1329.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1330.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1331.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1333.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1337.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1339.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1344.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1350.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1352.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1358.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1359.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1360.jpg', 'COD10K-CAM-2-Terrestrial-23-Cat-1363.jpg', 'COD10K-CAM-2-Terrestrial-23-Ca