## Input Preparation
In this notebook, take directories of sea lion image chips and convert them to format ready for machine learning training.

In [10]:
import cv2
import pathlib
import pickle
import random

##### Use pathlib to inspect files in the bbox_chips directory, and glob together one list of all the images

In [5]:
images_root = pathlib.Path('../../results/bbox_chips')
print('Chip Folders: ', '\n', '-' * 50)

for item in images_root.iterdir():
    print(item)
    
all_image_paths = list(images_root.glob('*/*'))
all_image_paths = [str(path) for path in all_image_paths]

random.seed(42)
random.shuffle(all_image_paths)  # shuffle now for train test split later

print('-' * 50)
print(f'Total image count is {len(all_image_paths)}')

Chip Folders:  
 --------------------------------------------------
..\..\results\bbox_chips\adult_females
..\..\results\bbox_chips\adult_males
..\..\results\bbox_chips\juveniles
..\..\results\bbox_chips\pups
..\..\results\bbox_chips\subadult_males
--------------------------------------------------
Total image count is 1229


##### Create label index codes. Get names from the folder directories.

In [6]:
label_names = sorted(item.name for item in images_root.glob('*/')
                     if item.is_dir())
label_to_index = dict((name, index) for index, name in enumerate(label_names))
label_to_index

{'adult_females': 0,
 'adult_males': 1,
 'juveniles': 2,
 'pups': 3,
 'subadult_males': 4}

##### Create labels for all chips

In [7]:
all_image_labels = [
    label_to_index[pathlib.Path(path).parent.name] for path in all_image_paths
]
all_image_labels[0:10] # sample of the labels

[3, 2, 0, 1, 0, 3, 2, 0, 0, 0]

##### Read in each chip filename as an image array with OpenCV.

In [9]:
all_image_arrays = [cv2.imread(str(i)) for i in all_image_paths
                    ]  # Read each path in as an image array
all_image_arrays = [cv2.resize(img, (80, 80)) for img in all_image_arrays
                    ]  # Make each chip uniform in size
all_image_arrays = [
    cv2.cvtColor(i, cv2.COLOR_BGR2GRAY).ravel() for i in all_image_arrays
]  # Convert each array to grayscale (one channel), and flatten out to a 1-d array

##### Save image labels and image arrays to pickles for next step - machine learning model training and testing!

In [11]:
with open('image_labels.pkl', 'wb') as f:
    pickle.dump(all_image_labels, f)
with open('image_arrays.pkl', 'wb') as f:
    pickle.dump(all_image_arrays, f)

## Move onto next notebook - we will load these pkl files in to retrieve our labels and arrays.