# Creating Image Dataset for Peltarion Platform

This Jupyter notebook will guide you through the process of creating your own Peltarion compatible image dataset.

## 1. Data and Folder Structure

Below you can see the required folder structure for your data. The images should be divided into subfolders, where the name of the subfolder works as the category for the images. 

Create __your own__ dataset with the following folder structure and save the path to the data folder for future use.

<img src="data_folder.png" alt="Data folder structure" title="Data folder structure" />

## 2. Load Dependencies

In [18]:
import os
import pandas as pd
import functools
import sidekick
from glob import glob
from PIL import Image

## 3. Dataset Paths

Set the __full__ path of your data folder and the __full__ path of ZIP-file including the name of the ZIP-file to be created.

In [19]:
# Path to the dataset
input_path = 'example_data/'  ##TODO
os.chdir(os.getcwd())

# Path to the zip output
output_path = os.getcwd() + 'dataset.zip'  ##TODO

## 4. Save Image Paths

Save __relative__ image paths to a list.

In [20]:
images_rel_path = glob(os.path.join('*', '*.jpg')) + glob(os.path.join('*', '*.png'))
print("Images found: ", len(images_rel_path))

Images found:  30


## 5. Create Pandas Dataframe

Dataframe has two columns: category and the __relative__ image path. The category is derived from the names of the subfolders.

In [21]:
df = pd.DataFrame({'image': images_rel_path})
df['class'] = df['image'].apply(lambda path: os.path.basename(os.path.dirname(path)))

##Check Dataframe
print("Dataframe has classes:", df['class'].unique())

Dataframe has classes: ['oranges' 'apples']


## 6. Validate Dataframe

Validate that the format for __all__ the pictures is the same eg. RGB.

In [22]:
def get_mode(path):
    im = Image.open(path)
    im.close()
    return im.mode

df['image_mode'] = df['image'].apply(lambda path: get_mode(path))
print("Image mode:", df['image_mode'].value_counts())

if len(df['image_mode'].unique()) > 1:
    print("Dataset contains more than one color model")

df = df.drop(['image_mode'], axis=1)

Image mode: RGB    30
Name: image_mode, dtype: int64


## 7. Create the Peltarion Dataset Bundle

Create Peltarion datataset bundle using [Sidekick](https://github.com/Peltarion/sidekick), a helper tool for creating Peltarion platform compatible datasets.

In [23]:
'''
Available modes:
- crop_and_resize
- center_crop_or_pad
- resize_image
'''
image_processor = functools.partial(sidekick.process_image, mode='crop_and_resize', size=(100, 100), file_format='png')
sidekick.create_dataset(
    output_path,
    df,
    path_columns=['image'],
    preprocess={
        'image': image_processor
    }
)

## 8. Peltarion Platform Format

Now you can upload the ZIP-file created by Sidekick to the Peltarion platform and start deep learning.

<img src="zip_folder.png" alt="Peltarion Dataset Structure" title="Dataset Structure" />

File structure for Peltarion platform.

## 9. Uploading the ZIP-file to the Peltarion Platform

Now you can upload your ZIP-file to [Peltarion platform](https://platform.peltarion.com).