As the first step of this tutorial, we will generate some synthetic data to train a model.

This code generates solid color swatches and striped color swatches.

You can see the code in `generate_swatches.py`.

Note: this code will create a directory called `data` filled with png files.

In [1]:
%load_ext autoreload

%autoreload 2

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [3]:
from generate_swatches import generate_color_swatches, generate_pattern_swatches

Generate Synthetic Data
===

In [4]:
generate_color_swatches()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df['image_locs'] = image_locs


In [5]:
generate_pattern_swatches()

Note: if you changed the default storage location above, you will need to alter the next two lines.

In [6]:
COLOR_DATASET_PATH = 'data/color_swatches'

In [7]:
PATTERN_DATASET_PATH = 'data/pattern_swatches'

Format color data
===

Now we will put the data in the format needed for Octopod.

In [8]:
color_df = pd.read_csv(f'{COLOR_DATASET_PATH}/color_dataset.csv')

In [9]:
color_df.head()

Unnamed: 0,complex_color,simple_color,rgb,image_locs
0,cloudy blue,blue,"(172, 194, 217)",/home/ubuntu/octopod/notebooks/synthetic_data/da...
1,dark pastel green,green,"(86, 174, 87)",/home/ubuntu/octopod/notebooks/synthetic_data/da...
2,electric lime,green,"(168, 255, 4)",/home/ubuntu/octopod/notebooks/synthetic_data/da...
3,fresh green,green,"(105, 216, 79)",/home/ubuntu/octopod/notebooks/synthetic_data/da...
4,nasty green,green,"(112, 178, 63)",/home/ubuntu/octopod/notebooks/synthetic_data/da...


You can see the generated csv `color_dataset.csv` contains four columns:
- complex-color: the value we will use for the text
- simple_color: the value we will try to predict
- rgb: the raw RGB value used to generate the swatch
- image_locs: the location of where each image is stored

We set `simple_color` to be a pandas category type

In [10]:
color_df['simple_color'] = color_df['simple_color'].astype('category')

We create a new column called `simple_color_cat` that contains the category to predict as an integer.
This is an important step because Octopod does not natively handle categories as text

In [11]:
color_df['simple_color_cat'] = color_df['simple_color'].cat.codes

In [12]:
# Category mapping
{label: i for i, label in enumerate(color_df['simple_color'].cat.categories)}

{'blue': 0, 'green': 1}

Note: you will need to save this mapping for later so that you can use your model to make predictions.
Since this is a simple model, we won't save it, but for a real project, we would store it somewhere.

In [13]:
color_df.head()

Unnamed: 0,complex_color,simple_color,rgb,image_locs,simple_color_cat
0,cloudy blue,blue,"(172, 194, 217)",/home/ubuntu/octopod/notebooks/synthetic_data/da...,0
1,dark pastel green,green,"(86, 174, 87)",/home/ubuntu/octopod/notebooks/synthetic_data/da...,1
2,electric lime,green,"(168, 255, 4)",/home/ubuntu/octopod/notebooks/synthetic_data/da...,1
3,fresh green,green,"(105, 216, 79)",/home/ubuntu/octopod/notebooks/synthetic_data/da...,1
4,nasty green,green,"(112, 178, 63)",/home/ubuntu/octopod/notebooks/synthetic_data/da...,1


Now our color data is in the right format for Octopod so we'll split it and save it again.

In [14]:
color_train_df, color_valid_df = train_test_split(color_df, train_size=0.75, random_state=17)

In [15]:
color_train_df.to_csv(f'{COLOR_DATASET_PATH}/color_train.csv', index=False)

In [16]:
color_valid_df.to_csv(f'{COLOR_DATASET_PATH}/color_valid.csv', index=False)

Format pattern data
===

In [17]:
pattern_df = pd.read_csv(f'{PATTERN_DATASET_PATH}/pattern_dataset.csv')

In [18]:
pattern_df.head()

Unnamed: 0,pattern_type,image_locs
0,striped,
1,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...
2,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...
3,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...
4,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...


You can see the generated csv `pattern_dataset.csv` contains two columns:
- pattern_type: the value we will try to predict
- image_locs: the location of where each image is stored

We set `pattern_type` to be a pandas category type

In [19]:
pattern_df['pattern_type'] = pattern_df['pattern_type'].astype('category')

We create a new column called `pattern_type_cat` that contains the category to predict as an integer.
This is an important step because Octopod does not natively handle categories as text

In [20]:
pattern_df['pattern_type_cat'] = pattern_df['pattern_type'].cat.codes

In [21]:
# Category mapping
{label: i for i, label in enumerate(pattern_df['pattern_type'].cat.categories)}

{'solid': 0, 'striped': 1}

Note: you will need to save this mapping for later so that you can use your model to make predictions.
Since this is a simple model, we won't save it, but for a real project, we would store it somewhere.

Since we don't have any text for our pattern data, we will fake it by just using text from the filenames.
Note: this would be a terrible idea for a real model since the target variable is contained in the file name.

In [22]:
pattern_df['fake_text'] = [i.replace('/', ' ') for i in pattern_df['image_locs']]

In [23]:
pattern_df.loc[0]['image_locs']

'/home/ubuntu/octopod/notebooks/synthetic_data/data/pattern_swatches/striped_0.jpg'

In [24]:
pattern_df.loc[0]['fake_text']

' home ubuntu octopod notebooks synthetic_data data pattern_swatches striped_0.jpg'

In [25]:
pattern_df.head()

Unnamed: 0,pattern_type,image_locs,pattern_type_cat,fake_text
0,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...,1,home ubuntu octopod notebooks synthetic_data da...
1,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...,1,home ubuntu octopod notebooks synthetic_data da...
2,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...,1,home ubuntu octopod notebooks synthetic_data da...
3,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...,1,home ubuntu octopod notebooks synthetic_data da...
4,striped,/home/ubuntu/octopod/notebooks/synthetic_data/da...,1,home ubuntu octopod notebooks synthetic_data da...


Now our pattern data is in the right format for Octopod so we'll split it and save it again.

In [26]:
pattern_train_df, pattern_valid_df = train_test_split(pattern_df, train_size=0.75, random_state=17)

In [27]:
pattern_train_df.to_csv(f'{PATTERN_DATASET_PATH}/pattern_train.csv', index=False)

In [28]:
pattern_valid_df.to_csv(f'{PATTERN_DATASET_PATH}/pattern_valid.csv', index=False)

Now that we have some data, move onto the notebook labeled `Step2_train_image_model.ipynb`.