# Fashion product categorisation and attribute extraction from images

### Paper: 

**A Unified Model with Structured Output for Fashion Images Classification - Ferreira *et al* (2018)**

**Abstract**: A picture is worth a thousand words. Albeit a cliché, for the fashion industry, an image of a clothing piece allows one to perceive its category (e.g., dress), sub-category (e.g., day dress) and properties (e.g., white colour with floral patterns). The seasonal nature of the fashion industry creates a highly dynamic and creative domain with evermore data, making it unpractical to manually describe a large set of images (of products). In this paper, we explore the concept of visual recognition for fashion images through an end-to-end architecture embedding the hierarchical nature of the annotations directly into the model. Towards that goal, and inspired by the work of [7], we have modified and adapted the original architecture proposal. Namely, we have removed the message passing layer symmetry to cope with Farfetch category tree, added extra layers for hierarchy level specificity, and moved the message passing layer into an enriched latent space. We compare the proposed unified architecture against state-of-the-art models and demonstrate the performance advantage of our model for structured multi-level categorization on a dataset of about 350k fashion product images.


https://arxiv.org/abs/1806.09445


Throughout this notebook, I have interspersed passages from the paper which describe the architecture, training strategy, etc.

### Tensorboard

If using TensorFlow 2.0, it is possible to run TensorBoard inline within this notebook.

However, the model architecture defined below will work best with tf v1.14. In this case, TensorBoard can still be run inline using the `tensorboardcolab` package, or in a separate tf2.0 notebook pointed to the correct log directory.  

## Imports etc

In [None]:
# Specify whether to use TF2.0 (which enables inline tensorboard)
use_tf_2 = False


# Specify whether to use the "small" or "large" images
IMAGE_SIZE = "large"

batch_size = 16

target_num_samples = 25000

TRAIN_SAMPLE = 0.85

AUGMENT_FLAG = True

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import glob
import datetime
# from PIL import Image
from skimage.io import imread
from skimage.transform import resize
import cv2
from albumentations import (
    Compose, HorizontalFlip, CLAHE, HueSaturationValue,
    RandomBrightness, RandomContrast, RandomGamma,
    ToFloat, ShiftScaleRotate)

if use_tf_2:
  from tensorflow.keras.applications.resnet50 import ResNet50
  from tensorflow.keras.preprocessing import image
  from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
  from tensorflow.keras.models import Sequential, Model, load_model
  from tensorflow.keras.layers import Input, Dense, Add, Dropout
  from tensorflow.keras.activations import softmax, sigmoid
  from tensorflow.keras import regularizers
  from tensorflow.keras.optimizers import Adam as adam
  from tensorflow.keras.losses import categorical_crossentropy, binary_crossentropy
  from tensorflow.keras.metrics import categorical_accuracy, top_k_categorical_accuracy
  from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
  from tensorflow.keras.utils import to_categorical, Sequence
  from sklearn.preprocessing import MultiLabelBinarizer
  
else:
  from keras.applications.resnet50 import ResNet50
  from keras.preprocessing import image
  from keras.applications.resnet50 import preprocess_input, decode_predictions
  from keras.models import Sequential, Model, load_model
  from keras.layers import Input, Dense, Add, Dropout
  from keras.activations import softmax, sigmoid
  from keras import regularizers
  from keras.optimizers import adam
  from keras.losses import categorical_crossentropy, binary_crossentropy
  from keras.metrics import categorical_accuracy, top_k_categorical_accuracy
  from keras.callbacks import ModelCheckpoint, TensorBoard
  from keras.utils import to_categorical, Sequence
  from sklearn.preprocessing import MultiLabelBinarizer
  import keras.backend as K

In [None]:
if use_tf_2:
  !pip install -q tf-nightly-2.0-preview
  # # Load the TensorBoard notebook extension
  %load_ext tensorboard
  
# else:
#   !pip install tensorboardcolab
#   from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback


In [None]:
today = datetime.datetime.now().strftime("%Y_%m_%d")

OUTPUT_DIR = f'./outputs_{IMAGE_SIZE}/{today}/'
if not os.path.exists(OUTPUT_DIR + 'models'):
  os.makedirs(OUTPUT_DIR + 'models')

## Data ingestion and prep

**Potential datasets:**

Clothing attributes dataset: https://exhibits.stanford.edu/data/catalog/tb980qz1002

Fashionista: https://github.com/grahamar/fashion_dataset/tree/master/fashionista

Fashion mnist: https://github.com/zalandoresearch/fashion-mnist

Deepfashion: http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html, http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/AttributePrediction.html


Product attributes: https://rloganiv.github.io/mae/

**Dataset used below:**

Product cats (with hierarchy) and attrs: https://www.kaggle.com/paramaggarwal/fashion-product-images-dataset



In [None]:
from google.colab import drive

drive.mount('/content/gdrive')

In [None]:
print(os.getcwd())
os.chdir('gdrive/My Drive/Colab Notebooks/fashion_images')
print(os.getcwd())

Here we will assume you are working in Google Colab, with the top level directory being: 

`gdrive/My Drive/Colab Notebooks/fashion_images`

Within this top level directory, download and unzip the product images dataset (and associated labels csv) to `fashion-product-images-small` or `fashion-product-images-large` depending on the dataset used.

Google Drive has a [known issue](https://github.com/googlecolab/colabtools/issues/382) whereby it will repeatedly time out if a folder contains many thousands of files - the fashion images dataset contains circa 50,000 images.

In order to mitigate this, this repository includes a shell script which can be run from the `move_images.ipynb` notebook to batch the images into subdirectories of ~1500 files each.

In [None]:
# from google.colab import files
# files.upload()  #this will prompt you to upload the kaggle.json

In [None]:
# !pip install -q kaggle
# !mkdir -p ~/.kaggle
# !cp kaggle.json ~/.kaggle/
# !ls ~/.kaggle
# !chmod 600 /root/.kaggle/kaggle.json  # set permission

# if IMAGE_SIZE == 'small'
#   ! kaggle datasets download paramaggarwal/fashion-product-images-small -p /content/gdrive/My\ Drive/Colab\ Notebooks/fashion_images/fashion-dataset-small/
#   ! unzip fashion-dataset-small/fashion-product-images-small.zip

# else:
#   # Large version
#   ! kaggle datasets download paramaggarwal/fashion-product-images-dataset -p /content/gdrive/My\ Drive/Colab\ Notebooks/fashion_images/fashion-dataset-large-2/
#   ! unzip fashion-dataset-large/fashion-product-images-dataset.zip

In [None]:
raw_data_df = pd.read_csv(f'./fashion-dataset-{IMAGE_SIZE}/labels.csv')

In [None]:
raw_data_df.head()

In [None]:
labels_df = raw_data_df[['id', 'subCategory', 'articleType', 'season', 'usage']]
labels_df['attributes'] = labels_df[['season','usage']].values.tolist()



# Get the filepaths for all the images in their subfolders
image_base_dir = f'./fashion-dataset-{IMAGE_SIZE}/images/'

image_fnames = [f for f in glob.glob(image_base_dir + "**/*.jpg", recursive=True)]

image_ids = [int(f.split('/')[-1][:-4]) for f in image_fnames]



image_ids_df = pd.DataFrame({'id':image_ids,'fname':image_fnames}, index=None)



# Inner join to make sure we have both the image and its labels
labels_df = labels_df.merge(image_ids_df, how='inner', on='id')

labels_df.reset_index(drop='index', inplace=True)


# Optionally take a random subset of the data to speed up prototyping
if target_num_samples is not None:
  
  # Limit the classes so that no 1 class has > 5 x the mean class size
  max_category_num = int(4 * np.mean(labels_df.subCategory.value_counts().values))
  labels_df = labels_df.groupby('subCategory', group_keys=False).apply(lambda x: x.sample(min(len(x), max_category_num),
                                                                                         random_state=42))
  
  # Calculate what fraction of the total dataset we need to get the target num of samples
  # Also ensure this doesn't go above 1
  downsample_proportion = min(1,target_num_samples/labels_df.shape[0])
  
  # Perform the downsampling
  labels_df = labels_df.groupby('subCategory', group_keys=False).apply(lambda x: x.sample(int(len(x)*downsample_proportion),
                                                                                         random_state=42))
  
  



#   Shuffle the dataframe
labels_df = labels_df.sample(frac=1, random_state=42).reset_index(drop=True)


  
# Recover the list of image filenames from the dataframe, to ensure orders match
image_fnames = list(labels_df['fname'])

In [None]:
#  define data augmentation

if AUGMENT_FLAG:
  AUGMENTATIONS_TRAIN = Compose([
      HorizontalFlip(p=0.5),
      RandomContrast(limit=0.2, p=0.5),
      RandomBrightness(limit=0.2, p=0.5),
      ShiftScaleRotate(
          shift_limit=0.0625, scale_limit=0.1, 
          rotate_limit=15, border_mode=cv2.BORDER_REFLECT_101, p=0.8), 
      ToFloat(max_value=255)
  ])

  AUGMENTATIONS_TEST = Compose([
      ToFloat(max_value=255)
  ])
  
else:
  AUGMENTATIONS_TRAIN = None
  AUGMENTATIONS_TEST = None

## Create train/test split, and sequence generator

In [None]:
# Here, `x_set` is list of path to the images
# and `y_set` are the associated classes.

class FashionSequence(Sequence):

    def __init__(self, x_set, y_cat, y_subcat, y_attr, batch_size, augmentations=None):
        self.x = x_set
        self.y_cat = y_cat
        self.y_subcat = y_subcat
        self.y_attr = y_attr
        self.batch_size = batch_size
        self.augment = augmentations

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y_cat = self.y_cat[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y_subcat = self.y_subcat[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y_attr = self.y_attr[idx * self.batch_size:(idx + 1) * self.batch_size]

        if self.augment == None:
          return np.array([
              resize(imread(file_name), (224, 224,3))
                 for file_name in batch_x]), [batch_y_cat, batch_y_subcat, batch_y_attr]          
        
        else:
          return np.array([
            self.augment(image=resize(imread(file_name), (224, 224,3)))["image"]
            for file_name in batch_x]), [batch_y_cat, batch_y_subcat, batch_y_attr]
      


In [None]:
num_samples = len(image_fnames)
print(f"{num_samples} samples successfully loaded")

In [None]:
num_samples_train = int(num_samples * TRAIN_SAMPLE)
num_samples_test = num_samples - num_samples_train

print(f"{num_samples_train} training and {num_samples_test} testing samples")

In [None]:
# Get category labels
cat_names = pd.Categorical(labels_df.subCategory).categories
cat_labels_all = to_categorical(pd.Categorical(labels_df.subCategory).codes)
num_categories = cat_labels_all.shape[1]


# Get subcategory labels
subcat_names = pd.Categorical(labels_df.articleType).categories
subcat_labels_all = to_categorical(pd.Categorical(labels_df.articleType).codes)
num_subcats = subcat_labels_all.shape[1]


# Get attribute labels
all_attributes = labels_df.attributes.values
attributes_set = list(set(list(labels_df.season) + list(labels_df.usage)))


mlb = MultiLabelBinarizer(classes=attributes_set)
mlb.fit([tuple(f) for f in all_attributes])
attribute_labels_all = mlb.transform(all_attributes)

attribute_names = mlb.classes_
num_attributes = attribute_labels_all.shape[1]

In [None]:
print(f"{num_categories} categories and {num_subcats} subcategories")

print(f"{num_attributes} possible attributes")

print(f"Possible attributes: {mlb.classes_}")

In [None]:
#  Get the train and test sample indices, stratified by category

train_indices = sorted(list(labels_df.groupby('subCategory', group_keys=False).apply(lambda x: x.sample(int(len(x)*TRAIN_SAMPLE), random_state=42)).index))

test_indices = [x for x in list(labels_df.index) if x not in train_indices]


In [None]:
cat_labels_train = cat_labels_all[train_indices]
subcat_labels_train = subcat_labels_all[train_indices]
attribute_labels_train = attribute_labels_all[train_indices]


cat_labels_test = cat_labels_all[test_indices]
subcat_labels_test = subcat_labels_all[test_indices]
attribute_labels_test = attribute_labels_all[test_indices]

img_list_train = list(np.array(image_fnames)[train_indices])
img_list_test = list(np.array(image_fnames)[test_indices])

In [None]:
train_generator = FashionSequence(x_set = img_list_train,
                                  y_cat = cat_labels_train,
                                  y_subcat = subcat_labels_train,
                                  y_attr = attribute_labels_train, 
                                  batch_size = batch_size,
                                  augmentations = AUGMENTATIONS_TRAIN)

test_generator = FashionSequence(x_set = img_list_test,
                                  y_cat = cat_labels_test,
                                  y_subcat = subcat_labels_test,
                                  y_attr = attribute_labels_test, 
                                  batch_size = batch_size,
                                  augmentations = AUGMENTATIONS_TEST)

## Define model architecture:


The message passing block (that encodes the category tree) is built on top of the off-the- shelf convolutional neural network ResNet-50 [5], pre-trained (i.e., with weights initialised as the weights learned after training the network) on the ImageNet [4].


Additionally, three parallel dense layers (one per hierarchy level) of dimension 1024 are connected to theoutputoftheResNet-50. ThesewillbetheinputsoftheMessage Propagation block.


Every Dense layer defined in the Message Propagation block is of dimension 1024 with a L2-norm regularization and regularization factor of 0.0005 (promoting the learning of more uniform weights, thus reducing the risk of over-fitting) followed by ReLU activation layers. 


The final Dense layers of this block (the Dense layers shown in Figure 7) are also followed by a Dropout [6] of rate 0.3.

The full architecture (encompassing the ResNet-50, intermediate dense layers for each level and the message passing block) totals 46.915.690 trainable parameters.

Output activations for each level predictions depend on the problem at hand, i.e., as the category and sub-category level predictions are multi-class problems we use a softmax function as activation, while at the attribute level we have a multi-label problem and thus we use a sigmoid activation function

In [None]:
base_model = ResNet50(weights='imagenet')


x_cat = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(base_model.output)
x_subcat = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(base_model.output)
x_attr = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(base_model.output)


### Forward pass (left hand side)
dense_1 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(x_cat)
dense_cat_b= Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_1)
dense_cat_b = Dropout(0.3)(dense_cat_b)

dense_2 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(x_subcat)
dense_3 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_cat_b)
add_1 = Add()([dense_2, dense_3])
dense_subcat_b = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(add_1)
dense_subcat_b = Dropout(0.3)(dense_subcat_b)

dense_4 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(x_attr)
dense_5 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_cat_b)
add_2 = Add()([dense_4, dense_5])
dense_attr_b = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(add_2)
dense_attr_b = Dropout(0.3)(dense_attr_b)


### Backward pass (right hand side)
dense_6 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(x_attr)
dense_attr_g = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_6)
dense_attr_g = Dropout(0.3)(dense_attr_g)

dense_7 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(x_subcat)
dense_subcat_g = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_7)
dense_subcat_g = Dropout(0.3)(dense_subcat_g)

dense_8 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(x_cat)
dense_9 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_subcat_g)
dense_10 = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(dense_attr_g)
add_3 = Add()([dense_9, dense_10])
add_4 = Add()([dense_8, add_3])
dense_cat_g = Dense(units=1024, activation='relu', kernel_regularizer=regularizers.l2(0.0005))(add_4)
dense_cat_g = Dropout(0.3)(dense_cat_g)


add_cat = Add()([dense_cat_b, dense_cat_g])
cat_out = Dense(units=num_categories, activation='softmax', kernel_regularizer=regularizers.l2(0.0005), name="Category_output")(add_cat)

add_subcat = Add()([dense_subcat_b, dense_subcat_g])
subcat_out = Dense(units=num_subcats, activation='softmax', kernel_regularizer=regularizers.l2(0.0005), name="Subcategory_output")(add_subcat)

add_attr = Add()([dense_attr_b, dense_attr_g])
attr_out = Dense(units=num_attributes, activation='sigmoid', kernel_regularizer=regularizers.l2(0.0005), name="Attributes_output")(add_attr)




In [None]:
model = Model(inputs=base_model.input, outputs=[cat_out, subcat_out, attr_out])

## Training strategy and evaluation metrics

The network is trained by minimising a weighted cross-entropy loss for each level in order to estimate the parameters that originate the most correct predictions for the category, sub-category and attribute levels.

A weighting mechanism is used to address class imbalance, a common issue that also arises in our dataset. In partic- ular, we compute the occurrence frequency of each class/label and apply a customised cross-entropy loss where the penalisation is weighted by the inverse of its frequency. 

Hence, the loss for predict- ing more frequent classes is down-weighted while when predicting more rare classes the loss is penalised. This way, all classes per level should be equally important during the training process of the model. 

Also, contrarily to what is presented in [7], we train our model in a single-shot fashion (end-to-end). 

The loss functions are optimised via backpropagation and batched-based Adam [8], with a batch size of 32 images and a learning rate of 0.001


**Evaluation Metrics**: For category and sub-category classifica- tion we choose the class with highest estimated confidence score. 

For the multi-label attribute classification, the labels are predicted as positive if the predicted label confidence is greater than 0.75. This threshold was chosen in conjunction with the business (to find a good ratio between adding new attributes without making serious mistakes). Nevertheless, we will present some metrics that are threshold independent to allow a better comparison between each approach.

For the multi-class problems, for category and sub-category levels, we report overall precision (OP), recall (OR), and F1-score (OF1), weighted by class support, i.e., the number of true instances for each class.

Therefore, given that we are using a weighted version of the macro precision and recall, the resulting F1-scores may not be between precision and recall.

For the multi-label classification, at attributes level, we also em- ploy overall precision (OP), recall (OR), and F1-score (OF1) for performance comparison. Moreover, we use precision (P@k), recall (R@k) and F1-score (F1@k) @ top K labels (where K is the number of ground truth labels that each product is annotated with). 

We also report the average precision (AP), which summarises the precision- recall curve. The previous metrics allow us to assess the method performance irrespective of defining a threshold on the confidence scores for positive/negative classification. For all these metrics, the larger value, the better the performance. 

In [None]:
opt = adam(lr=0.001)


losses = {'Category_output':'categorical_crossentropy',
         'Subcategory_output':'categorical_crossentropy',
         'Attributes_output':'binary_crossentropy'}

In [None]:
metrics = {"Category_output":"categorical_accuracy",
          "Subcategory_output":"categorical_accuracy",
          "Attributes_output":"top_k_categorical_accuracy"}

## Model compilation and summary

In [None]:
model.compile(optimizer=opt, loss=losses, metrics=metrics)

In [None]:
model.summary()

In [None]:
cat_weights = dict([(k,v) for k,v in zip(np.arange(num_categories),1/np.sum(cat_labels_train+0.001, axis=0))])
subcat_weights = dict([(k,v) for k,v in zip(np.arange(num_subcats),1/np.sum(subcat_labels_train+0.001, axis=0))])
attr_weights = dict([(k,v) for k,v in zip(np.arange(num_attributes),1/np.sum(attribute_labels_train+0.001, axis=0))])

weights_dict = {'Category_output':cat_weights,
               'Subcategory_output':subcat_weights,
                  'Attributes_output':attr_weights}

In [None]:
checkpoint_callback = ModelCheckpoint(filepath=OUTPUT_DIR + 'models/model-e-{epoch:02d}-loss-{val_loss:.2f}.hdf5', monitor='val_loss')


# if use_tf_2:
tb_callback = TensorBoard(log_dir=f'{OUTPUT_DIR}/logs', 
                          histogram_freq=0, 
                          batch_size=batch_size, 
                          update_freq=2048,
                         write_grads=True,
                          write_images=True)

# else:
#   tbc=TensorBoardColab()
#   tb_callback = TensorBoardColabCallback(tbc)

## Training

In [None]:
if use_tf_2:
  %tensorboard --logdir logs

In [None]:
model.fit_generator(train_generator, 
                    steps_per_epoch=None, 
                    epochs=25, 
                    verbose=1, 
                    callbacks=[checkpoint_callback, tb_callback],
                    validation_data=test_generator, 
                    validation_steps=None, 
                    class_weight=weights_dict, 
                    max_queue_size=64, 
                    workers=8, 
                    use_multiprocessing=True, 
                    shuffle=True,
                    initial_epoch=0)

In [None]:
for k in model.history.history.keys():
    print (f"Final {k}:  {model.history.history[k][-1]}")
    

## Predicting with trained model

In [None]:
predictions = model.predict_generator(test_generator)

In [None]:
for i in range(250):
    print(f"Input image {i}:")

    plt.figure(figsize=(7,7))
    plt.imshow(plt.imread(img_list_test[i]))
    plt.show()
    plt.close()

    cat_pred = cat_names[np.argmax(predictions[0][i])]
    cat_pred_score = str(max(predictions[0][i]))[:5]
    
    subcat_pred = subcat_names[np.argmax(predictions[1][i])]
    subcat_pred_score = str(max(predictions[1][i]))[:5]
    
    attr_preds = attribute_names[np.where(predictions[2][i] > 0.7)]
    attr_pred_scores = np.around(predictions[2][i][np.where(predictions[2][i] > 0.7)],3)
    
    gt_cat = cat_names[np.where(cat_labels_test[i])][0]
    gt_subcat = subcat_names[np.where(subcat_labels_test[i])][0]
    gt_attr = attribute_names[np.where(attribute_labels_test[i])]

    print(f"Actual category:       {gt_cat}")
    print(f"Predicted category:    {cat_pred} ({cat_pred_score})")

    print()
    
    print(f"Actual subcategory:    {gt_subcat}")
    print(f"Predicted subcategory: {subcat_pred} ({subcat_pred_score})")
    
    print()

    print("Actual attributes:    ", ", ".join(gt_attr))
    print("Predicted attributes: ", ", ".join("{} ({})".format(x, str(y)[:5]) for x, y in zip(attr_preds, attr_pred_scores)))

    
    print()
    print()