Assume you are a team of machine learning engineers working for an ecommerce furniture shop, where users can browse and navigate interior furniture items. You are required to build a Furniture Recommender that allows users who have recently moved to explore furniture on your ecommerce system at ease. Your systems should have a functionality to help users navigate to the category of the furniture item that users want to buy. In most of the current online shops, users should type the name of the items and browse from the list of the results. However, to enhance the quality of the searching results, our system provides an image based searching function, where the users can upload the images of the furniture item that they are looking for. The system will accomplish an image search and return the list of similar-styled furniture in favor from our dataset.
In the Furniture dataset, there are 06 categories: beds - 6578 images; chairs - 22053 images; dressers - 7871 images; lamps - 32402 images; sofas - 4080 images; tables - 17100 images, with total of 90084 images. For every category, there are 17 interior styles:
- (a) Asian; (b) Beach; (c) Contemp; (d) Craftsman; (e) Eclectic; (f) Farmhouse; 
- (g) Industrial; (h) Media; (i) Midcentury; (j) Modern; (k) Rustic; (l) Scandinavian; 
- (m) Southwestern; (n) Traditional; (o) Transitional; (p) Tropical and (q) Victorian

You have three tasks in this project:
- **Task 1:** Classify images according to furniture category (beds; chairs; dressers; lamps; sofas; tables)
- **Task 2:** Recommend 10 furniture items in our dataset which is similar to the input furniture item image from users. You are required to define a metric of “similarity” between two furniture items.
- **Task 3:** (only for those aim HD) The extension of the model in Task 2, the recommended furniture items must be in the same interior styles with the style of the input images. In order to fulfill this task, you are required to build a model to recognize the style of a furniture item.

In [3]:
from PIL import Image
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import os
import hashlib
import shutil
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

In [2]:
dataset_folder = "../Data/Raw/Furniture_Data"

In [3]:
desired_size = (256, 256)

image_data = []
image_hashes = set()
image_color_hist = []

main_folder_name = os.path.basename(dataset_folder)

In [4]:
for root, dirs, files in os.walk(dataset_folder):
    for parent_folder in dirs:
        parent_folder_path = os.path.join(root, parent_folder)
        
        for filename in os.listdir(parent_folder_path):
            if filename == ".DS_Store":
                continue
                
            file_path = os.path.join(parent_folder_path, filename)
            
            if os.path.isdir(file_path):
                continue
                
            try:

                img = Image.open(file_path)
                resized_img = img.resize(desired_size)
                image_hash = hashlib.md5(resized_img.tobytes()).hexdigest()
                parent_folder_dir = os.path.dirname(parent_folder_path)
                parent_folder_name = os.path.basename(parent_folder_dir)
                
                # Ignore duplicates
                if image_hash not in image_hashes:
                    # Add new img to hash
                    image_hashes.add(image_hash)
                    subfolder_name = os.path.basename(parent_folder_path)
                    image_data.append((parent_folder_name, subfolder_name, resized_img))

            except Exception as e:
                print(f"Error loading image {file_path}: {e}")

In [5]:
print(len(image_data))
print(len(image_hashes))

85165
85165


In [6]:
# Only using these 2 only because I picked chairs back when we were doing the EDA and sofas is the smallest category
only_these_cats = ["chairs", "sofas"]
mini_dataset = [(category, style, img) for category, style, img in image_data if category in only_these_cats]

In [7]:
print(len(mini_dataset))

25924


In [22]:
# Preprocesses data: resize image, normalize
# Also implements batch sizes since the dataset's gonna be humongous.
# Instead of cramming all data from the dataset to train the model, each time, it'll take a portion of the data
# as big as the specified batch_size and use it for training.
class CustomDataset(tf.keras.utils.Sequence):
    def __init__(self, data, batch_size):
        self.data = data
        self.batch_size = batch_size
        self.indexes = np.arange(len(self.data))
        
        # Had to encode label because it was still "chairs" and "sofas"
        self.label_encoder = LabelEncoder()
        self.labels = [category for category, _, _ in self.data]
        self.labels_encoded = self.label_encoder.fit_transform(self.labels)
    
    def __len__(self):
        return len(self.data) // self.batch_size
    
    def __getitem__(self, idx):
        batch_indexes = self.indexes[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_data = [self.data[i] for i in batch_indexes]
        images, labels = [], []

        # Ignoring the furniture's style for now
        for category, _, image in batch_data:
            # Resized to 224 x 224 because ResNet likes it that way
            image = image.resize((224, 224))
            image = np.array(image) / 255.0
            images.append(image)
        labels = self.labels_encoded[batch_indexes]
        return np.array(images), np.array(labels)

In [23]:
# Split the dataset into training (60), validation (20), and testing (20)
train_data, remaining_data = train_test_split(mini_dataset, test_size=0.4, shuffle=True)
val_data, test_data = train_test_split(remaining_data, test_size=0.5, shuffle=True)

In [24]:
# Define batch size
batch_size = 32

# Data generators based on original dataset and batch size
train_generator = CustomDataset(train_data, batch_size)
val_generator = CustomDataset(val_data, batch_size)
test_generator = CustomDataset(test_data, batch_size)

In [25]:
# Define the model
# Might have to be tuned later on
model = tf.keras.applications.ResNet50(
    include_top=False,
    weights='imagenet',
    input_shape=(224, 224, 3)
)

In [26]:
for layer in model.layers:
    layer.trainable = False

layer_flat_dense = layers.Flatten()(model.output)
layer_flat_dense = layers.Dense(256, activation='relu')(layer_flat_dense)
output = layers.Dense(len(only_these_cats), activation='softmax')(layer_flat_dense)

model = models.Model(inputs=model.input, outputs=output)

In [27]:
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [28]:
# Train the model
num_epochs = 10
results = model.fit(train_generator, epochs=num_epochs, validation_data=val_generator)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x26a43447eb0>

In [None]:
plt.plot(results.history['accuracy'], label = 'train_acc')
plt.plot(results.history['val_accuracy'], label = 'val_acc')
plt.legend()
plt.show()

In [None]:
plt.plot(results.history['loss'], label = 'train_loss')
plt.plot(results.history['val_loss'], label = 'val_loss')
plt.legend()
plt.show()