# B1_Data_Preprocessing

This workbook creates images samples from the **"Attribute Prediction"** dataset of the **"DeepFashion"** database below:

"DeepFashion"<Link>(http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html)
\
"Attribute Prediction" <Link>(http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/AttributePrediction.html)

While workbook A creates a sample of **12,000** images, workbeek creates a sample of **1,500** images. A smaller sample may have the disadvantage of leading to **lower test accuracy**. But it may also require **less processing capacity** so that code can be tried out before moving to a larger sample.

In this section, **relevant images** are placed in **separate category folders** and **prepared** for further processing. 

This entails the following steps:

| No.    | Step                                          |
|:-------|:----------------------------------------------|
| B1.1   | Import Libraries                              |
| B1.2   | Analyze As-Is Repository Structure            |
| B1.3   | Create New Category Folders in New Repository |
| B1.4   | Copy Subfolders into New Repository           |
| B1.5   | Select Images                                 |
| B1.5.1 | *Select Images - Tops*                        |
| B1.5.2 | *Select Images - Skirts*                      |
| B1.5.3 | *Select Images - Dresses*                     |
| B1.5.4 | *Combine Tops, Skirts and Dresses*            |

The **final file structure** is set to be the following:
\
\
|--deepfashionextract2 (folder)
\
|--|--img (folder)
\
|--|--|--top (folder)
\
|--|--|--|--a01.jpg
\
|--|--|--|--b02.jpg
\
|--|--|--|--...
\
|--|--|--skirt (folder)
\
|--|--|--|--c01.jpg
\
|--|--|--|--d02.jpg
\
|--|--|--|--...
\
|--|--|--dress (folder)
\
|--|--|--|--e01.jpg
\
|--|--|--|--f02.jpg
\
|--|--|--|--...

## B1.1 Import Libraries 

In [1]:
#Import libraries
import shutil
import os
from os import listdir
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
import pickle

## B1.2 Analyze As-Is Repository Structure

The Deep Fashion Repository contains **4,807** folders.

In [2]:
#Find out number of folders 
search_path = '../DeepFashion/img/'
root, dirs, files = next(os.walk(search_path), ([],[],[]))
len(dirs)

4807

The folders have the following **names**.

In [3]:
#Get folder names
dirs

['1981_Graphic_Ringer_Tee',
 '2-in-1_Space_Dye_Athletic_Tank',
 '25_Mesh-Paneled_Jersey_Dress',
 '36_Plaid_Shirt_Dress',
 'Above_Average_Linen_Tee',
 'Abstract-Embroidered_Glitter_Shorts',
 'Abstract-Geo_Print_Mini_Skirt',
 'Abstract-Paneled_Running_Shorts',
 'Abstract-Patterned_Blouse',
 'Abstract-Plaid_Ruffled_Bell_Sleeve_Top',
 'Abstract-Printed_Capri_Leggings',
 'Abstract-Quilted_Drawstring_Hoodie',
 'Abstract-Striped_Ladder-Back_Dress',
 'Abstract-Stripe_Fuzzy_Sweater',
 'Abstract-Trimmed_Knit_Tank',
 'Abstract-Watercolor_Print_Blouse',
 'Abstract_Animal_Print_Dress',
 'Abstract_Arrow_Flounce_Romper',
 'Abstract_Asymmetrical_Hem_Top',
 'Abstract_Bodycon_Dress',
 'Abstract_Brushstroke_Pocket_Top',
 'Abstract_Brushstroke_Print_Pencil_Skirt',
 'Abstract_Brushstroke_Sweater',
 'Abstract_Buttoned_Top',
 'Abstract_Chevron_Draped_Dress',
 'Abstract_Chevron_Henley_Dress',
 'Abstract_Chevron_Print_Culottes',
 'Abstract_Chevron_Print_Kimono',
 'Abstract_Chevron_Print_Shorts',
 'Abstract_Che

## B1.3 Create New Category Folders in New Repository

A **new repository 'deepfashionextract'** is created to allow for better file control. In the new repository, subfolders are grouped as **'top'**, **'skirt'** and **'dress'**.

In [4]:
#Create category folders 
dirName1 = '../deepfashionextract2/img/top'
dirName2 = '../deepfashionextract2/img/skirt'
dirName3 = '../deepfashionextract2/img/dress'

try: 
    os.mkdir(dirName1)
    print('Directory', dirName1, 'created')     
except FileExistsError:
    print('Directory', dirName1, 'already exists')   
        
try: 
    os.mkdir(dirName2)
    print('Directory', dirName2, 'created') 
except FileExistsError:
    print(f'Directory', dirName2, 'already exists')    
    
try: 
    os.mkdir(dirName3)
    print('Directory', dirName3, 'created') 
except FileExistsError:
    print(f'Directory', dirName3, 'already exists')  

Directory ../deepfashionextract2/img/top already exists
Directory ../deepfashionextract2/img/skirt already exists
Directory ../deepfashionextract2/img/dress already exists


## B1.4 Copy Subfolders into New Repository      

Files are **copied** into the new repository.

In [5]:
#Create copies
src = '../DeepFashion/img/'

for i in range(len(dirs)):
    s = os.path.join(src, dirs[i])

    if dirs[i][-3:] == 'Top':
        dst1 = os.path.join(dirName1, dirs[i])       
    
        try:
            shutil.copytree(s, dst1) 
            print('Files in', dst1, 'copied') 
        except FileExistsError:
            print(f'Files in', dst1, 'already exist')    
    
    elif dirs[i][-5:] == 'Skirt':
        dst2 = os.path.join(dirName2, dirs[i]) 
    
        try:
            shutil.copytree(s, dst2) 
            print('Files in', dst2, 'copied') 
        except FileExistsError:
            print(f'Files in', dst2, 'already exist')   
            
    elif dirs[i][-5:] == 'Dress':
        dst3 = os.path.join(dirName3, dirs[i]) 
    
        try:
            shutil.copytree(s, dst3) 
            print('Files in', dst3, 'copied') 
        except FileExistsError:
            print(f'Files in', dst3, 'already exist')   

Files in ../deepfashionextract2/img/dress\25_Mesh-Paneled_Jersey_Dress already exist
Files in ../deepfashionextract2/img/dress\36_Plaid_Shirt_Dress already exist
Files in ../deepfashionextract2/img/skirt\Abstract-Geo_Print_Mini_Skirt already exist
Files in ../deepfashionextract2/img/top\Abstract-Plaid_Ruffled_Bell_Sleeve_Top already exist
Files in ../deepfashionextract2/img/dress\Abstract-Striped_Ladder-Back_Dress already exist
Files in ../deepfashionextract2/img/dress\Abstract_Animal_Print_Dress already exist
Files in ../deepfashionextract2/img/top\Abstract_Asymmetrical_Hem_Top already exist
Files in ../deepfashionextract2/img/dress\Abstract_Bodycon_Dress already exist
Files in ../deepfashionextract2/img/top\Abstract_Brushstroke_Pocket_Top already exist
Files in ../deepfashionextract2/img/skirt\Abstract_Brushstroke_Print_Pencil_Skirt already exist
Files in ../deepfashionextract2/img/top\Abstract_Buttoned_Top already exist
Files in ../deepfashionextract2/img/dress\Abstract_Chevron_Drap

Files in ../deepfashionextract2/img/top\Amour_Peplum_Top already exist
Files in ../deepfashionextract2/img/top\Angel-Sleeved_Top already exist
Files in ../deepfashionextract2/img/top\Art_Nouveau_Floral_Print_Top already exist
Files in ../deepfashionextract2/img/skirt\Asymmetrical_Chiffon_Skirt already exist
Files in ../deepfashionextract2/img/top\Asymmetrical_Hem_Top already exist
Files in ../deepfashionextract2/img/top\Asymmetrical_Loose_Knit_Top already exist
Files in ../deepfashionextract2/img/skirt\Asymmetrical_Pencil_Skirt already exist
Files in ../deepfashionextract2/img/top\Asymmetrical_Twist-Front_Top already exist
Files in ../deepfashionextract2/img/top\Asymmetric_Batwing_Top already exist
Files in ../deepfashionextract2/img/dress\Asymmetric_Printed_Wrap_Dress already exist
Files in ../deepfashionextract2/img/skirt\Asymmetric_Rope-Textured_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Athletic-Striped_Pencil_Skirt already exist
Files in ../deepfashionextract2/i

Files in ../deepfashionextract2/img/top\Boho_Babe_Crop_Top already exist
Files in ../deepfashionextract2/img/top\Boho_Suite_Peasant_Top already exist
Files in ../deepfashionextract2/img/dress\Bold_Moves_Bodycon_Dress already exist
Files in ../deepfashionextract2/img/dress\Bon_Voyage_Maxi_Dress already exist
Files in ../deepfashionextract2/img/top\Botanical_Babe_Top already exist
Files in ../deepfashionextract2/img/dress\Botanical_Floral_Shift_Dress already exist
Files in ../deepfashionextract2/img/dress\Botanical_Print_A-Line_Dress already exist
Files in ../deepfashionextract2/img/dress\Botanical_Print_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/dress\Botanical_Print_Shift_Dress already exist
Files in ../deepfashionextract2/img/top\Bout_That_Graphic_Crop_Top already exist
Files in ../deepfashionextract2/img/top\Bow-Back_Abstract_Print_Top already exist
Files in ../deepfashionextract2/img/top\Bow-Back_Cutout_Top already exist
Files in ../deepfashionextract2/img/t

Files in ../deepfashionextract2/img/top\Cat_Print_Crop_Top already exist
Files in ../deepfashionextract2/img/top\Chained_Cutout-Back_Top already exist
Files in ../deepfashionextract2/img/top\Chambray_Halter_Top already exist
Files in ../deepfashionextract2/img/dress\Chambray_Peasant_Dress already exist
Files in ../deepfashionextract2/img/dress\Chambray_Shirt_Dress already exist
Files in ../deepfashionextract2/img/top\Chambray_Tank_Top already exist
Files in ../deepfashionextract2/img/dress\Chambray_Y-Back_Cami_Dress already exist
Files in ../deepfashionextract2/img/top\Champagne_Slub_Knit_Top already exist
Files in ../deepfashionextract2/img/dress\Charming_Boucl&eacute;_Dress already exist
Files in ../deepfashionextract2/img/dress\Checked_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/dress\Checkered_Metallic_Dress already exist
Files in ../deepfashionextract2/img/top\Chenille_&_Metallic_Knit_Crop_Top already exist
Files in ../deepfashionextract2/img/skirt\Chenille

Files in ../deepfashionextract2/img/dress\Crepe_Buttoned_Pocket_Dress already exist
Files in ../deepfashionextract2/img/dress\Crepe_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/dress\Crepe_Ladder-Cutout_Sheath_Dress already exist
Files in ../deepfashionextract2/img/skirt\Crepe_Midi_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Crepe_Pencil_Skirt already exist
Files in ../deepfashionextract2/img/dress\Crepe_Shift_Dress already exist
Files in ../deepfashionextract2/img/dress\Crepe_Woven_A-line_Dress already exist
Files in ../deepfashionextract2/img/dress\Crepe_Woven_Cami_Dress already exist
Files in ../deepfashionextract2/img/dress\Crepe_Woven_Shift_Dress already exist
Files in ../deepfashionextract2/img/skirt\Crepe_Woven_Skater_Skirt already exist
Files in ../deepfashionextract2/img/dress\Crepe_Woven_Surplice_Dress already exist
Files in ../deepfashionextract2/img/dress\Crinkled_Chiffon_Cami_Dress already exist
Files in ../deepfashionextract2/img/d

Files in ../deepfashionextract2/img/dress\Damask_Pattern_A-Line_Dress already exist
Files in ../deepfashionextract2/img/dress\Dancing_Darling_Skater_Dress already exist
Files in ../deepfashionextract2/img/dress\Daring_Mesh-Trimmed_Bodycon_Dress already exist
Files in ../deepfashionextract2/img/dress\Daring_Mesh_Panel_Skater_Dress already exist
Files in ../deepfashionextract2/img/dress\Darted_Sheath_Dress already exist
Files in ../deepfashionextract2/img/dress\Daydreamer_Crocheted_Dress already exist
Files in ../deepfashionextract2/img/dress\Day_Trip_Tribal_Print_Dress already exist
Files in ../deepfashionextract2/img/dress\Deep_V-Neck_Halter_Dress already exist
Files in ../deepfashionextract2/img/dress\Deep_V-Neck_Pleated_Dress already exist
Files in ../deepfashionextract2/img/skirt\Denim_A-Line_Skirt already exist
Files in ../deepfashionextract2/img/dress\Denim_Bodycon_Dress already exist
Files in ../deepfashionextract2/img/dress\Denim_Cami_Dress already exist
Files in ../deepfashione

Files in ../deepfashionextract2/img/dress\Fan_Print_A-Line_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux-Wrap_Surplice_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux-Wrap_Surplice_Printed_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux_Gem_Shift_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux_Leather_&_Tulle_Combo_Dress already exist
Files in ../deepfashionextract2/img/skirt\Faux_Leather_A-Line_Skirt already exist
Files in ../deepfashionextract2/img/dress\Faux_Leather_Bodycon_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux_Leather_Cami_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux_Leather_Combo_Dress already exist
Files in ../deepfashionextract2/img/dress\Faux_Leather_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/skirt\Faux_Leather_Flared_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Faux_Leather_Fringe_Skirt already exist
Files in

Files in ../deepfashionextract2/img/dress\Floral_Print_Skater_Dress already exist
Files in ../deepfashionextract2/img/skirt\Floral_Print_Skater_Skirt already exist
Files in ../deepfashionextract2/img/dress\Floral_Print_Slit_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Floral_Print_Smocked_Dress already exist
Files in ../deepfashionextract2/img/dress\Floral_Print_Smock_Dress already exist
Files in ../deepfashionextract2/img/dress\Floral_Print_Strapless_Dress already exist
Files in ../deepfashionextract2/img/dress\Floral_Print_Strappy_Dress already exist
Files in ../deepfashionextract2/img/dress\Floral_Print_Surplice_Dress already exist
Files in ../deepfashionextract2/img/skirt\Floral_Scuba_Knit_Skater_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Floral_Scuba_Knit_Skirt already exist
Files in ../deepfashionextract2/img/dress\Floral_Side-Slit_Maxi_Dress already exist
Files in ../deepfashionextract2/img/skirt\Floral_Stripe_Midi_Skirt already exist
Fil

Files in ../deepfashionextract2/img/skirt\Knit_Skater_Skirt already exist
Files in ../deepfashionextract2/img/dress\Knit_Trapeze_Dress already exist
Files in ../deepfashionextract2/img/dress\Knotted_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Knotted_Sheath_Dress already exist
Files in ../deepfashionextract2/img/dress\Knotted_Stripe_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Cutout_Sheath_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Paneled_Babydoll_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Paneled_Cami_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Paneled_Combo_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Paneled_Crepe_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Paneled_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/dress\Lace-Paneled_Floral_Print_Dress already exist
Files in ../deepfashionextract2/img

Files in ../deepfashionextract2/img/dress\Marled_Ringer_Tank_Dress already exist
Files in ../deepfashionextract2/img/dress\Marled_Side-Slit_Dress already exist
Files in ../deepfashionextract2/img/dress\Marled_Side-Slit_Maxi_Dress already exist
Files in ../deepfashionextract2/img/skirt\Marled_Skater_Skirt already exist
Files in ../deepfashionextract2/img/dress\Marled_T-Shirt_Dress already exist
Files in ../deepfashionextract2/img/dress\Matchstick_Print_Surplice_Dress already exist
Files in ../deepfashionextract2/img/dress\Matelass&eacute;_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/skirt\Matelass&eacute;_Pencil_Skirt already exist
Files in ../deepfashionextract2/img/dress\Matelass&eacute;_Rose_Skater_Dress already exist
Files in ../deepfashionextract2/img/dress\Matelass&eacute;_Skater_Dress already exist
Files in ../deepfashionextract2/img/skirt\Matelass&eacute;_Stripe_Panel_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Mateless&eacute;_Origami_Sk

Files in ../deepfashionextract2/img/dress\Painted_Floral_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Painted_Floral_Slip_Dress already exist
Files in ../deepfashionextract2/img/dress\Paint_It_Red_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/dress\Paint_It_Red_Lace_Midi_Dress already exist
Files in ../deepfashionextract2/img/skirt\Paint_It_Red_Marble_Print_Skirt already exist
Files in ../deepfashionextract2/img/dress\Paint_It_Red_Morning_Petal_Dress already exist
Files in ../deepfashionextract2/img/dress\Paint_It_Red_Valentina_Dress already exist
Files in ../deepfashionextract2/img/dress\Paint_It_Red_Wanderlust_Dress already exist
Files in ../deepfashionextract2/img/dress\Paisley-Embroidered_A-Line_Dress already exist
Files in ../deepfashionextract2/img/dress\Paisley_Cami_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Paisley_Cami_Trapeze_Dress already exist
Files in ../deepfashionextract2/img/skirt\Paisley_Crochet_Skirt 

Files in ../deepfashionextract2/img/dress\Pom_Trim_Cutout_Dress already exist
Files in ../deepfashionextract2/img/dress\Posh_Flounce_Mini_Dress already exist
Files in ../deepfashionextract2/img/dress\Posh_Tube_Dress already exist
Files in ../deepfashionextract2/img/dress\Power_Brunch_Flounce_Dress already exist
Files in ../deepfashionextract2/img/dress\Printed_Drop-Waist_Cami_Dress already exist
Files in ../deepfashionextract2/img/dress\Printed_Flutter-Sleeve_Dress already exist
Files in ../deepfashionextract2/img/dress\Quilted_Drop-Waist_Dress already exist
Files in ../deepfashionextract2/img/skirt\Quilted_Faux_Leather_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Quilted_Faux_Leather_Trim_Skirt already exist
Files in ../deepfashionextract2/img/dress\Racerback_Bodycon_Dress already exist
Files in ../deepfashionextract2/img/dress\Racerback_Bodycon_Midi_Dress already exist
Files in ../deepfashionextract2/img/dress\Racerback_Chiffon_Maxi_Dress already exist
Files in ../de

Files in ../deepfashionextract2/img/skirt\Self-Tie_Knit_Skirt already exist
Files in ../deepfashionextract2/img/dress\Self-Tie_Shift_Dress already exist
Files in ../deepfashionextract2/img/dress\Semi-Sheer_Curved_Hem_Dress already exist
Files in ../deepfashionextract2/img/dress\Semi-Sheer_Ornate_Crochet_Dress already exist
Files in ../deepfashionextract2/img/dress\Semi-Sheer_Ribbed_Midi_Dress already exist
Files in ../deepfashionextract2/img/dress\Sequined_Abstract_Pattern_Dress already exist
Files in ../deepfashionextract2/img/skirt\Sequined_Chevron-Pattern_Skirt already exist
Files in ../deepfashionextract2/img/skirt\Sequined_Chevron_Bodycon_Skirt already exist
Files in ../deepfashionextract2/img/dress\Sequined_Chiffon_Maxi_Dress already exist
Files in ../deepfashionextract2/img/skirt\Sequined_Geo-Embroidered_Mini_Skirt already exist
Files in ../deepfashionextract2/img/dress\Sequined_Geo_Pattern_Dress already exist
Files in ../deepfashionextract2/img/skirt\Sequined_Mini_Skirt already

Files in ../deepfashionextract2/img/dress\Strapless_Sweetheart_Dress already exist
Files in ../deepfashionextract2/img/dress\Strapless_Tribal_Print_Dress already exist
Files in ../deepfashionextract2/img/dress\Strapless_Tribal_Print_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Strapless_X-Ray_Roses_Dress already exist
Files in ../deepfashionextract2/img/dress\Strappy-Back_Pleated_Dress already exist
Files in ../deepfashionextract2/img/dress\Strappy_Chiffon_Maxi_Dress already exist
Files in ../deepfashionextract2/img/dress\Strappy_Fit_&_Flare_Dress already exist
Files in ../deepfashionextract2/img/dress\Strappy_Floral-Embroidered_Dress already exist
Files in ../deepfashionextract2/img/dress\Strappy_Flounce_Gauze_Dress already exist
Files in ../deepfashionextract2/img/dress\Strappy_Tribal_Print_Maxi_Dress already exist
Files in ../deepfashionextract2/img/skirt\Stretch-Knit_A-Line_Skirt already exist
Files in ../deepfashionextract2/img/dress\Stretch-Knit_Bodycon_Dres

## B1.5	Select Images

Due to **limited processing capacities**, the number of pictures has to be reduced. A **selection** has to be taken for each category from the following pool:

| Category | Number of Images | Selected Images |
|:---------|:-----------------|:----------------|
| Tops     | 10,078           | 500             |
| Skirt    | 12,742           | 500             |
| Dress    | 60,768           | 500             |

### B1.5.1 Select Images -- Tops

In [6]:
#Find image names in 'top'
file_root_list = []
for root, dirs, files in os.walk(dirName1):
    for dr in dirs:
        dr_root = root + '/' + dr
        for file in os.listdir(dr_root):
            file_root = dr_root + '/' + file
            file_root_list.append(file_root)
file_root_list

['../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000001.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000002.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000003.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000004.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000005.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000006.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000007.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000008.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000009.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000010.jpg',
 '../deepfashionextract2/img/top/Abstract-Plaid_Ruffled_Bell_Sleeve_Top/img_00000011.jpg',

In [7]:
#Clean and create a unique image identifier
new_file_root_list = []

for i in file_root_list:
    i = (i.replace('../deepfashionextract2/img/', '')
          .split('/'))
    i[2] = '../deepfashionextract2/img/' + 'top/' + i[1] + '/' + i[2]
    new_file_root_list.append(i)
    
#Create a dataframe
df = pd.DataFrame(new_file_root_list)
df = df.rename(columns = {0:'category', 1:'style', 2:'image_name'})
df.head()

Unnamed: 0,category,style,image_name
0,top,Abstract-Plaid_Ruffled_Bell_Sleeve_Top,../deepfashionextract2/img/top/Abstract-Plaid_...
1,top,Abstract-Plaid_Ruffled_Bell_Sleeve_Top,../deepfashionextract2/img/top/Abstract-Plaid_...
2,top,Abstract-Plaid_Ruffled_Bell_Sleeve_Top,../deepfashionextract2/img/top/Abstract-Plaid_...
3,top,Abstract-Plaid_Ruffled_Bell_Sleeve_Top,../deepfashionextract2/img/top/Abstract-Plaid_...
4,top,Abstract-Plaid_Ruffled_Bell_Sleeve_Top,../deepfashionextract2/img/top/Abstract-Plaid_...


The dataset contains **10,078** images of tops.

In [8]:
#See numbers of rows and colums
df.shape

(10078, 3)

In [9]:
#Split features
df['style'] = df['style'].str.split('_')
df.head()

Unnamed: 0,category,style,image_name
0,top,"[Abstract-Plaid, Ruffled, Bell, Sleeve, Top]",../deepfashionextract2/img/top/Abstract-Plaid_...
1,top,"[Abstract-Plaid, Ruffled, Bell, Sleeve, Top]",../deepfashionextract2/img/top/Abstract-Plaid_...
2,top,"[Abstract-Plaid, Ruffled, Bell, Sleeve, Top]",../deepfashionextract2/img/top/Abstract-Plaid_...
3,top,"[Abstract-Plaid, Ruffled, Bell, Sleeve, Top]",../deepfashionextract2/img/top/Abstract-Plaid_...
4,top,"[Abstract-Plaid, Ruffled, Bell, Sleeve, Top]",../deepfashionextract2/img/top/Abstract-Plaid_...


**'Boxy', 'Abstract', 'Classic' and 'Beaded'** are frequent styles.

In [10]:
#Find out frequent styles in style1
df['style1'] = df['style'].apply(lambda x: x[0])
df['style1'].value_counts()

Boxy         2102
Abstract     1859
Classic       799
Beaded        366
Boat          284
             ... 
Botanical      32
Bout           32
Caged          31
Art            29
Bleached       27
Name: style1, Length: 65, dtype: int64

**'Print', 'Floral', and 'Lace'** are frequent styles.

In [11]:
#Find out frequent styles in style2
df['style2'] = df['style'].apply(lambda x: x[1])
df['style2'].value_counts()

Print       952
Floral      406
Lace        386
Crop        364
Neck        284
           ... 
Basket       28
Nautical     27
Plaid        24
Cuffed       23
Marled       19
Name: style2, Length: 109, dtype: int64

**'Print', 'Paisley', 'Geo' (Geometrical), 'Tile' and 'Floral'** are frequent styles.

In [12]:
#Find out frequent feature combinations of 'Abstract' 
df.loc[df['style1'] == 'Abstract']['style2'].value_counts()

Print              643
Paisley            134
Geo                120
Tile               114
Floral             103
Stripe              99
Grid                87
Dot                 72
Asymmetrical        52
Zippered            48
Striped             46
Satin-Front         43
Zigzag              40
Chevron             39
Brushstroke         37
Cutout-Back         34
Grid-Patterned      31
Buttoned            31
Varsity-Striped     30
Mandala             29
Slub                27
Name: style2, dtype: int64

**'Crop', 'Scoop', 'Dolman' and 'Striped'** are frequent styles.

In [13]:
#Find out frequent feature combinations of 'Classic' 
df.loc[df['style1'] == 'Classic']['style2'].value_counts()

Crop                84
Scoop               77
Dolman              69
Striped             66
Long                64
Halter              63
Knit                61
Thermal             58
Georgette           57
Heathered           44
Ribbed              43
Woven               42
Stripe-Patterned    41
Slub                30
Name: style2, dtype: int64

**'Striped', 'Lace', 'Crepe', 'Floral' and 'Tank'** are frequent styles.

In [14]:
#Find out frequent feature combinations of 'Boxy' 
df.loc[df['style1'] == 'Boxy']['style2'].value_counts()

Striped            114
Lace                97
Crepe               88
Floral              81
Tank                77
Cropped             73
Pocket              67
Eyelash             66
Palm                63
Open                63
Denim               61
V-Neck              61
Polka               59
Textured            58
Slit-Sleeve         55
Crochet             51
Embroidered         50
Grid                50
Dolman              49
Belted              49
Open-Knit           49
Cutout              47
Tribal              45
Ribbed              45
Heathered           42
Texture-Striped     42
Pintucked           41
Beaded              41
Woven               41
Illusion            41
Chiffon             40
Cuff-Sleeve         37
Mesh                36
Medallion           36
Chambray            35
Knit                31
Basket              28
Nautical            27
Plaid               24
Cuffed              23
Marled              19
Name: style2, dtype: int64

Based on the analysis above, **7 styles** are selected based on their **popularity** as well as **level of recognition**.

In [15]:
#Select features
df['selected_style'] = 0
df.loc[df['style2']=='Print', 'selected_style'] = 'Print'
df.loc[df['style2']=='Floral', 'selected_style'] = 'Floral'
df.loc[df['style2']=='Lace', 'selected_style'] = 'Lace'
df.loc[df['style2']=='Paisley', 'selected_style'] = 'Paisley'
df.loc[df['style2']=='Geo', 'selected_style'] = 'Geo'
df.loc[df['style2']=='Striped', 'selected_style'] = 'Striped'
df.loc[df['style2']=='Tank', 'selected_style'] = 'Tank'

In [16]:
#Select relevant rows and columns
df = df[['category', 'selected_style', 'image_name']]
df = df.dropna()
df = df.loc[df['selected_style']!=0].reset_index(drop=True)
df.head()

Unnamed: 0,category,selected_style,image_name
0,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
1,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
2,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
3,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
4,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...


The current selection is at **2,523** images of tops.

In [17]:
#Get numbers of images
df.groupby('selected_style')['image_name'].count().sum()

2523

The following shows the split of **image numbers per style**.

In [18]:
#Get numbers of images per style
df.groupby('selected_style')['image_name'].count()

selected_style
Floral     406
Geo        177
Lace       386
Paisley    134
Print      952
Striped    226
Tank       242
Name: image_name, dtype: int64

A **further selection** is undertaken to arrive at exactly 500. In this process, **Geo** is discarded.

In [19]:
#Create separate dataframes
df_floral = df.loc[df['selected_style']=='Floral']
df_lace = df.loc[df['selected_style']=='Lace']
df_print_d = df.loc[df['selected_style']=='Print']
df_striped = df.loc[df['selected_style']=='Striped']
df_tank = df.loc[df['selected_style']=='Tank']

#Number of rows reduced
df_floral2 = df_floral.iloc[:100,:]
df_lace2 = df_lace.iloc[:100,:]
df_print_d2 = df_print_d.iloc[:100,:]
df_striped2 = df_striped.iloc[:100,:]
df_tank2 = df_tank.iloc[:100,:]

In [20]:
#Concatenate separate dataframes
new_df = pd.concat([df_floral2, df_lace2, df_print_d2, df_striped2, df_tank2])
new_df.head()

Unnamed: 0,category,selected_style,image_name
0,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
1,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
2,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
3,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
4,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...


The new selection contains **500** images of tops.

In [21]:
new_df.shape

(500, 3)

The following shows the new split of **image numbers per style**.

In [22]:
#Get numbers of images per style
new_df.groupby('selected_style')['image_name'].count()

selected_style
Floral     100
Lace       100
Print      100
Striped    100
Tank       100
Name: image_name, dtype: int64

In [23]:
#Save to csv
new_df.to_csv('../deepfashionextract3/deepfashion_tops.csv')

### B1.5.2 Select Images -- Skirts

In [24]:
#Find image names in 'skirts'
file_root_list2 = []
for root, dirs, files in os.walk(dirName2):
    for dr in dirs:
        dr_root = root + '/' + dr
        for file in os.listdir(dr_root):
            file_root = dr_root + '/' + file
            file_root_list2.append(file_root)
file_root_list2

['../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000001.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000002.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000003.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000004.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000005.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000006.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000007.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000008.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000009.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000010.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_00000011.jpg',
 '../deepfashionextract2/img/skirt/Abstract-Geo_Print_Mini_Skirt/img_0000001

In [25]:
#Clean and create a unique image identifier
new_file_root_list2 = []

for i in file_root_list2:
    i = (i.replace('../deepfashionextract2/img/', '')
          .split('/'))
    i[2] = '../deepfashionextract2/img/' + 'skirt/' + i[1] + '/' + i[2]
    new_file_root_list2.append(i)
    
#Create a dataframe
df2 = pd.DataFrame(new_file_root_list2)
df2 = df2.rename(columns = {0:'category', 1:'style', 2:'image_name'})
df2.head()

Unnamed: 0,category,style,image_name
0,skirt,Abstract-Geo_Print_Mini_Skirt,../deepfashionextract2/img/skirt/Abstract-Geo_...
1,skirt,Abstract-Geo_Print_Mini_Skirt,../deepfashionextract2/img/skirt/Abstract-Geo_...
2,skirt,Abstract-Geo_Print_Mini_Skirt,../deepfashionextract2/img/skirt/Abstract-Geo_...
3,skirt,Abstract-Geo_Print_Mini_Skirt,../deepfashionextract2/img/skirt/Abstract-Geo_...
4,skirt,Abstract-Geo_Print_Mini_Skirt,../deepfashionextract2/img/skirt/Abstract-Geo_...


The dataset contains **12,742** images of skirts.

In [26]:
#See numbers of rows and colums
df2.shape

(12742, 3)

In [27]:
#Split features
df2['style'] = df2['style'].str.split('_')
df2.head()

Unnamed: 0,category,style,image_name
0,skirt,"[Abstract-Geo, Print, Mini, Skirt]",../deepfashionextract2/img/skirt/Abstract-Geo_...
1,skirt,"[Abstract-Geo, Print, Mini, Skirt]",../deepfashionextract2/img/skirt/Abstract-Geo_...
2,skirt,"[Abstract-Geo, Print, Mini, Skirt]",../deepfashionextract2/img/skirt/Abstract-Geo_...
3,skirt,"[Abstract-Geo, Print, Mini, Skirt]",../deepfashionextract2/img/skirt/Abstract-Geo_...
4,skirt,"[Abstract-Geo, Print, Mini, Skirt]",../deepfashionextract2/img/skirt/Abstract-Geo_...


**'Pleated', 'Floral', 'Faux', 'Classic' and 'Buttoned'** are frequent styles.

In [28]:
#Find out frequent styles in style1
df2['style1'] = df2['style'].apply(lambda x: x[0])
df2['style1'].value_counts()

Pleated             1171
Floral               801
Faux                 550
Classic              412
Buttoned             354
                    ... 
Damask                17
Self-Tie              15
Front                 15
Stretch               12
Mateless&eacute;       6
Name: style1, Length: 115, dtype: int64

**'Print', 'Pencil', 'Mini', 'Lace' and 'Leather'** are frequent styles.

In [29]:
#Find out frequent styles in style2
df2['style2'] = df2['style'].apply(lambda x: x[1])
df2['style2'].value_counts()

Print               1439
Pencil               931
Mini                 881
Lace                 674
Leather              550
                    ... 
Zippered              20
Suspender             20
Crepe                 19
Matelass&eacute;      17
Origami                6
Name: style2, Length: 88, dtype: int64

**'Skirt', 'Mini', 'Maxi' and 'Pencil'** are frequent styles.

In [30]:
#Find out frequent styles in style3
df2['style3'] = df2['style'].apply(lambda x: x[2])
df2['style3'].value_counts()

Skirt               5501
Mini                1266
Maxi                 916
Pencil               878
Skater               782
Print                533
Midi                 352
A-Line               325
Dot                  279
Leather              274
Denim                186
Knit                 170
Pleated              105
Zippered             105
Chiffon               84
Flared                70
Fringe                67
Progress              60
Crepe                 59
Plaid                 56
Bodycon               54
Scuba                 52
Gauze                 48
Linen                 47
Bandage               45
Faux                  39
Stripe                37
Leather-Trimmed       37
Corduroy              37
Metallic              36
Twirly                35
Panel                 34
Overlay               33
Knee-Length           28
Matelass&eacute;      27
Red                   24
Heathered             21
Layered               20
Drawstring            20
Name: style3, dtype: int6

Based on the analysis above, **5 styles** are selected based on their **popularity** as well as **level of recognition** to yield a sample size of 500.

In [31]:
#Select features
df2['selected_style'] = 0
df2.loc[df2['style1']=='Pleated', 'selected_style'] = 'Pleated'
df2.loc[df2['style1']=='Floral', 'selected_style'] = 'Floral'
df2.loc[df2['style1']=='Faux', 'selected_style'] = 'Faux'
df2.loc[df2['style1']=='Classic', 'selected_style'] = 'Classic'
df2.loc[df2['style1']=='Buttoned', 'selected_style'] = 'Buttoned'

In [32]:
#Select relevant rows and columns
df2 = df2[['category', 'selected_style', 'image_name']]
df2 = df2.dropna()
df2 = df2.loc[df2['selected_style']!=0].reset_index(drop=True)
df2.head()

Unnamed: 0,category,selected_style,image_name
0,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
1,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
2,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
3,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
4,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...


The selection is currently at **3,288** images of skirts.

In [33]:
#Get numbers of images
df2.groupby('selected_style')['image_name'].count().sum()

3288

The following shows the split of **image numbers per style**.

In [34]:
#Get numbers of images per style
df2.groupby('selected_style')['image_name'].count()

selected_style
Buttoned     354
Classic      412
Faux         550
Floral       801
Pleated     1171
Name: image_name, dtype: int64

A **further selection** is undertaken to arrive at exactly 500. 

In [35]:
#Create separate dataframes
df2_buttoned = df2.loc[df2['selected_style']=='Buttoned']
df2_classic = df2.loc[df2['selected_style']=='Classic']
df2_faux = df2.loc[df2['selected_style']=='Faux']
df2_floral = df2.loc[df2['selected_style']=='Floral']
df2_pleated = df2.loc[df2['selected_style']=='Pleated']

In [36]:
#Number of rows investigated
a2, b2 = df2_buttoned.shape
c2, d2 = df2_classic.shape
e2, f2 = df2_faux.shape
g2, h2 = df2_floral.shape
i2, j2 = df2_pleated.shape

#Number of rows reduced
df2_buttoned2 = df2_buttoned.iloc[:100,:]
df2_classic2 = df2_classic.iloc[:100,:]
df2_faux2 = df2_faux.iloc[:100,:]
df2_floral2 = df2_floral.iloc[:100,:]
df2_pleated2 = df2_pleated.iloc[:100,:]

In [37]:
#Concatenate separate dataframes
new_df2 = pd.concat([df2_buttoned2, df2_classic2, df2_faux2, df2_floral2, df2_pleated2])
new_df2.head()

Unnamed: 0,category,selected_style,image_name
0,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
1,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
2,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
3,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...
4,skirt,Buttoned,../deepfashionextract2/img/skirt/Buttoned_Dais...


The new selection contains **500** images of skirts.

In [38]:
new_df2.shape

(500, 3)

The following shows the new split of **image numbers per style**.

In [39]:
#Get numbers of images per style
new_df2.groupby('selected_style')['image_name'].count()

selected_style
Buttoned    100
Classic     100
Faux        100
Floral      100
Pleated     100
Name: image_name, dtype: int64

In [40]:
#Save to csv
new_df2.to_csv('../deepfashionextract3/deepfashion_skirts.csv')

### B1.5.3 Select Images -- Dresses

In [41]:
#Find image names in 'skirts'
file_root_list3 = []
for root, dirs, files in os.walk(dirName3):
    for dr in dirs:
        dr_root = root + '/' + dr
        for file in os.listdir(dr_root):
            file_root = dr_root + '/' + file
            file_root_list3.append(file_root)
file_root_list3

['../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000001.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000002.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000003.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000004.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000005.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000006.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000007.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000008.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000009.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000010.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000011.jpg',
 '../deepfashionextract2/img/dress/25_Mesh-Paneled_Jersey_Dress/img_00000012.jpg',
 '..

In [42]:
#Clean and create a unique image identifier
new_file_root_list3 = []

for i in file_root_list3:
    i = (i.replace('../deepfashionextract2/img/', '')
          .split('/'))
    i[2] = '../deepfashionextract2/img/' + 'dress/' + i[1] + '/' + i[2]
    new_file_root_list3.append(i)
    
#Create a dataframe
df3 = pd.DataFrame(new_file_root_list3)
df3 = df3.rename(columns = {0:'category', 1:'style', 2:'image_name'})
df3.head()

Unnamed: 0,category,style,image_name
0,dress,25_Mesh-Paneled_Jersey_Dress,../deepfashionextract2/img/dress/25_Mesh-Panel...
1,dress,25_Mesh-Paneled_Jersey_Dress,../deepfashionextract2/img/dress/25_Mesh-Panel...
2,dress,25_Mesh-Paneled_Jersey_Dress,../deepfashionextract2/img/dress/25_Mesh-Panel...
3,dress,25_Mesh-Paneled_Jersey_Dress,../deepfashionextract2/img/dress/25_Mesh-Panel...
4,dress,25_Mesh-Paneled_Jersey_Dress,../deepfashionextract2/img/dress/25_Mesh-Panel...


The dataset contains **60,768** images of dresses.

In [43]:
#See numbers of rows and colums
df3.shape

(60768, 3)

In [44]:
#Split features
df3['style'] = df3['style'].str.split('_')
df3.head()

Unnamed: 0,category,style,image_name
0,dress,"[25, Mesh-Paneled, Jersey, Dress]",../deepfashionextract2/img/dress/25_Mesh-Panel...
1,dress,"[25, Mesh-Paneled, Jersey, Dress]",../deepfashionextract2/img/dress/25_Mesh-Panel...
2,dress,"[25, Mesh-Paneled, Jersey, Dress]",../deepfashionextract2/img/dress/25_Mesh-Panel...
3,dress,"[25, Mesh-Paneled, Jersey, Dress]",../deepfashionextract2/img/dress/25_Mesh-Panel...
4,dress,"[25, Mesh-Paneled, Jersey, Dress]",../deepfashionextract2/img/dress/25_Mesh-Panel...


**'Floral', 'Abstract', 'Embroidered', 'Belted' and 'Lace'** are frequent styles.

In [45]:
#Find out frequent styles in style1
df3['style1'] = df3['style'].apply(lambda x: x[0])
df3['style1'].value_counts()

Floral           5105
Abstract         2775
Embroidered      2344
Belted           1493
Lace             1376
                 ... 
Clashist           25
Patriotic          24
Matchstick         23
Ladder-Cutout      20
Striped            10
Name: style1, Length: 309, dtype: int64

**'Print', 'Lace', 'Floral', 'Maxi' and 'Knit'** are frequent styles.

In [46]:
#Find out frequent styles in style2
df3['style2'] = df3['style'].apply(lambda x: x[1])
df3['style2'].value_counts()

Print        7426
Lace         3423
Floral       2872
Maxi         2378
Knit         1818
             ... 
Longline       21
Prints         20
Ringer         18
Rib-Knit       17
Side-Vent       4
Name: style2, Length: 257, dtype: int64

**'Print', 'Floral' and 'Stripes'** are frequent styles.

In [47]:
#Find out frequent feature combinations of 'Abstract' 
df3.loc[df3['style1'] == 'Abstract']['style2'].value_counts()

Print           928
Floral          292
Stripe          146
Chevron         121
Splatter         88
Ikat             75
Diamond          71
Pattern          70
Wave             70
Drop             67
Plaid            67
Animal           67
Mirrored         66
Paisley          65
V-Cut            62
Zigzag           59
Surplice         56
Dotted           54
Geo              50
Sweater          49
Mosaic           46
Tile             46
Patterned        41
Self-Tie         38
Tribal           35
Southwestern     24
Bodycon          22
Name: style2, dtype: int64

Based on the analysis above, **5 styles** are selected based on their **popularity** as well as **level of recognition** to yield a sample size of 500.

In [48]:
#Select features
df3['selected_style'] = 0
df3.loc[df3['style2']=='Print', 'selected_style'] = 'Print'
df3.loc[df3['style2']=='Floral', 'selected_style'] = 'Floral'
df3.loc[df3['style1']=='Embroidered', 'selected_style'] = 'Embroidered'
df3.loc[df3['style2']=='Lace', 'selected_style'] = 'Lace'
df3.loc[df3['style2']=='Maxi', 'selected_style'] = 'Maxi'

In [49]:
#Select relevant rows and columns
df3 = df3[['category', 'selected_style', 'image_name']]
df3 = df3.dropna()
df3 = df3.loc[df3['selected_style']!=0].reset_index(drop=True)
df3.head()

Unnamed: 0,category,selected_style,image_name
0,dress,Floral,../deepfashionextract2/img/dress/Abstract_Flor...
1,dress,Floral,../deepfashionextract2/img/dress/Abstract_Flor...
2,dress,Floral,../deepfashionextract2/img/dress/Abstract_Flor...
3,dress,Floral,../deepfashionextract2/img/dress/Abstract_Flor...
4,dress,Floral,../deepfashionextract2/img/dress/Abstract_Flor...


The selection is currently at **17,894** images of dresses.

In [50]:
#Get numbers of images
df3.groupby('selected_style')['image_name'].count().sum()

17894

The following shows the split of **image numbers per style**.

In [51]:
#Get numbers of images per style
df3.groupby('selected_style')['image_name'].count()

selected_style
Embroidered    2016
Floral         2651
Lace           3423
Maxi           2378
Print          7426
Name: image_name, dtype: int64

A **further selection** is undertaken to arrive at exactly 500.

In [52]:
#Create separate dataframes
df3_embroidered = df3.loc[df3['selected_style']=='Embroidered']
df3_floral = df3.loc[df3['selected_style']=='Floral']
df3_lace = df3.loc[df3['selected_style']=='Lace']
df3_maxi = df3.loc[df3['selected_style']=='Maxi']
df3_print_d = df3.loc[df3['selected_style']=='Print']

In [53]:
#Number of rows reduced
df3_embroidered2 = df3_embroidered.iloc[:100,:]
df3_floral2 = df3_floral.iloc[:100,:]
df3_lace2 = df3_lace.iloc[:100,:]
df3_maxi2 = df3_maxi.iloc[:100,:]
df3_print_d2 = df3_print_d.iloc[:100,:]

In [54]:
#Concatenate separate dataframes
new_df3 = pd.concat([df3_embroidered2, df3_floral2, df3_lace2, df3_maxi2, df3_print_d2])
new_df3.head()

Unnamed: 0,category,selected_style,image_name
5416,dress,Embroidered,../deepfashionextract2/img/dress/Embroidered_B...
5417,dress,Embroidered,../deepfashionextract2/img/dress/Embroidered_B...
5418,dress,Embroidered,../deepfashionextract2/img/dress/Embroidered_B...
5419,dress,Embroidered,../deepfashionextract2/img/dress/Embroidered_B...
5420,dress,Embroidered,../deepfashionextract2/img/dress/Embroidered_B...


The new selection contains **500** images of dresses.

In [55]:
new_df3.shape

(500, 3)

The following shows the new split of **image numbers per style**.

In [56]:
#Get numbers of images per style
new_df3.groupby('selected_style')['image_name'].count()

selected_style
Embroidered    100
Floral         100
Lace           100
Maxi           100
Print          100
Name: image_name, dtype: int64

In [57]:
#Save to csv
new_df3.to_csv('../deepfashionextract2/deepfashion_dresses.csv')

### B1.5.4 Combining Tops, Skirts and Dresses

In [58]:
combined = pd.concat([new_df, new_df2, new_df3])

combined.head()

Unnamed: 0,category,selected_style,image_name
0,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
1,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
2,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
3,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...
4,top,Floral,../deepfashionextract2/img/top/Abstract_Floral...


In [59]:
#See number of images
combined.groupby('category')['image_name'].count()

category
dress    500
skirt    500
top      500
Name: image_name, dtype: int64

In [60]:
#Save to csv
combined.to_csv('../deepfashionextract2/combined.csv')