# B2_Extract_Transform

This workbook **extracts** and **transforms** the data for the model. 

This entails the following steps:

| No.    | Step                                          |
| :------| :---------------------------------------------|
| B2.1   | Import Libraries                              |
| B2.2   | Load Combined Dataset                         |
| B2.3   | Extract and Rename Images from Subfolders     |
| B2.4   | Delete Empty Subfolders                       |

## B2.1 Import Libraries

In [1]:
#Import libraries
import shutil
import os
from os import listdir
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
import pickle

## B2.2 Load Combined Dataset

In [2]:
#Read csv
combined = pd.read_csv('../deepfashionextract2/combined.csv')
#Try dataset
combined.iloc[0,3]

'../deepfashionextract2/img/top/Abstract_Floral_Fringe_Crop_Top/img_00000001.jpg'

In [3]:
#Find out numbers of images
combined.groupby('category')['image_name'].count()

category
dress    500
skirt    500
top      500
Name: image_name, dtype: int64

## B2.3 Extract and Rename Images from Subfolders

The images are extracted from the subfolders, and renamed in also carrying the **name of the subfolder**. 
\
This creates a **unique identifier**, since previously images in different folders had the same name. 
\
Due to the bulk of the data the renaming occurs in **batches.**

In [4]:
#Rename in dirName1
dirName1 = '../deepfashionextract2/img/top'

for root, dirs, files in os.walk(dirName1):
        
    for dr in dirs:
        dr_root = root + '/' + dr
            
        for file in os.listdir(dr_root):
            file_root = dr_root + '/' + file
            dst = dirName1 + '/' + dr + '_' + file
                
            for l in range(500):
                if combined.iloc[l,3]==file_root: 
                    try:
                        os.rename(file_root, dst) 
                        print(f'File', dst, 'now renamed') 
                    except FileExistsError:
                        print(f'File', dst, 'already renamed') 

In [5]:
#Rename in dirName2
dirName2 = '../deepfashionextract2/img/skirt'

for root, dirs, files in os.walk(dirName2):
        
    for dr in dirs:
        dr_root = root + '/' + dr
            
        for file in os.listdir(dr_root):
            file_root = dr_root + '/' + file
            dst = dirName2 + '/' + dr + '_' + file
                
            for l in range(500):
                if combined.iloc[l,3]==file_root: 
                    try:
                        os.rename(file_root, dst) 
                        print(f'File', dst, 'now renamed') 
                    except FileExistsError:
                        print(f'File', dst, 'already renamed') 

In [6]:
#Rename in dirName3
dirName3 = '../deepfashionextract2/img/dress'

for root, dirs, files in os.walk(dirName3):
        
    for dr in dirs:
        dr_root = root + '/' + dr
            
        for file in os.listdir(dr_root):
            file_root = dr_root + '/' + file
            dst = dirName3 + '/' + dr + '_' + file
                
            for l in range(500):
                if combined.iloc[l,3]==file_root: 
                    try:
                        os.rename(file_root, dst) 
                        print(f'File', dst, 'now renamed') 
                    except FileExistsError:
                        print(f'File', dst, 'already renamed') 

## B2.4 Delete Unnecessary Subfolders

In [7]:
#Delete unnecessary folders 
list = [dirName1, dirName2, dirName3]
for i in list:
    for root, dirs, files in os.walk(i):
        for dr in dirs:
            try:
                shutil.rmtree(os.path.join(root, dr))
            except Exception as e:
                pass

In [8]:
#Find number of images
list = [dirName1, dirName2, dirName3]
for i in list:
    print(len(os.listdir(i)))

500
500
500


In [9]:
os.listdir(dirName1)

['Abstract_Floral_Fringe_Crop_Top_img_00000001.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000002.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000003.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000004.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000005.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000006.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000007.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000008.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000009.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000010.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000011.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000012.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000013.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000014.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000015.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000016.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000017.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000018.jpg',
 'Abstract_Floral_Fringe_Crop_Top_img_00000019