## Overview

This notebook contains the codes I wrote for DSO 560 Text Analytics & NLP Final Project to predict **fit** for women clothing. The client is ThreadTogether, an Australian Non-profit orgnazation.
- Part I focuses on data preprocessing and model building 
- Part II is the final prediction program. The main() function will ask user for a CSV file input and output a CSV file with an additional fit column. 

Create on: 5.2.2020

Create by: Xinyi (Alex) Guo

In [1]:
import pandas as pd
import numpy as np
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer, PorterStemmer
import spacy

from sklearn.model_selection import train_test_split
from sklearn.utils import resample
from sklearn.linear_model import LogisticRegression

from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from numpy import asarray
from numpy import zeros
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Embedding
from keras.models import load_model

import pickle

import warnings
warnings.filterwarnings('ignore')
startTime = pd.datetime.now()

Using TensorFlow backend.


## Part 1: Data Preprocessing and Model building

### Load data
- The full data and tagged attribute data was merged in Postico

In [2]:
mergedData = pd.read_csv("merged_data_159k.csv")

In [3]:
mergedData.shape

(159013, 18)

In [4]:
mergedData.head()

Unnamed: 0,product_id,brand,mpn,product_full_name,description,brand_category,created_at,updated_at,deleted_at,brand_canonical_url,details,labels,bc_product_id,product_id.1,product_color_id,attribute_name,attribute_value,file
0,01DPGV4YRP3Z8J85DASGZ1Y99W,Frame,LWAX0056,Les Second - Medium--NOIR,"Minimal, Modern Styling Meets Refined Luxury I...",Accessories,2019-10-06 15:31:31.730524+00,2019-12-19 20:40:30.786144+00,,https://frame-store.com/products/les-second-me...,,{},,01DPGV4YRP3Z8J85DASGZ1Y99W,01DPGVGBK6YGNYGNF2S6FSH02T,style,Casual,initial_tags
1,01DPGV4YRP3Z8J85DASGZ1Y99W,Frame,LWAX0056,Les Second - Medium--NOIR,"Minimal, Modern Styling Meets Refined Luxury I...",Accessories,2019-10-06 15:31:31.730000+00:00,2020-04-06 23:19:53.216000+00:00,2020-04-06 23:19:53.216000+00:00,https://frame-store.com/products/les-second-me...,,[],185.0,01DPGV4YRP3Z8J85DASGZ1Y99W,01DPGVGBK6YGNYGNF2S6FSH02T,style,Casual,initial_tags
2,01DSE8Z2ZDAZKZ2SKCS1E3B3HK,Banana Republic,491075,Madison 12-Hour Loafer Pump,Everything you love about our original Madison...,Unknown,2019-11-11 22:22:21.664000+00:00,2020-03-25 23:24:44.823000+00:00,2020-03-23 21:06:15.953000+00:00,https://bananarepublic.gap.com/browse/product....,Everything you love about our original Madison...,[],431.0,01DSE8Z2ZDAZKZ2SKCS1E3B3HK,01DSE8ZG8Y3FR8KWE2TY1QDWBF,shoe_width,Medium,initial_tags
3,01DSE8Z2ZDAZKZ2SKCS1E3B3HK,Banana Republic,491075,Madison 12-Hour Loafer Pump,Everything you love about our original Madison...,Unknown,2019-11-11 22:22:21.664425+00,2019-12-19 20:40:30.786144+00,,https://bananarepublic.gap.com/browse/product....,Everything you love about our original Madison...,{},,01DSE8Z2ZDAZKZ2SKCS1E3B3HK,01DSE8ZG8Y3FR8KWE2TY1QDWBF,shoe_width,Medium,initial_tags
4,01E2C3YN4KQ36A0REWZJ89ZN73,FREDA SALVADOR,5229129,Ace Bootie,Edgy style and expert craftsmanship combine on...,Unknown,2020-03-01 22:37:32.169000+00:00,2020-04-15 21:46:03.512000+00:00,2020-03-18 23:00:31.558000+00:00,https://shop.nordstrom.com/s/freda-salvador-ac...,"True to size. 2 1/4"" (57mm) heel (size 8.5) 5""...",[],1051.0,01E2C3YN4KQ36A0REWZJ89ZN73,01E2C3YN56ZCJ8TN45V3EC8CPS,Primary Color,Blacks,initial_tags


In [5]:
mergedData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 159013 entries, 0 to 159012
Data columns (total 18 columns):
product_id             159013 non-null object
brand                  159013 non-null object
mpn                    159013 non-null object
product_full_name      159013 non-null object
description            148089 non-null object
brand_category         153172 non-null object
created_at             159013 non-null object
updated_at             159013 non-null object
deleted_at             108946 non-null object
brand_canonical_url    158979 non-null object
details                133971 non-null object
labels                 159013 non-null object
bc_product_id          120574 non-null float64
product_id.1           159013 non-null object
product_color_id       159013 non-null object
attribute_name         159013 non-null object
attribute_value        159013 non-null object
file                   159013 non-null object
dtypes: float64(1), object(17)
memory usage: 21.8+ MB


### Fit

In [6]:
fit_df = mergedData[mergedData.attribute_name == 'fit']

In [7]:
fit_df.shape

(6361, 18)

In [8]:
fit_df.reset_index(inplace = True)
fit_df = fit_df.drop('index', axis = 1)

In [9]:
#remove duplicates
fit_df.drop_duplicates('product_color_id', keep = 'last', inplace = True)

In [10]:
fit_df.shape

(3686, 18)

In [11]:
fit_df.isnull().sum()

product_id               0
brand                    0
mpn                      0
product_full_name        0
description            325
brand_category         338
created_at               0
updated_at               0
deleted_at             536
brand_canonical_url      0
details                303
labels                   0
bc_product_id          323
product_id.1             0
product_color_id         0
attribute_name           0
attribute_value          0
file                     0
dtype: int64

### Preprocessing 

In [12]:
#Define function to remove punctuation
import string 
def removePunctuation(text, punctuations=string.punctuation+"``"+"’"+"”"):
    words=nltk.word_tokenize(text)
    newWords = [word for word in words if word.lower() not in punctuations]
    cleanedText = " ".join(newWords)
    return cleanedText

In [13]:
nltk_stopwords = set(stopwords.words("English"))

In [14]:
#Define function to remove stopwords
def removeStopwords(text, stopwords=nltk_stopwords):
    words = nltk.word_tokenize(text)
    newWords = [word for word in words if word.lower() not in stopwords]
    cleanedText = " ".join(newWords)
    return cleanedText

In [15]:
#Define function for lemmatization
def lemmatize(text):
    lemmatizer = WordNetLemmatizer()
    words = nltk.word_tokenize(text)
    lemmatizedWords = [lemmatizer.lemmatize(word.lower()) for word in words]
    lemmatizedText = " ".join(lemmatizedWords)
    return lemmatizedText

In [16]:
#Create dummy variables for each fit category
dummies = pd.get_dummies(fit_df['attribute_value'])
fit_df = pd.concat([fit_df, dummies], axis = 1)
fit_df.reset_index(inplace = True)
fit_df = fit_df.drop('index', axis = 1)

In [17]:
def preprocessing(df, columns = ["brand", "product_full_name", "description", "details"]):
    df['details'] = df['details'].str.replace("\n", "")
    #replace null values with UNKNOWN_TOKEN
    df['brand'] = df['brand'].fillna('UNKNOWN_TOKEN')
    df['description'] = df['description'].fillna('UNKNOWN_TOKEN')
    df['details'] = df['details'].fillna('UNKNOWN_TOKEN')
    df['product_full_name'] = df['product_full_name'].fillna('UNKNOWN_TOKEN')
    #remove punctuation and stopwords then lemmatize
    for col in columns: 
        df[col] = df[col].apply(removePunctuation)
        df[col] = df[col].apply(removeStopwords)
        df[col] = df[col].apply(lemmatize)
    return df

In [18]:
fit_df = preprocessing(fit_df)

In [19]:
columnsToDrop = ['mpn', 'created_at', 'updated_at', 'brand_category', 'deleted_at','product_id.1', 'bc_product_id', 'labels', 'attribute_name', 'file']
fit_df = fit_df.drop(columnsToDrop, axis = 1)

In [20]:
fit_df["input_doc"] = fit_df.brand + " " + fit_df.product_full_name + " " + fit_df.description + " " + fit_df.details 

In [21]:
fit_df.isnull().sum()

product_id             0
brand                  0
product_full_name      0
description            0
brand_canonical_url    0
details                0
product_color_id       0
attribute_value        0
fittedtailored         0
oversized              0
relaxed                0
semifitted             0
straightregular        0
input_doc              0
dtype: int64

In [22]:
fit_df.shape

(3686, 14)

### Build Keras Deep Learning Model
- During this step, I trained and test out the Keras model 

In [23]:
def integer_encode_documents(docs, tokenizer):
    return tokenizer.texts_to_sequences(docs)

In [24]:
def get_max_token_length_per_doc(docs):
    return max(list(map(lambda x: len(x.split()), docs)))

In [25]:
def prediction(df, target, EMBEDDING_SIZE = 100):
    X = df.loc[:, "input_doc"].values
    y = df.loc[:, target].values
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0, stratify = y)
    docs = list(X_train)
    labels = y_train
    
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(docs)
    vocab_size = len(tokenizer.word_index) + 1 
    # integer encode the documentsa
    encoded_docs = integer_encode_documents(docs, tokenizer)
    # get the max length in terms of token length
    max_length = get_max_token_length_per_doc(docs)
    # pad documents to a max length of words
    padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
    # define the model
    model = Sequential()
    model.add(Embedding(vocab_size, EMBEDDING_SIZE, input_length=max_length))
    model.add(Flatten()) 
    model.add(Dense(1, activation='sigmoid')) 
    # compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
    # fit the model
    model.fit(padded_docs, labels, epochs=50, verbose=0)
    # evaluate the model
    loss, trainingAccuracy = model.evaluate(padded_docs, labels, verbose=0)
    
    #testing
    test_docs = list(X_test)
    test_labels = y_test
    encoded_test_docs = integer_encode_documents(test_docs, tokenizer)
    padded_test_docs = pad_sequences(encoded_test_docs, maxlen=max_length, padding='post')
    loss, testAccuracy = model.evaluate(padded_test_docs, test_labels, verbose=0)
    prediction_proba = model.predict(padded_test_docs, verbose = 0)
    
    score_dict = {"target": target, "trainingAccuracy": round(trainingAccuracy, 2), 
                  "testAccuracy": round(testAccuracy,2)}
    
    return score_dict, prediction_proba

In [26]:
#Predict for each fit type 
fitType = list(fit_df.attribute_value.unique())
scoreList = []
prob_df = pd.DataFrame()
for fit in fitType:
    score_dict, prediction_proba = prediction(fit_df, target = fit)
    scoreList.append(score_dict)
    prob_df[fit] = prediction_proba.flatten()
    print(fit, "prediction done")

straightregular prediction done
semifitted prediction done
relaxed prediction done
oversized prediction done
fittedtailored prediction done


In [27]:
#Training and testing accuracy
score_df = pd.DataFrame(scoreList)

In [28]:
score_df

Unnamed: 0,target,trainingAccuracy,testAccuracy
0,straightregular,1.0,0.91
1,semifitted,1.0,0.84
2,relaxed,1.0,0.81
3,oversized,1.0,0.97
4,fittedtailored,1.0,0.93


In [29]:
#Assign the fit with the highest probability out of the 5 fit categories
prob_df['predict_fit'] = prob_df.idxmax(axis=1)

In [30]:
prob_df

Unnamed: 0,straightregular,semifitted,relaxed,oversized,fittedtailored,predict_fit
0,0.408755,0.000096,0.106553,5.602837e-06,9.595108e-01,fittedtailored
1,0.000038,0.008466,0.000012,4.380941e-05,5.960464e-08,semifitted
2,0.000034,0.042026,0.947667,4.172325e-07,3.181696e-04,relaxed
3,0.000794,0.000487,0.952116,9.179115e-06,1.043081e-06,relaxed
4,0.000057,0.998580,0.000108,8.642673e-07,1.490116e-07,semifitted
...,...,...,...,...,...,...
917,0.000013,0.000041,0.897436,3.087223e-04,1.391259e-02,relaxed
918,0.003296,0.920671,0.000728,5.862117e-05,8.940697e-08,semifitted
919,0.000013,0.000000,0.011393,8.940697e-08,3.546476e-06,relaxed
920,0.999871,0.014819,0.000656,1.556547e-05,8.845456e-04,straightregular


### Save Keras Model & build automated prediction
- Since the Keras model had high accuracy of 80 - 90%, I decided to save my models and tokenizers. I did training and testing prediction again with the same random state to ensure my code is correct. 

#### Generate test data

In [31]:
def generateTestData(df, target):
    X = fit_df.loc[:, ["brand", "product_full_name", "description", "details"]]
    y = fit_df.loc[:, target]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0, stratify = y)
    X_test.to_csv("x_test_{}.csv".format(target))
    pd.DataFrame(y_test).to_csv("y_test_{}.csv".format(target))

In [32]:
fitType = list(fit_df.attribute_value.unique())
fitType

['straightregular', 'semifitted', 'relaxed', 'oversized', 'fittedtailored']

In [33]:
for fit in fitType:
    generateTestData(fit_df, target = fit)

In [34]:
def buildKerasModel(df, target, EMBEDDING_SIZE = 100):
    X = df.loc[:, "input_doc"].values
    y = df.loc[:, target].values
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0, stratify = y)
    docs = list(X_train)
    labels = y_train
    
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(docs)
    vocab_size = len(tokenizer.word_index) + 1 
    # integer encode the documentsa
    encoded_docs = integer_encode_documents(docs, tokenizer)
    # get the max length in terms of token length
    max_length = get_max_token_length_per_doc(docs)
    # pad documents to a max length of words
    padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
    # define the model
    model = Sequential()
    model.add(Embedding(vocab_size, EMBEDDING_SIZE, input_length=max_length))
    model.add(Flatten()) 
    model.add(Dense(1, activation='sigmoid')) 
    # compile the model
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
    # fit the model
    model.fit(padded_docs, labels, epochs=50, verbose=0)
    # evaluate the model
    loss, trainingAccuracy = model.evaluate(padded_docs, labels, verbose=0)
    
    #save model
    model.save("{}_model.h5".format(target))
    
    
    #save tokenizer
    with open('{}_tokenizer.pickle'.format(target), 'wb') as handle:
        pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
    
    return max_length

In [35]:
maxLengthDict = {}
for fit in fitType:
    maxLengthDict[fit] = buildKerasModel(df = fit_df, target = fit, EMBEDDING_SIZE = 100)
    print(fit, "done")

straightregular done
semifitted done
relaxed done
oversized done
fittedtailored done


In [36]:
def test(input_X_test_file, input_y_test_file, target, max_length):
    #load data
    X_test_df = pd.read_csv(input_X_test_file, index_col = 0)
    X_test_df["input_doc"] = X_test_df.brand + " " + X_test_df.product_full_name + " " + X_test_df.description + " " + X_test_df.details 
    X_test = X_test_df.loc[:, "input_doc"].values
    y_test_df = pd.read_csv(input_y_test_file, index_col = 0)
    y_test = y_test_df.loc[:, target].values
    test_docs = list(X_test)
    test_labels = y_test
    
    #load model
    model = load_model("{}_model.h5".format(target))
    with open('{}_tokenizer.pickle'.format(target), 'rb') as handle:
        tokenizer = pickle.load(handle)
        
    #predict
    encoded_test_docs = integer_encode_documents(test_docs, tokenizer)
    padded_test_docs = pad_sequences(encoded_test_docs, maxlen=max_length, padding='post')
    loss, testAccuracy = model.evaluate(padded_test_docs, test_labels, verbose=0)
    prediction_proba = model.predict(padded_test_docs, verbose = 0)

    score_dict = {"target": target, "testAccuracy": round(testAccuracy,2)}
    
    return score_dict, prediction_proba

In [37]:
maxLengthDict

{'straightregular': 185,
 'semifitted': 185,
 'relaxed': 202,
 'oversized': 202,
 'fittedtailored': 202}

In [38]:
scoreList = []
prob_df = pd.DataFrame()
for fit in fitType:
    input_X_test_file = "x_test_{}.csv".format(fit)
    input_y_test_file = "y_test_{}.csv".format(fit)
    score_dict, prediction_proba = test(input_X_test_file,input_y_test_file, target = fit, max_length = maxLengthDict[fit])
    scoreList.append(score_dict)
    prob_df[fit] = prediction_proba.flatten()
    print(fit, "done")

straightregular done
semifitted done
relaxed done
oversized done
fittedtailored done


In [39]:
score_df = pd.DataFrame(scoreList)

In [40]:
score_df

Unnamed: 0,target,testAccuracy
0,straightregular,0.91
1,semifitted,0.84
2,relaxed,0.81
3,oversized,0.97
4,fittedtailored,0.93


In [41]:
prob_df['predict_fit'] = prob_df.idxmax(axis=1)

In [42]:
prob_df

Unnamed: 0,straightregular,semifitted,relaxed,oversized,fittedtailored,predict_fit
0,0.466508,1.281500e-04,0.065370,2.604723e-05,9.627302e-01,fittedtailored
1,0.000039,4.094937e-02,0.000017,5.698204e-05,2.086163e-07,semifitted
2,0.000020,1.686387e-01,0.983514,1.788139e-07,7.907152e-04,relaxed
3,0.000819,2.659559e-04,0.767777,1.189113e-05,4.231930e-06,relaxed
4,0.000077,9.986877e-01,0.000117,7.748604e-07,7.152557e-07,semifitted
...,...,...,...,...,...,...
917,0.000030,3.337860e-05,0.889967,2.555847e-04,7.956237e-03,relaxed
918,0.005344,9.615009e-01,0.000728,2.273917e-05,2.980232e-08,semifitted
919,0.000017,1.490116e-07,0.041814,3.576279e-07,4.053116e-06,relaxed
920,0.999934,4.871343e-03,0.000805,1.118699e-05,5.359103e-04,straightregular


## Part ll: Fit prediction on full data

- This main() function will predict the fit of the clothing. It takes a CSV file as an input. The CSV file needs to have "brand", "product_full_name", "description", and "details" columns. The function will output a CSV file with an additional fit column and return the dataframe. 
- Link to the saved Keras models and tokenizer: https://drive.google.com/file/d/1QZ1hVUlboyCnOFILYwfMbsKlhQHJAIBb/view?usp=sharing Please kindly download the Models.zip file before running the main() function. 

In [43]:
import pandas as pd
import numpy as np
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer, PorterStemmer

from sklearn.model_selection import train_test_split

from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from numpy import asarray
from numpy import zeros
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Embedding
from keras.models import load_model

import pickle

import warnings
warnings.filterwarnings('ignore')
startTime = pd.datetime.now()

In [44]:
import string 
def removePunctuation(text, punctuations=string.punctuation+"``"+"’"+"”"):
    words=nltk.word_tokenize(text)
    newWords = [word for word in words if word.lower() not in punctuations]
    cleanedText = " ".join(newWords)
    return cleanedText

In [45]:
nltk_stopwords = set(stopwords.words("English"))
def removeStopwords(text, stopwords=nltk_stopwords):
    words = nltk.word_tokenize(text)
    newWords = [word for word in words if word.lower() not in stopwords]
    cleanedText = " ".join(newWords)
    return cleanedText

In [46]:
def lemmatize(text):
    lemmatizer = WordNetLemmatizer()
    words = nltk.word_tokenize(text)
    lemmatizedWords = [lemmatizer.lemmatize(word.lower()) for word in words]
    lemmatizedText = " ".join(lemmatizedWords)
    return lemmatizedText

In [47]:
def preprocessing(df, columns = ["brand", "product_full_name", "description", "details"]):
    df['details'] = df['details'].str.replace("\n", "")
    #replace null values with UNKNOWN_TOKEN
    df['brand'] = df['brand'].fillna('UNKNOWN_TOKEN')
    df['description'] = df['description'].fillna('UNKNOWN_TOKEN')
    df['details'] = df['details'].fillna('UNKNOWN_TOKEN')
    df['product_full_name'] = df['product_full_name'].fillna('UNKNOWN_TOKEN')
    #remove punctuation and stopwords then lemmatize
    for col in columns: 
        df[col] = df[col].apply(removePunctuation)
        df[col] = df[col].apply(removeStopwords)
        df[col] = df[col].apply(lemmatize)
    return df

In [48]:
def integer_encode_documents(docs, tokenizer):
    return tokenizer.texts_to_sequences(docs)

In [49]:
def get_max_token_length_per_doc(docs):
    return max(list(map(lambda x: len(x.split()), docs)))

In [50]:
def predict(X_test_df, target, max_length):
    #load data
    X_test_df["input_doc"] = X_test_df.brand + " " + X_test_df.product_full_name + " " \
                                + X_test_df.description + " " + X_test_df.details 
    X_test = X_test_df.loc[:, "input_doc"].values
    test_docs = list(X_test)

    #load model
    model = load_model("{}_model.h5".format(target))
    with open('{}_tokenizer.pickle'.format(target), 'rb') as handle:
        tokenizer = pickle.load(handle)
        
    #predict
    encoded_test_docs = integer_encode_documents(test_docs, tokenizer)
    padded_test_docs = pad_sequences(encoded_test_docs, maxlen=max_length, padding='post')
    prediction_proba = model.predict(padded_test_docs, verbose = 0)
    
    return prediction_proba

In [51]:
def main():
    '''
    This function will predict the fit of the clothing. It takes a CSV file as an input. The CSV file needs to have 
    "brand", "product_full_name", "description", and "details" columns. The function will output a CSV file with an 
    additional fit column and return the dataframe. 
    '''
    #load data
    inputFile = input("What's the name of the csv file? (ex. full_data.csv)")
    fullData = pd.read_csv(inputFile)
    #Preprocess data
    print("Start preprocessing data...")
    testData = fullData.copy()
    testData = testData.loc[:, ["brand", "product_full_name", "description", "details"]]
    testData = preprocessing(testData)
    print("Start predicting fit...")
    #Predict fit
    maxLengthDict = {'straightregular': 185,
                 'semifitted': 185,
                 'relaxed': 202,
                 'oversized': 202,
                 'fittedtailored': 202}
    prob_df = pd.DataFrame()
    fitType = ['straightregular', 'semifitted', 'relaxed', 'oversized', 'fittedtailored']
    for fit in fitType:
        prediction_proba = predict(testData, target = fit, max_length = maxLengthDict[fit])
        prob_df[fit] = prediction_proba.flatten()
        print(fit, "fit prediction done")
    prob_df['predict_fit'] = prob_df.idxmax(axis=1)
    fullData['fit'] = prob_df['predict_fit']
    fullData.to_csv("full_data with fit prediction.csv")
    return fullData

In [52]:
df = main()

What's the name of the csv file? (ex. full_data.csv)full_data_final version.csv
Start preprocessing data...
Start predicting fit...
straightregular fit prediction done
semifitted fit prediction done
relaxed fit prediction done
oversized fit prediction done
fittedtailored fit prediction done


In [55]:
df.head()

Unnamed: 0,product_id,brand,mpn,product_full_name,description,brand_category,created_at,updated_at,deleted_at,brand_canonical_url,details,labels,bc_product_id,fit
0,01DSE9TC2DQXDG6GWKW9NMJ416,Banana Republic,514683.0,Ankle-Strap Pump,"A modern pump, in a rounded silhouette with an...",Unknown,2019-11-11 22:37:15.719107+00,2019-12-19 20:40:30.786144+00,,https://bananarepublic.gap.com/browse/product....,"A modern pump, in a rounded silhouette with an...","{""Needs Review""}",,straightregular
1,01DSE9SKM19XNA6SJP36JZC065,Banana Republic,526676.0,Petite Tie-Neck Top,Dress it down with jeans and sneakers or dress...,Unknown,2019-11-11 22:36:50.682513+00,2019-12-19 20:40:30.786144+00,,https://bananarepublic.gap.com/browse/product....,Dress it down with jeans and sneakers or dress...,"{""Needs Review""}",,semifitted
2,01DSJX8GD4DSAP76SPR85HRCMN,Loewe,400100000000.0,52MM Padded Leather Round Sunglasses,Padded leather covers classic round sunglasses.,JewelryAccessories/SunglassesReaders/RoundOval...,2019-11-13 17:33:59.581661+00,2019-12-19 20:40:30.786144+00,,https://www.saksfifthavenue.com/loewe-52mm-pad...,100% UV protection Case and cleaning cloth inc...,"{""Needs Review""}",,semifitted
3,01DSJVKJNS6F4KQ1QM6YYK9AW2,Converse,400012000000.0,Baby's & Little Kid's All-Star Two-Tone Mid-To...,The iconic mid-top design gets an added dose o...,"JustKids/Shoes/Baby024Months/BabyGirl,JustKids...",2019-11-13 17:05:05.203733+00,2019-12-19 20:40:30.786144+00,,https://www.saksfifthavenue.com/converse-babys...,Canvas upper Round toe Lace-up vamp SmartFOAM ...,"{""Needs Review""}",,semifitted
4,01DSK15ZD4D5A0QXA8NSD25YXE,Alexander McQueen,400011000000.0,64MM Rimless Sunglasses,Hexagonal shades offer a rimless view with int...,JewelryAccessories/SunglassesReaders/RoundOval,2019-11-13 18:42:30.941321+00,2019-12-19 20:40:30.786144+00,,https://www.saksfifthavenue.com/alexander-mcqu...,100% UV protection Gradient lenses Adjustable ...,"{""Needs Review""}",,relaxed


In [53]:
print("Total runtime: ", pd.datetime.now() - startTime)

Total runtime:  0:02:14.868643
