
# Predicting Customer Satisfaction on Rent the Runway

##  II. Data Cleaning for Garment Data 
### Katrin Ayrapetov


<font style="font-size: 2rem; color: blue">


 
</font>

### Overview of the Notebook: 

Data was scraped from the website Rent_the_Runway. There is a total of  208,747 observations. In the notebook, I_Data_Cleaning_Customer_Information, features describing the customer were cleaned. The new data set has 161,418 features. 

In this notebook, features describing the dress will be cleaned. 

**Features describing the customer:**  Nickname, Type_of_customer, Size of the garment customer rented, Size the customer usually wears, Height, Age, Bust Size, Body Type, Weight, Date the customer rented the garment, Reason the garment was rented, Overall fit of the garment  

**Features describing the dress:** Retail Price of the garment, Rent price of the garment, Product Details, Number of Reviews left for that Garment 

In this notebook, the features describing the dress will be cleaned.  
* The Retail price of the dress, the Rent price of dress, the number of reviews left for the dress will be stripped off extra symbols and just turned into an integer. 
* The product description will be broken up into three additional features: Sleeves, Neckline and Dress Style. 
* For example:
<br> &emsp;&emsp; **Product Detail:** 'Blue printed cotton (69% Cotton, 27% Nylon, 4% Spandex). Hourglass. Sleeveless. Square neckline. 45" from shoulder to hemline.' <br>
becomes 
<br> &emsp;&emsp; **New Features:** 
<br> &emsp;&emsp; **Sleeves:** Sleeveless, 
<br>&emsp;&emsp; **Neckline:** Square Neckline, 
<br>&emsp;&emsp; **Dress_Style:** Hourglass 
* At the end of the garment data cleaning, the data set has: 156,433 observations. 

In [884]:
#Import the libraries needed
import pandas as pd
import re
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)

In [885]:
#Import the Data Set 
df =  pd.read_csv('../Data/df_customer_data_cleaned.csv')

In [886]:
df.shape

(151499, 21)

In [887]:
df = df.drop_duplicates(subset='Dress_Description', keep='first')
df = df.reset_index()
df.drop(columns=["index"],inplace=True)

In [888]:
maternity_index = []
for i in range(df.shape[0]):
    if "maternity" in df['Dress_Description'][i]:
        maternity_index.append(i)
df = df.drop(labels=maternity_index, axis=0)
df = df.reset_index()
df.drop(columns=["index"],inplace=True)

In [889]:
#Fix Typos in the Product Detail 
# Hit the hand entries here. 
df['Product_details'] = df['Product_details'].str.lower()
df["Product_details"]=df["Product_details"].replace("sleeve slit","", regex=True)
df["Product_details"]=df["Product_details"].replace("sleees","sleeves", regex=True)
df["Product_details"]=df["Product_details"].replace("sleves","sleeves", regex=True)
df["Product_details"]=df["Product_details"].replace("necline","neckline", regex=True)
df["Product_details"]=df["Product_details"].replace("slevees","sleeves", regex=True)
df["Product_details"]=df["Product_details"].replace("shoudler","shoulder", regex=True)
df["Product_details"]=df["Product_details"].replace("sleevleess","sleeveless", regex=True)
df["Product_details"]=df["Product_details"].replace("sleevess","sleeveless", regex=True)
df["Product_details"]=df["Product_details"].replace("sleevelss","sleeveless", regex=True)
df["Product_details"]=df["Product_details"].replace("sleevless","sleeveless", regex=True)
df["Product_details"]=df["Product_details"].replace("recycled","", regex=True)
df["Product_details"]=df["Product_details"].replace("organic ","", regex=True)
df["Product_details"]=df["Product_details"].replace("merino ","", regex=True)
df["Product_details"]=df["Product_details"].replace("virgin ","", regex=True)
df["Product_details"]=df["Product_details"].replace("metallic fibers","metallic_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("metallic fiber","metallic_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("metallic thread","metallic_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("metallic threads","metallic_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("tencel modal","tencel_modal", regex=True)
df["Product_details"]=df["Product_details"].replace("other fibers","other_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("bci","", regex=True)
df["Product_details"]=df["Product_details"].replace("sustainable ","", regex=True)
df["Product_details"]=df["Product_details"].replace("extrafine","", regex=True)
df["Product_details"]=df["Product_details"].replace("mettalized fiber","metallic_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("elite","", regex=True)
df["Product_details"]=df["Product_details"].replace("alpaca","", regex=True)
df["Product_details"]=df["Product_details"].replace("other fibers","metallic_fibers", regex=True)
df["Product_details"]=df["Product_details"].replace("recycled ","", regex=True)
df["Product_details"]=df["Product_details"].replace("cady","", regex=True)
df["Product_details"]=df["Product_details"].replace("french","", regex=True)
df["Product_details"]=df["Product_details"].replace("995% polyester","(95% polyester", regex=True)

In [890]:
product_correction_df =  pd.read_csv('../Data/product_details_correction_df.csv')

In [891]:
for i in range(product_correction_df.shape[0]):
    num = df[df['Dress_Description'] ==product_correction_df["dress"][i]].index[0]
    df.at[num, "Product_details"] = product_correction_df["details"][i]

In [892]:
for i in range(df.shape[0]):
    if "opening ceremony" in df["Brand"][i] and "novelty rib knit dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "black knit (40% wool, 33% viscose, 27% nylon). sheath. long sleeves. crew neckline, 46 from shoulder to hemline. imported."
    if "badgley mischka" in df["Brand"][i] and "rosalind peplum gown" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "black stretch crepe (98% polyester, 2% spandex). strapless. sweetheart neckline. 58"
    if "parker" in df["Brand"][i] and "black rory dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = 'black crepe (89% polyester, 11% elastane). cap sleeves. 35 '
    if "area stars" in df["Brand"][i] and "grey tia dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = 'grey and white cotton(100% cotton). hourglass. long sleeves. v-neckline. 33" from shoulder to hemline. imported.'
    if "hervé léger" in df["Brand"][i] and "cutout gown" in df["Dress_Description"][i]:   
         df.at[i, "Product_details"] = 'brown knit (22% viscose, 22% nylon, 3% spandex). sheath. halter neck. sleeveless.  49" from shoulder to hemline. '
    if "area stars" in df["Brand"][i] and "grey tia dress" in df["Dress_Description"][i]:    
        df.at[i, "Product_details"] = "embroidery details (100% cotton). hourglass. long sleeves. v-neckline. 33 from"
    if "j.crew" in df["Brand"][i] and "pink turtleneck dress" in df["Dress_Description"][i]:     
        df.at[i, "Product_details"] = "pink knit (56% nylon, 30% wool, 4% spandex). shift. long sleeves. high neck. 43 shoulder"
    if "goen. j" in df["Brand"][i] and "vegas python printed dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "python printed chiffon (100% polyester). shift. sleeveless. v-neckline. hidden back zipper closure.  40 from"
    if "badgley mischka" in df["Brand"][i] and "ivy gown" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] ='green . (100% polyester, 97% polyester, 3% spandex). sleeveless. crew neckline. 61'
    if "ml monique lhuillier" in df["Brand"][i] and "ruffle skirt day dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] ='cayenne multi woven (100% viscose). blouson. three quarter sleeves. crew neckline. 44" from shoulder to hemline. imported.'


#### Fabrics 

In [893]:
#Create a column for things that are inside the parentheses
df["Fabric"] = "unknown"
df["Fabric"] = df["Fabric"].astype('object')
for i in range(df.shape[0]):
    s = df["Product_details"][i]
    result = re.findall('\\(.*?\\)', s)
    df.at[i, 'Fabric'] = result
    for item in result:
           s = s.replace(item,"")
    df.at[i, 'Product_details'] = s   


In [894]:
for i in range(df.shape[0]):
    if len(df["Fabric"][i])>0:
        df.at[i, "Fabric"] = df["Fabric"][i][0].replace(")","").replace("(","")
    else:
        df.at[i, "Fabric"] = "unknown"

In [895]:
Missing_Fabric =  pd.read_csv('../Data/Missing_Fabric.csv')
Missing_Fabric['dress'] = Missing_Fabric['dress'].str.lower()
for i in range(Missing_Fabric.shape[0]):
    num = df[df['Dress_Description'] == Missing_Fabric["dress"][i]].index[0]
    df.at[num, "Fabric"] = Missing_Fabric["fabric"][i]

In [896]:
Missing_Fabric_2 =  pd.read_csv('../Data/Missing_Fabric_2.csv')
for i in range(Missing_Fabric_2.shape[0]):
    num = df[df['Dress_Description'] == Missing_Fabric_2["dress"][i]].index[0]
    df.at[num, "Fabric"] = Missing_Fabric_2["fabrics"][i]

In [897]:
df["Fabric"]=df["Fabric"].replace("%"," ", regex=True).replace(",","",regex=True).replace(";","",regex=True).replace(";","",regex=True).replace("elastane;","",regex=True)
for i in range(df.shape[0]):
    df.at[i, "Fabric"] = df["Fabric"][i].split()

In [898]:
for i in range(df.shape[0]):
    df.at[i,"Fabric"] = df["Fabric"][i][:6]

In [899]:
df_fabrics =  pd.read_csv('../Data/fabrics.csv')

In [900]:
df_fabrics =  pd.read_csv('../Data/fabrics.csv')

In [901]:
current_fabrics = list(df_fabrics["Fabric"])
edited_fabrics = list(df_fabrics["Translate"])
dict_of_fabrics = {k:v for k,v in zip(current_fabrics,edited_fabrics)}

In [902]:
dict_of_fabrics

{'cotton': 'cotton',
 'nylon': 'nylon',
 'spandex': 'spandex',
 'polyurethane': 'polyester',
 'viscose': 'rayon',
 'rayon': 'rayon',
 'silk': 'silk',
 'linen': 'linen',
 'polyester': 'polyester',
 'elastane': 'spandex',
 'tencel_modal': 'tencel',
 'polyster': 'polyester',
 'polyamide': 'nylon',
 'metallic': 'metallic_fibers',
 'acetate': 'rayon',
 'tencel': 'tencel',
 'sp': 'spandex',
 'lyocell': 'rayon',
 'wool': 'wool',
 'metallic_fibers': 'metallic_fibers',
 'modal': 'tencel',
 'polyamine': 'polyester',
 'polyamide-nylon': 'polyester',
 'elastance': 'spandex',
 'ramie': 'linen',
 'lurex': 'metallic_fibers',
 'poly': 'polyester',
 'model': 'tencel',
 'cupro': 'cotton',
 'bamboo': 'polyester',
 'acrylic': 'polyester',
 'lycra': 'spandex',
 'triacetate': 'cellulose',
 'leather': 'leather',
 'elasthane': 'spandex',
 'polyester.': 'polyester',
 'filming': 'nylon',
 'hemp': 'linen',
 'polyethylene': 'polyester',
 'denim': 'cotton',
 'cashmere': 'cashmere',
 'woven': 'wool',
 'merino': 'wo

In [903]:
for i in range(df.shape[0]):
    res = " ".join(dict_of_fabrics.get(ele, ele) for ele in df["Fabric"][i])
    df.at[i,"Fabric"] = str(res)

In [904]:
for i in range(df.shape[0]):
    df.at[i,"Fabric"] = df["Fabric"][i].split()

In [905]:
# Create a column for each fabric type in the dictionary. 
fabrics_columns = list(set(edited_fabrics))
for fabric in fabrics_columns:
    df[fabric] = 0
    df[fabric] = df[fabric].astype('int')

In [906]:
for i in range(df.shape[0]):
    if len(df["Fabric"][i])==2:
        df.at[i,df["Fabric"][i][1]] = df["Fabric"][i][0]
    elif len(df["Fabric"][i])==4:
        df.at[i,df["Fabric"][i][1]] = df["Fabric"][i][0]
        df.at[i,df["Fabric"][i][3]] = df["Fabric"][i][2]
    elif len(df["Fabric"][i])==6:
        df.at[i,df["Fabric"][i][1]] = df["Fabric"][i][0]
        df.at[i,df["Fabric"][i][3]] = df["Fabric"][i][2]
        df.at[i,df["Fabric"][i][5]] = df["Fabric"][i][4]

In [907]:
df["Length"] = "unknown"
df["Length"] = df["Length"].astype('object')
for i in range(df.shape[0]):
    s = df["Product_details"][i]
    result = re.findall(r'\d+', s)
    if len(result)>0:
        df.at[i, 'Length'] = int(result[0])

In [908]:
df['Brand'] = df['Brand'].str.lower()

In [909]:
#Fill in values without product descriptions by hand. 
for i in range(df.shape[0]):
    if "victor alfaro collective" in df["Brand"][i] and "tea length shirtdress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "Shirtdress. three_quarter_sleeves. shirt_collar_neckline"
    if "nicole miller" in df["Brand"][i] and "velvet mini dress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "sheath. high neckline. long sleeves"
    if "adam lippes collective" in df["Brand"][i] and "birds of prey dress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "hourglass.crew neckline. short sleeves"
    if "area stars" in df["Brand"][i] and "lara leopard print dress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "maxi.v-neckline. short sleeves"
    if "tibi" in df["Brand"][i] and "rolled sleeve shirtdress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "shirt dress. collared neckline.  short sleeves" 
    if "jonathan simkhai" in df["Brand"][i] and "lucy cutout midi dress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "sheath. straight neckling. sleeveless"   
    if "allsaints" in df["Brand"][i] and "juela dress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "shift dress. mock neck. long sleeves."   
    if "allsaints" in df["Brand"][i] and "rosetta tinsel dress" in df["Dress_Description"][i]:       
        df.at[i, "Product_details"] = "maxi. crew neck. long sleeves."     
    if "tibi" in df["Brand"][i] and "serpentine tank dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = " Sheath. sleeveless. crew neck."
    if "marchesa notte" in df["Brand"][i] and "off the shoulder corset gown" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = " gown. sleeveless. off_shoulder. " 
    if "fifteen twenty" in df["Brand"][i] and "green handkerchief hem dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "Hourglass. sleeveless. Shift. V-neckline."
    if "elliatt" in df["Brand"][i] and "paige sequin mesh dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "sheath. sleeveless. Square neckline."
    if "badgley mischka" in df["Brand"][i] and "butterfly belted gown" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "gown. sleeveless. v-neck."
    if "rejina pyo" in df["Brand"][i] and "astrid trench dress" in df["Dress_Description"][i]:
         df.at[i, "Product_details"] = "wrap. sleeveless. collared neckline."
    if "aidan aidan mattox" in df["Brand"][i] and "floral print satin midi dress" in df["Dress_Description"][i]: 
        df.at[i, "Product_details"] = "sleeveless. hourglass. square neck."
    if "hugo boss" in df["Brand"][i] and "dalissa dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "short_sleeves. hourglass.  v-neckline."
    if "allsaints" in df["Brand"][i] and "lilliana kettu dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "hourglass. short_sleeves. Hourglass. v-neckline."
    if "shoshanna" in df["Brand"][i] and "catalaya sequin dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "Sheath. sleeveless. off the shoulder."
    if "bardot" in df["Brand"][i] and "beckett sequin dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "Sheath. sleeveless. scoop neck."
    if "officine générale" in df["Brand"][i] and "camo bonnie dress"in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "Shirtdress. Long sleeves. Shirt collar neckline."
    if "badgley mischka" in df["Brand"][i] and "glitz gown" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "gown. V-neckline. sleeveless."
    if "badgley mischka" in df["Brand"][i] and "midnight glamour dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "hourglass. Sleeveless. V neckline."
    if "aijek" in df["Brand"][i] and "rylee lace toga dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "Hourglass. One shoulder neckline. Sleeveless."
    if "mark & james by badgley mischka" in df["Brand"][i] and "dancing til daylight dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "Hourglass. Sleeveless. V neckline."
    if "slate & willow" in df["Brand"][i] and "corey wrap dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "Wrap dress. Long sleeves. V-neckline."
    if "badgley mischka" in df["Brand"][i] and "miss mysterious gown" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "gown. sleeveless. one shoulder"
    if "badgley mischka" in df["Brand"][i] and "sequin garden dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "hourglass. sleeveless. V neckline"
    if "parker" in df["Brand"][i] and "clarisa dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "hourglass. sleeveless. One shoulder."
    if "badgley mischka" in df["Brand"][i] and "draped in gold dress" in df["Dress_Description"][i]:
        df.at[i, "Product_details"] = "hourglass. sleeveless. V neckline"
df['Product_details'] = df['Product_details'].str.lower()

In [910]:
#Break up the garment description into a list and call it "Details". 
df["Details"] = "unknown" 
df["Details"] = df["Details"].astype('object')
for i in range(df.shape[0]):
    x = df["Product_details"][i].split(". ")
    df.at[i, 'Details'] = x

In [911]:
df["Sleeves"] = "unknown"
df["Sleeves"] = df["Sleeves"].astype('object')

In [912]:
for i in range(df.shape[0]):
     for k in reversed(range(len(df["Details"][i]))):
        if "sleeveless" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "sleeveless"
        elif "quarter" in df["Details"][i][k] or "3/4" in df["Details"][i][k] or "¾" in df["Details"][i][k] or "elbow" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "three_quarter_sleeves"
        elif "short" in df["Details"][i][k] and "sleeve" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "short_sleeves"
        elif "long" in df["Details"][i][k] and "sleeve" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "long_sleeves"
        elif "cap" in df["Details"][i][k] and "sleeve" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "cap_sleeves"
        elif "flutter" in df["Details"][i][k] and "sleeve" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "cap_sleeves"
        elif "three" in df["Details"][i][k] and "sleeve" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "three_quarter_sleeves"  
        elif "strapless" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "strapless"  
        elif "halter" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "halter" 
        elif "convertible" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "sleeveless"
        elif "one shoulder" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "sleeveless"
        elif "white cotton with eyelet" in df["Details"][i][k]:
            df.at[i, "Sleeves"] = "sleeveless"

In [913]:
df["Neckline"] = "unknown"
df["Neckline"] = df["Neckline"].astype('object')

In [914]:
for i in range(df.shape[0]):
     for k in reversed(range(len(df["Details"][i]))):
        if "neck" in df["Details"][i][k] and "v" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "v_neckline"
        elif "sweetheart" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "sweetheart_neckline"
        elif "scoop" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "scoop_neckline"
        elif "turtle" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "turtleneck"  
        elif "mock" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "turtleneck" 
        elif "surplice" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "v_neckline" 
        elif "straight" in df["Details"][i][k] and "neck" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "straight_neckline"    
        elif "strapless" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "straight_neckline"    
        elif "crew" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "crew_neckline"
        elif "halter" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "halter"
        elif "high" in df["Details"][i][k] and "neck" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "high_neckline"
        elif "asymmetric" in df["Details"][i][k] and "neck" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "assymetric_neckline"
        elif "one" in df["Details"][i][k] and "shoulder" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "one_shoulder"
        elif "boat" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "boat_neckline"
        elif "cowl" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "cowl_neckline" 
        elif "square" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "square_neckline" 
        elif "shirt" in df["Details"][i][k] or "collar" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "shirt_collar_neckline" 
        elif "off" in df["Details"][i][k] and "shoulder" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "off_shoulder_neckline" 
        elif "boat" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "boat_neckline" 
        elif "scoop" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "scoop_neckline" 
        elif "plunge" in df["Details"][i][k]:
            df.at[i, "Neckline"] = "plunge_neckline"     


In [915]:
Missing_Desc =  pd.read_csv('../Data/missing_description.csv')
for i in range(Missing_Desc.shape[0]):
    num = df[df['Dress_Description'] == Missing_Desc["dress"][i]].index[0]
    df.at[num, "Neckline"] = Missing_Desc["neck"][i]
    df.at[num, "Sleeves"] = Missing_Desc["sleeves"][i]

In [916]:
df.columns

Index(['Type_of_Customer', 'Size', 'Overall_fit', 'Rented_for',
       'Size_usually_worn', 'Height', 'Age', 'Bust_size', 'Body_type',
       'Weight', 'Rating', 'Date', 'Brand', 'Dress_Description',
       'Retail_price', 'Rent_price', 'Product_details', 'Number_of_reviews',
       'Band_Size', 'Cup_Size', 'BMI', 'Fabric', 'cotton', 'nylon',
       'cellulose', 'silk', 'spandex', 'tencel', 'rayon', 'linen', 'cashmere',
       'wool', 'leather', 'polyester', 'metallic_fibers', 'Length', 'Details',
       'Sleeves', 'Neckline'],
      dtype='object')

In [917]:
for i in range(df.shape[0]):
    if isinstance(df["Length"][i],int):
        if df["Length"][i] <=35: 
            df.at[i,"Length"] = "mini" 
        elif df["Length"][i] > 35 and df["Length"][i] <=45:
            df.at[i,"Length"] = "midi" 
        elif df["Length"][i]>45:
            df.at[i,"Length"] = "maxi"

In [918]:
missing_dress_lengths_df =  pd.read_csv('../Data/missing_dress_lengths_df.csv')
for i in range(missing_dress_lengths_df.shape[0]):
    num = df[df['Dress_Description'] == missing_dress_lengths_df["dresses"][i]].index[0]
    df.at[num, "Length"] = missing_dress_lengths_df["lengths"][i]

In [919]:
for fabric in fabrics_columns:
    df[fabric] = df[fabric].astype('int')

In [None]:
df[fabric] = df[fabric].astype('int')

In [921]:
df.columns

Index(['Type_of_Customer', 'Size', 'Overall_fit', 'Rented_for',
       'Size_usually_worn', 'Height', 'Age', 'Bust_size', 'Body_type',
       'Weight', 'Rating', 'Date', 'Brand', 'Dress_Description',
       'Retail_price', 'Rent_price', 'Product_details', 'Number_of_reviews',
       'Band_Size', 'Cup_Size', 'BMI', 'Fabric', 'cotton', 'nylon',
       'cellulose', 'silk', 'spandex', 'tencel', 'rayon', 'linen', 'cashmere',
       'wool', 'leather', 'polyester', 'metallic_fibers', 'Length', 'Details',
       'Sleeves', 'Neckline'],
      dtype='object')

In [922]:
df.drop(columns=['Type_of_Customer', 'Size', 'Overall_fit', 'Rented_for','Size_usually_worn', 'Height', 'Age','Bust_size', 'Body_type', 'Product_details','Weight', 'Rating', 'Date','Band_Size', 'Cup_Size', 'BMI', 'Fabric',"Details"],inplace=True)

In [923]:
df.head()

Unnamed: 0,Brand,Dress_Description,Retail_price,Rent_price,Number_of_reviews,cotton,nylon,cellulose,silk,spandex,tencel,rayon,linen,cashmere,wool,leather,polyester,metallic_fibers,Length,Sleeves,Neckline
0,tory burch,painted roses smocked dress,478,70,33,69,27,0,0,4,0,0,0,0,0,0,0,0,midi,sleeveless,square_neckline
1,tanya taylor,printed claudia dress,495,68,97,98,0,0,0,0,0,0,0,0,0,0,2,0,maxi,sleeveless,scoop_neckline
2,warm,wax poetic garden dress,350,60,83,100,0,0,0,0,0,0,0,0,0,0,0,0,mini,long_sleeves,crew_neckline
3,farm rio,macaw mix maxi,250,36,142,0,0,0,0,0,0,100,0,0,0,0,0,0,maxi,short_sleeves,v_neckline
4,peter som collective,chambray midi dress,395,33,59,55,0,0,0,0,0,45,0,0,0,0,0,0,maxi,short_sleeves,shirt_collar_neckline


In [924]:
#Export as an excel file 
df.to_csv('../Data/df_clean_dresses.csv', header=True, index=False)

### Summary of this notebook: 

#### In the next notebook, EDA, some visualizations are presented to get a better idea of the feature distributions. 