# How to choose gifts from Sephora if you know nothing about makeup?
Girls love makeup products, and they also like surprises. So, naturally, sometimes they want their boyfriends (or guy friends in general) to give them makeup products as surprise gifts. Unfortunately, guys usually have very little sense about makeup, or not at all. Thus, choosing makeup products can be a great challenge for guys. I have a girlfriend who also loves makeup product, and I am one of the guys who knows nothing about it. So, to save my future self some "trouble" when the time comes, I developed a method to help me automatically choose the best makeup products, within a given budget.

# Overview
In this project, I scrapped the rating, number of reviews, and the price of each new coming makeup product on the Sephora website (her favorite brand). I then filtered out the items with low ratings and low numbers of reviews, or with prices higher than my budget. After that, I made a table containing combinations of two makeup products that have total prices below my budget. Finally, my decision is the combination that is closest to my budget. 

In [2]:
import time
import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import pandas as pd

In [8]:
# Web scrapper for infinite scrolling page 
driver = webdriver.Chrome(executable_path=r"E:\Chromedriver\chromedriver_win32\chromedriver.exe")
# Makeup
driver.get("https://www.sephora.com/ca/en/beauty/new-makeup")
time.sleep(5)
scroll_pause_time = 2
screen_height = driver.execute_script("return window.screen.height;")   # get the screen height of the web
i = 1
while True:
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  # scroll one screen height each time
    i += 1
    time.sleep(scroll_pause_time)
    scroll_height = driver.execute_script("return document.body.scrollHeight;")  # update scroll height each time after scrolled
    if (screen_height) * i > scroll_height:
        break
urls = []
soup = BeautifulSoup(driver.page_source, "html.parser")
for parent in soup.find_all(class_="css-1s223mm"):
    a_tag = parent.find("a", class_="css-ix8km1")
    base = "https://www.sephora.com/"
    link = a_tag.attrs['href']
    url = urljoin(base, link)
    urls.append(url)

In [10]:
makeup = {'name':[], 'rating_stars':[], 'num_reviews':[], 'price_dollar':[]}
for url in urls:
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, "html.parser")    
    name = soup.find("span", class_="css-0").text
    rating = soup.find(class_ = "css-ychh9y").attrs['aria-label']
    reviews = soup.find("span", class_="css-2rg6q7").text
    price = soup.select(".css-slwsq8 span")[0].text
    # Store each element in dictionary
    makeup['name'].append(name)
    makeup['rating_stars'].append(rating)
    makeup['num_reviews'].append(reviews)
    makeup['price_dollar'].append(price)

In [11]:
makeup = pd.DataFrame(makeup)
makeup

Unnamed: 0,name,rating_stars,num_reviews,price_dollar
0,Hydrating Fruity Face Set,4.5 stars,21 reviews,$51.00
1,Lip Maestro Midi Set,No stars,0 reviews,$68.00
2,NightLite Bronzing Powder,4.5 stars,9 reviews,$45.00
3,Rose Petal-Soft Deep Hydration Lip Balm,4.5 stars,268 reviews,$32.00
4,Go Big or Go Home Mascara Set,4 stars,2 reviews,$32.00
...,...,...,...,...
130,PRO Brow Brush #20,4.5 stars,243 reviews,$24.00
131,PRO Lip Brush #85,4.5 stars,58 reviews,$21.00
132,PRO Concealer Brush #71,4.5 stars,85 reviews,$30.00
133,Amor Eterno Illuminacion Highlighter,3 stars,22 reviews,$51.00


My web scrapper helped me scrapped information for 135 items. Now I need to filter out the items with low ratings, low numbers of reviews, or prices higher than my budget ($100 before tax). 

In [13]:
makeup2 = makeup.copy()
makeup2['rating_stars'] = makeup2['rating_stars'].str.replace(' stars', '').str.replace(' star', '').str.replace('No', '0').astype(float)
makeup2['num_reviews'] = makeup2['num_reviews'].str.replace(r' reviews?', '').str.replace('K', '000').astype(float)
makeup2['price_dollar'] = makeup2['price_dollar'].str.replace('$', '').astype(float)

In [14]:
# Filter: rating_stars above 4, num_reviews above 20, price_dollar below 100
min_rating = 4
min_num_reviews = 20
budget = 100
makeup2 = makeup2.loc[(makeup2['rating_stars']>min_rating) & (makeup2['num_reviews']>min_num_reviews) & (makeup2['price_dollar']<budget)].reset_index(drop=True)
makeup2

Unnamed: 0,name,rating_stars,num_reviews,price_dollar
0,Hydrating Fruity Face Set,4.5,21.0,51.0
1,Rose Petal-Soft Deep Hydration Lip Balm,4.5,268.0,32.0
2,Veil™ Setting Spray,4.5,210.0,62.0
3,Naked In Neon Nail Polish set,5.0,25.0,25.0
4,Tatouage Couture Velvet Cream Liquid Lipstick,4.5,72.0,49.0
5,Lash & Go- Eyeliner and Lash Adhesive Hybrid,4.5,192.0,31.0
6,Melatonin Overnight Lip Mask,4.5,53.0,29.0
7,Ignite Liquified Light Face & Body Highlighter,4.5,363.0,50.0
8,Mini Bloom Blush,5.0,25.0,26.0
9,Superhero No-Tug Gel Eyeliner,4.5,148.0,29.0


Only 28 items satisfy my criteria. I know that my girlfriend does not need new brush, so I filter them out.

In [21]:
makeup2 = makeup2.loc[~makeup2['name'].str.contains("Brush")].reset_index(drop=True)
makeup2

Unnamed: 0,name,rating_stars,num_reviews,price_dollar
0,Hydrating Fruity Face Set,4.5,21.0,51.0
1,Rose Petal-Soft Deep Hydration Lip Balm,4.5,268.0,32.0
2,Veil™ Setting Spray,4.5,210.0,62.0
3,Naked In Neon Nail Polish set,5.0,25.0,25.0
4,Tatouage Couture Velvet Cream Liquid Lipstick,4.5,72.0,49.0
5,Lash & Go- Eyeliner and Lash Adhesive Hybrid,4.5,192.0,31.0
6,Melatonin Overnight Lip Mask,4.5,53.0,29.0
7,Ignite Liquified Light Face & Body Highlighter,4.5,363.0,50.0
8,Mini Bloom Blush,5.0,25.0,26.0
9,Superhero No-Tug Gel Eyeliner,4.5,148.0,29.0


Only 16 items left in my list. As can be seen from the table, the most expensive items are "Rust Eyeshadow Palette" and "Amor Eterno Eyeshadow Palette", both of which cost 76 dollars. However, there is still some room to reach my budget of 100 dollars. So I want to know if I can find combinations of two products that can make the most use of my budget. 

In [36]:
makeup_two_combinations = {'combination':[], 'total_price_dollar':[]}

for i in range(0,makeup2.shape[0]):
    total_price_dollar = makeup2.loc[i, 'price_dollar'] + makeup2.loc[(i+1):, 'price_dollar']
    combination = makeup2.loc[i, 'name'] + ' & ' + makeup2.loc[(i+1):, 'name']
    indices = range(i+1, len(total_price_dollar)+i)
    for k in indices:
        makeup_two_combinations['total_price_dollar'].append(total_price_dollar[k])
        makeup_two_combinations['combination'].append(combination[k])
makeup_two_combinations = pd.DataFrame(makeup_two_combinations)
makeup_two_combinations = makeup_two_combinations.loc[makeup_two_combinations['total_price_dollar']<budget].reset_index(drop=True)
makeup_two_combinations.sort_values('total_price_dollar', inplace=True, ascending=False)

In [38]:
for i in range(5):
    print("First combination: %s, the total price is %d" %(makeup_two_combinations.iloc[i, 0], makeup_two_combinations.iloc[i, 1]))

First combination: Lash & Go- Eyeliner and Lash Adhesive Hybrid  & The Liquid Silk Canvas: Featherweight Protective Primer, the total price is 99
First combination: Dewy Blush & Most Wanted Lip Color Lipstick Collection, the total price is 99
First combination: Mini Bloom Blush & Most Wanted Lip Color Lipstick Collection, the total price is 99
First combination: Tatouage Couture Velvet Cream Liquid Lipstick & Ignite Liquified Light Face & Body Highlighter, the total price is 99
First combination: Naked In Neon Nail Polish set & Most Wanted Lip Color Lipstick Collection, the total price is 98


Here it is! Now I don't need to worry about checking hundreds of items that I know nothing about, but to just check couples of items in my top list!