### Colour Palettes
In this notebook we are going to scrap colour pallets from the [`colorhunt`](https://colorhunt.co). 

First we are going to start by installing `selenium` in the following code cell.

In [1]:
pip install selenium -q

Note: you may need to restart the kernel to use updated packages.


In the following code cell we are going to import all the packages that we are going to use in this notebook.

In [143]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from bs4 import BeautifulSoup
import requests


In the following code cell we are going do initialize the `webdriver`.

In [9]:
driver = webdriver.Chrome()
driver

<selenium.webdriver.chrome.webdriver.WebDriver (session="242af63ac6bed336db9ffd892da2a449")>

We are going to get all the colour `palettes` fom the website so that for each category we are going to get individual colour palette.

In [31]:
driver.get('https://colorhunt.co/palettes/pastel')
soup = BeautifulSoup(driver.page_source, 'html.parser')
tags = soup.find('div', {'class': 'tags'}).find_all("a", {"class": "tab"})
palettes = [tag.text for tag in tags]
palettes[:3]

['Pastel', 'Vintage', 'Retro']

The following function will get the `rgb` and `hex` for each `palette`.

In [125]:
def get_palettes(soup):
    feed = soup.find("div", {"class": "feed global"})
    items = [i for i in feed.find_all("div", {"class": "item"}) if i.has_attr("data-code")]
    pts = [] 
    for item in items:
        p = item.find("div", {"class": "palette"})
        palette = []
        for div in p.find_all("div"):
            rgb = tuple(
                int(i) for i in div["style"].replace(";", '').split(":")[-1].replace("rgb", '').replace(')', '').replace('(', '').split(',')
            )
            hex = div.find("span").text
            palette.append({"rgb": rgb, "hex": hex})
        pts.append({
            "id": item["data-code"], 
            "index": int(item["data-index"]),
            "colors": palette
        })
    return pts

The following function will get a unique lists of colour `palettes` that belongs to a specific class. 

In [197]:
def scrap_palettes(palette, pages=2):
    data = []
    driver.get(f'https://colorhunt.co/palettes/{palette}')
    WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CLASS_NAME, "palette"))
    )
    for page in range(pages):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(10)
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        plts = get_palettes(soup)
        data.extend(plts)
        print(f"Got {len(plts)} for page {page+1}/{pages}...")
    unique = {v['id']: v for v in data}.values()
    return list(unique)

In the following code cell we are then going to get all the colours and store them in a python list.

In [199]:
data = []
for pe in palettes:
    print(f"Scrapping '{pe}' palettes...")
    pe = pe.lower()
    plts = scrap_palettes(pe, 10)
    data.append({pe: plts})
    print("\n\n")

Scrapping 'Pastel' palettes...
Got 80 for page 1/10...
Got 120 for page 2/10...
Got 160 for page 3/10...
Got 200 for page 4/10...
Got 240 for page 5/10...
Got 280 for page 6/10...
Got 320 for page 7/10...
Got 360 for page 8/10...
Got 400 for page 9/10...
Got 440 for page 10/10...



Scrapping 'Vintage' palettes...
Got 80 for page 1/10...
Got 120 for page 2/10...
Got 160 for page 3/10...
Got 200 for page 4/10...
Got 240 for page 5/10...
Got 280 for page 6/10...
Got 320 for page 7/10...
Got 360 for page 8/10...
Got 400 for page 9/10...
Got 440 for page 10/10...



Scrapping 'Retro' palettes...
Got 80 for page 1/10...
Got 120 for page 2/10...
Got 160 for page 3/10...
Got 200 for page 4/10...
Got 240 for page 5/10...
Got 280 for page 6/10...
Got 320 for page 7/10...
Got 360 for page 8/10...
Got 400 for page 9/10...
Got 412 for page 10/10...



Scrapping 'Neon' palettes...
Got 80 for page 1/10...
Got 120 for page 2/10...
Got 160 for page 3/10...
Got 200 for page 4/10...
Got 240 for page 5/1

In the following code cell we are going to save the scrapped colors in a `json` file.

In [201]:
with open('colorhunt.json', 'w') as w:
  w.write(json.dumps(data, indent=2))

print("Done!")

Done!
