# Coffee Availability By Roaster - Web Scraping Project
## Using BeautifulSoup to gather information about coffees for sale from select roasters

The goal of this project is to create a DataFrame with information compiled from selected Coffee Roaster websites in the U.S., with the goal of being able to compare and contrast the available options. 

Once the following info is in a pandas DataFrame, it will be possible to search and filter coffees based on preferred origin, price, etc. 

 - Information I hope to include:
     + Roaster Name
     + Coffee Name
     + Country/Countries of Origin
     + Description
     + Tasting Notes
     + Variety
     + Process
     + Size Options
     + Avg Price Per Pound or Ounce

In [183]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv

In [184]:
onyx_url = 'https://onyxcoffeelab.com/collections/coffee'
equator_url = 'https://www.equatorcoffees.com/collections/coffees'
ruby_url = 'https://rubycoffeeroasters.com/collections/coffee'

In [None]:
def gather_coffee_links(url):
    r= requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    coffee_links = soup.find_all('a')
    
    links = []
    for link in coffee_links:
        links.append(url+link["href"])

    coffee_link_list = []
    for link in links:
        if '/product' in link:
            coffee_link_list.append(link)

    return coffee_link_list

In [186]:
def get_coffee_attrs(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    coffee_data = {}
    try:
        name = soup.find('h1').text.strip()
    except:
        name = None
    try:
        coffee_description = soup.find('div', {'class':'main-blurb'}).getText().strip()
    except:
        coffee_description = None
    for div in soup.find_all('div', {'class':'a-feature'}):
        try:
            label = div.find('div', {'class':'label'})
            label_value = div.find('div', {'class':'value'})
            coffee_data[label.text]=label_value.text
        except:
            label = None
            label_value = None
    coffee_data.update({'Name':name, 'Description':coffee_description})
    return coffee_data

In [187]:
coffee_data = []
for link in onyx_links:
    coffee_data.append(get_coffee_attrs(link))

In [188]:
df_coffee_info = pd.DataFrame(coffee_data)
df_coffee_info.head(25)

Unnamed: 0,Name,Description,Origin:,Process:,Elevation:,Cup:,Variety:
0,404,,,,,,
1,Finest coffee in the worldevery month for the ...,,,,,,
2,,,,,,,
3,Southern Weather,Southern Weather embodies everything we love a...,"Colombia, Ethiopia",Washed,1850,"Milk Chocolate, Plum, Candied Walnuts, Juicy &...",
4,Geometry,"Geometry has been defined as ""describing space...","Colombia, Ethiopia",Washed,1950 - 2100,"Berries, Stone Fruit, Earl Grey, Honeysuckle, ...",
5,Monarch,Monarch is our most developed roast that conve...,,"Washed, Natural",1800,"Dark Chocolate, Molasses, Red Wine, Dried Berr...","Colombia, Ethiopia"
6,Tropical Weather,Tropical Weather is a seasonal blend that cele...,Ethiopia,"Natural, Washed",1900,"Mixed Berries, Sweet Tea, Raw Honey, Plum",
7,Power Nap,"OK, so you need a quick burst of energy, but y...",,"Washed, Raised-Bed Dreid",1950 - 2000,"Brown Sugar, Cocoa, Silky, Floral, Peach","Colombia, Ethiopia"
8,Cold Brew,This coffee is intentionally sourced and roast...,"Colombia, Ethiopia","Washed, Patio Dried",1850,"Cocoa, Dates, Brown Sugar, Stone Fruit, Creamy",
9,Silverstein,Prepare to embark on a sensory journey that ha...,,"Honey, Patio Dried",1450,"Apple Cider, Cherry, Cacao Nib, Hibiscus","Catuai, Caturra"


In [189]:
df_coffee_info = df_coffee_info.drop([0,1,2], axis=0).reset_index(drop = True)
df_coffee_info

Unnamed: 0,Name,Description,Origin:,Process:,Elevation:,Cup:,Variety:
0,Southern Weather,Southern Weather embodies everything we love a...,"Colombia, Ethiopia",Washed,1850,"Milk Chocolate, Plum, Candied Walnuts, Juicy &...",
1,Geometry,"Geometry has been defined as ""describing space...","Colombia, Ethiopia",Washed,1950 - 2100,"Berries, Stone Fruit, Earl Grey, Honeysuckle, ...",
2,Monarch,Monarch is our most developed roast that conve...,,"Washed, Natural",1800,"Dark Chocolate, Molasses, Red Wine, Dried Berr...","Colombia, Ethiopia"
3,Tropical Weather,Tropical Weather is a seasonal blend that cele...,Ethiopia,"Natural, Washed",1900,"Mixed Berries, Sweet Tea, Raw Honey, Plum",
4,Power Nap,"OK, so you need a quick burst of energy, but y...",,"Washed, Raised-Bed Dreid",1950 - 2000,"Brown Sugar, Cocoa, Silky, Floral, Peach","Colombia, Ethiopia"
5,Cold Brew,This coffee is intentionally sourced and roast...,"Colombia, Ethiopia","Washed, Patio Dried",1850,"Cocoa, Dates, Brown Sugar, Stone Fruit, Creamy",
6,Silverstein,Prepare to embark on a sensory journey that ha...,,"Honey, Patio Dried",1450,"Apple Cider, Cherry, Cacao Nib, Hibiscus","Catuai, Caturra"
7,Colombia El Vergel Java Koji,"This is the coffee that has sparked debates, n...",,"Koji Inoculated Natural, Raised Bed Dried",1550,"Raspberry, Watermelon Candy, Winey, Mango",Java
8,Colombia Wilder Lasso Citric Gesha,This silky and refined Gesha gets its citric p...,,"Washed, Raised-Bed Dried",1900 MASL,"Lemon, Black Tea, Orange Blossom, Honey",Gesha
9,Burundi Long Miles Gaharo Natural,This natural processed coffee comes to us from...,,"Natural, Raised-Bed Dried",1950,"Dried Cherry, Milk Chocolate, Nectarine, Black...",Red Bourbon


In [190]:
df_coffee_info.insert(0, 'Roaster', 'Onyx')
df_coffee_info

Unnamed: 0,Roaster,Name,Description,Origin:,Process:,Elevation:,Cup:,Variety:
0,Onyx,Southern Weather,Southern Weather embodies everything we love a...,"Colombia, Ethiopia",Washed,1850,"Milk Chocolate, Plum, Candied Walnuts, Juicy &...",
1,Onyx,Geometry,"Geometry has been defined as ""describing space...","Colombia, Ethiopia",Washed,1950 - 2100,"Berries, Stone Fruit, Earl Grey, Honeysuckle, ...",
2,Onyx,Monarch,Monarch is our most developed roast that conve...,,"Washed, Natural",1800,"Dark Chocolate, Molasses, Red Wine, Dried Berr...","Colombia, Ethiopia"
3,Onyx,Tropical Weather,Tropical Weather is a seasonal blend that cele...,Ethiopia,"Natural, Washed",1900,"Mixed Berries, Sweet Tea, Raw Honey, Plum",
4,Onyx,Power Nap,"OK, so you need a quick burst of energy, but y...",,"Washed, Raised-Bed Dreid",1950 - 2000,"Brown Sugar, Cocoa, Silky, Floral, Peach","Colombia, Ethiopia"
5,Onyx,Cold Brew,This coffee is intentionally sourced and roast...,"Colombia, Ethiopia","Washed, Patio Dried",1850,"Cocoa, Dates, Brown Sugar, Stone Fruit, Creamy",
6,Onyx,Silverstein,Prepare to embark on a sensory journey that ha...,,"Honey, Patio Dried",1450,"Apple Cider, Cherry, Cacao Nib, Hibiscus","Catuai, Caturra"
7,Onyx,Colombia El Vergel Java Koji,"This is the coffee that has sparked debates, n...",,"Koji Inoculated Natural, Raised Bed Dried",1550,"Raspberry, Watermelon Candy, Winey, Mango",Java
8,Onyx,Colombia Wilder Lasso Citric Gesha,This silky and refined Gesha gets its citric p...,,"Washed, Raised-Bed Dried",1900 MASL,"Lemon, Black Tea, Orange Blossom, Honey",Gesha
9,Onyx,Burundi Long Miles Gaharo Natural,This natural processed coffee comes to us from...,,"Natural, Raised-Bed Dried",1950,"Dried Cherry, Milk Chocolate, Nectarine, Black...",Red Bourbon
