# Website Scraping
### [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) vs [APIs](https://www.ibm.com/cloud/learn/api)
___
Website scraping is used for extracting data from websites, however it is still being done manually too often. Automating website scraping can increase work rate/flow speed and convenience, and lower costs and human error. This can be done in a handful of ways - two of the most common methods being; the BeautifulSoup python module and the other being APIs. The benefits of website scraping.

Below I have set up two seperate environments, one for BeautifulSoup and another for the CoinGecko API, which will scrape the price in USD of Bitcoin, Ethereum and Dogecoin from the CoinGecko website.

###### Scraping CoinGecko with BeautifulSoup

In [14]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime

In [15]:
page = requests.get('https://www.coingecko.com/en') # query and get url
soup = BeautifulSoup(page.content, 'html.parser') # parse html and store in 'soup'
now = datetime.datetime.now()

In [16]:
# BTC

BTC_title_box = soup.find('a', attrs={'class': 'd-lg-none font-bold', 'href': '/en/coins/bitcoin'}) # gets the <div> of 'd-lg-none font-bold' and its value
BTC_title = BTC_title_box.text.strip() # get the data

BTC_price_box = soup.find('span', attrs={'class': 'no-wrap', 'data-price-btc': '1.0'})
BTC_price = BTC_price_box.text.strip() # .strip() removes leading and trailing characters

In [17]:
# ETH

ETH_title_box = soup.find('a', attrs={'class': 'd-lg-none font-bold', 'href': '/en/coins/ethereum'})
ETH_title = ETH_title_box.text.strip()

ETH_price_box = soup.find('span', attrs={'class': 'no-wrap', 'data-coin-symbol': 'eth'})
ETH_price = ETH_price_box.text.strip()

In [18]:
# Doge

DOGE_title_box = soup.find('a', attrs={'class': 'd-lg-none font-bold', 'href': '/en/coins/dogecoin'})
DOGE_title = DOGE_title_box.text.strip()

DOGE_price_box = soup.find('span', attrs={'class': 'no-wrap', 'data-coin-symbol': 'doge'})
DOGE_price = DOGE_price_box.text.strip()

In [32]:
# --- Using Pandas for data visualisation

dataset = {
    'Coin': [BTC_title, ETH_title, DOGE_title],
    '     Price (USD)': [BTC_price, ETH_price, DOGE_price]
}

woop = pd.DataFrame(dataset)

print("Here are the prices at " + str(now.strftime("%H:%M:%S on %A, %B the %dth, %Y")) + ":")
print("")
print(woop)

Here are the prices at 12:23:55 on Friday, November the 05th, 2021:

   Coin      Price (USD)
0   BTC          $61,457
1   ETH        $4,510.19
2  DOGE        $0.260496


BeautifulSoup can take some time to get used to, especially if you aren't used to HTML and CSS since you have to navigate and find specific tags to get the data you need. Even if you get used it, a lot of time is spent going back to the source code of the chosen
website which is annoying and time consuming. There are also a few steps needed before you 
can start actually writing the code for what you want; getting the URL, parsing the URL, 
storing the URL (for my code). The code isn't very pretty either. That being said, once you get the hang of it and know what to look for, beautifulsoup is a fairly simple module to use.

###### Scraping CoinGecko with the CoinGecko API

In [7]:
import requests
import pandas as pd
import datetime
from pycoingecko import CoinGeckoAPI # importing the CoinGecko API
cg = CoinGeckoAPI()
now = datetime.datetime.now()

In [8]:
# BTC

BTC_USD = cg.get_price('bitcoin', 'usd')
BTC = str(BTC_USD)
BTC_price = BTC.strip("bitcoin': {'usd': }}")

In [9]:
# ETH

ETH_USD = cg.get_price('ethereum', 'usd')
ETH = str(ETH_USD)
ETH_price = ETH.strip("{'ethereum': {'usd': }}")

In [11]:
# DOGE

DOGE_USD = cg.get_price('dogecoin', 'usd')
DOGE = str(DOGE_USD)
DOGE_price = DOGE.strip("{'dogecoin': {'usd': }}")

In [31]:
# Data visualisation

dataset = {
    'Coin': ['BTC', 'ETH', 'DOGE'],
    '     Price (USD)': [BTC_price, ETH_price, DOGE_price]
}

woopwoop = pd.DataFrame(dataset)

print("Here are the prices at " + str(now.strftime("%H:%M:%S on %A, %B the %dth, %Y") + ":"))
print("")
print(woopwoop)

Here are the prices at 12:23:55 on Friday, November the 05th, 2021:

   Coin      Price (USD)
0   BTC          $61,457
1   ETH        $4,510.19
2  DOGE        $0.260496


For this code, I used the CoinGecko API. As APIs are created according to each specific 
provider, the functions, utilities, limitations, etc can vary significantly. For example,
a limitation of the CoinGecko API was how annoying and time consuming it was to isolate 
the price of coins (extracting BTC_price from BTC_USD with `str()` and `.strip()`). An
advantage of using the CoinGecko API was that I could get all the data of a single coin
I needed in one fairly short line of code `cg.get_price()`. Also, there were no steps
required before I could start actually writing useful code - I personally chose to add one
extra step `cg = CoinGeckoAPI()` because it made helped shorten later lines of code. Note
that not all website will have public APIs, which means you would have to default to using
beautifulsoup if so.