## Web Scraping with Beautiful Soup

Simple example of how a web scraper can be used to interact with webpage.
Mixture of articles used within this POC.
- R vs Python example [file](https://www.dataquest.io/blog/python-vs-r/#:~:text=Python%20is%20more%20object%2Doriented%2C%20and%20R%20is%20more%20functional,approaches%20can%20work%20very%20well)
- Webscraping examples [file](https://www.jcchouinard.com/web-scraping-with-beautifulsoup-in-python/#:~:text=To%20parse%20a%20web%20page,downloading%20from%20the%20web%20browser.)

In [None]:
# Making a HTTP request
import requests

In [None]:
# URL to retrieve data
url = "https://www.basketball-reference.com/boxscores/201506140GSW.html"

In [None]:
response = requests.get(url)

In [None]:
# Confirm status of request and print sample data
print(f'Status code: {response.status_code}')
print(f'Text: {response.text[:100]}')

In [None]:
# Parsing response with BeautifulSoup
from bs4 import BeautifulSoup

In [None]:
# Parse HTML
soup = BeautifulSoup(response.text, 'html.parser')

# Extract HTML tag - title
soup.find('title')

In [None]:
# How to extract HTML tags
# find elements by Tag name
title = soup.find('title')
h1 = soup.find('h1')
links = soup.find_all('a', href=True)

# Print outputs
print(f'Title: {title}')
print(f'h1: {h1}')
print(f'Example link: {links[1]["href"]}')

In [None]:
# Length Python list
len(links)

In [None]:
# First N sample
links[:10]

In [None]:
# Find elements by ID
soup.find(id='header')

In [None]:
# List all of the HTML tags from page
set(tag.name for tag in soup.findAll())

In [None]:
# Select data from tables in webpage
tables = soup.find_all('table')
len(tables)

In [None]:
# Create a dataframe from tables
import pandas as pd

In [None]:
tables[1]

In [None]:
soup.select?

In [None]:
table = (
    soup
    .find(lambda tag: tag.name == 'table' and tag.has_attr('id') and tag['id']=="box-CLE-q1-basic")
    .findAll(lambda tag: tag.name == 'tr')
)

In [None]:
table