# Chapter 3: Web Scraping Techniques
Basic Web Scraping with Beautiful Soup: A recipe that guides through scraping data from a simple HTML website.

In the code below we will be using Beautiful Soup and requests to find chess game data from the chess player Judit Polgar, scrape it, and save it into a pandas dataframe.

** NOTE that before doing any scraping you should go to the site you would like to scrape and add /robots.txt to the url. This brings you to a page outlining what webscraping activity is permitted**

#### Import Necessary Libraries and Set Up Variables
In this step, we import the required libraries, set the base URL for 365chess.com, and define the player's name (Judit Polgar) to be used in the URL.

In [10]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Set the base URL and player name
base_url = "https://www.365chess.com/chess-games.php"
player_name = 'Polgar+Judit'
url = 'https://www.365chess.com/chess-games.php?wid=&bid=&wlname={}&openn=&blname=&eco=&nocolor=on&yeari=&yeare=&sply=1&ply=&res=&submit_search=1'.format(player_name)

#### Send a Request and Parse HTML Content
Here, we send a GET request to the URL and parse the HTML content using BeautifulSoup.

In [11]:
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')


#### Find the Game Data Table
We locate the table containing game information on the webpage. Soup's .find method allows us to search for div tags with specific ids. Looking through the inspect code on the site we find the table for each game in the tbody tag in the div tage with id 'result1.

In [12]:
result = soup.find('div', id='result1')
table_body = result.find('tbody')

#### Extract Game Information
In this step, we iterate through each game's row (<tr>) and extract specific details, such as player names, Elo ratings, results, moves, and more.

In [13]:
games = table_body.find_all('tr')
games_data = []
for game in games:
    details = game.find_all('td')
    game_info = {
        'white': details[0].text.strip(),
        'white_elo': details[1].text.strip(),
        'black': details[2].text.strip(),
        'black_elo': details[3].text.strip(),
        'result': details[4].text.strip(),
        'moves': details[5].text.strip(),
        'eco': details[6].text.strip(),
        'year': details[7].text.strip(),
        'tournament': details[8].text.strip(),
    }
    games_data.append(game_info)


#### Create a DataFrame and Display Data
Finally, we convert the extracted game data into a Pandas DataFrame for easy analysis and display it.

In [14]:
df = pd.DataFrame(games_data)
df

Unnamed: 0,white,white_elo,black,black_elo,result,moves,eco,year,tournament
0,"Polgar, J",2693,"Goryachkina, A",2424,1-0,48,B12,2014,15th ch-EUR Indiv 2014
1,"Gabrielian, A",2560,"Polgar, J",2693,½-½,71,B33,2014,15th ch-EUR Indiv 2014
2,"Polgar, J",2693,"Jianu, V",2574,1-0,46,C07,2014,15th ch-EUR Indiv 2014
3,"Tregubov, P",2614,"Polgar, J",2693,½-½,59,E17,2014,15th ch-EUR Indiv 2014
4,"Polgar, J",2693,"Pashikian, A",2612,½-½,48,C65,2014,15th ch-EUR Indiv 2014
5,"Schlosser, P",2612,"Polgar, J",2693,½-½,60,A33,2014,15th ch-EUR Indiv 2014
6,"Polgar, J",2693,"Hovhannisyan, R",2611,½-½,41,C67,2014,15th ch-EUR Indiv 2014
7,"Dvirnyy, D",2575,"Polgar, J",2693,0-1,50,A36,2014,15th ch-EUR Indiv 2014
8,"Polgar, J",2693,"Grigoriants, S",2574,0-1,36,C84,2014,15th ch-EUR Indiv 2014
9,"Ter Sahakyan, S",2572,"Polgar, J",2693,1-0,52,B48,2014,15th ch-EUR Indiv 2014
