# Follow-Along Activity



## Step 0: Check the `robots.txt` File

Let’s check the robots.txt file of the demo website:


https://drmichaelguo.github.io/web-scraping-demo-PFAAFAMMA/robots.txt

In [1]:
# Step 1
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time

url = "https://drmichaelguo.github.io/web-scraping-demo-PFAAFAMMA/"
headers = {'User-Agent': 'Mozilla/5.0 (compatible; WebScrapingTutorial/1.0)'}  # identify scraper
response = requests.get(url, headers=headers)

# Handle request errors
if response.status_code != 200:
    print(f"Request failed with status code: {response.status_code}")
    exit()

# Rate limiting
time.sleep(1)  # pause before next request

html_content = response.text


In [2]:
# Step 2: Parse HTML
soup = BeautifulSoup(html_content, "html.parser")
rows = soup.find_all("tr")


In [3]:
# Step 3: Create function to parse numbers
def parse_number(text):
    text = text.strip().replace(",", "")
    if "(" in text and ")" in text:
        return -int(text.replace("(", "").replace(")", ""))
    try:
        return float(text) if "." in text else int(text)
    except ValueError:
        return None


In [4]:
# Step 4: Extract data
data = []
for row in rows[1:]:
    cells = row.find_all("td")
    if len(cells) == 7:
        data.append({
            "Year": int(cells[0].text.strip()),
            "Revenue (£m)": parse_number(cells[1].text),
            "Cost of Sales (£m)": parse_number(cells[2].text),
            "Gross Profit (£m)": parse_number(cells[3].text),
            "Operating Profit (£m)": parse_number(cells[4].text),
            "Net Profit (£m)": parse_number(cells[5].text),
            "EPS (p)": parse_number(cells[6].text),
        })


In [5]:
# Step 5: Convert to DataFrame
df = pd.DataFrame(data)
print(df)

   Year  Revenue (£m)  Cost of Sales (£m)  Gross Profit (£m)  \
0  2021          1245                -757                488   
1  2022          1398                -832                566   
2  2023          1561                -903                658   

   Operating Profit (£m)  Net Profit (£m)  EPS (p)  
0                    312              215     35.2  
1                    358              249     40.7  
2                    421              297     48.3  


In [6]:
print(f"Scraped {len(data)} records from {url} on {pd.Timestamp.now()}")


Scraped 3 records from https://drmichaelguo.github.io/web-scraping-demo-PFAAFAMMA/ on 2025-08-03 09:50:25.180621
