# Solution to Exercise 08 - Web Scraping

In today's exercise we're using the Python libraries *BeautifulSoup* and *owlready2* to create an ontology from data scraped from the Web.
[BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) is a library for extracting data from HTML or XML files by accessing concrete elements in the tree structure.
The ontology (& the web scraping) is foused on extracting data about Pokémon from a wiki-like website called *[Bulbapedia](https://bulbapedia.bulbagarden.net/wiki/Main_Page)*.

## Setup

In [None]:
!pip install bs4

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2



[notice] A new release of pip is available: 24.3.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.3.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
# Next we import the necessary libraries
import requests
import time
import json
import re
import csv

from bs4 import BeautifulSoup

In [None]:
import requests
from bs4 import BeautifulSoup
import re
import time
import csv

def get_top_100_urls():
    list_url = "https://store.steampowered.com/search/?filter=mostplayed&count=100"
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}
    res = requests.get(list_url, headers=headers)
    soup = BeautifulSoup(res.text, 'html.parser')
    
    links = []
    rows = soup.find_all('a', class_='search_result_row')
    for row in rows[:100]:
        links.append(row['href'])
    return links

def scrape_game_data(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.9"
    }
    # Included age-gate bypass cookies
    cookies = {'birthtime': '283996801', 'lastagecheckage': '1-0-1991', 'wants_mature_content': '1'}
    
    try:
        page_res = requests.get(url, headers=headers, cookies=cookies, timeout=10)
        gs = BeautifulSoup(page_res.text, 'html.parser')

        # --- Name ---
        name_tag = gs.find('div', id='appHubAppName') or gs.find('div', class_='apphub_AppName')
        NAME = name_tag.get_text(strip=True) if name_tag else "N/A"
        
        # --- Price ---
        price_section = gs.find('div', class_='game_purchase_price') or gs.find('div', class_='discount_final_price')
        PRICE = price_section.get_text(strip=True) if price_section else "N/A"

        # --- PEGI ---
        pegi_div = gs.find('div', class_='game_rating_icon')
        PEGI = pegi_div.find('img').get('alt', 'N/A') if pegi_div and pegi_div.find('img') else "N/A"

        # --- UPDATED MODE LOGIC (game_area_features_list_ctn) ---
        mode_keywords = [
            "Single-player", "Multi-player", "Online Co-op", "LAN Co-op", 
            "Shared/Split Screen Co-op", "MMO", "Online PvP", "LAN PvP",
            "Cross-Platform Multiplayer", "Co-op", "PvP"
        ]
        
        found_modes = []
        # Target the specific container you identified
        feature_container = gs.find('div', class_='game_area_features_list_ctn')
        
        if feature_container:
            # Find all links within that container
            feature_links = feature_container.find_all('a')
            for link in feature_links:
                # Get the label text (e.g., "Multi-player")
                label_text = link.find(class_='label').get_text(strip=True) if link.find(class_='label') else link.get_text(strip=True)
                
                if any(kw.lower() == label_text.lower() or kw.lower() in label_text.lower() for kw in mode_keywords):
                    found_modes.append(label_text)

        # Remove duplicates and join
        MODE = ", ".join(sorted(list(set(found_modes)))) if found_modes else "Single-player"

        # --- Hardware Helper ---
        def get_spec(label):
            tag = gs.find('strong', string=re.compile(label, re.IGNORECASE))
            if tag:
                text = tag.next_sibling
                if text and isinstance(text, str):
                    return text.strip().strip(':').strip()
                return tag.parent.get_text().replace(tag.get_text(), "").strip().strip(':').strip()
            return "N/A"

        OS = get_spec("OS:")
        CPU = get_spec("Processor:")
        RAM = get_spec("Memory:")
        GPU = get_spec("Graphics:")
        STORAGE = get_spec("Storage:")

        return {
            "NAME": NAME, "PRICE": PRICE, "PEGI": PEGI, "MODE": MODE,
            "OS": OS, "CPU": CPU, "GPU": GPU, "RAM": RAM, "STORAGE": STORAGE
        }
    except Exception as e:
        return None

# --- EXECUTION ---
top_links = get_top_100_urls()
all_games_data = []

print(f"Found {len(top_links)} games. Starting deep scrape...")

for i, link in enumerate(top_links):
    data = scrape_game_data(link)
    if data:
        all_games_data.append(data)
        print(f"[{i+1}/100] {data['NAME'][:25]} | Modes: {data['MODE']}")
    time.sleep(0.7)

# --- CSV SAVE ---
if all_games_data:
    keys = ["NAME", "PRICE", "PEGI", "MODE", "OS", "CPU", "GPU", "RAM", "STORAGE"]
    with open('steam_top_100.csv', 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=keys, extrasaction='ignore')
        writer.writeheader()
        writer.writerows(all_games_data)
    print("\nExtraction complete. Check 'steam_top_100.csv'.")

Found 100 games. Starting deep scrape...
[1/100] Counter-Strike 2 | Modes: Cross-Platform Multiplayer
[2/100] ARC Raiders | Modes: Cross-Platform Multiplayer, Online Co-op, Online PvP
[3/100] Where Winds Meet | Modes: Cross-Platform Multiplayer, Online Co-op, Online PvP, Single-player
[4/100] Euro Truck Simulator 2 | Modes: Online Co-op, Single-player
[5/100] StarRupture | Modes: Online Co-op, Single-player
[6/100] Warframe | Modes: Cross-Platform Multiplayer, Online Co-op, Single-player
[7/100] Grand Theft Auto V Enhanc | Modes: Online Co-op, Online PvP, Single-player
[8/100] War Thunder | Modes: Cross-Platform Multiplayer, MMO, Online Co-op, Online PvP, Single-player
[9/100] ARK: Survival Ascended | Modes: Cross-Platform Multiplayer, LAN Co-op, LAN PvP, MMO, Online Co-op, Online PvP, Shared/Split Screen Co-op, Single-player
[10/100] Path of Exile 2 | Modes: MMO, Online Co-op, Single-player
[11/100] Dead by Daylight | Modes: Cross-Platform Multiplayer, Online Co-op, Online PvP
[12/100