# Data Collection: Web Scraping Baldur's Gate 3 Guides

## <ins>Objective</ins>
- The goal of this notebook is to scrape the main content and associated meta tags (e.g., title, description, keywords) of guide pages from various websites for Baldur's Gate 3. 
- This enriched dataset will serve as the foundation for understanding how each website optimizes its content for search engines and user engagement.

## <ins>Workflow</ins>
1. Define the selected questions and their corresponding top-ranked URLs.
2. Verify compliance with each site's `robots.txt` directives.
3. Import necessary libraries and configure the Selenium WebDriver.
4. Scrape the content and meta tags for each URL using Selenium.
5. Save the scraped data in `.pkl` and `.csv` formats for use in subsequent notebooks.

## <ins>Selected Questions and Top Links</ins>
- **Question 1**: How to multiclass
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/How_to_Multiclass)
  - [The Gamer](https://www.thegamer.com/baldurs-gate-3-bg3-multiclass-explained-how-to/)
  - [Polygon](https://www.polygon.com/baldurs-gate-3-guides/24011302/multiclass-how-to-best-builds-bg3)
  - [Rock Paper Shotgun](https://www.rockpapershotgun.com/baldurs-gate-3-multiclass)

- **Question 2**: How to install mods
  - [BG3 Wiki](https://bg3.wiki/wiki/Modding:Installing_mods)
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/How_to_Install_Mods)
  - [The Gamer](https://www.thegamer.com/baldurs-gate-3-install-mods-guide/)
  - [Siliconera](https://www.siliconera.com/how-to-get-the-baldurs-gate-3-toolkit-and-install-mods/)

- **Question 3**: How to change spells
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/How_to_Prepare_and_Change_Spells)
  - [Game Rant](https://gamerant.com/baldurs-gate-3-bg3-change-characters-spells/)
  - [Game Leap](https://www.gameleap.com/articles/baldurs-gate-3-how-to-prepare-and-change-spells)
  - [Screen Rant](https://screenrant.com/baldurs-gate-3-equip-spells/)

- **Question 4**: How to respec
  - [PC Gamer](https://www.pcgamer.com/baldurs-gate-3-respec-guide/)
  - [GameSpot](https://www.gamespot.com/articles/baldurs-gate-3-respec-change-class-guide/1100-6516514/)
  - [Game Rant](https://gamerant.com/baldurs-gate-3-change-class-respec-level-1-reset-stats-bg3/)
  - [Fextralife](https://baldursgate3.wiki.fextralife.com/Respec)

- **Question 5**: How to get Minthara
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/Where_to_Find_and_Recruit_Minthara)
  - [Screen Rant](https://screenrant.com/baldurs-gate-3-recruit-minthara/)
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/Minthara_Companion_Guide)
  - [PCGamesN](https://www.pcgamesn.com/baldurs-gate-3/minthara)

- **Question 6**: How to get the owlbear cub
  - [The Gamer](https://www.thegamer.com/baldurs-gate-3-owlbear-cub-guide-tips/)
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/How_to_Get_the_Owlbear_Cub)
  - [GamesRadar+](https://www.gamesradar.com/baldurs-gate-3-owlbear-cub-cave/)
  - [Fextralife](https://baldursgate3.wiki.fextralife.com/Owlbear+Cub)

- **Question 7**: How to pickpocket
  - [The Gamer](https://www.thegamer.com/baldurs-gate-3-how-to-pickpocket-and-steal/)
  - [Destructoid](https://www.destructoid.com/how-to-pickpocket-in-baldurs-gate-3/)
  - [Gamestegy](https://gamestegy.com/post/bg3/858/pickpocket-guide)
  - [Gameranx](https://gameranx.com/features/id/210625/article/baldurs-gate-3-how-to-rob-everything-perfect-pickpocket-guide/)

- **Question 8**: How to save Mayrina
  - [Polygon](https://www.polygon.com/baldurs-gate-3-guides/23825850/save-mayrina-ancient-abode-auntie-ethel-bg3)
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/Save_Mayrina)
  - [Game Rant](https://gamerant.com/baldurs-gate-3-bg3-how-to-save-mayrina-ethel/)
  - [BG3 Wiki](https://bg3.wiki/wiki/Save_Mayrina)

- **Question 9**: How to get all companions
  - [GameSpot](https://www.gamespot.com/articles/baldurs-gate-3-companions-recruit-guide/1100-6515889/)
  - [IGN](https://www.ign.com/wikis/baldurs-gate-3/How_to_Find_and_Recruit_All_Companions)
  - [Eurogamer](https://www.eurogamer.net/all-companions-baldurs-gate-3-where-to-find-location-recruit-9351)
  - [Sports Illustrated](https://www.si.com/videogames/guides/baldurs-gate-3-companions-list)
  
## <ins>Check URLs for Compliance</ins>
Each URL's compliance with scraping policies was checked using the respective website's `robots.txt` file. None of the URLs selected for this project violate the directives specified in the `robots.txt` files, ensuring ethical and legal scraping practices.

## <ins>Imports Used for This Notebook</ins>

In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
import time

## List of URLs
Here are the selected URLs for scraping, grouped by their associated questions.

In [2]:
urls = [
    # Question 1: How to multiclass
    "https://www.ign.com/wikis/baldurs-gate-3/How_to_Multiclass",
    "https://www.thegamer.com/baldurs-gate-3-bg3-multiclass-explained-how-to/",
    "https://www.polygon.com/baldurs-gate-3-guides/24011302/multiclass-how-to-best-builds-bg3",
    "https://www.rockpapershotgun.com/baldurs-gate-3-multiclass",

    # Question 2: How to install mods
    "https://bg3.wiki/wiki/Modding:Installing_mods",
    "https://www.ign.com/wikis/baldurs-gate-3/How_to_Install_Mods",
    "https://www.thegamer.com/baldurs-gate-3-install-mods-guide/",
    "https://www.siliconera.com/how-to-get-the-baldurs-gate-3-toolkit-and-install-mods/",

    # Question 3: How to change spells
    "https://www.ign.com/wikis/baldurs-gate-3/How_to_Prepare_and_Change_Spells",
    "https://gamerant.com/baldurs-gate-3-bg3-change-characters-spells/",
    "https://www.gameleap.com/articles/baldurs-gate-3-how-to-prepare-and-change-spells",
    "https://screenrant.com/baldurs-gate-3-equip-spells/",

    # Question 4: How to respec
    "https://www.pcgamer.com/baldurs-gate-3-respec-guide/",
    "https://www.gamespot.com/articles/baldurs-gate-3-respec-change-class-guide/1100-6516514/",
    "https://gamerant.com/baldurs-gate-3-change-class-respec-level-1-reset-stats-bg3/",
    "https://baldursgate3.wiki.fextralife.com/Respec",

    # Question 5: How to get Minthara
    "https://www.ign.com/wikis/baldurs-gate-3/Where_to_Find_and_Recruit_Minthara",
    "https://screenrant.com/baldurs-gate-3-recruit-minthara/",
    "https://www.ign.com/wikis/baldurs-gate-3/Minthara_Companion_Guide",
    "https://www.pcgamesn.com/baldurs-gate-3/minthara",

    # Question 6: How to get the Owlbear Cub
    "https://www.thegamer.com/baldurs-gate-3-owlbear-cub-guide-tips/",
    "https://www.ign.com/wikis/baldurs-gate-3/How_to_Get_the_Owlbear_Cub",
    "https://www.gamesradar.com/baldurs-gate-3-owlbear-cub-cave/",
    "https://baldursgate3.wiki.fextralife.com/Owlbear+Cub",

    # Question 7: How to pickpocket
    "https://www.thegamer.com/baldurs-gate-3-how-to-pickpocket-and-steal/",
    "https://www.destructoid.com/how-to-pickpocket-in-baldurs-gate-3/",
    "https://gamestegy.com/post/bg3/858/pickpocket-guide",
    "https://gameranx.com/features/id/210625/article/baldurs-gate-3-how-to-rob-everything-perfect-pickpocket-guide/",

    # Question 8: How to save Mayrina
    "https://www.polygon.com/baldurs-gate-3-guides/23825850/save-mayrina-ancient-abode-auntie-ethel-bg3",
    "https://www.ign.com/wikis/baldurs-gate-3/Save_Mayrina",
    "https://gamerant.com/baldurs-gate-3-bg3-how-to-save-mayrina-ethel/",
    "https://bg3.wiki/wiki/Save_Mayrina",

    # Question 9: How to get all companions
    "https://www.gamespot.com/articles/baldurs-gate-3-companions-recruit-guide/1100-6515889/",
    "https://www.ign.com/wikis/baldurs-gate-3/How_to_Find_and_Recruit_All_Companions",
    "https://www.eurogamer.net/all-companions-baldurs-gate-3-where-to-find-location-recruit-9351",
    "https://www.si.com/videogames/guides/baldurs-gate-3-companions-list"
]

## <ins>Scraping Using Selenium</ins>

Selenium is configured to simulate a browser and scrape the following data for each URL:
- **Main Body Content**: The primary text content of the guide.
- **Meta Title**: The page title from the `<title>` tag.
- **Meta Description**: The page description from the `<meta name="description">` tag.
- **Meta Keywords**: The keywords from the `<meta name="keywords">` tag (if available).

This data will be stored in a structured format for subsequent processing and analysis.

In [5]:
# Configure Selenium WebDriver
chrome_options = Options()
chrome_options.add_argument("--headless")  # Run Chrome in headless mode
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chromedriver_path = r"C:\Users\candy\Desktop\chromedriver-win64\chromedriver.exe"  # Update with your path
service = Service(chromedriver_path)
driver = webdriver.Chrome(service=service, options=chrome_options)

# Set timeouts
driver.implicitly_wait(30)  # Implicit wait for elements
driver.set_page_load_timeout(120)  # Page load timeout

# Function to scrape a single URL
def scrape_url(url):
    try:
        driver.get(url)
        time.sleep(5)  # Wait for the page to load
        WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.TAG_NAME, "body")))
        content = driver.find_element(By.TAG_NAME, "body").text
        meta_title = driver.execute_script("return document.title;")
        meta_description = driver.execute_script("return document.querySelector('meta[name=\"description\"]').getAttribute('content');")
        meta_keywords = driver.execute_script("return document.querySelector('meta[name=\"keywords\"]').getAttribute('content');")
        print(f"Successfully scraped: {url}")
        return {"URL": url, "Content": content, "Meta_Title": meta_title, "Meta_Description": meta_description, "Meta_Keywords": meta_keywords}
    except Exception as e:
        print(f"Error scraping {url}: {e}")
        return {"URL": url, "Content": None, "Meta_Title": None, "Meta_Description": None, "Meta_Keywords": None}

# Scrape all URLs with the updated function
scraped_data = []
for url in urls:
    scraped_data.append(scrape_url_with_meta(url))

# Convert the scraped data into a DataFrame
scraped_dataframe = pd.DataFrame(scraped_data)

# Close the Selenium driver
driver.quit()

# Preview the first few rows
print(scraped_dataframe.head())

Successfully scraped: https://www.ign.com/wikis/baldurs-gate-3/How_to_Multiclass
Successfully scraped: https://www.thegamer.com/baldurs-gate-3-bg3-multiclass-explained-how-to/
Successfully scraped: https://www.polygon.com/baldurs-gate-3-guides/24011302/multiclass-how-to-best-builds-bg3
Successfully scraped: https://www.rockpapershotgun.com/baldurs-gate-3-multiclass
Successfully scraped: https://bg3.wiki/wiki/Modding:Installing_mods
Successfully scraped: https://www.ign.com/wikis/baldurs-gate-3/How_to_Install_Mods
Successfully scraped: https://www.thegamer.com/baldurs-gate-3-install-mods-guide/
Successfully scraped: https://www.siliconera.com/how-to-get-the-baldurs-gate-3-toolkit-and-install-mods/
Successfully scraped: https://www.ign.com/wikis/baldurs-gate-3/How_to_Prepare_and_Change_Spells
Successfully scraped: https://gamerant.com/baldurs-gate-3-bg3-change-characters-spells/
Successfully scraped: https://www.gameleap.com/articles/baldurs-gate-3-how-to-prepare-and-change-spells
Succes

## <ins>Save the Scraped DataFrame</ins>
The scraped data, including content and meta tags, is saved as a `.pkl` file for use in subsequent notebooks. A `.csv` version is also saved for review purposes.

In [6]:
scraped_dataframe.to_pickle("data/scraped_data_with_meta.pkl")
scraped_dataframe.to_csv("data/scraped_data_with_meta.csv", index=False)