I made this project to learn how to use Scrapy framework with Python to perform web scraping. The developed spider extracts all the board games from BoardGameGeek website and exported into a csv file (around 137 thousand games) with the following data:
- Id
- Rank
- Name
- Url
- Rating
- Number of Votes
- Year
- Description
- Date
I use the following code here to scrape data using XML API from Boardgamegeek website to get the following details for each game that was previously web scraped from their website:
- Thumbnail
- Language dependency
- Minimum number of players
- Maximum number of players
- Best player number
- Minimum playing time
- Maximum playing time
- Number of users rating the game
- Number of users that own the game
- Number of comments
- Number of votes for the weight of the game
Need to have installed the following Python packages:
- scrapy
- pandas
- beautifulsoup4