Skip to content

A set of scripts that crawls STEAM website to download game reviews.

License

Notifications You must be signed in to change notification settings

aesuli/steam-crawler

Repository files navigation

STEAM crawler

This set of scripts crawls STEAM website to download game reviews.

These scripts are aimed at students that want to experiment with text mining on review data.

The script have an order of execution.

  • steam-game-crawler.py download pages that lists games into ./data/games/

  • steam-game-extractor.py extracts games ids from the downloaded pages, saving them into ./data/games.csv

  • steam-review-crawler.py uses the above list to download game reviews pages into ./data/reviews This process can take a long time (it's a lot of data and the script sleeps between requests to be fair with the server). When the script is stopped and restarted it will skip games for which all reviews have been downloaded on the previous run (it does not downloads new reviews for such games).

  • steam-review-extractor.py extracts reviews and other info from the downloaded pages, saving them into ./data/reviews.csv

Column in the reviews.csv file:

  • game id
  • number of people that found the review to be useful
  • number of people that found the review to be funny
  • username of the reviewer
  • number of games owned by the reviewer
  • number of reviews written by the reviewer
  • 1=recommended, -1=not recommended
  • hours played by the reviewer on the game
  • date of creation of the review
  • text of the review

The last script steam-reviews-stats.py is a sample script that processes the review.csv file and outputs some basic info and stats in json files:

  • ./data/games.json number of reviews and played hours for every game.

  • ./data/users.json number of game owned (as reported by user's badge on STEAM) and number of played hours.

  • ./data/summary.json number of reviews, number of played hours, number of users, number of games.

On March 15, 2018 those last statistics are:

reviews        6614765
played hours 554702535
users          2720777
games            26677

About

A set of scripts that crawls STEAM website to download game reviews.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages