# Steam Game Review Scraper

Webscrape game review data from Steam, including the user, profile link, and the review itself

Based on: https://www.reddit.com/r/Python/comments/j42rv5/i_created_a_web_scraper_for_steam_game_reviews/

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep

## Getting started

Lookup the game id by doing a search on steam, navigate to the game homepage, and then get the number embedded in the URL before the game title.

In [2]:
# Dark Souls II: Scholar of the First Sin (335300)
game_id = 335300
template = 'https://steamcommunity.com/app/{}/reviews/?browsefilter=toprated&snr=1_5_100010_&p=1'

url = template.format(game_id)

driver = webdriver.Chrome(executable_path="C:/Users/bdion/Downloads/Chrome WebDriver/chromedriver-win64/chromedriver.exe")
driver.maximize_window()
driver.get(url)

  driver = webdriver.Chrome(executable_path="C:/Users/bdion/Downloads/Chrome WebDriver/chromedriver-win64/chromedriver.exe")


## Scrape via webdriver:

The page is continously scrolling, so you'll need to grab the cards, then scroll down to the bottom and repeat until finished. For this project, we are going to collect the following information:
- Steam ID
- Review Text
- Review Recommendation
- Date Posted
- There are 181 pages of DS2 reviews
- Each review is contained under:
    - div class="apphub_CardTextContent"

In [3]:
# Open the URL
driver.get(url)

reviews = []
review_ids = set()

# get current position of y scrollbar
last_position = driver.execute_script("return window.pageYOffset;")

# Scroll down to load additional reviews (adjust the number of scrolls as needed)
t = 3
while True:  # Scrolling a certain amount of times (num of iterations)
    # get cards on the page
    cards = driver.find_elements_by_class_name('apphub_Card')

    for card in cards[-20:]:  # only the tail end are new cards
        # gamer profile url
        profile_url = card.find_element_by_xpath('.//div[@class="apphub_friend_block"]/div/a[2]').get_attribute('href')

        # steam id
        steam_id = profile_url.split('/')[-2]
        
        # check to see if I've already collected this review
        if steam_id in review_ids:
            continue
        else:
            review_ids.add(steam_id)

        # username
        user_name = card.find_element_by_xpath('.//div[@class="apphub_friend_block"]/div/a[2]').text

        # the actual review
        date_posted = card.find_element_by_xpath('.//div[@class="apphub_CardTextContent"]/div').text
        review_content = card.find_element_by_xpath('.//div[@class="apphub_CardTextContent"]').text.replace(date_posted,'').strip()      

        # recommendation (if person liked the game)
        thumb_text = card.find_element_by_xpath('.//div[@class="reviewInfo"]/div[2]').text

        # save review
        review = (steam_id, review_content, thumb_text, date_posted)
        reviews.append(review)
    
    # scroll down
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(t)  # Wait for the content to load (t is in term of seconds)

    # get current position after scrolling
    curr_position = driver.execute_script("return window.pageYOffset;")

    # Break the loop if no new content is loaded
    if curr_position == last_position:
        break

    last_position = curr_position

# Close the webdriver
driver.quit()

  cards = driver.find_elements_by_class_name('apphub_Card')


## Save the results

Putting reviews in a dataframe

In [4]:
import pandas as pd

In [5]:
reviews = pd.DataFrame(reviews)
reviews.columns = ['SteamId', 'Review', 'Recommended?', 'DatePosted']
reviews

Unnamed: 0,SteamId,Review,Recommended?,DatePosted
0,Xilirite,TL;DR\nBuy it if you prefer the individual lev...,Recommended,"Posted: June 2, 2017"
1,76561197996536125,Elden Ring waiting room,Recommended,"Posted: June 20, 2021"
2,76561198143997912,I thought this was supposed to be the bad one,Recommended,Posted: August 12
3,76561198088449870,Little known fact: it's called dark souls 2 be...,Recommended,Posted: July 4
4,76561198960403801,port bloodborne to pc,Recommended,"Posted: April 30, 2022"
...,...,...,...,...
35173,saihchotic,Game's so hard steam wont even let me redownlo...,Recommended,"Posted: January 13, 2017"
35174,TheArmedMadMan,This game is wank. It's basically Dark Souls b...,Not Recommended,"Posted: January 15, 2017"
35175,76561197971446226,Flawed camera but one of the best games I've e...,Recommended,"Posted: January 14, 2017"
35176,sporadicity,"If you liked the first Dark Souls, then great!...",Recommended,"Posted: January 14, 2017"


Saving the reviews into a csv file:

In [6]:
# save the df to a CSV file
reviews.to_csv('reviews.csv')