# Steam Game Review Scraper

Webscrape game review data from Steam, including the user, profile link, and the review itself

In [1]:
import requests as re
import pandas as pd

## Requesting reviews from Steam:

The page is continously scrolling, so you'll need to grab the cards, then scroll down to the bottom and repeat until finished. For this project, we are going to collect the following information:
- Steam ID
- Review Text
- Review Recommendation
- Date Posted
- There are 181 pages of DS2 reviews
- Each review is contained under:
    - div class="apphub_CardTextContent"

In [2]:
init_request = re.get('https://store.steampowered.com/appreviews/335300?json=1?filter=all&language=english&num_per_page=100').json()

In [3]:
init_request['success']

1

In [4]:
init_request['query_summary']

{'num_reviews': 91,
 'review_score': 8,
 'review_score_desc': 'Very Positive',
 'total_positive': 21986,
 'total_negative': 4471,
 'total_reviews': 26457}

- Using cursors to get more batches of reviews:

In [5]:
cursor = init_request['cursor']
cursor 

'AoIIPwAAAHyKjuEE'

In [6]:
request = re.get(f'https://store.steampowered.com/appreviews/335300?json=1?filter=all&language=english&num_per_page=100&cursor={cursor}').json()
request['cursor']

'AoIIPwYYanm+09sE'

- Convert to dataframe:

In [7]:
def review_batches(app_id=335300):
    # Initial request:
    init_request = re.get(f'https://store.steampowered.com/appreviews/{app_id}?json=1?filter=all&language=english&num_per_page=100&cursor=*').json()

    # Used to get further batches of reviews:
    cursor = init_request['cursor']

    # Creating the actual df:
    init_reviews = init_request['reviews']

    columns = ['review', 'init_date', 'update_date', 'in_early_access']
    reviews_data = {
        'review': [review['review'] for review in init_reviews],
        'init_date': [review['timestamp_created'] for review in init_reviews],
        'update_date': [review['timestamp_updated'] for review in init_reviews],
        'in_early_access': [review['written_during_early_access'] for review in init_reviews]
    }
    reviews = pd.DataFrame(reviews_data, columns=columns)

    # Keeping track if a request was successful:
    success = init_request['success']
    while success:
        request = re.get(f'https://store.steampowered.com/appreviews/{app_id}?json=1?filter=all&language=english&num_per_page=100&cursor={cursor}').json()
        if 'cursor' in request:
            cursor = request['cursor']
        else:
            break

        more_data = {
            'review': [review['review'] for review in init_reviews],
            'init_date': [review['timestamp_created'] for review in init_reviews],
            'update_date': [review['timestamp_updated'] for review in init_reviews],
            'in_early_access': [review['written_during_early_access'] for review in init_reviews]
        }
        more = pd.DataFrame(more_data, columns=columns)
        reviews = pd.concat([reviews, more], ignore_index=True)

        success = request['success']
    
    return reviews

In [8]:
reviews = review_batches()
reviews

Unnamed: 0,review,init_date,update_date,in_early_access
0,Dark Souls II is many things its siblings are ...,1710199956,1710199995,False
1,"Reddit told me this game sucks, don't trust a ...",1710453440,1710453440,False
2,I try and play this game and it gives me a hea...,1709438268,1709438268,False
3,"""The lord souls killed my wife"" - John Darksoul",1709070496,1709070496,False
4,"dark souls 1 felt like you were truly lost, in...",1710202744,1710202856,False
...,...,...,...,...
177,Not enjoyable enough to finish. Made by people...,1709557574,1709557574,False
178,"I wanted to like this game, but it just doesn'...",1709551481,1710296156,False
179,Every other game in the Souls series is better...,1710588447,1710588447,False
180,One of the worst games ive ever had the disple...,1710560570,1710560570,False


## Saving the reviews into a csv file:

In [9]:
# save the df to a CSV file
