---
# 1/ Finding the AppID of the selected game (The Crew 2 in this case)
---

To access the reviews page of a Steam game, we need the AppID of the game.  
To achieve this, we will use Steam's game search page by entering the game name.  
Then, we will display (on the Streamlit app) a list of the top 5 (or less/more) results from the search, allowing the user to select the corresponding game.  
Based on this selection, we will scrape the AppID of the chosen game using the HTML content of Steam's game search page.

## Importing libraries

In [1]:
import requests

from bs4 import BeautifulSoup

import urllib.parse

In [2]:
# This url corresponds to the game search page on Steam
# It is necessary to transform the game name into URL format

game_name = "the crew 2"
search_page_url = "https://store.steampowered.com/search/?term=" + urllib.parse.quote(game_name)
print(search_page_url)

https://store.steampowered.com/search/?term=the%20crew%202


In [3]:
# Request the URL of the search page
search_page_response = requests.get(search_page_url)

In [4]:
# Retrieve the HTML content of the page
search_page_soup = BeautifulSoup(search_page_response.text, 'lxml')

Scrape the game names and their corresponding AppIDs

In [5]:
game_names_list = []
appIDs_list = []

games_list_result = search_page_soup.find_all("a", class_ = "search_result_row ds_collapse_flag", limit = 5)

for game in games_list_result :
    game_names_list.append(game.find(class_ = "title").get_text(strip = True))
    appIDs_list.append(game.get("data-ds-appid"))

# Dictionary of game names and their corresponding AppIDs
dict_game = dict(zip(game_names_list, appIDs_list))
print(dict_game)

{'The Crew™ 2': '646910', 'The Crew 2 - Season Pass': '889890', 'The Crew 2 Demo': '1075340', 'The Crew 2 - Mazda RX8 Starter Pack': '2183882', 'The Crew Motorfest | Year 2 Pass': '3251420'}


In [6]:
# We select the game "The Crew 2"
final_AppID = dict_game[game_names_list[0]]
print(final_AppID)

646910


In [7]:
# We now have the URL of the Steam reviews page for the selected game
reviews_page_url = "https://steamcommunity.com/app/" + final_AppID + "/reviews"
print(reviews_page_url)

https://steamcommunity.com/app/646910/reviews


# 2/ Set the parameters for the reviews search

To filter the reviews obtained, it is possible to set parameters as the review sorting method, the review language, and the maximum number of reviews to scrape.  
In our case, the user will choose these parameters on the Streamlit app.  
After setting these parameters, we will define the URL of the reviews page based on the selected filters.  
To apply these filters, we will use BeautifulSoup once again...

In [10]:
# Request the URL of the initial reviews page
reviews_page_response = requests.get(reviews_page_url)

In [11]:
# Retrieve the HTML content of the page
reviews_page_soup = BeautifulSoup(reviews_page_response.text, 'lxml')

## Set the review sorting method

In [124]:
# List containing the names of the sorting methods to be displayed on the Streamlit app
sorting_methods_name_list = []

# List containing the identifier of the sorting methods to be added to the final reviews page URL
sorting_methods_identifier_list = []

sorting_methods_result = reviews_page_soup.find("div", class_ = "filterselect_options shadow_content").find_all("div", class_ = "option")

for sorting_method in sorting_methods_result :
    sorting_methods_identifier_list.append(sorting_method.get("onclick").replace("javascript:SelectContentFilter( '", "").replace("' );", ""))
    sorting_method_text = sorting_method.get_text(strip = True)
    sorting_methods_name_list.append(sorting_method_text)

# Dictionary of sorting methods names and their corresponding identifier
dict_sorting_methods = dict(zip(sorting_methods_name_list, sorting_methods_identifier_list))
print(dict_sorting_methods)

{'Most Helpful(All Time)': '?p=1&browsefilter=toprated', 'Most Helpful(Today)': '?p=1&browsefilter=trendday', 'Most Helpful(Week)': '?p=1&browsefilter=trendweek', 'Most Helpful(Month)': '?p=1&browsefilter=trendmonth', 'Most Helpful(Three Months)': '?p=1&browsefilter=trendthreemonths', 'Most Helpful(Six Months)': '?p=1&browsefilter=trendsixmonths', 'Most Helpful(Year)': '?p=1&browsefilter=trendyear', 'Most Recent': '?p=1&browsefilter=mostrecent', 'Recently Updated': '?p=1&browsefilter=recentlyupdated', 'Funny': '?p=1&browsefilter=funny'}


In [132]:
# In our case, we choose the sorting method 'Most Helpful(Year)'
sorting_method_identifier = dict_sorting_methods['Most Helpful(Year)']
print(sorting_method_identifier)

?p=1&browsefilter=trendyear


## Set the language for the reviews

In [130]:
# List containing the names of the languages to be displayed on the Streamlit app
languages_name_list = []

# List containing the identifier of the languages to be added to the final reviews page URL
languages_identifier_list = []

languages_result = reviews_page_soup.find("div", class_ = "filterselect_options language shadow_content").find_all("div", class_ = "option")

for language in languages_result :
    languages_identifier_list.append(language.get("onclick").replace("javascript:SelectLanguageFilter( '?", "").replace("' );", ""))
    language_text = language.get_text(strip = True)
    languages_name_list.append(language_text)

# Dictionary of languages names and their corresponding identifier
dict_languages = dict(zip(languages_name_list, languages_identifier_list))
print(dict_languages)

{'All Languages': 'filterLanguage=all', 'Simplified Chinese': 'filterLanguage=schinese', 'Traditional Chinese': 'filterLanguage=tchinese', 'Japanese': 'filterLanguage=japanese', 'Korean': 'filterLanguage=koreana', 'Thai': 'filterLanguage=thai', 'Bulgarian': 'filterLanguage=bulgarian', 'Czech': 'filterLanguage=czech', 'Danish': 'filterLanguage=danish', 'German': 'filterLanguage=german', 'English': 'filterLanguage=english', 'Spanish - Spain': 'filterLanguage=spanish', 'Spanish - Latin America': 'filterLanguage=latam', 'Greek': 'filterLanguage=greek', 'French': 'filterLanguage=french', 'Italian': 'filterLanguage=italian', 'Indonesian': 'filterLanguage=indonesian', 'Hungarian': 'filterLanguage=hungarian', 'Dutch': 'filterLanguage=dutch', 'Norwegian': 'filterLanguage=norwegian', 'Polish': 'filterLanguage=polish', 'Portuguese - Portugal': 'filterLanguage=portuguese', 'Portuguese - Brazil': 'filterLanguage=brazilian', 'Romanian': 'filterLanguage=romanian', 'Russian': 'filterLanguage=russian',

In [131]:
# In our case, we choose English language
language_identifier = dict_languages['English']
print(language_identifier)

filterLanguage=english


## Set the maximum number of reviews to scrape

In [139]:
max_review = 100

## Define the final URL with the selected filters

In [133]:
final_reviews_page_url = reviews_page_url + sorting_method_identifier + "&" + language_identifier
print(final_reviews_page_url)

https://steamcommunity.com/app/646910/reviews?p=1&browsefilter=trendyear&filterLanguage=english


By observing this page, we can see that not all reviews are in the HTML content...  
This is because the page needs to be loaded by scrolling to the bottom.  
To solve this, we will use Selenium library to scroll to the bottom of the page, ensuring that all reviews are loaded in the HTML content.

---
# 3/ Scroll the page using Selenium
---

## Importing libraries

In [140]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.action_chains import ActionChains

from webdriver_manager.chrome import ChromeDriverManager

## Create and launch the Selenium driver

In [150]:
# Set the options to launch the driver in the background
chrome_options = Options()
chrome_options.add_argument("--headless")

# Launch the driver
driver = webdriver.Chrome(options = chrome_options, service=Service(ChromeDriverManager().install()))

# Access the Steam reviews page
driver.get(final_reviews_page_url)

In [151]:
end = False

action_wait = driver.find_element(by = By.XPATH, value = '//*[@id="action_wait"]/img')
getMoreContent = driver.find_element(by = By.ID, value = 'GetMoreContentBtn')
noMoreContent = driver.find_element(by = By.XPATH, value = '//*[@id="NoMoreContent"]/div[2]/a')

while end == False and len(driver.find_elements(By.CLASS_NAME, 'apphub_CardTextContent')) < max_review :
    try :
        ActionChains(driver).scroll_to_element(noMoreContent).perform()
        end = True
    except :
        ActionChains(driver).scroll_to_element(getMoreContent).perform()
        WebDriverWait(driver, 10).until(EC.invisibility_of_element(action_wait))

In [152]:
result_soup = BeautifulSoup(driver.page_source, "lxml")

In [145]:
reviews = result_soup.find_all("div", class_ = "apphub_CardTextContent")

In [153]:
liste_review = []
for review in reviews :    
    for elt_a_suppr in review.find_all(['div'], class_=["date_posted", "received_compensation", "refunded"]):
        elt_a_suppr.decompose()
    
    review = review.get_text(strip=True, separator = "\n")

    liste_review.append(review)
    print(review, end = "\n\n")

Justice for The Crew 1

Worth 1$*
* be aware Ubisoft can disable The Crew 2 servers!

I used to be able to say "yes, get this game," but now that you've taken it's predecessor offline, I cannot and won't. How long do we have, maybe another 2 to 3 years for this game? Possibly even less because of your focus on the other game? I don't like modern Forza games, but at least they let you play their older games even without an online connection. If you can't even do that for your consumers, then why bother make anymore games?

Eh, for a $1 USD you can't complain. The game isn't bad, with the little time I played today I drove all the way from California to New York and it was a fun drive. At least Ubisoft has stated that they are going to give this game an offline mode which is fantastic considering they up and left The Crew 1 and shutdown all servers. You can't play it anymore, and they need some good reputation at the moment.
Ubisoft/Ivory Tower haven't stated anything that the game is go