# Steam Game Reviews

Steam is a leading online gaming platform and community that hosts thousands of games along with user-generated reviews. These reviews provide valuable insights into the gaming experience, developer performance, and overall game quality. In this analysis, we aim to explore and extract meaningful information about the games, publishers, and user sentiments from Steam reviews. 

Through this notebook, we will:

- Analyze game reviews to uncover trends in user satisfaction and common themes.
- Evaluate publishers and developers based on user feedback.
- Explore correlations between game features (genre, price, release date) and review sentiments.
- Highlight key statistics and patterns that inform game development and user preferences.

By delving into this data, we aim to gain a deeper understanding of what makes a game successful and identify patterns in the gaming community's feedback.

In [1]:
import pickle
import sys
import os
import pandas as pd
import glob
import random

# Get the parent directory and add it to sys.path
parent_dir = os.path.abspath(os.path.join(os.path.dirname("./"), '..'))
sys.path.append(parent_dir)

from src.LLM_analysis import *

In [2]:

# Define the path to your pickle files
file_paths = glob.glob('./data/steam_reviews/*.csv')

# Initialize an empty list to store the DataFrames
dataframes = []

# Load each file into a separate DataFrame
for file_path in file_paths[:3]:  # Limit to the first three files
    with open(file_path, 'rb') as file:
        df = pd.read_csv(file_path) 
        dataframes.append(df)

# Assign each DataFrame to separate variables for clarity
games_ranking_df, games_description_df, steam_game_reviews_df = dataframes

print("Games Ranking Columns: " + str(games_ranking_df.columns.tolist()))
print("Games Description Columns: " + str(games_description_df.columns.tolist()))
print("Steam Game Reviews Columns: " + str(steam_game_reviews_df.columns.tolist()))

Games Ranking Columns: ['game_name', 'genre', 'rank_type', 'rank']
Games Description Columns: ['name', 'short_description', 'long_description', 'genres', 'minimum_system_requirement', 'recommend_system_requirement', 'release_date', 'developer', 'publisher', 'overall_player_rating', 'number_of_reviews_from_purchased_people', 'number_of_english_reviews', 'link']
Steam Game Reviews Columns: ['review', 'hours_played', 'helpful', 'funny', 'recommendation', 'date', 'game_name', 'username']


  df = pd.read_csv(file_path)


In [3]:
# If you want to preview the data
print("Games Ranking: ")
print(games_ranking_df.head())
print("\n\n")
print("Games Description: ")
print(games_description_df.head())
print("\n\n")
print("Steam Game Reviews: ")
print(steam_game_reviews_df.head())

Games Ranking: 
                          game_name   genre rank_type  rank
0                  Counter-Strike 2  Action     Sales     1
1  Warhammer 40,000: Space Marine 2  Action     Sales     2
2                    Cyberpunk 2077  Action     Sales     3
3                Black Myth: Wukong  Action     Sales     4
4                        ELDEN RING  Action     Sales     5



Games Description: 
                               name  \
0                Black Myth: Wukong   
1                  Counter-Strike 2   
2  Warhammer 40,000: Space Marine 2   
3                    Cyberpunk 2077   
4                        ELDEN RING   

                                   short_description  \
0  Black Myth: Wukong is an action RPG rooted in ...   
1  For over two decades, Counter-Strike has offer...   
2  Embody the superhuman skill and brutality of a...   
3  Cyberpunk 2077 is an open-world, action-advent...   
4  THE CRITICALLY ACCLAIMED FANTASY ACTION RPG. R...   

                             

In [14]:
initialize_llm(api_key="YOUR API KEY")
#Verify we can continue, True if we can
print("Ready to go!") if verify_setup() == True else print("Something went wrong")

Ready to go!


### Let's start easy!

Let's take a random subset of 5 reviews from a random game and summarize the reviews using the `summarize_text` function.

In [11]:
random_game = random.choice(steam_game_reviews_df['game_name'].unique())
random_game_reviews = steam_game_reviews_df[steam_game_reviews_df['game_name'] == random_game].sample(5, random_state=42)

random_game, random_game_reviews['review'].tolist()
reviews_concat = " ".join(random_game_reviews['review'].tolist())
reviews_concat

'2020 I adore this game. it\'s deceptive simple, yet actually somewhat difficult. there\'s a variety of time trials to keep the vibe going post-story--took me 10 hours to play through the story. so if you\'re into speedrunning or anything like that, this will be hella fun 2020 very chiil love the vibesskate fasteat trash 2020 just a great game 2020 A Bird\'s Eye View Curator 🦜IntroductionTanuki Sunset is a game in which you play as a rebel raccoon not only disobeying their mother by not wearing a helm but also all of the Tony Hawk knowledge I possess that you don\'t do tricks on a longboard AND the titular Tanuki even does it without looking ahead!Visuals and MechanicsThe visuals of Tanuki Sunset are a gorgeous depiction of the 80\'s synthwavey style with dancing to the music suns and buildings and overall vibrant and un-photorealistic style of the entire game and it\'s menus. As for the mechanics boy is this game an example of easy to learn and hard to master. Much further than steeri

In [13]:
summary = summarize_text(reviews_concat)
summary

'Key Points:\n\n1. **Game Overview**: "Tanuki Sunset" features a rebel raccoon character and combines elements of skateboarding with a vibrant 80\'s synthwave aesthetic.\n\n2. **Gameplay Mechanics**: The game is easy to learn but challenging to master, with a focus on steering and timing rather than drifting, which can slow players down.\n\n3. **Visuals and Soundtrack**: The visuals are colorful and stylized, while the soundtrack varies by location, enhancing the gaming experience, though some transitions between songs could be smoother.\n\n4. **Time Trials and Replayability**: After completing the 10-hour story mode, players can engage in various time trials, making it appealing for speedrunners and completionists.\n\n5. **Recommendation**: Overall, the game is highly recommended for its enjoyable experience, despite minor issues with controls and menu navigation.'

### Different Customer Sentiments Among Various Satisifaction Levels
Now we'll move onto something more interesting. Let's say we have a game and a distribution of reviews in one of the following categories:
- Overwhelmingly Positive
- Very Positive
- Positive
- Mostly Positive
- Mixed
- Mostly Negative
- Negative
- Very Negative
- Overwhelmingly Negative

Here is how they do the rankings if you are curious, though it isn't relevant for this first exercise.
| Score Range | Reviews     | Sentiment   | Intensity        |
|-------------|-------------|-------------|------------------|
| 95 - 100    | 500+        | Positive    | Overwhelming     |
| 85 - 100    | 50+         | Positive    | Very             |
| 80 - 100    | 1+          | Positive    |                  |
| 70 - 79     | 1+          | Positive    | Mostly           |
| 40 - 69     | 1+          | Mixed       |                  |
| 20 - 39     | 1+          | Negative    | Mostly           |
|  0 - 19     | 1+          | Negative    |                  |
|  0 - 19     | 50+         | Negative    | Very             |
|  0 - 19     | 500+        | Negative    | Overwhelming     |


We want to show a game's distribution of rankings by those categories on a plot. That shouldn't be too difficult.