# Machine Learning Group Project 

User game rating prediction & systematic discount offering on Steam. Project developed by Team XX composed by:
| Student Name | Student Number | Class Group |
| --- | --- | --- |
| **Alessandro Maugeri** | 53067 | TA |
| **Frank Andreas Bauer** | XXXX | XX |
|  **Johannes Rahn** | XXXX | XX |
| **Nicole Zoppi** | XXXX | XX |
| **Yannick von der Heyden** | XXXX | XX |

## Importing Packages

In [2]:
import ast
import pandas as pd
import numpy as np
from datetime import datetime

## Importing Data

The data for this project was retrieved from [Kaggle](https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam?select=games.csv) and stored in the "data" folder found in the notebook's directory. The folder includes **four data files**:

The CSV file **[games.csv](data/games.csv)** presents data concerning individual games in the Steam library:

| Column | Description | Example|
| --- | --- | --- |
| **app_id** | Product ID on Steam | 620 |
| **title** | Product Commercial Title | Portal 2|
|  **date_release** | Release Date of Title (y-m-d) | 2011-04-18 |
| **win** | Boolean Denoting Compatibility to Windows Computers | True |
| **mac** | Boolean Denoting Compatibility to Mac Computers  | True | 
| **linux** | Boolean Denoting Compatibility to Linux Computers  | True |
| **rating** | Categorical Rating of Product (e.g. "Positive")| Overwhelmingly Positive |
| **positive_ratio** | Ratio of Postive Feedback for Game  | 98 |
| **user_reviews** | Number of Reviews  | 267142 |
| **price_final** | Final Price in USD | 9.99 |
| **price_original** | Price Before Discounts in USD | 9.99 |
| **discount** | Applied Discount | 0 |
| **steam_deck** | Discount Percentage | True |



In [3]:
df_games_data = pd.read_csv("data/games.csv")
df_games_data.head(2)

Unnamed: 0,app_id,title,date_release,win,mac,linux,rating,positive_ratio,user_reviews,price_final,price_original,discount,steam_deck
0,10090,Call of Duty: World at War,2008-11-18,True,False,False,Very Positive,92,37039,19.99,19.99,0.0,True
1,13500,Prince of Persia: Warrior Within™,2008-11-21,True,False,False,Very Positive,84,2199,9.99,9.99,0.0,True


----
The **CSV file [users.csv](data/users.csv)** presents data concerning individual users found in the datasets:

| Column | Description | Example|
| --- | --- | --- |
| **user_id** | User ID on Steam | 5693478 |
| **products** | Number of Products from Steam Library Owned | 156 |
|  **reviews** | Number of Reviews Published | 1 |

In [4]:
df_users = pd.read_csv("data/users.csv")
df_users.head(2)

Unnamed: 0,user_id,products,reviews
0,5693478,156,1
1,3595958,329,3


----
The **CSV file [recommendations.csv](data/recommendations.csv)** has a many-to-many relationship to both users.csv and games.csv and contains data concerning user reviews of specific games:

| Column | Description | Example|
| --- | --- | --- |
| **app_id** | Product ID on Steam | 620 |
| **helpful** | Number of Users Who Found Review Helpful | 0 |
|  **funny** | Number of Users Who Found Review Funny | 0 |
| **date** | Date in Which Review was Published (y-m-d) | 2022-12-12 |
| **is_recommended** | Does the User Recommend the Title | True | 
| **hours** | Hours Spent by User Playing Game  | 36.3 |
| **user_id** | User ID of Review Author | 19954 |
| **review_id** | ID of Individual Review  | 0 |

In [5]:
df_redommendations = pd.read_csv("data/recommendations.csv")
df_redommendations.head(2)

Unnamed: 0,app_id,helpful,funny,date,is_recommended,hours,user_id,review_id
0,975370,0,0,2022-12-12,True,36.3,19954,0
1,304390,4,0,2017-02-17,False,11.5,1098,1


----
Finally, the folder includes a **JSON file [games_metadata.json](data/games_metadata.json)** containing metadata on individual games:

| Column | Description | Example|
| --- | --- | --- |
| **app_id** | Product ID on Steam | 304430 |
| **description** | Game Description on Steam | "Hunted and alone, a boy finds himself drawn into the center of a dark project. INSIDE is a dark, narrative-driven platformer combining intense action with challenging puzzles. It has been critically acclaimed for its moody art style, ambient soundtrack and unsettling atmosphere." |
|  **tags** | Additional Tags on Steam Platform | ["2.5D", "Story Rich", "Puzzle Platformer" , "Atmospheric" , "Adventure" , "Indie" , "Dark" , "Horror" , "Singleplayer" , "Action-Adventure" , "Puzzle" , "Multiple Endings" , "Exploration" , "2D Platformer" , "Platformer" , "Controller" , "Soundtrack" , "Ambient" , "Action" , "Narrative"] |

In [6]:
df_games_meta_data = pd.read_json('data/games_metadata.json', lines=True)
df_games_meta_data.head(2)

Unnamed: 0,app_id,description,tags
0,10090,"Call of Duty is back, redefining war like you'...","[Zombies, World War II, FPS, Multiplayer, Acti..."
1,13500,Enter the dark underworld of Prince of Persia ...,"[Action, Adventure, Parkour, Third Person, Gre..."


## Data Exploration

## Data Preparation

Below we prepare the individuals datasets for the analysis.

#### Games Data

In [7]:
# Turn date_release column to Pandas DateTime
df_games_data["date_release"] = pd.to_datetime(df_games_data["date_release"])

df_games_data["date_release"][0]

Timestamp('2008-11-18 00:00:00')

#### User Data

#### Recommendations Data

In [8]:
df_redommendations["date"] = pd.to_datetime(df_redommendations["date"])

#### Games Metadata

In [9]:
# Turn the Description Column to a String
df_games_meta_data['description'] = df_games_meta_data['description'].astype(str)

# Turn the Tags Column Into a List
df_games_meta_data["tags"] = df_games_meta_data["tags"].astype(str).apply(ast.literal_eval)

df_games_meta_data

Unnamed: 0,app_id,description,tags
0,10090,"Call of Duty is back, redefining war like you'...","[Zombies, World War II, FPS, Multiplayer, Acti..."
1,13500,Enter the dark underworld of Prince of Persia ...,"[Action, Adventure, Parkour, Third Person, Gre..."
2,22364,,[Action]
3,113020,Monaco: What's Yours Is Mine is a single playe...,"[Co-op, Stealth, Indie, Heist, Local Co-Op, St..."
4,226560,Escape Dead Island is a Survival-Mystery adven...,"[Zombies, Adventure, Survival, Action, Third P..."
...,...,...,...
46063,758560,"Welcome to Versus World! Shoot, stab, snipe, a...","[Action, Indie, Early Access, Gore, Violent, F..."
46064,886910,,"[Simulation, Free to Play, Multiplayer, Single..."
46065,1477870,Fire and Water what is that an online game mad...,"[Casual, Action, Adventure, Action-Adventure, ..."
46066,1638430,A modern turn-based deckbuilding JRPG involvin...,"[RPG, Pixel Graphics, Party-Based RPG, JRPG, A..."


## Merging Datasets

We merge the data into 

In [10]:
# Merge all information on games to one DataFrame
games_df = df_games_data.merge(df_games_meta_data)

# Merge game information into the recommendations DataFrame
recs_df = df_redommendations.merge(games_df, how = "left", on = "app_id")

# Merge all information on users into a final DataFrame
final_df = recs_df.merge(df_users, how="left", on = "user_id")

In [11]:
final_df.head(2)

Unnamed: 0,app_id,helpful,funny,date,is_recommended,hours,user_id,review_id,title,date_release,...,positive_ratio,user_reviews,price_final,price_original,discount,steam_deck,description,tags,products,reviews
0,975370,0,0,2022-12-12,True,36.3,19954,0,Dwarf Fortress,2022-12-06,...,95,17773,29.99,29.99,0.0,True,"The deepest, most intricate simulation of a wo...","[Colony Sim, Indie, Pixel Graphics, Simulation...",28,3
1,304390,4,0,2017-02-17,False,11.5,1098,1,FOR HONOR™,2017-02-13,...,68,76071,14.99,14.99,0.0,True,Carve a path of destruction through an intense...,"[Medieval, Swordplay, Action, Multiplayer, PvP...",269,1


## Feature Engineering

**Elapsed Time:** A new feature which tracks the amount of time that has elapsed between the game's release and the review being logged. This could be interested because people who purchase a game right after its release are likely to be larger fans of the genre or franchise.

In [12]:
final_df["elapsed_time"] = final_df["date"] - final_df["date_release"]

**Relative Recommendation:** Likelihood 

In [13]:
usr_avg_rating = final_df[["user_id","is_recommended"]].groupby("user_id").mean()
usr_avg_rating.rename(columns = {"is_recommended":"avg_rating"}, inplace = True)

In [14]:
final_df = final_df.merge(usr_avg_rating, how = "left", on = "user_id")
final_df["rel_rec"] = (final_df["is_recommended"] - final_df["avg_rating"])

In [17]:
final_df

Unnamed: 0,app_id,helpful,funny,date,is_recommended,hours,user_id,review_id,title,date_release,...,price_original,discount,steam_deck,description,tags,products,reviews,elapsed_time,avg_rating,rel_rec
0,975370,0,0,2022-12-12,True,36.3,19954,0,Dwarf Fortress,2022-12-06,...,29.99,0.0,True,"The deepest, most intricate simulation of a wo...","[Colony Sim, Indie, Pixel Graphics, Simulation...",28,3,6 days,1.0,0.0
1,304390,4,0,2017-02-17,False,11.5,1098,1,FOR HONOR™,2017-02-13,...,14.99,0.0,True,Carve a path of destruction through an intense...,"[Medieval, Swordplay, Action, Multiplayer, PvP...",269,1,4 days,0.0,0.0
2,1085660,2,0,2019-11-17,True,336.5,91207,2,Destiny 2,2019-10-01,...,0.00,0.0,True,Destiny 2 is an action MMO with a single evolv...,"[Free to Play, Open World, FPS, Looter Shooter...",237,2,47 days,1.0,0.0
3,703080,0,0,2022-09-23,True,27.4,93054,3,Planet Zoo,2019-11-05,...,44.99,0.0,True,Build a world for wildlife in Planet Zoo. From...,"[Management, Simulation, Building, Sandbox, Na...",5,2,1053 days,1.0,0.0
4,526870,0,0,2021-01-10,True,7.9,9106,4,Satisfactory,2020-06-08,...,29.99,0.0,True,Satisfactory is a first-person open-world fact...,"[Base Building, Automation, Open World, Multip...",13,2,216 days,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10072265,225540,0,0,2020-10-28,True,200.1,5465564,10072265,Just Cause™ 3,2015-11-30,...,19.99,0.0,True,With over 1000 km² of complete freedom from sk...,"[Open World, Action, Destruction, Third-Person...",352,1,1794 days,1.0,0.0
10072266,225540,0,0,2022-07-31,True,187.0,3903623,10072266,Just Cause™ 3,2015-11-30,...,19.99,0.0,True,With over 1000 km² of complete freedom from sk...,"[Open World, Action, Destruction, Third-Person...",129,1,2435 days,1.0,0.0
10072267,225540,9,0,2015-12-19,True,44.7,2465684,10072267,Just Cause™ 3,2015-11-30,...,19.99,0.0,True,With over 1000 km² of complete freedom from sk...,"[Open World, Action, Destruction, Third-Person...",25,1,19 days,1.0,0.0
10072268,225540,0,0,2021-10-15,True,11.9,2173819,10072268,Just Cause™ 3,2015-11-30,...,19.99,0.0,True,With over 1000 km² of complete freedom from sk...,"[Open World, Action, Destruction, Third-Person...",225,2,2146 days,1.0,0.0
