## Main Task
Building a Game Recommendation System with Steam Platform Data!

### Data Understanding
The dataset contains over 41 million cleaned and preprocessed user recommendations (reviews) from a Steam Store - a leading online platform for purchasing and downloading video games, DLC, and other gaming-related content.  
Additionally, it contains detailed information about games and add-ons.  

The dataset consists of three main entities:

1. **games.csv** - a table of games (or add-ons) information on ratings, pricing in US dollars $, release date, etc. A piece of extra non-tabular details on games, such as descriptions and tags, is in a metadata file;
2. **users.csv** - a table of user profiles' public information: the number of purchased products and reviews published;
3. **recommendations.csv** - a table of user reviews: whether the user recommends a product. The table represents a many-many relation between a game entity and a user entity.

The dataset does not contain any personal information about users on a Steam Platform. A preprocessing pipeline anonymized all user IDs. All collected data is accessible to a member of the general public.

Link to the dataset: [https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam/data]

In [1]:
import pandas as pd

In [2]:
games_df = pd.read_csv('datasets/games.csv')
users_df = pd.read_csv('datasets/users.csv')
recommendations_df = pd.read_csv('datasets/recommendations.csv')

In [3]:
games_df.head(2)

Unnamed: 0,app_id,title,date_release,win,mac,linux,rating,positive_ratio,user_reviews,price_final,price_original,discount,steam_deck
0,13500,Prince of Persia: Warrior Within™,2008-11-21,True,False,False,Very Positive,84,2199,9.99,9.99,0.0,True
1,22364,BRINK: Agents of Change,2011-08-03,True,False,False,Positive,85,21,2.99,2.99,0.0,True


In [4]:
users_df.head(2)

Unnamed: 0,user_id,products,reviews
0,7360263,359,0
1,14020781,156,1


In [5]:
recommendations_df.head(2)

Unnamed: 0,app_id,helpful,funny,date,is_recommended,hours,user_id,review_id
0,975370,0,0,2022-12-12,True,36.3,51580,0
1,304390,4,0,2017-02-17,False,11.5,2586,1


### Data PreProcessing

In [6]:
games_df.isnull().sum()

app_id            0
title             0
date_release      0
win               0
mac               0
linux             0
rating            0
positive_ratio    0
user_reviews      0
price_final       0
price_original    0
discount          0
steam_deck        0
dtype: int64

In [7]:
users_df.isnull().sum()

user_id     0
products    0
reviews     0
dtype: int64

In [8]:
recommendations_df.isnull().sum()

app_id            0
helpful           0
funny             0
date              0
is_recommended    0
hours             0
user_id           0
review_id         0
dtype: int64

### EDA on Games Dataset

In [9]:
games_df.columns

Index(['app_id', 'title', 'date_release', 'win', 'mac', 'linux', 'rating',
       'positive_ratio', 'user_reviews', 'price_final', 'price_original',
       'discount', 'steam_deck'],
      dtype='object')