# Recommendation Systems using Steam Data - Preprocessing<a id='top'></a>

Steam is the ultimate destination for playing, discussing, and creating games. As such, several developers and players use this platform to share their games and experiences. To optimize the experience of each user, we decided to create recommendation systems which will enhance game recommendations, predict the games success,optimize game ratings, evaluate the impact of each update and provide better insights to the developers and the company itself.


Our work is divided in **XXXXXXXXXXXXXXXx** notebooks. The first two notebooks will be used to extract the necessary data to train and evaluate our recomendation systems. Following, we will explore and prepare the dataa to train our models. The next step is to create the recommendation systems which will then be evaluated.

In this first notebook, we started by extracting the data using the Steam API as instructed in https://partner.steamgames.com/doc/webapi_overview.

The structure of this notebook is as follows:

[0. Import Libraries](#libraries) <br>
[1. Select Games](#select) <br>
&emsp; [1.1. Create Games List](#create_list) <br>
&emsp; [1.2. Filtered Games](#filtered) <br>
[2. Data Exploration](#explore) <br>
&emsp; [2.1. Current List of Games](#current) <br>
&emsp; [2.2. Find by Term](#term) <br>
&emsp; [2.3. Find by App ID](#app_id) <br>
[3. Extract Games Data](#extract) <br>
&emsp; [3.1. Select Games](#select_data) <br>
&emsp; [3.2. Extract Data](#final_extract) <br>


# 0. Import Libraries<a id='libraries'></a>
[to the top](#top)  

The first step is to import the necessary libraries.

In [2]:
import polars as pl
from utils import *

# 1. Select Games<a id='select'></a>
[to the top](#top)  

Before starting to extract any games data, our first step is going to be the selection of the games which data we will use.

## 1.1. Create Games List<a id='create_list'></a>
[to the top](#top)  

To select the games, we create a list from a json file created using the information in http://api.steampowered.com/ISteamApps/GetAppList/v2/.

In [2]:
games_list = get_steam_gamesIDs()

Below, we can see that the link above provides app IDs of 198607 games.

In [3]:
print(len(games_list))

198607


## 1.2. Filtered Games<a id='filtered'></a>
[to the top](#top)  

As 198607 games evolve using a high computational power or requires a bigger time spam, we decided to filter the games that required specific words od that had characters that are not common in most of the european languages.

In [4]:
filtered_games = clean_games_list(games_list)

Below, we can see that the size of the dataset was reduced to 152312 games.

In [5]:
print(len(filtered_games))

152312


## 2. Data Exploration<a id='explore'></a>
[to the top](#top)  

To know a litle more about the data we are going to extract, we decided to do a small initial data exploration.

## 2.1. Current List of Games<a id='current'></a>
[to the top](#top)  

Below, we can see the last values of the list that we created which include the games ID and name.

In [6]:
filtered_games[-10:]

[{'appid': 1978590, 'name': 'Anger Foot'},
 {'appid': 2892660, 'name': "DON'T EXIST"},
 {'appid': 1534840, 'name': 'Hyper Light Breaker'},
 {'appid': 1243850, 'name': 'This Bed We Made'},
 {'appid': 701160, 'name': 'Kingdom Two Crowns'},
 {'appid': 564230, 'name': 'Fire Pro Wrestling World'},
 {'appid': 2307350, 'name': 'Into the Radius 2'},
 {'appid': 1494560, 'name': 'Rogue Voltage'},
 {'appid': 981750, 'name': 'Crystar'},
 {'appid': 1113010, 'name': 'Chornobyl Liquidators'}]

## 2.2. Find by Term<a id='term'></a>
[to the top](#top)  

After, we decided to create a function that let's us search for any term in the games name.

In [7]:
search = find_games_by_term(filtered_games, " ")
search[:5]

[{'appid': 1344950, 'name': 'Lone King'},
 {'appid': 1344970, 'name': 'Castle Heck'},
 {'appid': 1344980, 'name': 'GUNPIG: Firepower For Hire'},
 {'appid': 1344990, 'name': "Zefyr: A Thief's Melody"},
 {'appid': 1345030, 'name': 'Bomber Games'}]

## 2.3. Find by App ID<a id='app_id'></a>
[to the top](#top)  

Following, we created another function that can find the games using the App ID.

In [8]:
appid = 10
find_game_name(filtered_games, appid)

'Counter-Strike'