# Org, Jupyter Compatibility

First, activate the environment that has a jupyter installation. Then
load the `emacs/jupyter` package.

``` commonlisp
(myPython/activate-conda-env "~/.opt/miniconda3/envs/nielit-project")
```

This can be used to convert the `org` file into a `ipynb` file.

``` shell
pandoc game_recommender.org -o game_recommender.ipynb
```

# The Report

## The Problem Statement

> Given a list of games previously played by a user, recommend new
> games.

## Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


## The Data Set

The dataset was obtained from
[Kaggle](https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam?select=games.csv).
It comprises of 4 files,

games.csv  
A record of games. Here, we are mainly interested in the title, ratings,
and the final price.

games_metadata.json  
Provides the tags/genres for the games. This can be used to find
similarities between games.

users.csv  
A list of users, the number of games they bought, and the number of
times they reviewed anything. The number of games bought can be used to
filter out a number of users from being considered.

recommendations.csv  
List of user reviews, in a form that answers if said user recommends a
certain game or not.

## Data Cleaning

It is worth noting that these files, `users` and `recommendations` in
particular, are much too large. Quite a few fields are of no use to us.

### Games

In [2]:
games = pd.read_csv("games.csv", index_col="app_id")
games.head()


                                    title date_release   win    mac  linux  \
app_id                                                                       
13500   Prince of Persia: Warrior Within™   2008-11-21  True  False  False   
22364             BRINK: Agents of Change   2011-08-03  True  False  False   
113020       Monaco: What's Yours Is Mine   2013-04-24  True   True   True   
226560                 Escape Dead Island   2014-11-18  True  False  False   
249050            Dungeon of the ENDLESS™   2014-10-27  True   True  False   

               rating  positive_ratio  user_reviews  price_final  \
app_id                                                             
13500   Very Positive              84          2199         9.99   
22364        Positive              85            21         2.99   
113020  Very Positive              92          3722        14.99   
226560          Mixed              61           873        14.99   
249050  Very Positive              88        

We are not concerned with the release dates, discounts or the original
prices, only the final price of the game; we can safely drop the
corresponding fields.

In [None]:
games = games.drop(["date_release", "price_original", "discount"], axis=1)


It's pobably a lot more efficient to just target a single platform, so
let's choose the one with the largest number of games compatible.

In [None]:
def plot_games_in_platform():
    platforms: list[str] = ["win", "steam_deck", "mac", "linux"]
    games_in_platform = games[platforms].sum()
    plt.bar(games_in_platform.keys(), games_in_platform, color=sns.color_palette("pastel"))
    plt.title("Number of games compatible with each platform")
    plt.xlabel("Platform")
    plt.ylabel("Compatible games")
    plt.show()

plot_games_in_platform()


We note that the Stem Deck has the best compatibility, so, for
simplicity's sake we assume that every user owns one, and we can drop
the other 3.

In [None]:
games = games.drop(["win", "linux", "mac"], axis=1)


We also drop the games that are not compatible with the Steam Deck.

In [None]:
games["steam_deck"] = games["steam_deck"].replace(False, np.nan)
games = games.dropna(subset=["steam_deck"])


This leaves us with,

In [None]:
remaining_game_entries = games.shape[0]
f"{remaining_game_entries} Entries"


Which is still very large, so we discard 30% of it

In [None]:
games = games.head(int(remaining_game_entries * 70 / 100))
f"{games.shape[0]} Entries"


Finally, we write this file to disk for future use.

In [None]:
games.to_csv("games_processed.csv")
