# FAST AI JOURNEY: PART 1. LESSON 4. 
## 20 YEARS OF GAMES PROJECT. COLLABORATIVE FILTERING AND TABULAR MODELS.

In this new project, we will analyze the '20 Years of Games' Dataset, available on Kaggle, using what we have learned on collaborative filtering and tabular data.

Every notebook starts with the following three lines; they ensure that any edits to libraries you make are reloaded here automatically, and also that any charts or images displayed are shown in this notebook.

# Tabular Models.

In [1]:
from fastai import *
from fastai.tabular import *

## Getting the Data.

The Steam Video Games Dataset isn't available on the [fastai dataset page](https://course.fast.ai/datasets) due to copyright restrictions. You can download it from Kaggle however. Let's see how to do this by using the [Kaggle API](https://github.com/Kaggle/kaggle-api) as it's going to be pretty useful to you if you want to join a competition or use other Kaggle datasets later on.

First, install the Kaggle API by uncommenting the following line and executing it, or by executing it in your terminal (depending on your platform you may need to modify this slightly to either add `source activate fastai` or similar, or prefix `pip` with a path. Have a look at how `conda install` is called for your platform in the appropriate *Returning to work* section of https://course-v3.fast.ai/. (Depending on your environment, you may also need to append "--user" to the command.)

In [2]:
#! pip install kaggle --upgrade

Then you need to upload your credentials from Kaggle on your instance. Login to kaggle and click on your profile picture on the top left corner, then 'My account'. Scroll down until you find a button named 'Create New API Token' and click on it. This will trigger the download of a file named 'kaggle.json'.

Upload this file to the directory this notebook is running in, by clicking "Upload" on your main Jupyter page, then uncomment and execute the next two commands (or run them in a terminal).

In [3]:
#! mkdir -p ~/.kaggle/
#! mv kaggle.json ~/.kaggle/

You're all set to download the data from [20 Years of Games](https://www.kaggle.com/egrinstein/20-years-of-games/version/2).

In [4]:
#! chmod 600 /home/jupyter/.kaggle/kaggle.json

In [5]:
path = Path('data/ign')
path.mkdir(parents=True, exist_ok=True)
path

PosixPath('data/ign')

In [6]:
#! kaggle datasets download -d egrinstein/20-years-of-games -f ign.csv -p {path}
#! unzip -q -n {path}/ign.csv.zip -d {path}

Tabular data should be in a Pandas `DataFrame`.

In [7]:
df = pd.read_csv(path/'ign.csv')
df['one_hot_score'] = df['score'].map(lambda x: 0 if x <7 else 1)

cols = list(df.columns.values)
cols = ['score_phrase', 'title', 'url','platform','score','one_hot_score','genre',
        'editors_choice','release_year','release_month', 'release_day',]

df = df[cols]

df.head()

Unnamed: 0,score_phrase,title,url,platform,score,one_hot_score,genre,editors_choice,release_year,release_month,release_day
0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,1,Platformer,Y,2012,9,12
1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero E...,/games/littlebigplanet-ps-vita-marvel-super-he...,PlayStation Vita,9.0,1,Platformer,Y,2012,9,12
2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,1,Puzzle,N,2012,9,12
3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,1,Sports,N,2012,9,11
4,Great,NHL 13,/games/nhl-13/ps3-128181,PlayStation 3,8.5,1,Sports,N,2012,9,11


In [8]:
dep_var = 'one_hot_score'

#Here we do not include 'score_phrase' or 'editors_choice' as factors
cat_names = ['title', 'platform', 'genre', 
             'release_year', 'release_month', 'release_day']


procs = [FillMissing, Categorify, Normalize]

In [9]:
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names)

In [10]:
data = (TabularList.from_df(df, path=path, cat_names=cat_names, procs=procs)
                           .split_by_idx(list(range(800,1000)))
                           .label_from_df(cols=dep_var)
                           .add_test(test, label=0)
                           .databunch())

In [11]:
data.show_batch(rows=10)

title,platform,genre,release_year,release_month,release_day,target
Battlefield: Bad Company 2,PC,Shooter,2010,3,2,1
Tetris Worlds,PC,Puzzle,2002,1,9,0
WWE SmackDown vs. Raw 2008,PlayStation Portable,Wrestling,2007,11,1,0
Mortal Kombat: Shaolin Monks,PlayStation 2,"Fighting, Action",2005,9,16,1
Moon Diver,PlayStation 3,Action,2011,4,4,1
Magic: The Gathering -- Duels of the Planeswalkers 2013,iPad,"Card, Battle",2012,6,25,1
Chop Chop Runner,iPhone,Action,2010,4,7,0
Watchmen: The End is Nigh -- Part 2,PC,Action,2009,8,26,0
Mario Bros.-e,Game Boy Advance,Platformer,2002,11,15,0
Serious Sam: The Second Encounter,PC,Shooter,2002,2,6,1


In [12]:
learn = tabular_learner(data, layers=[200,100], metrics=accuracy)

In [13]:
learn.model

TabularModel(
  (embeds): ModuleList(
    (0): Embedding(12442, 50)
    (1): Embedding(60, 31)
    (2): Embedding(113, 50)
    (3): Embedding(23, 12)
    (4): Embedding(13, 7)
    (5): Embedding(32, 17)
  )
  (emb_drop): Dropout(p=0.0)
  (bn_cont): BatchNorm1d(0, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layers): Sequential(
    (0): Linear(in_features=167, out_features=200, bias=True)
    (1): ReLU(inplace)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Linear(in_features=200, out_features=100, bias=True)
    (4): ReLU(inplace)
    (5): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): Linear(in_features=100, out_features=2, bias=True)
  )
)

In [14]:
learn.fit(1, 1e-2)

Total time: 00:02
epoch  train_loss  valid_loss  accuracy
1      0.530561    0.692435    0.600000  (00:02)



## Inference.

In [15]:
row = df.iloc[1]
row

score_phrase                                                Amazing
title             LittleBigPlanet PS Vita -- Marvel Super Hero E...
url               /games/littlebigplanet-ps-vita-marvel-super-he...
platform                                           PlayStation Vita
score                                                             9
one_hot_score                                                     1
genre                                                    Platformer
editors_choice                                                    Y
release_year                                                   2012
release_month                                                     9
release_day                                                      12
Name: 1, dtype: object

In [16]:
learn.predict(row)

(1, tensor(0), tensor([0.9383, 0.0617]))