In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ast import literal_eval

import warnings
warnings.filterwarnings('ignore')

In [2]:
rawg_data = pd.read_csv('01_rawg_clean.csv', parse_dates = ['released'])

rawg_data.head()

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,"['PC', 'Xbox Series S/X', 'PlayStation 5', 'Pl...","['Action', 'Adventure']","['Epic Games', 'PlayStation Store', 'Xbox Stor...","['Singleplayer', 'Steam Achievements', 'Multip...",Mature
1,portal-2,Portal 2,2011-04-18,95.0,582,"['Xbox One', 'PlayStation 3', 'PC', 'Xbox 360'...","['Shooter', 'Puzzle']","['Xbox Store', 'Xbox 360 Store', 'PlayStation ...","['Singleplayer', 'Steam Achievements', 'Multip...",Everyone 10+
2,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,2015-05-18,92.0,678,"['PC', 'Xbox One', 'Nintendo Switch', 'PlaySta...","['Action', 'Adventure', 'RPG']","['GOG', 'Xbox Store', 'Steam', 'PlayStation St...","['Singleplayer', 'Atmospheric', 'Full controll...",Mature
3,tomb-raider,Tomb Raider (2013),2013-03-05,86.0,664,"['PC', 'PlayStation 4', 'PlayStation 3', 'Xbox...","['Action', 'Adventure']","['App Store', 'Google Play', 'PlayStation Stor...","['Singleplayer', 'Multiplayer', 'Atmospheric',...",Mature
4,the-elder-scrolls-v-skyrim,The Elder Scrolls V: Skyrim,2011-11-11,94.0,621,"['PC', 'PlayStation 3', 'Xbox 360', 'Nintendo ...","['Action', 'RPG']","['Xbox 360 Store', 'Nintendo Store', 'Steam', ...","['Singleplayer', 'Steam Achievements', 'steam-...",Mature


# Unpacking columns

We want to unpack values inside the list-like columns, specially `platforms` (we can leave the rest for later, as we can directly do it on the feature engineering process).

The idea is to copy the raw for every platform listed in the column so that it looks like the following:

| slug | name | released | metacritic | suggestions_count | platforms | (...) |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| grand-theft-auto-v | Grand Theft Auto V | 2013-09-17 | 97.0 |	416 | PC | (...) |
| grand-theft-auto-v | Grand Theft Auto V | 2013-09-17 | 97.0 |	416 | Xbox Series S/X | (...) |
| grand-theft-auto-v | Grand Theft Auto V | 2013-09-17 | 97.0 |	416 | PlayStation 5 | (...) |


## But first...

CSV files has the problem that saves lists as str, so we have to convert them back to be able to access the data inside them. To do so, we will apply the `literal_eval` function from the `ast` library.

After the first cleaning, there are rows in the list-like columns which are not lists, as the list was a single-element list. We will not apply `literal_eval` on them as they raise an error.

In [3]:
rawg_data['platforms'] = rawg_data['platforms'].apply(lambda x: literal_eval(x) if pd.notnull(x) and ('[' in x) else x)
rawg_data['genres'] = rawg_data['genres'].apply(lambda x: literal_eval(x) if pd.notnull(x) and ('[' in x) else x)
rawg_data['stores'] = rawg_data['stores'].apply(lambda x: literal_eval(x) if pd.notnull(x) and ('[' in x) else x)
rawg_data['tags'] = rawg_data['tags'].apply(lambda x: literal_eval(x) if pd.notnull(x) and ('[' in x) else x)

---

## NaN values

Now that lists are lists again, let's look at the `NaN` values in this dataset to study them.

In [4]:
rawg_data.isna().sum()

slug                      0
name                      0
released              24608
metacritic           516685
suggestions_count         0
platforms              4077
genres               116156
stores                30070
tags                  62068
esrb_rating          463781
dtype: int64

---
### Platforms' NaNs


On a first glance, we may think thet it would be a good idea to remove them in the `platforms` column, as not having a platform means that the game won't be able to be merged into the VCG dataset.

Let's see what kind of games have `NaN` in the `platforms` column.

In [5]:
rawg_data[rawg_data['platforms'].isna()]

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
5211,half-life-2-downfall,Half-Life 2: Downfall,2017-04-17,,311,,"[Action, Shooter]",Steam,"[Mod, destroy]",
6967,minerva-metastasis,MINERVA: Metastasis,2005-09-02,,216,,Shooter,Steam,Mod,
15281,witchfire,Witchfire,NaT,,185,,Shooter,,,
19904,spacewar-2,Spacewar!,1962-01-01,,37,,,,"[Space, combat]",
21229,urban-legends,Urban Legends,2015-03-02,,201,,Puzzle,itch.io,"[Space, puzzles, gun, Gravity]",
...,...,...,...,...,...,...,...,...,...,...
510641,le-yuan-sheng-huo-hitsuzicun,楽園生活ひつじ村,2014-01-01,,386,,,,Free to Play,
510649,circle-of-mana,Circle of Mana,2013-03-05,,213,,,,"[battle, sword, balance, tree]",
510769,antraxx,Antraxx,NaT,,359,,Shooter,,"[Character Customization, Isometric, Mechs]",
510771,the-pyramid-gate,The Pyramid Gate,2014-05-06,,42,,Adventure,,"[Exploration, Pixel Graphics, Psychedelic, Abs...",


What we can see here is that when a game has a `NaN` on the `platforms` column, some other list-like columns can also have `NaN` on them.

However, if we look at, for example, the `stores` column more in detail...

In [6]:
rawg_data[rawg_data['platforms'].isna()]['stores'].value_counts()

itch.io              3273
PlayStation Store      34
Google Play             7
Steam                   4
Xbox Store              1
Name: stores, dtype: int64

We see that there is some portential problems there:

- A game that can be bought in steam should have `PC` on the `Platforms` column.

- The same happens to `Playstation Store` and `Xbox Store`. The problem here is that they are DLCs as we can see below, and they already have their counterpart for consoles on the dataset.

In [7]:
nan_plat = rawg_data[rawg_data['platforms'].isna()]
ps_store = rawg_data['stores'] == 'PlayStation Store'

nan_plat[ps_store]

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
54769,werewolf-the-apocalypse-earthblood-ps4-and-ps5,Werewolf: The Apocalypse – Earthblood PS4 and PS5,2021-02-04,,408,,,PlayStation Store,"[War, Blood, Destruction, combat, console, pla...",
55152,wrc-9-fia-world-rally-championship-ps4-and-ps5,WRC 9 FIA World Rally Championship PS4 and PS5,2020-11-12,,274,,Racing,PlayStation Store,"[Multiplayer, cars, console, japan, offline]",
55154,borderlands-3-ps4-and-ps5,Borderlands 3 PS4 and PS5,2020-11-12,,275,,"[Action, Shooter]",PlayStation Store,"[Multiplayer, online, friends, console, skill,...",
71297,dreaming-sarah-ps4-and-ps5,Dreaming Sarah PS4 and PS5,2021-03-05,,178,,Adventure,PlayStation Store,"[environment, console, brain, girl, collect, w...",
73545,yakuza-like-a-dragon-ps4-and-ps5,Yakuza: Like a Dragon PS4 and PS5,2021-03-02,,366,,,PlayStation Store,"[RPG, Crime, combat, party, city, race, hero, ...",
75403,thunderflash-ps4-and-ps5,Thunderflash PS4 and PS5,2021-02-26,,228,,"[Action, Arcade]",PlayStation Store,"[Multiplayer, Retro, War, combat, console, wav...",
80966,anodyne-2-return-to-dust-ps4-and-ps5,Anodyne 2: Return to Dust PS4 and PS5,2021-02-18,,274,,Adventure,PlayStation Store,"[friends, explore, console, car, art, offline,...",
83968,ultragoodness-2-ps4-and-ps5,UltraGoodness 2 PS4 and PS5,2021-02-09,,155,,"[Action, Arcade]",PlayStation Store,"[Dark, Blood, battle, fun, console, Traps, bra...",
85228,nioh-2-remastered-ps5-upgrade,Nioh 2 Remastered (PS5 Upgrade),2021-02-05,,206,,Action,PlayStation Store,"[Multiplayer, RPG, combat, online, death, Mons...",
91736,atelier-ryza-2-lost-legends-and-the-secret-fai...,Atelier Ryza 2: Lost Legends and the Secret Fa...,2021-01-26,,183,,RPG,PlayStation Store,"[Story, battle, Underwater, island, console, t...",


In [8]:
rawg_data[rawg_data['name'].str.contains('Assassin\'s Creed Valhalla')]

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
954,assassins-creed-valhalla,Assassin's Creed Valhalla,2020-11-10,83.0,449,"[PlayStation 5, Xbox One, PC, Xbox Series S/X,...","[Action, Adventure, RPG]","[PlayStation Store, Xbox Store, Epic Games]","[Fantasy, vikings]",Mature
121688,assassins-creed-valhalla-ultimate-ps4-and-ps5,Assassin's Creed Valhalla Ultimate PS4 and PS5,2020-11-12,,512,,RPG,PlayStation Store,"[Assassin, character, Epic, console, collectio...",


In [9]:
rawg_data[rawg_data['name'].str.contains('Borderlands 3')]

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
495,borderlands-3,Borderlands 3,2019-09-13,83.0,693,"[PC, Xbox Series S/X, PlayStation 5, PlayStati...","[Action, Shooter, Adventure, RPG]","[Steam, PlayStation Store, Xbox Store, Epic Ga...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
22381,borderlands-3-moxxis-heist-of-the-handsome-jac...,Borderlands 3: Moxxi’s Heist of the Handsome J...,2019-12-19,78.0,373,"[Xbox One, PlayStation 4, PC]","[Action, Shooter, Adventure]","[PlayStation Store, Xbox Store, Epic Games]",,Mature
55154,borderlands-3-ps4-and-ps5,Borderlands 3 PS4 and PS5,2020-11-12,,275,,"[Action, Shooter]",PlayStation Store,"[Multiplayer, online, friends, console, skill,...",


---
Let's see if we can gather more insight on the `Steam` department.

In [10]:
steam = rawg_data['stores'] == 'Steam'

nan_plat[steam]

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
5211,half-life-2-downfall,Half-Life 2: Downfall,2017-04-17,,311,,"[Action, Shooter]",Steam,"[Mod, destroy]",
6967,minerva-metastasis,MINERVA: Metastasis,2005-09-02,,216,,Shooter,Steam,Mod,
32216,endless-dungeons,Endless Dungeon,NaT,,326,,Action,Steam,,
54669,zenith-the-last-city,Zenith: The Last City,NaT,,260,,"[Action, Adventure, RPG]",Steam,,


What we see is that 2 of them are MODs (= free content) and the other 2 are games **not released yet**. Thus, we can remove all these data without losing information. as we do not want mobile-exclusive games and we will not consider itch.io for the moment, as the VGC dataset does not contain data from there, thus no sales data.

---

## Dropping rows

Let's then proceed with the removal of rows having the one or more of the next characteristics:

- `platforms` is `NaN`

- Mobile-only games, thus `platforms` only contain whether `Android` or `iOS`

- `stores` has only `itch.io`


In [11]:
plat_nan = list(rawg_data[rawg_data['platforms'].isna()].index)
android = list(rawg_data[rawg_data['platforms']=='Android'].index)
ios = list(rawg_data[rawg_data['platforms']=='iOS'].index)

plat_to_drop = plat_nan + android + ios

rawg_data = rawg_data.drop(index=plat_to_drop)\
                     .reset_index(drop=True)

rawg_data.shape[0]

439480

In [12]:
itchio = list(rawg_data[rawg_data['stores']=='itch.io'].index)

rawg_data = rawg_data.drop(index=itchio)\
                     .reset_index(drop=True)

rawg_data.shape[0]

94591

We have drastically reduced the dimension of the dataset to almost 20% of the original size.

Let's check how many `NaN` do we have:

In [13]:
rawg_data.isna().sum()

slug                     0
name                     0
released             13657
metacritic           89673
suggestions_count        0
platforms                0
genres               10253
stores               27811
tags                 19760
esrb_rating          79698
dtype: int64

In [14]:
nan_release = list(rawg_data[rawg_data['released'].isna()].index)

rawg_data = rawg_data.drop(index = nan_release)\
                     .reset_index(drop=True)

rawg_data

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platforms,genres,stores,tags,esrb_rating
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,"[PC, Xbox Series S/X, PlayStation 5, PlayStati...","[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
1,portal-2,Portal 2,2011-04-18,95.0,582,"[Xbox One, PlayStation 3, PC, Xbox 360, Linux,...","[Shooter, Puzzle]","[Xbox Store, Xbox 360 Store, PlayStation Store...","[Singleplayer, Steam Achievements, Multiplayer...",Everyone 10+
2,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,2015-05-18,92.0,678,"[PC, Xbox One, Nintendo Switch, PlayStation 4]","[Action, Adventure, RPG]","[GOG, Xbox Store, Steam, PlayStation Store]","[Singleplayer, Atmospheric, Full controller su...",Mature
3,tomb-raider,Tomb Raider (2013),2013-03-05,86.0,664,"[PC, PlayStation 4, PlayStation 3, Xbox 360, X...","[Action, Adventure]","[App Store, Google Play, PlayStation Store, St...","[Singleplayer, Multiplayer, Atmospheric, Full ...",Mature
4,the-elder-scrolls-v-skyrim,The Elder Scrolls V: Skyrim,2011-11-11,94.0,621,"[PC, PlayStation 3, Xbox 360, Nintendo Switch]","[Action, RPG]","[Xbox 360 Store, Nintendo Store, Steam, PlaySt...","[Singleplayer, Steam Achievements, steam-tradi...",Mature
...,...,...,...,...,...,...,...,...,...,...
80929,crumble-zone,Crumble Zone,2012-11-29,,95,"[Android, iOS]","[Action, Arcade]","[App Store, Google Play]","[Multiplayer, Space, Colorful, achievements, f...",
80930,delta-strike-first-assault,Delta Strike: First Assault,2016-05-31,,356,PS Vita,Action,PlayStation Store,"[combat, online, Tanks, drone]",Everyone 10+
80931,siege-hero-wizards,Siege Hero Wizards,2013-09-05,,193,"[iOS, Android]","[Action, Puzzle]","[App Store, Google Play]","[Physics, Cartoon, hero, Monsters, wizard]",Everyone 10+
80932,velocispider,Velocispider,2011-06-01,,90,"[iOS, Android]","[Action, Arcade, Casual]","[App Store, Google Play]","[Retro, Robots, character, fun, shoot, art, sp...",Teen


---

Now we would want to unpack the rows to single-platform columns...but how do we accomplish that?

We could do the following:

- Create an auxiliar DataFrame `rawg_aux`

- Iterate over every row with `for idx, row in rawg_data.iterrows()`

- For every platform, append the columns to `rawg_aux`

\* **Ideally**, we would to do this step with a function being applied to the dataframe. 

However, we haven't come up with the idea to how to do it yet.

In [15]:
rawg_aux = pd.DataFrame(columns = rawg_data.columns)

In [16]:
for idx, row in rawg_data.iterrows():
    
    plats = row['platforms']
    
    if type(plats) != str: # We do not want to do this loop if the game only has one platform (thus, no list-like = string).
        for element in plats:      

            aux = row.copy()
            aux['platforms'] = element

            rawg_aux = rawg_aux.append(aux)
    
    else:
        
        aux = row.copy()
        aux['platforms'] = plats

        rawg_aux = rawg_aux.append(aux)
        
    if idx % 10000 == 0:
        print(idx)


rawg_aux = rawg_aux.rename(columns = {'platforms': 'platform',
                                      'esrb_rating': 'esrb'})

print('Finished!')

0
10000
20000
30000
40000
50000
60000
70000
80000
Finished!


In [17]:
rawg_data = rawg_aux.reset_index(drop=True)
rawg_data

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platform,genres,stores,tags,esrb
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PC,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
1,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,Xbox Series S/X,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
2,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 5,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
3,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 4,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
4,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 3,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
...,...,...,...,...,...,...,...,...,...,...
140611,siege-hero-wizards,Siege Hero Wizards,2013-09-05,,193,Android,"[Action, Puzzle]","[App Store, Google Play]","[Physics, Cartoon, hero, Monsters, wizard]",Everyone 10+
140612,velocispider,Velocispider,2011-06-01,,90,iOS,"[Action, Arcade, Casual]","[App Store, Google Play]","[Retro, Robots, character, fun, shoot, art, sp...",Teen
140613,velocispider,Velocispider,2011-06-01,,90,Android,"[Action, Arcade, Casual]","[App Store, Google Play]","[Retro, Robots, character, fun, shoot, art, sp...",Teen
140614,kitten-sanctuary,Kitten Sanctuary,2009-03-13,,365,iOS,"[Family, Puzzle]",App Store,"[Cute, Aliens, achievements, Story, fun, cats,...",Everyone 10+


---
# Checking for mobile games after the unpacking

In [18]:
sorted(rawg_data['platform'].unique())

['3DO',
 'Android',
 'Apple II',
 'Atari 2600',
 'Atari 5200',
 'Atari 7800',
 'Atari 8-bit',
 'Atari Flashback',
 'Atari Lynx',
 'Atari ST',
 'Atari XEGS',
 'Classic Macintosh',
 'Commodore / Amiga',
 'Dreamcast',
 'Game Boy',
 'Game Boy Advance',
 'Game Boy Color',
 'Game Gear',
 'GameCube',
 'Genesis',
 'Jaguar',
 'Linux',
 'NES',
 'Neo Geo',
 'Nintendo 3DS',
 'Nintendo 64',
 'Nintendo DS',
 'Nintendo DSi',
 'Nintendo Switch',
 'PC',
 'PS Vita',
 'PSP',
 'PlayStation',
 'PlayStation 2',
 'PlayStation 3',
 'PlayStation 4',
 'PlayStation 5',
 'SEGA 32X',
 'SEGA CD',
 'SEGA Master System',
 'SEGA Saturn',
 'SNES',
 'Web',
 'Wii',
 'Wii U',
 'Xbox',
 'Xbox 360',
 'Xbox One',
 'Xbox Series S/X',
 'iOS',
 'macOS']

We can see that after unpacking the `platforms` column into `platform`, we get `iOS` and `Android` again.

We are going to drop the raws containing those data by getting the indexes and using the `pd.DataFrame.drop()` method.

In [19]:
ios = list(rawg_data[rawg_data['platform']=='iOS'].index)
android = list(rawg_data[rawg_data['platform']=='Android'].index)

drop = ios + android

rawg_data = rawg_data.drop(index = drop)\
                     .reset_index(drop=True)

rawg_data

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platform,genres,stores,tags,esrb
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PC,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
1,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,Xbox Series S/X,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
2,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 5,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
3,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 4,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
4,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 3,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature
...,...,...,...,...,...,...,...,...,...,...
122922,docs-for-playstation,Docs for PlayStation,2016-05-18,,0,PlayStation 3,,PlayStation Store,"[Music, online, console, art]",
122923,docs-for-playstation,Docs for PlayStation,2016-05-18,,0,PlayStation 4,,PlayStation Store,"[Music, online, console, art]",
122924,bee-leader,Bee Leader,2012-05-24,,45,macOS,"[Action, Arcade, Casual]",App Store,"[Cute, Physics, achievements, Music, city, Fli...",
122925,delta-strike-first-assault,Delta Strike: First Assault,2016-05-31,,356,PS Vita,Action,PlayStation Store,"[combat, online, Tanks, drone]",Everyone 10+


# Getting the url for the metacritic page for each game

## Listing the platforms available in Metacritic into a dictionary

Remember the metacritic page for a game has the next format:

https://www.metacritic.com/game/PLATFORM/SLUG

But not every platform in this dataset is available, as this page did not exist.

We will list the platforms available in Metacritic inside a dictionary, using the `Platform` column counterpart as the key value:
`RAWG-platform: Metacritic-platform`

In [20]:
platforms_metacritic = {
    
    # Sony
    
    "PlayStation 5": "playstation-5",
    "PlayStation 4": "playstation-4",
    "PlayStation 3": "ps3",
    "PlayStation 2": "ps2",
    "PlayStation": "ps",
    "PS Vita": "vita",
    "PSP": "psp",
    
    # Microsoft
    
    "Xbox One": "xbox-one",
    "Xbox Series S/X": "xbox-series-x",
    "Xbox 360": "xbox-360",
    "Xbox": "xbox",
    
    # Nintendo
    
    "Nintendo Switch": "switch",
    "Wii U": "wii-u",
    "Wii": "wii",
    "GameCube": "gamecube",
    "Nintendo 64": "n64",
    "Nintendo 3DS": "3ds",
    "Nintendo DS": "ds",
    "Nintendo DSi": "ds",
    "Game Boy Advance": "gba",
    
    # Others
    
    "PC": "pc",
    "Dreamcast": "dreamcast"
    
}

---

We are going to add a new column `plat_mc` to `rawg_data` for the platform in Metacritic format. Any console not in the `platforms_metacritic` dictionary will have a `NaN` assigned.

In [21]:
def plat_metacritic(x):
    
    if x in platforms_metacritic.keys():
        
        result = platforms_metacritic[x]
        
    else:
        
        result = np.nan
        
    return result

In [22]:
rawg_data['plat_mc'] = rawg_data['platform'].apply(plat_metacritic)

rawg_data

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platform,genres,stores,tags,esrb,plat_mc
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PC,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,pc
1,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,Xbox Series S/X,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,xbox-series-x
2,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 5,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,playstation-5
3,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 4,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,playstation-4
4,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 3,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,ps3
...,...,...,...,...,...,...,...,...,...,...,...
122922,docs-for-playstation,Docs for PlayStation,2016-05-18,,0,PlayStation 3,,PlayStation Store,"[Music, online, console, art]",,ps3
122923,docs-for-playstation,Docs for PlayStation,2016-05-18,,0,PlayStation 4,,PlayStation Store,"[Music, online, console, art]",,playstation-4
122924,bee-leader,Bee Leader,2012-05-24,,45,macOS,"[Action, Arcade, Casual]",App Store,"[Cute, Physics, achievements, Music, city, Fli...",,
122925,delta-strike-first-assault,Delta Strike: First Assault,2016-05-31,,356,PS Vita,Action,PlayStation Store,"[combat, online, Tanks, drone]",Everyone 10+,vita


In [23]:
rawg_data['plat_mc'].value_counts()

pc               57716
playstation-4     5495
switch            4193
xbox-one          4149
ps3               3485
xbox-360          2476
wii               2283
ds                2182
vita              1895
3ds               1691
psp               1546
ps2               1513
ps                1356
wii-u             1251
gba                854
xbox               671
gamecube           625
n64                338
dreamcast          324
playstation-5      117
xbox-series-x       96
Name: plat_mc, dtype: int64

---

Indexes seem to have set all to 0 somehow, so we will reset them with the `pd.DataFrame.reset_index()` method.

In [24]:
rawg_data.head()

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platform,genres,stores,tags,esrb,plat_mc
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PC,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,pc
1,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,Xbox Series S/X,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,xbox-series-x
2,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 5,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,playstation-5
3,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 4,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,playstation-4
4,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 3,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,ps3


In [25]:
rawg_data = rawg_data.reset_index(drop=True)

We will list the indexes of `NaN` in the `plat_mc` column in order to drop them.

In [26]:
no_metacritic = list(rawg_data[rawg_data['plat_mc'].isna()].index)

In [27]:
rawg_data = rawg_data.drop(index = no_metacritic)\
                     .reset_index(drop = True)

---
We are creating a new column named `url`, which will include the hyperlink to the games' Metacritic page (if it exists). This will ease the web scraping process.

In [28]:
rawg_data['url'] = 'https://www.metacritic.com/game/' + rawg_data['plat_mc'] + '/' + rawg_data['slug']

rawg_data.head()

Unnamed: 0,slug,name,released,metacritic,suggestions_count,platform,genres,stores,tags,esrb,plat_mc,url
0,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PC,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,pc,https://www.metacritic.com/game/pc/grand-theft...
1,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,Xbox Series S/X,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,xbox-series-x,https://www.metacritic.com/game/xbox-series-x/...
2,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 5,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,playstation-5,https://www.metacritic.com/game/playstation-5/...
3,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 4,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,playstation-4,https://www.metacritic.com/game/playstation-4/...
4,grand-theft-auto-v,Grand Theft Auto V,2013-09-17,97.0,416,PlayStation 3,"[Action, Adventure]","[Epic Games, PlayStation Store, Xbox Store, Xb...","[Singleplayer, Steam Achievements, Multiplayer...",Mature,ps3,https://www.metacritic.com/game/ps3/grand-thef...


We save `rawg_data` to a `.csv` file for its use after in a new notebook to ease the reading.

In [29]:
rawg_data.to_csv("02_rawg_metacritic_url.csv", encoding='utf-8', index=False)