# 400_load_videogame_datasets

## Purpose

In this notebook we will be loading both of our console datasets and saving them as pickle files. We will also get a brief overview of the data contaied in them.

## Datasets

- Input: Console_Sales_2008-2017.csv && Console_Sales_All_Time.csv
- Output: Console_Sales_2008-2017.pkl && Console_Sales-All-Time.pkl

Importing the required libraries.

In [1]:
import os
import pandas as pd  

In [2]:
# check if paths exists otherwise error
if not os.path.exists("../../data/raw/Console_Sales_2008-2017.csv"):
    print("Missing Dataset File")

In [3]:
# check if paths exists otherwise error
if not os.path.exists("../../data/raw/Console_Sales_All_Time.csv"):
    print("Missing Dataset File")

# Loading the datasets

In [4]:
# check it loaded correctly
consoles1 = pd.read_csv("../../data/raw/Console_Sales_2008-2017.csv")
consoles1.shape

(11, 12)

In [5]:
# check it loaded correctly
consoles2 = pd.read_csv("../../data/raw/Console_Sales_All_Time.csv")
consoles2.shape

(26, 9)

## Getting an overview of the datasets.

Now that we have loaded both the datasets we are going to take a brief look at the data contained within the datasets.

In [6]:
consoles1.head()

Unnamed: 0,Console,Abbreviation,Sales_2008,Sales_2009,Sales_2010,Sales_2011,Sales_2012,Sales_2013,Sales_2014,Sales_2015,Sales_2016,Sales_2017
0,Playstation 4,PS4,,,,,,4.49,14.59,17.51,17.59,19.64
1,Nintendo switch,NS,,,,,,,,,,11.85
2,Xbox one,XOne,,,,,,3.08,7.91,8.63,8.37,8.21
3,Nintendo 3ds,3DS,,,,12.56,13.48,14.31,9.74,7.33,7.59,6.19
4,Playstation vita,PSV,,,,0.48,3.69,4.3,2.3,2.68,2.04,0.72


In [7]:
# unique console
consoles1["Console"].unique()

array(['Playstation 4', 'Nintendo switch', 'Xbox one', 'Nintendo 3ds',
       'Playstation vita', 'Playstation 3', 'Nintendo wii u', 'Xbox 360',
       'PSP', 'Nintendo wii', 'Nintendo ds'], dtype=object)

In [8]:
consoles2.head()

Unnamed: 0,Console,Abbreviation,Sales_NA,Sales_Europe,Sales_Japan,Sales_Rest_Of_World,Sales_Total,Release_Year,Developer
0,Playstation 2,PS2,53.65,55.28,23.18,25.57,157.68,2000,Sony
1,Nintendo DS,DS,57.39,52.07,33.01,12.43,154.9,2004,Nintendo
2,Game Boy,GB,43.18,40.05,32.47,2.99,118.69,1989,Nintendo
3,Playstation,PS,38.94,36.91,19.36,9.04,104.25,1994,Sony
4,Nintendo Wii,Wii,45.51,33.88,12.77,9.48,101.64,2006,Nintendo


In [9]:
# unique consoles
consoles2["Console"].unique()

array(['Playstation 2', 'Nintendo DS', 'Game Boy', 'Playstation ',
       'Nintendo Wii', 'Playstation 3', 'Xbox 360', 'Game Boy Advance',
       'Playstation Portable', 'Playstation 4', 'Nintendo 3DS',
       'Nintendo Entertainment System',
       'Super Nintendo Entertainment System', 'Xbox One', 'Nintendo 64',
       'Sega Genesis', 'Atari 2600', 'Xbox', 'GameCube',
       'Playstation Vita', 'Nintendo Switch', 'Wii U', 'Game Gear',
       'Sega Saturn', 'Dreamcast', 'Atari 7800'], dtype=object)

The above shows us a brief view of the data we have for each console and what consoles we have data for. This information will be useful when analysing the data.

## Cleaning the datasets

Both datasets are small and are unlikely to have missing data but we will check just in case.

In [10]:
# null values
consoles1.isnull().sum()

Console         0
Abbreviation    0
Sales_2008      6
Sales_2009      6
Sales_2010      6
Sales_2011      4
Sales_2012      3
Sales_2013      1
Sales_2014      2
Sales_2015      3
Sales_2016      4
Sales_2017      3
dtype: int64

In [11]:
# null values
consoles2.isnull().sum()

Console                0
Abbreviation           0
Sales_NA               0
Sales_Europe           1
Sales_Japan            2
Sales_Rest_Of_World    1
Sales_Total            0
Release_Year           0
Developer              0
dtype: int64

As thought there are no null values, therefore there is no cleaning needed to be done on these datasets.

## Saving the datasets

In [12]:
# columns to save
cols = consoles1.columns
cols1 = consoles2.columns

Saving as pickle files.

In [13]:
# pickle format saving
consoles1[cols].to_pickle("../../data/prep/Console_Sales_2008-2017.pkl")
consoles2[cols1].to_pickle("../../data/prep/Console_Sales_All_Time.pkl")