# Data Cleaning – Console & Game Sales

This notebook loads the raw console sales and video game sales datasets,
performs basic cleaning and standardization, and outputs cleaned CSV files
for use in later analysis, SQL, machine learning, and dashboarding.

Raw data is read from:
- data/raw/

Cleaned data is saved to:
- data/processed/

In [5]:
import pandas as pd
import numpy as np

# Load Raw CSV Files

In [10]:
# Load Raw Datasets as Pandas Dataframe
df_console = pd.read_csv('../data/raw/console_data.csv')
df_games = pd.read_csv("../data/raw/game_data.csv")

# Quick Check
display(df_console.head())
display(df_games.head())

Unnamed: 0,Console Name,Type,Company,Gen,Gen Years,Released Year,Generation,Discontinuation Year,Units sold (million),Remarks,Link to gif
0,Magnavox Odyssey,Home,Magnavox,1st Gen,1972-1978,1972,1,1975,0.35,The Magnavox Odyssey is the first commercial ...,https://s12.gifyu.com/images/SVLO3.gif
1,Home Pong,Home,Atari,1st Gen,1972-1978,1975,1,1978,0.15,Atari's Home Pong was a dedicated console that...,https://s12.gifyu.com/images/SVz99.gif
2,Atari 2600,Home,Atari,2nd Gen,1978-1982,1977,2,1992,30.0,Atari2600 is often credited with popularizing ...,https://s12.gifyu.com/images/SVz3U.gif
3,Magnavox Odyssey 2,Home,Magnavox,2nd Gen,1978-1982,1978,2,1984,2.0,The Odyssey² featured a built-in keyboard for ...,https://s12.gifyu.com/images/SVz70.gif
4,Intellivision,Home,Mattel,2nd Gen,1978-1982,1979,2,1990,3.0,The Intellivision boasted superior graphics an...,https://s12.gifyu.com/images/SVLBF.gif


Unnamed: 0,Game Name,System Full,Units(m),Publisher,Developer,Image_URL,Release Date
0,Pac-Man,Atari 2600,7.7,Atari,Atari,https://www.vgchartz.com/games/boxart/3878609c...,01/03/1982
1,Pitfall!,Atari 2600,4.0,Activision,Activision,https://www.vgchartz.com/games/boxart/127822cc...,20/04/1982
2,Frogger,Atari 2600,4.0,Parker Bros.,Konami,https://www.vgchartz.com/games/boxart/7351891c...,01/01/1982
3,Missile Command,Atari 2600,2.5,Atari,Atari,https://www.vgchartz.com/games/boxart/8855822c...,01/01/1981
4,Space Invaders,Atari 2600,2.0,Atari,Atari,https://www.vgchartz.com/games/boxart/7131076c...,01/01/1978


# Inspect Structure

In [11]:
# Check data types, nulls, and column info
df_console.info()
df_games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Console Name          26 non-null     object 
 1   Type                  26 non-null     object 
 2   Company               26 non-null     object 
 3   Gen                   26 non-null     object 
 4   Gen Years             26 non-null     object 
 5   Released Year         26 non-null     int64  
 6   Generation            26 non-null     int64  
 7   Discontinuation Year  26 non-null     object 
 8   Units sold (million)  26 non-null     float64
 9   Remarks               26 non-null     object 
 10  Link to gif           26 non-null     object 
dtypes: float64(1), int64(2), object(8)
memory usage: 2.4+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27308 entries, 0 to 27307
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --