### Ice Sales Analysis

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [10]:
# Import the DataFrame

url = 'https://raw.githubusercontent.com/DHE42/ice-sales-analysis/refs/heads/main/games.csv'
games_df = pd.read_csv(url)

# Print Head

print(games_df.head())



                       Name Platform  Year_of_Release         Genre  NA_sales  \
0                Wii Sports      Wii           2006.0        Sports     41.36   
1         Super Mario Bros.      NES           1985.0      Platform     29.08   
2            Mario Kart Wii      Wii           2008.0        Racing     15.68   
3         Wii Sports Resort      Wii           2009.0        Sports     15.61   
4  Pokemon Red/Pokemon Blue       GB           1996.0  Role-Playing     11.27   

   EU_sales  JP_sales  Other_sales  Critic_Score User_Score Rating  
0     28.96      3.77         8.45          76.0          8      E  
1      3.58      6.81         0.77           NaN        NaN    NaN  
2     12.76      3.79         3.29          82.0        8.3      E  
3     10.93      3.28         2.95          80.0          8      E  
4      8.89     10.22         1.00           NaN        NaN    NaN  


The Games DataFrame has 11 columns. These go over 

1) Name
2) Gaming Platform
3) Release Year
4) Genre
5) North America Sales
6) Europe Sales
7) Japan Sales
8) Other Sales
9) Critic Score
10) User Score
and
11) Rating.

Before performing any operations, let's see what the data type for each column is.

In [11]:
print("Data Types of Each Column")
print()
print(games_df.dtypes)
print()

Data Types of Each Column

Name                object
Platform            object
Year_of_Release    float64
Genre               object
NA_sales           float64
EU_sales           float64
JP_sales           float64
Other_sales        float64
Critic_Score       float64
User_Score          object
Rating              object
dtype: object



Name, Platform, Genre, User_Score, and Rating are all stored as an object. User_Score and Rating should all be stored as string, and all values should be lower case, as well as removing leading and trailing spaces. Rating values are all stored as an object. User_Score and Rating should all be stored as string, and all values should be lower case, as well as removing leading and trailing spaces. Year_of_Release should also be stored as int64 since there's no need for a decimal in a year.

In [13]:

# These columns are currently stored as object dtype, which is a generic type in pandas.
# Converting them to string dtype ensures consistency and allows for string-specific operations.
games_df['Name'] = games_df['Name'].astype(str)
games_df['Platform'] = games_df['Platform'].astype(str)
games_df['Genre'] = games_df['Genre'].astype(str)
games_df['Rating'] = games_df['Genre'].astype(str)

# Step 2: Convert User_Score and Rating to float64
# User_Score and Rating are currently stored as object dtype, which is not suitable for numerical operations.
# We use pd.to_numeric to convert these columns to float64. The 'errors="coerce"' argument ensures that
# any non-numeric values are replaced with NaN, making the data easier to handle for numerical analysis.
games_df['User_Score'] = pd.to_numeric(games_df['User_Score'], errors='coerce')


# Step 3: Convert Year_of_Release to int64
# Year_of_Release is currently stored as float64, which is not suitable for representing years.
# We use pd.to_numeric to convert this column to int64. The 'errors="coerce"' argument ensures that
# any non-numeric values are replaced with NaN, and then we drop these NaN values before conversion.
games_df['Year_of_Release'] = pd.to_numeric(games_df['Year_of_Release'], errors='coerce').dropna().astype('int64')



In [14]:
# Step 4: Clean string dtype columns
# Convert all string values to lower case, strip leading/trailing spaces, and replace spaces with underscores
string_columns = games_df.select_dtypes(include=['object']).columns
for col in string_columns:
    games_df[col] = games_df[col].str.lower().str.strip().str.replace(' ', '_')

print("Data types after conversion:")
print(games_df.dtypes)
print()
print("DF Head After Cleaning:")
print(games_df.head())
print()

Data types after conversion:
Name                object
Platform            object
Year_of_Release    float64
Genre               object
NA_sales           float64
EU_sales           float64
JP_sales           float64
Other_sales        float64
Critic_Score       float64
User_Score         float64
Rating              object
dtype: object

DF Head After Cleaning:
                       Name Platform  Year_of_Release         Genre  NA_sales  \
0                wii_sports      wii           2006.0        sports     41.36   
1         super_mario_bros.      nes           1985.0      platform     29.08   
2            mario_kart_wii      wii           2008.0        racing     15.68   
3         wii_sports_resort      wii           2009.0        sports     15.61   
4  pokemon_red/pokemon_blue       gb           1996.0  role-playing     11.27   

   EU_sales  JP_sales  Other_sales  Critic_Score  User_Score        Rating  
0     28.96      3.77         8.45          76.0         8.0        spo