---

> # <center>EDA on Video Game Sales 🎮 Using Vega-Altair </center>

---

> <center><img src="https://tenor.com/en-GB/view/gamer-insomnia-video-game-gif-5395296.gif"></center>

---

## Table of Contents

*  Description
*  Objective
*  Library & Data Loading
    *  Import Library
    *  Database Loading
*  Data Preprocessing
    *  Checking for missing values
    *  There are missing values in 'Year' and 'Publisher' column
    *  Dropping missing values
* Exploratory Data Analysis
    * Which genre has the most number of video games?
    * Which Year had the most number of video games sold (1980-2015)
    * Top 5 years with the most Video Games release by genre
    * Which Year had the highest Games Sales Worldwide
    * Which genre games have been released the most in a single year
    * Which genre games have been sold the most in a single year
    * Best Selling Genre Globally
    * Best Selling Platform Globally
    * Top 20 Publisher Sales Globally
    * Top 20 Game Sales Globally
    * Best Selling Games per region
    * 4.12 Best Selling Genre per region
    * 4.13 Best Selling Publisher per region
    * 4.14 Best Selling Platform per region
* Conclusion

# Description

This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of [vgchartz.com](https://www.vgchartz.com/).

<b>Fields include:</b>

   - <b>Name:</b> The game's name
   - <b>Platform:</b> Platform of the game's release (e.g., PS2, PS3, PC, Xbox, etc.)
   - <b>Year:</b> Year of the game's release
   - <b>Genre:</b> Genre of the game
   - <b>Publisher:</b> Publisher of the game
   - <b>NA_Sales:</b> Sales in North America (in millions)
   - <b>EU_Sales:</b> Sales in Europe (in millions)
   - <b>JP_Sales:</b> Sales in Japan (in millions)
   - <b>Other_Sales:</b> Sales in the rest of the world (in millions)
   - <b>Global_Sales:</b> Total worldwide sales.

The script to scrape the data is available at [vgchartzScrape](https://github.com/GregorUT/vgchartzScrape). It is based on BeautifulSoup using Python. There are 16,598 records. 2 records were dropped due to incomplete information.


# Objective

This project aims to analyze <b>video game sales</b> data to extract meaningful insights regarding sales performance across different regions, platforms, publishers, and genres. By utilizing Python programming language and data visualization libraries such as <b>Altair and Pandas</b>, this project seeks to provide a comprehensive analysis of the video game industry, identifying trends, top-selling titles, and key players in the market.


# Library & Data loading 


##  Importing Library

In [1]:
!pip install "vegafusion[embed]>=1.4.0"

Collecting vegafusion>=1.4.0 (from vegafusion[embed]>=1.4.0)
  Downloading vegafusion-1.6.5-py3-none-any.whl.metadata (1.3 kB)
Collecting vegafusion-python-embed==1.6.5 (from vegafusion[embed]>=1.4.0)
  Downloading vegafusion_python_embed-1.6.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (394 bytes)
Collecting vl-convert-python>=0.7.0 (from vegafusion[embed]>=1.4.0)
  Downloading vl_convert_python-1.3.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Downloading vegafusion-1.6.5-py3-none-any.whl (52 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.5/52.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading vegafusion_python_embed-1.6.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.9/19.9 MB[0m [31m58.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading vl_convert_python-1.3.0-cp37-abi3-manylinux_2_17_x86_64.manyli

In [2]:
#import libraries
import numpy as np
import pandas as pd
import math

import altair as alt
alt.data_transformers.enable("vegafusion")

DataTransformerRegistry.enable('vegafusion')

## Data Loading

In [3]:
# read data file and save to df
df = pd.read_csv('/kaggle/input/videogamesales/vgsales.csv')

In [4]:
df.head(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
7,8,Wii Play,Wii,2006.0,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
9,10,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31


In [5]:
df.shape

(16598, 11)

# Data Preprocessing

In [6]:
drop_row_index = df[df['Year'] > 2015].index
df = df.drop(drop_row_index)

In [7]:
df.shape

(16250, 11)

# Checking for missing values

In [8]:
df.isnull().sum()

Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        56
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64

# There are missing values in 'Year' and 'Publisher' column 


In [9]:
#Checking for missing values in Year
df[df['Year'].isnull()]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
179,180,Madden NFL 2004,PS2,,Sports,Electronic Arts,4.26,0.26,0.01,0.71,5.23
377,378,FIFA Soccer 2004,PS2,,Sports,Electronic Arts,0.59,2.36,0.04,0.51,3.49
431,432,LEGO Batman: The Videogame,Wii,,Action,Warner Bros. Interactive Entertainment,1.86,1.02,0.00,0.29,3.17
470,471,wwe Smackdown vs. Raw 2006,PS2,,Fighting,,1.57,1.02,0.00,0.41,3.00
607,608,Space Invaders,2600,,Shooter,Atari,2.36,0.14,0.00,0.03,2.53
...,...,...,...,...,...,...,...,...,...,...,...
16307,16310,Freaky Flyers,GC,,Racing,Unknown,0.01,0.00,0.00,0.00,0.01
16327,16330,Inversion,PC,,Shooter,Namco Bandai Games,0.01,0.00,0.00,0.00,0.01
16366,16369,Hakuouki: Shinsengumi Kitan,PS3,,Adventure,Unknown,0.01,0.00,0.00,0.00,0.01
16427,16430,Virtua Quest,GC,,Role-Playing,Unknown,0.01,0.00,0.00,0.00,0.01


In [10]:
# Checking for missing values in Publisher
df[df['Publisher'].isnull()]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
470,471,wwe Smackdown vs. Raw 2006,PS2,,Fighting,,1.57,1.02,0.0,0.41,3.0
1303,1305,Triple Play 99,PS,,Sports,,0.81,0.55,0.0,0.1,1.46
1662,1664,Shrek / Shrek 2 2-in-1 Gameboy Advance Video,GBA,2007.0,Misc,,0.87,0.32,0.0,0.02,1.21
2222,2224,Bentley's Hackpack,GBA,2005.0,Misc,,0.67,0.25,0.0,0.02,0.93
3159,3161,Nicktoons Collection: Game Boy Advance Video V...,GBA,2004.0,Misc,,0.46,0.17,0.0,0.01,0.64
3166,3168,SpongeBob SquarePants: Game Boy Advance Video ...,GBA,2004.0,Misc,,0.46,0.17,0.0,0.01,0.64
3766,3768,SpongeBob SquarePants: Game Boy Advance Video ...,GBA,2004.0,Misc,,0.38,0.14,0.0,0.01,0.53
4145,4147,Sonic the Hedgehog,PS3,,Platform,,0.0,0.48,0.0,0.0,0.48
4526,4528,The Fairly Odd Parents: Game Boy Advance Video...,GBA,2004.0,Misc,,0.31,0.11,0.0,0.01,0.43
4635,4637,The Fairly Odd Parents: Game Boy Advance Video...,GBA,2004.0,Misc,,0.3,0.11,0.0,0.01,0.42


# Dropping missing values

In [11]:
df = df.dropna(subset=['Year', 'Publisher']).reset_index(drop=True)


In [12]:
df.isnull().sum()

Rank            0
Name            0
Platform        0
Year            0
Genre           0
Publisher       0
NA_Sales        0
EU_Sales        0
JP_Sales        0
Other_Sales     0
Global_Sales    0
dtype: int64

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15945 entries, 0 to 15944
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          15945 non-null  int64  
 1   Name          15945 non-null  object 
 2   Platform      15945 non-null  object 
 3   Year          15945 non-null  float64
 4   Genre         15945 non-null  object 
 5   Publisher     15945 non-null  object 
 6   NA_Sales      15945 non-null  float64
 7   EU_Sales      15945 non-null  float64
 8   JP_Sales      15945 non-null  float64
 9   Other_Sales   15945 non-null  float64
 10  Global_Sales  15945 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.3+ MB


In [14]:
#different games genre 
df['Genre'].value_counts()

Genre
Action          3132
Sports          2266
Misc            1668
Role-Playing    1428
Shooter         1250
Adventure       1241
Racing          1205
Platform         865
Simulation       838
Fighting         822
Strategy         660
Puzzle           570
Name: count, dtype: int64


# Exploratory Data Analysis

# Which genre has most number of video games?

<img src="https://giffiles.alphacoders.com/162/162991.gif">

In [15]:
genre_counts = df['Genre'].value_counts().reset_index()
genre_counts.columns = ['Genre', 'Number of Games']

# Create Altair Chart
chart = alt.Chart(genre_counts).mark_bar(opacity=1,size=25).encode(
    y=alt.Y('Genre:N', sort='-x'),
    x='Number of Games:Q',
    color=alt.Color('Number of Games:Q', scale=alt.Scale(scheme='category20')),
    tooltip=['Genre:N', 'Number of Games:Q']
)

# Add text labels on top of each bar
text = chart.mark_text(
    align='center',
    baseline='middle',
    dx=20, 
    dy=0,
    fontSize=14,
    fontWeight='bold'
).encode(
    text='Number of Games:Q'
)

# Combine the bar chart and text labels
chart_with_labels = (chart + text).properties(
    title='Distribution of Game Genres',
    width=600,
    height=400
)

# Customize background and gridlines
chart_with_labels = chart_with_labels.configure_view(
    fill='FFFFF0',  
    stroke='opaque' 
)


chart_with_labels.display()


# Which Year had most number of video games sold(1980-2015)

<img src="https://tenor.com/en-GB/view/ghost-simon-riley-simon-ghost-riley-call-of-duty-mw2-gif-16654519695300043533.gif">

In [16]:
df['Year'] = pd.to_numeric(df['Year'], errors='coerce')  
year_counts = df.groupby('Year').size().reset_index(name='Number of Games')

# Create Altair Chart
chart = alt.Chart(year_counts).mark_bar(opacity=0.8,size=20).encode(
    x=alt.X('Year:O', sort=list(year_counts['Year'])),
    y='Number of Games:Q',
    color=alt.Color('Number of Games:Q', scale=alt.Scale(scheme='category20')),
    tooltip=['Year:O', 'Number of Games:Q']
).properties(
    title='Distribution of Game Sales by Year',
    width=900,
    height=600
)

# Add text labels on top of each bar
text = chart.mark_text(
    align='center',
    baseline='middle',
    dx=-2,
    dy=-10,
    fontSize=9,
    fontWeight='bold'
).encode(
    text='Number of Games:Q'
)

# Combine the bar chart and text labels
chart_with_labels = (chart + text)

# Customize background and gridlines
chart_with_labels = chart_with_labels.configure_view(
    fill='#FFFFF0',
    stroke='opaque'
)

chart_with_labels.display()


# Top 5 years with most Video Games release by genre

In [17]:
df['Number of Games'] = 1

# Filter the DataFrame to include only the top 5 years with the most number of games
top_years = df['Year'].value_counts().nlargest(5).index
filtered_df = df[df['Year'].isin(top_years)]

# Create Altair Chart
chart = alt.Chart(filtered_df).mark_bar(opacity=0.8,size=8).encode(
    x=alt.X('Genre:N', title='', axis=alt.Axis(labels=False, ticks=False)), 
    y=alt.Y('count()', title='Number of Games'),
    color=alt.Color('Genre:N', title='', scale=alt.Scale(scheme='category20')),
    column=alt.Column('Year:N', title='', header=alt.Header(labelOrient='bottom')),  
    tooltip=['Genre:N', 'count()', 'Year:N']
).properties(
    width=140,  
    height=300  
).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)

chart.display()


# Best Selling Games Globally Per Year

In [18]:
df_year = df.groupby(by=['Year'])['Global_Sales'].sum().reset_index()

# Create Altair Chart
chart = alt.Chart(df_year).mark_bar(opacity=0.8,size=20).encode(
    x=alt.X('Year:N', title='Year'),
    y=alt.Y('Global_Sales:Q', title='Global Sales'),
    color=alt.Color('Year:N', title='Year', scale=alt.Scale(scheme='category20')),
    tooltip=['Year:N', 'Global_Sales:Q']
).properties(
    width=800,
    height=400
)

# Add text labels on top of each bar
text = chart.mark_text(
    align='center',
    baseline='top',
    dx=25,
    dy=-5,
    angle=270,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Global_Sales:Q'
)


# Combine the bar chart and text labels
chart_with_labels = (chart + text).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)


chart_with_labels.display()


# Which genre games has been released the most in a single year

In [19]:
year_max_genre = (
    df.groupby(['Year', 'Genre'])
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=False)
    .drop_duplicates(subset='Year')
    .reset_index(drop=True)
)

# Create Altair Chart
chart = alt.Chart(year_max_genre).mark_bar(opacity=0.8,size=20).encode(
    x=alt.X('Year:N', title='Year'),
    y=alt.Y('count:Q', title='Genre Count'),
    color=alt.Color('Genre:N', title='Genre', scale=alt.Scale(scheme='category20')),
    tooltip=['Year:N', 'Genre:N', 'count:Q']
).properties(
    width=800,
    height=400
)

# Add text labels on top of each bar
text = chart.mark_text(
    align='center',
    baseline='middle',
    dx=-2,
    dy=-5,
    fontSize=10,
    fontWeight='bold'
).encode(
    text='count:Q'
)

# Combine the bar chart and text labels
chart_with_labels = (chart + text).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)


chart_with_labels.display()


# Which genre games has been sold the most in a single year

In [20]:
year_sale_max = (
    df.groupby(['Year', 'Genre'])
    ['Global_Sales'].sum()
    .reset_index(name='Global_Sales')
    .sort_values(['Year', 'Global_Sales'], ascending=[True, False])
    .drop_duplicates(subset='Year')
    .reset_index(drop=True)
)

# Create Altair Chart with custom styling
chart = alt.Chart(year_sale_max).mark_bar(opacity=0.8,size=20).encode(
    x=alt.X('Year:N', title='Year'),
    y=alt.Y('Global_Sales:Q', title='Global Sales'),
    color=alt.Color('Genre:N', title='Genre', scale=alt.Scale(scheme='category20')),  # Choose a color scheme
    tooltip=['Genre:N', 'Global_Sales:Q', 'Year:N']
).properties(
    width=900,
    height=600,
    title='Global Sales by Genre for Each Year'
)

# Add text labels on top of each bar
text = chart.mark_text(
    align='center',
    baseline='top',
    dx=25,
    dy=-5,
    angle=270,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Global_Sales:Q'
)

# Combine the bar chart and text labels
chart_with_labels = (chart + text).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)
# Show the chart
chart_with_labels.display()


# Best Selling Genre Globally

<img src="https://i.pinimg.com/originals/a5/5d/6a/a55d6ae0bbe127e2692a801d843dd46c.png">

In [21]:
df_genre = (
    df.groupby(by=['Genre'])['Global_Sales'].sum()
    .reset_index()
    .sort_values(by=['Global_Sales'], ascending=False)
)

# Create Altair Chart
chart = alt.Chart(df_genre).mark_bar(opacity=0.8,size=25).encode(
    x=alt.X('Genre:N', title='Genre', sort=alt.EncodingSortField(field='Global_Sales', order='descending')),
    y=alt.Y('Global_Sales:Q', title='Global Sales'),
    color=alt.Color('Genre:N', title='Genre', scale=alt.Scale(scheme='set1')),
    tooltip=['Genre:N', 'Global_Sales:Q']
).properties(
    width=600,
    height=400,
    title='Global Sales by Genre'
)

text= chart.mark_text(
    align='center',
    baseline='bottom',
    dy=-5,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Global_Sales:Q'
)

# Combine the bar chart and text labels
chart_with_labels = (chart + text).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)



chart_with_labels.display()

# Best Selling Platform Globally

<img src="https://techcrunch.com/wp-content/uploads/2015/10/vidya.jpg?w=1390&crop=1">

In [22]:
df_platform = (
    df.groupby(by=['Platform'])['Global_Sales'].sum()
    .reset_index()
    .sort_values(by=['Global_Sales'], ascending=False)
)

# Create Altair Chart
chart = alt.Chart(df_platform).mark_bar(opacity=1,size=20).encode(
    x=alt.X('Platform:N', title='Platform', sort=alt.EncodingSortField(field='Global_Sales', order='descending')),
    y=alt.Y('Global_Sales:Q', title='Global Sales'),
    color=alt.Color('Platform:N', title='Platform', scale=alt.Scale(scheme='category20')),
    tooltip=['Platform:N', 'Global_Sales:Q']
).properties(
    width=900,
    height=600,
    title='Global Sales by Platform'
)

text= chart.mark_text(
    align='center',
    baseline='top',
    dx=25,
    dy=-5,
    angle=270,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Global_Sales:Q'
)

# Combine the bar chart and text labels
chart_with_labels = (chart + text).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)



chart_with_labels.display()

# Top 20 Publisher Sales Globally

<img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2019/11/Most-Successful-Game-Publishers-of-the-Decade-by-Revenue-Feature-Images.jpg">

In [23]:
top_publishers = df.groupby(by=['Publisher'])['Global_Sales'].sum().reset_index().sort_values(by=['Global_Sales'], ascending=False).head(20)

chart_publisher_top20 = alt.Chart(top_publishers).mark_bar(opacity=1, size=20).encode(
    x=alt.X('Publisher:N', title='Publisher', sort=alt.EncodingSortField(field='Global_Sales', order='descending')),
    y=alt.Y('Global_Sales:Q', title='Global Sales'),
    color=alt.Color('Publisher:N', title='Publisher', scale=alt.Scale(scheme='category20')),
    tooltip=['Publisher:N', 'Global_Sales:Q']
).properties(
    width=900,
    height=600,
    title='Top 20 Publishers by Global Sales'
)

text_publisher_top20 = chart_publisher_top20.mark_text(
    align='center',
    baseline='top',
    dx=25,
    dy=-5,
    angle=270,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Global_Sales:Q'
)

chart_with_labels_publisher_top20 = (chart_publisher_top20 + text_publisher_top20).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)

chart_with_labels_publisher_top20.display()


# Top 20 Game Sales Globally

In [24]:
top_game_sale = (
    df[['Name', 'Year', 'Global_Sales']]
    .sort_values(by=['Global_Sales'], ascending=False)
    .head(20)
)

# Create Altair Chart
chart = alt.Chart(top_game_sale).mark_bar(opacity=0.8, size=35).encode(
    x=alt.X('Name:N', title='Game', sort=alt.EncodingSortField(field='Global_Sales', order='descending')),
    y=alt.Y('Global_Sales:Q', title='Global Sales'),
    color=alt.Color('Year:N', title='Year', scale=alt.Scale(scheme='category20')),
    tooltip=['Name:N', 'Global_Sales:Q', 'Year:N']
).properties(
    width=800,
    height=400,
    title='Top 20 Game Sales'
)


text1 = chart.mark_text(
    align='center',
    baseline='bottom',
    dx=0,
    dy=-5,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Global_Sales:Q',
)

text2 = chart.mark_text(
    align='center',
    baseline='top',
    dx=-40,
    dy=-5,
    angle=270,
    fontSize=12,
    fontWeight='bold'
).encode(
    text='Year:N',
    color=alt.value('black')
)


chart_with_labels = (chart + text1 + text2).configure_axis(
    labelFontSize=12,
    titleFontSize=14
).configure_legend(
    labelFontSize=12,
    titleFontSize=14
)

chart_with_labels.display()


# Best Selling Games per region

<img src="https://tenor.com/en-GB/view/super-mario-bros-2d-platformer-lakitu-spiny-mario-gif-23109872.gif"> 

In [25]:
top_game_sale = (
    df[['Name', 'Year', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']]
    .sort_values(by=['Global_Sales'], ascending=False)
    .head(15)
)

# Create a function to generate charts for each region
def create_chart(data, region, title):
    chart = alt.Chart(data).mark_bar(opacity=0.8, size=35).encode(
        x=alt.X('Name:N', title='Game', sort=alt.EncodingSortField(field='Global_Sales', order='descending')),
        y=alt.Y(f'{region}:Q', title=f'{region} Sales'),
        color=alt.Color('Year:N', title='Year', scale=alt.Scale(scheme='category20')),
        tooltip=['Name:N', f'{region}:Q', 'Global_Sales:Q', 'Year:N']
    ).properties(
        width=800,
        height=650,
        title=title
    )

    text1 = chart.mark_text(
        align='center',
        baseline='bottom',
        dx=0,
        dy=-5,
        fontSize=12,
        fontWeight='bold'
    ).encode(
        text=f'{region}:Q',
    )

    text2 = chart.mark_text(
        align='center',
        baseline='top',
        dx=0,
        dy=5,
        angle=360,
        fontSize=12,
        fontWeight='bold'
    ).encode(
        text='Year:N',
        color=alt.value('black')
    )

    chart_with_labels = (chart + text1 + text2)

    return chart_with_labels

NA_Sales_Chart = create_chart(top_game_sale, 'NA_Sales', 'Top 15 Game Sales in North America')
EU_Sales_Chart = create_chart(top_game_sale, 'EU_Sales', 'Top 15 Game Sales in Europe')
JP_Sales_Chart = create_chart(top_game_sale, 'JP_Sales', 'Top 15 Game Sales in Japan')
Other_Sales_Chart = create_chart(top_game_sale, 'Other_Sales', 'Top 15 Game Sales in the Rest of the World')

grid = alt.vconcat(NA_Sales_Chart, EU_Sales_Chart, JP_Sales_Chart, Other_Sales_Chart)

grid.display()


# Best Selling Genre per region

In [26]:
comp_genre = df[['Genre', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']].groupby('Genre').sum().reset_index()

def create_chart(data, region, title):

    bar_chart = alt.Chart(data).mark_bar(opacity=0.8, size=5).encode(
        x=alt.X('Genre:N', title='Genre', sort=alt.EncodingSortField(field=f'{region}_Sales', order='descending')),
        y=alt.Y(f'{region}_Sales:Q', title='Sales'),
        color=alt.Color('Genre:N', title='Genre', scale=alt.Scale(scheme='category20')),
        tooltip=['Genre:N', f'{region}_Sales:Q']
    ).properties(
        width=800,
        height=400,
        title=title
    )

    # Create the line chart
    line_chart = alt.Chart(data).mark_line(
        size=1.1,
        color='red'
    ).encode(
        x=alt.X('Genre:N', title='Genre'),
        y=alt.Y(f'{region}_Sales:Q'),
        strokeDash=alt.value([10])
    )

    # Create the point chart
    point_chart = alt.Chart(data).mark_point(
        size=100,
        filled=True,
        shape='circle'
    ).encode(
        x=alt.X('Genre:N', title='Genre'),
        y=alt.Y(f'{region}_Sales:Q'),
        color=alt.Color('Genre:N', title='Genre', scale=alt.Scale(scheme='category20')),\
        tooltip=['Genre:N', f'{region}_Sales:Q']
)



    # Combine the bar, line, and point charts
    combined_chart = bar_chart + line_chart + point_chart

    return combined_chart


# Create charts for each region
NA_Sales_Chart = create_chart(comp_genre, 'NA', 'Number of sales by genre in North America (in millions)')
EU_Sales_Chart = create_chart(comp_genre, 'EU', 'Number of sales by genre in Europe (in millions)')
JP_Sales_Chart = create_chart(comp_genre, 'JP', 'Number of sales by genre in Japan (in millions)')
Other_Sales_Chart = create_chart(comp_genre, 'Other', 'Number of sales by genre in rest of the world (in millions)')

# Display the charts separately
NA_Sales_Chart.display()
EU_Sales_Chart.display()
JP_Sales_Chart.display()
Other_Sales_Chart.display()



# Best Selling Publisher per region

<img src="https://preview.redd.it/f0h5rwpn7a171.jpg?width=1080&crop=smart&auto=webp&s=e453b97ab8c739db20f00eff0ac79f0fff58dc2a">

In [27]:
top_publishers = df.groupby(by=['Publisher'])['Year'].count().sort_values(ascending=False).head(20).index
df_top_publishers = df[df['Publisher'].isin(top_publishers)]

comp_publisher = df_top_publishers.groupby(['Publisher']).agg({
    'NA_Sales': 'sum',
    'EU_Sales': 'sum',
    'JP_Sales': 'sum',
    'Other_Sales': 'sum',
    'Global_Sales': 'sum'
}).reset_index()

def create_chart(data, region, title):
    
    bar_chart = alt.Chart(data).mark_bar(opacity=0.8,size=25
).encode(
        x=alt.X('Publisher:N', sort='-y'),
        y=alt.Y(f'{region}:Q', title=f'{region}'),
        color=alt.Color('Publisher:N', title='Publisher', scale=alt.Scale(scheme='category20')),
        tooltip=['Publisher:N',f'{region}:Q']
    ).properties(
        width=800,
        height=400,
        title=title
    )

     
    text_labels = alt.Chart(data).mark_text(
        align='center',
        baseline='top',
        fontSize=12,
        fontWeight='bold',
        dx=0,
        dy=-20
    ).encode(
        x=alt.X('Publisher:N', sort='-y'),
        y=alt.Y(f'{region}:Q'),
        text=alt.Text(f'{region}:Q'),
        color=alt.Color('Publisher:N', title='Publisher', scale=alt.Scale(scheme='category20')),
        tooltip=['Publisher:N', f'{region}:Q']
    )

    combined_chart = bar_chart + text_labels

    return combined_chart

NA_Sales_Chart = create_chart(comp_publisher, 'NA_Sales', 'Number of sales by Publisher in North America (in millions)')
EU_Sales_Chart = create_chart(comp_publisher, 'EU_Sales', 'Number of sales by Publisher in Europe (in millions)')
JP_Sales_Chart = create_chart(comp_publisher, 'JP_Sales', 'Number of sales by Publisher in Japan (in millions)')
Other_Sales_Chart = create_chart(comp_publisher, 'Other_Sales', 'Number of sales by Publisher in the rest of the world (in millions)')

grid = alt.vconcat(NA_Sales_Chart, EU_Sales_Chart, JP_Sales_Chart, Other_Sales_Chart)

grid.display()


# Best Selling Platform per region

In [28]:
top_platforms = df.groupby(by=['Platform'])['Name'].count().sort_values(ascending=False).head(10).index
df_top_platforms = df[df['Platform'].isin(top_platforms)]

comp_platform = df_top_platforms.groupby(['Platform']).agg({
    'NA_Sales': 'sum',
    'EU_Sales': 'sum',
    'JP_Sales': 'sum',
    'Other_Sales': 'sum',
    'Global_Sales': 'sum'
}).reset_index()

def create_chart_platform(data, region, title):
    bar_chart = alt.Chart(data).mark_bar(opacity=0.8, size=25).encode(
        x=alt.X('Platform:N', sort='-y'),
        y=alt.Y(f'{region}:Q', title=f'{region} Sales'),
        color=alt.Color('Platform:N', title='Platform', scale=alt.Scale(scheme='category20')),
        tooltip=['Platform:N', f'{region}:Q']
    ).properties(
        width=800,
        height=400,
        title=title
    )

    text_labels = alt.Chart(data).mark_text(
        align='center',
        baseline='top',
        fontSize=12,
        fontWeight='bold',
        dx=0,
        dy=-20
    ).encode(
        x=alt.X('Platform:N', sort='-y'),
        y=alt.Y(f'{region}:Q'),
        text=alt.Text(f'{region}:Q'),
        color=alt.Color('Platform:N', title='Platform', scale=alt.Scale(scheme='category20')),
        tooltip=['Platform:N', f'{region}:Q']
    )

    combined_chart = bar_chart + text_labels

    return combined_chart

NA_Sales_Chart_Platform = create_chart_platform(comp_platform, 'NA_Sales', 'Number of sales by Platform in North America (in millions)')
EU_Sales_Chart_Platform = create_chart_platform(comp_platform, 'EU_Sales', 'Number of sales by Platform in Europe (in millions)')
JP_Sales_Chart_Platform = create_chart_platform(comp_platform, 'JP_Sales', 'Number of sales by Platform in Japan (in millions)')
Other_Sales_Chart_Platform = create_chart_platform(comp_platform, 'Other_Sales', 'Number of sales by Platform in the rest of the world (in millions)')

grid_platform = alt.vconcat(NA_Sales_Chart_Platform, EU_Sales_Chart_Platform, JP_Sales_Chart_Platform, Other_Sales_Chart_Platform)

grid_platform.display()


# Conclusion


Wrapping up our deep dive into video game sales data, I must say, it's been quite the adventure, especially with our bold move to use <b>Vega-Lite and Altair</b> for visualization. But hey, why stick to the ordinary, right? Here's what we've unraveled:

<b>Genre Dominance:</b> Action and Sports genres lead in popularity and sales.<br>
<b>Trends:</b> Growth trends over time indicate the industry's dynamism.<br>
<b>Global Impact:</b> The global gaming market demonstrates substantial revenue.<br>
<b>Regional Preferences:</b>Varied preferences exist across regions for genres, platforms, and publishers.<br>
<b>Publisher Performance:</b> Top publishers drive significant global sales.<br>
    
These findings provide valuable intelligence for industry stakeholders to strategize effectively and cater to diverse market demands, ensuring continued success and innovation in the ever-evolving video game industry.

 <center>**Please Upvote If you like the work!**</center>
 <center>**Any sort of feedback would be appreciated!**</center>
 <center>**Thank you!**</center>
