<a href="https://colab.research.google.com/github/alighieris/data-is-beautiful/blob/main/games/Projeto_Python%26Pandas_DIO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

O Dataset trabalhado contém uma lista de video games que venderam mais de 100k cópias.
O Dataset está disponível no [Kaggle](https://www.kaggle.com/gregorut/videogamesales)

In [1]:
!pip install --upgrade plotly
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt

Collecting plotly
  Downloading plotly-5.2.1-py2.py3-none-any.whl (21.8 MB)
[K     |████████████████████████████████| 21.8 MB 71.8 MB/s 
Collecting tenacity>=6.2.0
  Downloading tenacity-8.0.1-py3-none-any.whl (24 kB)
Installing collected packages: tenacity, plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 4.4.1
    Uninstalling plotly-4.4.1:
      Successfully uninstalled plotly-4.4.1
Successfully installed plotly-5.2.1 tenacity-8.0.1


In [9]:
df = pd.read_csv("./drive/MyDrive/Colab Notebooks/data/video_games_sales.csv")

In [10]:
df.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


O Dataset contém os dados  de:

- Rank - Ranking do total de vendas
- Name
- Platform - Plataforma de lançamento
- Year 
- Genre 
- Publisher

E os dados de número de **vendas** (em milhões):
  - NA_Sales
  - EU_Sales
  - JP_Sales
  - Other_Sales - vendas fora dos mercados NA, EU e JP
  - Global_Sales

In [67]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


In [None]:
len(df['Publisher'].unique())

579

In [None]:
len(df['Platform'].unique())

31

In [None]:
len(df['Genre'].unique())

12

In [None]:
df['Year'].describe()

count    16327.000000
mean      2006.406443
std          5.828981
min       1980.000000
25%       2003.000000
50%       2007.000000
75%       2010.000000
max       2020.000000
Name: Year, dtype: float64

Temos um total de 579 Publishers, 31 Plataformas e 12 Gêneros de jogos, no período de 1980 a 2020.

In [11]:
platf_top8 = df.groupby(by='Platform').sum().sort_values('Global_Sales', ascending=False).drop(['Rank','Year'], axis=1).head(8)

In [12]:
fig = px.bar(platf_top8, x=platf_top8.index, y=['NA_Sales', 'EU_Sales', 'JP_Sales','Other_Sales'], 
       title='Video games sales worldwide - TOP 8 Platforms', labels={'value':'Sales (mi)'},
       template='plotly_dark')

fig.update_layout(font={'color':'white', 'family':'Balto', 'size':16}, legend={'title':'Region'})

fig.show()

A plataforma com mais vendas ao redor do mundo é o PlayStation 2, sendo os maiores mercados, o norte americano e o europeu.

Podemos analisar o crescimento de vendas totais de video games com o passsar dos anos:

In [38]:
total_yearly = df.groupby(by=['Year', 'Platform']).sum().sort_values('Year').drop(['Rank', 'NA_Sales', 'EU_Sales','JP_Sales','Other_Sales'],axis=1)
total_yearly.reset_index(inplace=True)

In [39]:
total_yearly.head(10)

Unnamed: 0,Year,Platform,Global_Sales
0,1980.0,2600,11.38
1,1981.0,2600,35.77
2,1982.0,2600,28.86
3,1983.0,2600,5.83
4,1983.0,NES,10.96
5,1984.0,2600,0.27
6,1984.0,NES,50.09
7,1985.0,2600,0.45
8,1985.0,DS,0.02
9,1985.0,NES,53.44


In [82]:
fig1 = go.Figure()

for platf in df['Platform'].unique():
  ty = total_yearly[total_yearly['Platform']==platf]
  fig1.add_trace(go.Scatter(x=ty['Year'], y=ty['Global_Sales'], mode='lines', name=platf,
                            hovertemplate= 'Year: %{x:.0f}<br>'+
                                           'Sales: %{y}'
                            ))


fig1.update_layout(template='plotly_dark', xaxis={'showgrid':False, 'title':'Year'}, yaxis={'showgrid':False, 'title':'Sales (mi)'},
                   title = 'Video Game Sales per Platform, 1980-2020',font={'color':'white', 'family':'Balto', 'size':16},
                   legend={'title':'Platform','font_size':12})
fig1.show()

Porém, fica claro que estes dados não estão atualizados, uma vez que a partir do ano de 2015 ocorre uma grande queda no número de vendas e falta de registros em todas as plataformas, o que não pode ser verdade, já que o mercado de games está em alta há alguns anos.

In [101]:
df_total = df.groupby(by='Platform',as_index=False).sum()[['Platform','Global_Sales']]

In [107]:
fig2 = px.pie(df_total,values='Global_Sales', names='Platform', template='plotly_dark')
fig2.update_traces(textposition='inside', textinfo='percent+label')
fig2.update_layout(title='Percentual Sales by Platform', font={'color':'white', 'family':'Balto', 'size':16})
fig2.show()

In [110]:
df_genre = df.groupby(by='Genre').sum().drop(['Rank', 'Year'], axis=1)

In [111]:
df_genre

Unnamed: 0_level_0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Action,877.83,525.0,159.95,187.38,1751.18
Adventure,105.8,64.13,52.07,16.81,239.04
Fighting,223.59,101.32,87.35,36.68,448.91
Misc,410.24,215.98,107.76,75.32,809.96
Platform,447.05,201.63,130.77,51.59,831.37
Puzzle,123.78,50.78,57.31,12.55,244.95
Racing,359.42,238.39,56.69,77.27,732.04
Role-Playing,327.28,188.06,352.31,59.61,927.37
Shooter,582.6,313.27,38.28,102.69,1037.37
Simulation,183.31,113.38,63.7,31.52,392.2


In [122]:
fig3 = px.bar(df_genre, x=['NA_Sales', 'EU_Sales', 'JP_Sales','Other_Sales'],y=df_genre.index, 
       title='Video games sales worldwide by Genre', labels={'value': 'Sales (mi)'},
       template='plotly_dark')

fig3.update_layout(font={'color':'white', 'family':'Balto', 'size':16}, legend={'title':'Region'}, yaxis={'categoryorder':'total ascending'})

fig3.show()