Conjunto de dados obtido em

# Imporatção das bibiotecas

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Leitura do dataset

In [None]:
vgsales = pd.read_csv("/content/drive/MyDrive/PANDA/vgsales.csv")

In [None]:
vgsales

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


Como há poucas linhas com valores NA (baixa porcentagem), pode-se desconsiderá-los nas análises. Caso esse valor fosse mais expressivo, poderia ser desconsiderado apenas nas análises que usariam essas colunas, mas deixá-los nas análises que não as usariam.

Dependendo da análise feita, pode-se considerar ou desconsiderar algumas linhas. Caso seja feita uma análise em relação às vendas dos jogos deve-se verificar a coerência das colunas de vendas, por exemplo, jogos com vendas nulas (que não venderam em nenhum lugar), mas tem vendas globais com valor não nulo; ou jogos que venderam em apenas alguns locais, assim, não apresentaram, necessariamente, vendas globais. Mas em análises que não levam em conta as vendas dos jogos, não precisa fazer essa limpeza, já que essas colunas nem seriam usadas.

In [None]:
vgsales.shape

(16598, 11)

In [None]:
vgsales.columns

Index(['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales',
       'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales'],
      dtype='object')

In [None]:
vgsales['Platform'].unique()

array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA',
       '3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC',
       'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16',
       '3DO', 'GG', 'PCFX'], dtype=object)

## Atributos contidos no conjunto de dados

- Rank: classificação do jogo baseada nas vendas globais.

- Name: nome do jogo.

- Platform: plataforma de lançamento do jogo.

- Year: ano de lançamento do jogo.

- Genre: gênero do jogo.

- Publisher: distribuidora do jogo.

- NA_Sales: vendas do jogo na américa do norte (em milhoões).

- EU_Sales: vendas do jogo na europa (em milhoões).

- JP_Sales: vendas do jogo no japão (em milhoões).

- Other_Sales: vendas do jogo no restante do mundo (em milhoões).

- Global_Sales: vendas globais do jogo (em milhoões).

# Limpeza dos dados

## Retirada de dados faltantes (NAs)

Como verificou-se que a porcentagem de dados faltantes era pequena, decidiu-se por retirá-los do banco de dados.

In [None]:
# porcentagem de dados NA
vgsales.isna().mean().round(4).mul(100).sort_values(ascending = False)

Year            1.63
Publisher       0.35
Rank            0.00
Name            0.00
Platform        0.00
Genre           0.00
NA_Sales        0.00
EU_Sales        0.00
JP_Sales        0.00
Other_Sales     0.00
Global_Sales    0.00
dtype: float64

In [None]:
# filtragem dos valores faltantes
vgsales = vgsales[vgsales['Year'].notna()]
vgsales = vgsales[vgsales['Publisher'].notna()]

In [None]:
# nova porcentagem de dados NA
vgsales.isna().mean().round(4).mul(100).sort_values(ascending = False)

Rank            0.0
Name            0.0
Platform        0.0
Year            0.0
Genre           0.0
Publisher       0.0
NA_Sales        0.0
EU_Sales        0.0
JP_Sales        0.0
Other_Sales     0.0
Global_Sales    0.0
dtype: float64

In [None]:
vgsales

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


## Retirada de dados incoerentes

Retirou-se dados incoerentes, isto é, jogos que não apresentaram nenhuma venda local, mas ainda possuiam vendas globais com valor não nulo. Neste caso, apenas um jogo foi retirado.

In [None]:
partial_sales = vgsales[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']].sum(axis = 1)
partial_sales

0        82.74
1        40.24
2        35.83
3        33.00
4        31.38
         ...  
16593     0.01
16594     0.01
16595     0.00
16596     0.01
16597     0.01
Length: 16291, dtype: float64

In [None]:
vgsales[partial_sales + vgsales['Global_Sales'] == vgsales['Global_Sales']]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01


In [None]:
vgsales[vgsales.Name == 'SCORE International Baja 1000: The Official Game']

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
9674,9676,SCORE International Baja 1000: The Official Game,X360,2008.0,Racing,Activision,0.11,0.01,0.0,0.01,0.12
10620,10622,SCORE International Baja 1000: The Official Game,Wii,2008.0,Racing,Activision,0.09,0.0,0.0,0.01,0.1
11723,11725,SCORE International Baja 1000: The Official Game,PS3,2008.0,Racing,Activision,0.07,0.0,0.0,0.01,0.08
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01


In [None]:
vgsales = vgsales[vgsales.Rank != 16598]
vgsales

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16592,16595,Plushees,DS,2008.0,Simulation,Destineer,0.01,0.00,0.00,0.00,0.01
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [None]:
partial_sales = partial_sales[partial_sales > 0]
partial_sales

0        82.74
1        40.24
2        35.83
3        33.00
4        31.38
         ...  
16592     0.01
16593     0.01
16594     0.01
16596     0.01
16597     0.01
Length: 16290, dtype: float64

## Dados com erro de precisão

Ao fazer uma comparação entre a soma das vendas em locais específicos (américa do norte, europa, Japão e o restante do mundo) e as vendas globais, observou-se que os dados apresentam uma diferença máxima de 0.02, o que pode ter sido causado pela representação numérica adotada no dataset, isto é, na casa dos milhões com dois digitos decimais.

Assim, o menor valor representado é 0.01 milhão, que corresponde a 10 mil e acaba desconsiderando valores menores que podem ter sido usados no cálculo das vendas globais antes de passá-lo para a representação numérica atual.

Por isso, decidiu-se por deixar esses valores no dataset sendo analisado.

In [None]:
vgsales[abs(partial_sales - vgsales['Global_Sales']) < 10e-2]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16592,16595,Plushees,DS,2008.0,Simulation,Destineer,0.01,0.00,0.00,0.00,0.01
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [None]:
vgsales[abs(partial_sales - vgsales['Global_Sales']) < 10e-3]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.20,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.50,2.90,30.01
...,...,...,...,...,...,...,...,...,...,...,...
16592,16595,Plushees,DS,2008.0,Simulation,Destineer,0.01,0.00,0.00,0.00,0.01
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [None]:
abs(partial_sales - vgsales['Global_Sales']).max()

0.020000000000000462

In [None]:
partial_sales = vgsales[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']].sum(axis = 1)
abs(partial_sales - vgsales['Global_Sales']).max()

0.020000000000000462

In [None]:
# vgsales_clean = vgsales[abs(partial_sales - vgsales['Global_Sales']) < 10e-2]
# vgsales_clean

vgsales_clean = vgsales[abs(partial_sales - vgsales['Global_Sales']) < 10e-2]
vgsales_clean = vgsales_clean[partial_sales + vgsales_clean['Global_Sales']  > vgsales_clean['Global_Sales']]
vgsales_clean

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16592,16595,Plushees,DS,2008.0,Simulation,Destineer,0.01,0.00,0.00,0.00,0.01
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


## Separação do dataset

A partir do dataset original, pode-se analisar 3 diferentes conjuntos:

- Dataset original.

- Dataset com jogos que não venderam em alguns países/continetes.

- Dataset com jogos que venderam globalmente (em todos países/continetes).

In [None]:
# jogos que não venderam em algumas países/continentes
vgsales_local = vgsales_clean[(vgsales_clean[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']] == 0).any(axis = 1)]
vgsales_local.head(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
60,61,Just Dance 3,Wii,2011.0,Misc,Ubisoft,6.05,3.15,0.0,1.07,10.26
83,84,The Sims 3,PC,2009.0,Simulation,Electronic Arts,0.98,6.42,0.0,0.71,8.11
89,90,Pac-Man,2600,1982.0,Puzzle,Atari,7.28,0.45,0.0,0.08,7.81
98,99,Call of Duty: World at War,X360,2008.0,Shooter,Activision,4.79,1.9,0.0,0.69,7.37
102,103,Just Dance,Wii,2009.0,Misc,Ubisoft,3.51,3.03,0.0,0.73,7.27
111,112,Just Dance 4,Wii,2012.0,Misc,Ubisoft,4.14,2.21,0.0,0.56,6.91
117,118,Zumba Fitness,Wii,2010.0,Sports,505 Games,3.5,2.64,0.0,0.67,6.81
137,138,World of Warcraft,PC,2004.0,Role-Playing,Activision,0.07,6.21,0.0,0.0,6.28
147,148,Final Fantasy XII,PS2,2006.0,Role-Playing,Square Enix,1.88,0.0,2.33,1.74,5.95
150,151,LEGO Star Wars: The Complete Saga,Wii,2007.0,Action,LucasArts,3.66,1.63,0.0,0.53,5.83


In [None]:
# retira jogos que venderam apenas em alguns lugares
vgsales_global = vgsales_clean[(vgsales_clean[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']] > 0).all(axis = 1)]
vgsales_global.head(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
7,8,Wii Play,Wii,2006.0,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
9,10,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31


In [None]:
vgsales_limpo = pd.read_csv('/content/drive/MyDrive/Datasets/vendas_videogame_limpo.csv')
vgsales_limpo

Unnamed: 0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...
16286,Woody Woodpecker in Crazy Castle 5,GBA,2002,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16287,Men in Black II: Alien Escape,GC,2003,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16288,SCORE International Baja 1000: The Official Game,PS2,2008,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16289,Know How 2,DS,2010,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


# Análise exploratória dos dados

## Histograma das plataformas dos jogos

Cada barra representa uma plataforma e seu tamanho está relacionado com a quantidade de jogos daquela plataforma.

### Observações

- As plataformas com mais jogos são PS2 e DS, ambos tendo valores próximos.

In [None]:
# fig = px.histogram(vgsales, x='Platform').update_xaxes(categoryorder = 'total ascending')
fig = px.histogram(vgsales, x = vgsales['Platform']).update_xaxes(categoryorder = 'total ascending')
fig.show()

In [None]:
sales_per_plataform = vgsales.groupby('Platform')['Global_Sales'].sum()
fig = px.histogram(sales_per_plataform,
                   x = sales_per_plataform.index,
                   y = sales_per_plataform.values,
                   color_discrete_sequence = ['darkgreen']).update_xaxes(categoryorder = 'total ascending')
fig.show()

## Histograma dos anos de lançamento dos jogos

Cada barra corresponde a um ano e seu tamanho está relacionado com a quantidade de jogos lançados nesse período.

### Observações

- Entre os anos de 2005 e 2001 houve um pico no lançamento de jogos.
- Não há dados de jogos lançados após 2016.

In [None]:
fig = px.histogram(vgsales, x='Year')
fig.show()

In [None]:
# years = vgsales['Year'].unique().value_sort()
# print(years)
sales_per_year = vgsales.groupby(by = 'Year')['Global_Sales'].sum()
sales_per_year.index = sales_per_year.index.map(int)
sales_per_year.index = sales_per_year.index.map(str)
fig = px.histogram(sales_per_year,
                   x = sales_per_year.index,
                   y = sales_per_year.values,
                   color_discrete_sequence = ['darkgreen'],
                   nbins = int(sales_per_year.count()))

fig.show()

In [None]:
sales_per_year = vgsales.groupby(by = 'Year')['Global_Sales'].sum()
sales_per_year.index = sales_per_year.index.map(int)
sales_per_year.index = sales_per_year.index.map(str)

fig = go.Figure()
fig.add_trace(go.Histogram(histfunc="count", y=vgsales['Year'], x=vgsales['Year'], name="Jogos lançados no ano", marker_color='#5555FF'))
fig.add_trace(go.Histogram(histfunc="sum", y=sales_per_year.values, x=sales_per_year.index, name="Total de vendas no ano", marker_color='#008800'))

fig.update_layout(
    title_text = 'Número de jogos laçados e total de vendas por ano',
    xaxis_title_text = 'Ano',
    bargap = 0.2,
)

fig.show()

In [None]:
vgsales['Platform'].unique()

array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA',
       '3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC',
       'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16',
       '3DO', 'GG', 'PCFX'], dtype=object)

In [None]:
# years = vgsales['Year'].unique().value_sort()
# print(years)
sales_per_year = vgsales['Platform'].groupby(by = ['PS3', 'X360'])['Global_Sales'].sum()
sales_per_year.index = sales_per_year.index.map(int)
sales_per_year.index = sales_per_year.index.map(str)
fig = px.histogram(sales_per_year,
                   x = sales_per_year.index,
                   y = sales_per_year.values,
                   color_discrete_sequence = ['darkgreen'],
                   nbins = int(sales_per_year.count()))

fig.show()

KeyError: ignored

In [None]:
vgsales[vgsales['Year'] > 2008]

Histograma dos gêneros dos jogos: cada barra é um gênero e seu tamanho está relacionado com  a quantidade de jogos do gênero.

In [None]:
fig = px.histogram(vgsales, x='Genre').update_xaxes(categoryorder = 'total ascending')
fig.show()

Histograma dos publicadores dos jogos: cada barra é um publisher e seu tamanho está relacionado a quantidade de jogos lançados pela publicadora.

Nota-se que poucas publishers lançaram muitos jogos, mas a grande maioria lançou poucos.

In [None]:
fig = px.histogram(vgsales, x='Publisher').update_xaxes(categoryorder = 'total ascending')
fig.show()

In [None]:
vgsales['Publisher'].value_counts()

Boxplot dos publicadores de jogos: como visto antes, há poucas publicadoras que lançaram muitos jogos, assim, no boxplot eles se tornaram outliers comparados a maioria, que publicaram poucos jogos.

In [None]:
#sns.boxplot(y = vgsales['Publisher'].value_counts())
fig = px.box(vgsales['Publisher'].value_counts(), x = 'Publisher')
fig.show()

In [None]:
vgsales['Global_Sales'].value_counts()

Boxplot das vendas globais: nota-se a presentaça de muitos outliers com valores muito distantes da maioria.

Isso pode indicar que apenas alguns jogos vendem muito, enquanto a maioria não faz tanto sucesso.

In [None]:
fig = px.box(vgsales['Global_Sales'].value_counts(), x = 'Global_Sales')
fig.show()

Histograma das vendas globais: mostra que a maioria dos jogos vendeu pouquíssimo, enquando alguns venderam muito.

In [None]:
fig = px.histogram(vgsales, x='Global_Sales').update_xaxes(categoryorder = 'total ascending')
fig.show()