# Análise exploratória

Este é a análise exploratória do Steam Games Database que pode ser encontrado aqui: https://www.kaggle.com/datasets/fronkongames/steam-games-dataset

Antes de fazer a análise, coloque os arquivos dentro da pasta `/datasets`.

## Preparando ambiente

In [15]:
import csv
import sys
import numpy
import pandas
import sqlalchemy

In [16]:
# remove csv fild limit
csv.field_size_limit(sys.maxsize)

9223372036854775807

In [17]:
dataset_csv_path = "./datasets/games.csv"
dataset_csv_fixed_path = "./datasets/games_fixed.csv"


## Preparando datasets

O arquivo `games.csv` está quebrado, então precisa arrumar ele antes de usar de vez no pandas. O erro é que a quantidade de colunas no cabeçalho está diferente da quantidade de colunas nas rows em si. Para resolver esse problema eu apliquei uma correção:

In [18]:
# Ler o arquivo
with open(dataset_csv_path, 'r', encoding='utf-8') as f:
    linhas = f.readlines()

# Corrigir o cabeçalho
linhas[0] = linhas[0].replace('DiscountDLC count', 'Discount,DLC count')

# Salvar o arquivo corrigido
with open(dataset_csv_fixed_path, 'w', encoding='utf-8') as f:
    f.writelines(linhas)

print("✅ Arquivo corrigido salvo como: games_fixed.csv")

✅ Arquivo corrigido salvo como: games_fixed.csv


Carregando o dataset corrigido para dentro do pandas:

In [19]:
games_dataset = pandas.read_csv(
  dataset_csv_fixed_path,
  sep=",",
  quotechar='"',
  quoting=csv.QUOTE_MINIMAL,
  engine="python",
  encoding="utf-8-sig",
)

## Análise inicial

In [20]:
games_dataset.info() # Ver tipos e valores nulos

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111452 entries, 0 to 111451
Data columns (total 40 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   AppID                       111452 non-null  int64  
 1   Name                        111446 non-null  object 
 2   Release date                111452 non-null  object 
 3   Estimated owners            111452 non-null  object 
 4   Peak CCU                    111452 non-null  int64  
 5   Required age                111452 non-null  int64  
 6   Price                       111452 non-null  float64
 7   Discount                    111452 non-null  int64  
 8   DLC count                   111452 non-null  int64  
 9   About the game              104969 non-null  object 
 10  Supported languages         111452 non-null  object 
 11  Full audio languages        111452 non-null  object 
 12  Reviews                     10624 non-null   object 
 13  Header image  

Podemos ver que a tabela é formada por 40 colunas.

In [21]:
games_dataset.head() # Ver primeiras linhas

Unnamed: 0,AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,Discount,DLC count,About the game,...,Average playtime two weeks,Median playtime forever,Median playtime two weeks,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,0,Galactic Bowling is an exaggerated and stylize...,...,0,0,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,0,THE LAW!! Looks to be a showdown atop a train....,...,0,0,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,0,Jolt Project: The army now has a new robotics ...,...,0,0,0,Campião Games,Campião Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
3,1355720,Henosis™,"Jul 23, 2020",0 - 20000,0,0,5.99,0,0,HENOSIS™ is a mysterious 2D Platform Puzzler w...,...,0,0,0,Odd Critter Games,Odd Critter Games,"Single-player,Full controller support","Adventure,Casual,Indie","2D Platformer,Atmospheric,Surreal,Mystery,Puzz...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
4,1139950,Two Weeks in Painland,"Feb 3, 2020",0 - 20000,0,0,0.0,0,0,ABOUT THE GAME Play as a hacker who has arrang...,...,0,0,0,Unusual Games,Unusual Games,"Single-player,Steam Achievements","Adventure,Indie","Indie,Adventure,Nudity,Violent,Sexual Content,...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...


In [22]:
games_dataset.describe() # Estatísticas básicas

Unnamed: 0,AppID,Peak CCU,Required age,Price,Discount,DLC count,Metacritic score,User score,Positive,Negative,Score rank,Achievements,Recommendations,Average playtime forever,Average playtime two weeks,Median playtime forever,Median playtime two weeks
count,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,44.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0
mean,1716972.0,177.7215,0.254208,7.061568,0.464209,0.44953,2.623354,0.030408,754.3525,125.859177,98.909091,17.511144,616.3715,81.24729,9.174954,72.65133,9.891038
std,920385.9,8390.462,2.035653,12.563246,3.503658,12.006677,13.736245,1.565136,21394.1,4002.844431,0.857747,150.139008,15738.54,999.935906,168.20103,1321.333137,183.232812
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,97.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,936255.0,0.0,0.0,0.99,0.0,0.0,0.0,0.0,0.0,0.0,98.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1665065.0,0.0,0.0,3.99,0.0,0.0,0.0,0.0,3.0,1.0,99.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2453585.0,1.0,0.0,9.99,0.0,0.0,0.0,0.0,29.0,8.0,100.0,17.0,0.0,0.0,0.0,0.0,0.0
max,3671840.0,1311366.0,21.0,999.98,92.0,2366.0,97.0,100.0,5764420.0,895978.0,100.0,9821.0,3441592.0,145727.0,19159.0,208473.0,19159.0


In [23]:
games_dataset.shape # Dimensões

(111452, 40)

In [24]:
games_dataset.columns.tolist() # Lista de colunas

['AppID',
 'Name',
 'Release date',
 'Estimated owners',
 'Peak CCU',
 'Required age',
 'Price',
 'Discount',
 'DLC count',
 'About the game',
 'Supported languages',
 'Full audio languages',
 'Reviews',
 'Header image',
 'Website',
 'Support url',
 'Support email',
 'Windows',
 'Mac',
 'Linux',
 'Metacritic score',
 'Metacritic url',
 'User score',
 'Positive',
 'Negative',
 'Score rank',
 'Achievements',
 'Recommendations',
 'Notes',
 'Average playtime forever',
 'Average playtime two weeks',
 'Median playtime forever',
 'Median playtime two weeks',
 'Developers',
 'Publishers',
 'Categories',
 'Genres',
 'Tags',
 'Screenshots',
 'Movies']