# Análise Exploratória

Este notebook apresenta a etapa inicial de análise exploratória do Steam Games Dataset, disponibilizado em:
https://www.kaggle.com/datasets/fronkongames/steam-games-dataset
.

Para assegurar a reprodutibilidade e organização do projeto, todos os arquivos utilizados na análise foram previamente armazenados no diretório `/datasets`.

---

## Configuração do Ambiente

As bibliotecas necessárias são carregadas e algumas configurações são ajustadas para facilitar o tratamento e a visualização dos dados. Em particular, amplia-se o limite de caracteres dos campos CSV, evitando truncamento de linhas extensas, e permite-se que o pandas exiba todas as colunas do DataFrame durante a inspeção.

In [1]:
%pip install pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import csv
import sys
import pandas

# Remove limite padrão de tamanho de campos em arquivos CSV
csv.field_size_limit(sys.maxsize)

# Permite a visualização de todas as colunas do DataFrame
pandas.set_option('display.max_columns', None)

dataset_csv_path = "./datasets/games.csv"
dataset_csv_fixed_path = "./datasets/games_fixed.csv"

## Correção do Dataset

O arquivo original (`games.csv`) apresenta inconsistências estruturais que impedem sua leitura direta pelo pandas. Em particular, o número de colunas definido no cabeçalho não corresponde ao número de campos presentes em algumas linhas do arquivo. Esse tipo de problema é comum em bases de dados grandes e heterogêneas oriundas de web scraping ou integrações automatizadas.

Durante a inspeção inicial, identificou-se que a coluna `DiscountDLC count` estava incorretamente delimitada, ocasionando uma fusão indevida entre dois atributos distintos. Assim, procedeu-se à correção manual do cabeçalho, assegurando a correta separação entre as colunas `discount` e `dlc_count`.

Além disso, os nomes das colunas foram normalizados para um formato mais adequado ao processamento em Python: todos os caracteres foram convertidos para minúsculas, acentos e espaços foram removidos ou substituídos, e adotou-se o padrão `snake_case`.

In [3]:
# Exibe as primeiras linhas do arquivo original para inspeção
with open(dataset_csv_path, 'r', encoding='utf-8') as f:
    for i in range(5):
        print(f.readline())

# Leitura completa do arquivo
with open(dataset_csv_path, 'r', encoding='utf-8') as f:
    linhas = f.readlines()

# Correção do cabeçalho
header = linhas[0]

# Ajuste específico da coluna problemática
header = header.replace('DiscountDLC count', 'Discount,DLC count')

# Normalização dos nomes das colunas
colunas = header.strip().split(',')
colunas_normalizadas = [
    c.strip().lower().replace(" ", "_")
    for c in colunas
]

# Reconstrução do cabeçalho corrigido
linhas[0] = ",".join(colunas_normalizadas) + "\n"

# Salvamento do arquivo corrigido
with open(dataset_csv_fixed_path, 'w', encoding='utf-8') as f:
    f.writelines(linhas)

AppID,Name,Release date,Estimated owners,Peak CCU,Required age,Price,DiscountDLC count,About the game,Supported languages,Full audio languages,Reviews,Header image,Website,Support url,Support email,Windows,Mac,Linux,Metacritic score,Metacritic url,User score,Positive,Negative,Score rank,Achievements,Recommendations,Notes,Average playtime forever,Average playtime two weeks,Median playtime forever,Median playtime two weeks,Developers,Publishers,Categories,Genres,Tags,Screenshots,Movies

20200,"Galactic Bowling","Oct 21, 2008","0 - 20000",0,0,19.99,0,0,"Galactic Bowling is an exaggerated and stylized bowling game with an intergalactic twist. Players will engage in fast-paced single and multi-player competition while being submerged in a unique new universe filled with over-the-top humor, wild characters, unique levels, and addictive game play. The title is aimed at players of all ages and skill sets. Through accessible and intuitive controls and game-play, Galactic Bowling allows you to j

## Carregamento do Dataset

Após a correção estrutural, o conjunto de dados é carregado no pandas para dar início à análise exploratória. Utiliza-se o parâmetro `engine="python"` para ampliar a robustez na leitura de arquivos CSV possivelmente irregulares, e define-se a codificação `"utf-8-sig"` para garantir a correta interpretação dos caracteres especiais.

In [4]:
games_dataset = pandas.read_csv(
  dataset_csv_fixed_path,
  sep=",",
  quotechar='"',
  quoting=csv.QUOTE_MINIMAL,
  engine="python",
  encoding="utf-8-sig",
)

## Análise inicial

In [5]:
games_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 111452 entries, 0 to 111451
Data columns (total 40 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   appid                       111452 non-null  int64  
 1   name                        111446 non-null  object 
 2   release_date                111452 non-null  object 
 3   estimated_owners            111452 non-null  object 
 4   peak_ccu                    111452 non-null  int64  
 5   required_age                111452 non-null  int64  
 6   price                       111452 non-null  float64
 7   discount                    111452 non-null  int64  
 8   dlc_count                   111452 non-null  int64  
 9   about_the_game              104969 non-null  object 
 10  supported_languages         111452 non-null  object 
 11  full_audio_languages        111452 non-null  object 
 12  reviews                     10624 non-null   object 
 13  header_image  

In [6]:
games_dataset.describe()

Unnamed: 0,appid,peak_ccu,required_age,price,discount,dlc_count,metacritic_score,user_score,positive,negative,score_rank,achievements,recommendations,average_playtime_forever,average_playtime_two_weeks,median_playtime_forever,median_playtime_two_weeks
count,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0,44.0,111452.0,111452.0,111452.0,111452.0,111452.0,111452.0
mean,1716972.0,177.7215,0.254208,7.061568,0.464209,0.44953,2.623354,0.030408,754.3525,125.859177,98.909091,17.511144,616.3715,81.24729,9.174954,72.65133,9.891038
std,920385.9,8390.462,2.035653,12.563246,3.503658,12.006677,13.736245,1.565136,21394.1,4002.844431,0.857747,150.139008,15738.54,999.935906,168.20103,1321.333137,183.232812
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,97.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,936255.0,0.0,0.0,0.99,0.0,0.0,0.0,0.0,0.0,0.0,98.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1665065.0,0.0,0.0,3.99,0.0,0.0,0.0,0.0,3.0,1.0,99.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2453585.0,1.0,0.0,9.99,0.0,0.0,0.0,0.0,29.0,8.0,100.0,17.0,0.0,0.0,0.0,0.0,0.0
max,3671840.0,1311366.0,21.0,999.98,92.0,2366.0,97.0,100.0,5764420.0,895978.0,100.0,9821.0,3441592.0,145727.0,19159.0,208473.0,19159.0


In [7]:
games_dataset.shape

(111452, 40)

In [8]:
games_dataset.head()

Unnamed: 0,appid,name,release_date,estimated_owners,peak_ccu,required_age,price,discount,dlc_count,about_the_game,supported_languages,full_audio_languages,reviews,header_image,website,support_url,support_email,windows,mac,linux,metacritic_score,metacritic_url,user_score,positive,negative,score_rank,achievements,recommendations,notes,average_playtime_forever,average_playtime_two_weeks,median_playtime_forever,median_playtime_two_weeks,developers,publishers,categories,genres,tags,screenshots,movies
0,20200,Galactic Bowling,"Oct 21, 2008",0 - 20000,0,0,19.99,0,0,Galactic Bowling is an exaggerated and stylize...,['English'],[],,https://cdn.akamai.steamstatic.com/steam/apps/...,http://www.galacticbowling.net,,,True,False,False,0,,0,6,11,,30,0,,0,0,0,0,Perpetual FX Creative,Perpetual FX Creative,"Single-player,Multi-player,Steam Achievements,...","Casual,Indie,Sports","Indie,Casual,Sports,Bowling",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
1,655370,Train Bandit,"Oct 12, 2017",0 - 20000,0,0,0.99,0,0,THE LAW!! Looks to be a showdown atop a train....,"['English', 'French', 'Italian', 'German', 'Sp...",[],,https://cdn.akamai.steamstatic.com/steam/apps/...,http://trainbandit.com,,support@rustymoyher.com,True,True,False,0,,0,53,5,,12,0,,0,0,0,0,Rusty Moyher,Wild Rooster,"Single-player,Steam Achievements,Full controll...","Action,Indie","Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
2,1732930,Jolt Project,"Nov 17, 2021",0 - 20000,0,0,4.99,0,0,Jolt Project: The army now has a new robotics ...,"['English', 'Portuguese - Brazil']",[],,https://cdn.akamai.steamstatic.com/steam/apps/...,,,ramoncampiaof31@gmail.com,True,False,False,0,,0,0,0,,0,0,,0,0,0,0,Campião Games,Campião Games,Single-player,"Action,Adventure,Indie,Strategy",,https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
3,1355720,Henosis™,"Jul 23, 2020",0 - 20000,0,0,5.99,0,0,HENOSIS™ is a mysterious 2D Platform Puzzler w...,"['English', 'French', 'Italian', 'German', 'Sp...",[],,https://cdn.akamai.steamstatic.com/steam/apps/...,https://henosisgame.com/,https://henosisgame.com/,info@henosisgame.com,True,True,True,0,,0,3,0,,0,0,,0,0,0,0,Odd Critter Games,Odd Critter Games,"Single-player,Full controller support","Adventure,Casual,Indie","2D Platformer,Atmospheric,Surreal,Mystery,Puzz...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
4,1139950,Two Weeks in Painland,"Feb 3, 2020",0 - 20000,0,0,0.0,0,0,ABOUT THE GAME Play as a hacker who has arrang...,"['English', 'Spanish - Spain']",[],,https://cdn.akamai.steamstatic.com/steam/apps/...,https://www.unusual-games.com/home/,https://www.unusual-games.com/contact/,welistentoyou@unusual-games.com,True,True,False,0,,0,50,8,,17,0,This Game may contain content not appropriate ...,0,0,0,0,Unusual Games,Unusual Games,"Single-player,Steam Achievements","Adventure,Indie","Indie,Adventure,Nudity,Violent,Sexual Content,...",https://cdn.akamai.steamstatic.com/steam/apps/...,http://cdn.akamai.steamstatic.com/steam/apps/2...
