### Adicional CSV 

In this exploratory data analysis (EDA), we face a common challenge in our data: columns containing grouped records, such as comma-separated lists. We will use data transformation and cleaning techniques to decompose these records into individual columns, allowing us to explore in greater detail the distribution and frequency of items within these lists.

Libreries and modules

In [3]:
import pandas as pd
import os
from dotenv import load_dotenv

Set the work envarioment

In [None]:
load_dotenv()

work_dir = os.getenv("WORK_DIR")

file_path = "../data/final_games.csv"
df = pd.read_csv(file_path, encoding='utf-8')


### Supported Languages

The provided code performs the initial preparation of our Steam game dataset, splitting the lists of languages supported by the games into individual rows and calculating the frequency of each language.

In [4]:
df['SupportedLanguagesSepareted'] = df['SupportedLanguages'].str.split(',')

df_languages = df.explode('SupportedLanguagesSepareted')

df_languages['SupportedLanguagesSepareted'] = df_languages['SupportedLanguagesSepareted'].str.strip()

df_languages['LanguageFrequency'] = df_languages['SupportedLanguagesSepareted'].map(df_languages['SupportedLanguagesSepareted'].value_counts())

view = ['AppID', 'Name', 'SupportedLanguages', 'SupportedLanguagesSepareted', 'LanguageFrequency']

df_languages[view].to_csv(f'{work_dir}/data/SupportedLanguages.csv', index=False, encoding='utf-8')

### Categories

This code focuses on exploring the categories assigned to games in the Steam dataset. We split the categories into individual lists and expand them into separate rows to get a detailed view of the category distribution.

In [5]:
df['CategoriesSepareted'] = df['Categories'].str.split(',')

df_categories = df.explode('CategoriesSepareted')

df_categories['CategoriesSepareted'] = df_categories['CategoriesSepareted'].str.strip()

view = ['AppID', 'Name', 'Categories', 'CategoriesSepareted']

df_categories[view].to_csv(f'{work_dir}/data/Categories.csv', index=False, encoding='utf-8')

### Genres 

This code focuses on exploring the genres assigned to games in the Steam dataset. First, we split the genres into individual lists and then expand them into separate rows to get a more detailed view of the genre distribution.

In [6]:
df['GenresSepareted'] = df['Genres'].str.split(',')

df_genres = df.explode('GenresSepareted')

df_genres['GenresSepareted'] = df_genres['GenresSepareted'].str.strip()

view = ['AppID', 'Name', 'Genres', 'GenresSepareted']

df_genres[view].to_csv(f'{work_dir}/data/Genres.csv', index=False, encoding='utf-8')

### Tags 

This code is intended to explore the tags assigned to games in the Steam dataset. Initially, we split the tags into individual lists and then expand them into separate rows to get a more detailed view of the distribution of tags.This analysis will allow us to better understand the tags associated with the games on the Steam platform.

In [None]:
df['TagsSepareted'] = df['Tags'].str.split(',')

df_tags = df.explode('TagsSepareted')

df_tags['TagsSepareted'] = df_tags['TagsSepareted'].str.strip()

view = ['AppID', 'Name', 'Tags', 'TagsSepareted']

df_tags[view].to_csv(f'{work_dir}/data/Tags.csv', index=False, encoding='utf-8')