Proceso de Analisis Exploratorio de Datos (EDA)


El Análisis Exploratorio de Datos (EDA) es crucial para desarrollar un sistema de recomendación de juegos en Steam. Este proceso permite:

*Comprender datos de juegos y usuarios.
*Identificar patrones y tendencias.
*Obtener información para construir el modelo de recomendación.

Pasos clave:

*Recolección y comprensión de datos: Recopilar y explorar conjuntos de datos relevantes.
*Análisis descriptivo: Describir y visualizar las características de las variables.
*Identificación de relaciones: Encontrar segmentos de usuarios y juegos, patrones de compra y similitud entre juegos.
*Descubrimiento de conocimiento: Resumir hallazgos y generar hipótesis.
*Comunicación de resultados: Documentar y comunicar insights a las partes interesadas.

El EDA es iterativo y debe adaptarse a las necesidades del proyecto.

---

**Sincronización con Drive**

---

In [None]:
from google.colab import drive
#drive.mount('/content/drive', force_remount=True)
carpeta_compartida_path = '/content/drive/MyDrive/CarpetaCompartida'

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


---

**Importar Librerías**

---

In [None]:
import pandas as pd
import numpy as np
import ast
import json
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

---

**Función para cargar los archivos JSON**

---

In [None]:
def load_json_lines(file_path):
    data = []
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            data.append(ast.literal_eval(line))
    return pd.DataFrame(data)

---

**EDA:** *australian_user_reviews*

---




In [None]:
#Carga y muestra el archivo
df_reviews = load_json_lines('/content/drive/MyDrive/P_I_1/dateset/australian_user_reviews.json')
df_reviews

Unnamed: 0,user_id,user_url,reviews
0,76561197970982479,http://steamcommunity.com/profiles/76561197970...,"[{'funny': '', 'posted': 'Posted November 5, 2..."
1,js41637,http://steamcommunity.com/id/js41637,"[{'funny': '', 'posted': 'Posted June 24, 2014..."
2,evcentric,http://steamcommunity.com/id/evcentric,"[{'funny': '', 'posted': 'Posted February 3.',..."
3,doctr,http://steamcommunity.com/id/doctr,"[{'funny': '', 'posted': 'Posted October 14, 2..."
4,maplemage,http://steamcommunity.com/id/maplemage,"[{'funny': '3 people found this review funny',..."
...,...,...,...
25794,76561198306599751,http://steamcommunity.com/profiles/76561198306...,"[{'funny': '', 'posted': 'Posted May 31.', 'la..."
25795,Ghoustik,http://steamcommunity.com/id/Ghoustik,"[{'funny': '', 'posted': 'Posted June 17.', 'l..."
25796,76561198310819422,http://steamcommunity.com/profiles/76561198310...,"[{'funny': '1 person found this review funny',..."
25797,76561198312638244,http://steamcommunity.com/profiles/76561198312...,"[{'funny': '', 'posted': 'Posted July 21.', 'l..."


In [None]:
# Explora columna user_id del DataFrame
df_reviews.info  # Se observa que la columna reviews es una columna anidada

<bound method DataFrame.info of                  user_id                                           user_url  \
0      76561197970982479  http://steamcommunity.com/profiles/76561197970...   
1                js41637               http://steamcommunity.com/id/js41637   
2              evcentric             http://steamcommunity.com/id/evcentric   
3                  doctr                 http://steamcommunity.com/id/doctr   
4              maplemage             http://steamcommunity.com/id/maplemage   
...                  ...                                                ...   
25794  76561198306599751  http://steamcommunity.com/profiles/76561198306...   
25795           Ghoustik              http://steamcommunity.com/id/Ghoustik   
25796  76561198310819422  http://steamcommunity.com/profiles/76561198310...   
25797  76561198312638244  http://steamcommunity.com/profiles/76561198312...   
25798        LydiaMorley           http://steamcommunity.com/id/LydiaMorley   

                   

In [None]:
# Extrae la columna reviews anidada (aplanado de columna)
data_reviews = pd.json_normalize(df_reviews.reviews)
data_reviews.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,"{'funny': '', 'posted': 'Posted November 5, 20...","{'funny': '', 'posted': 'Posted July 15, 2011....","{'funny': '', 'posted': 'Posted April 21, 2011...",,,,,,,
1,"{'funny': '', 'posted': 'Posted June 24, 2014....","{'funny': '', 'posted': 'Posted September 8, 2...","{'funny': '', 'posted': 'Posted November 29, 2...",,,,,,,
2,"{'funny': '', 'posted': 'Posted February 3.', ...","{'funny': '', 'posted': 'Posted December 4, 20...","{'funny': '', 'posted': 'Posted November 3, 20...","{'funny': '', 'posted': 'Posted October 15, 20...","{'funny': '', 'posted': 'Posted October 15, 20...","{'funny': '', 'posted': 'Posted October 15, 20...",,,,
3,"{'funny': '', 'posted': 'Posted October 14, 20...","{'funny': '', 'posted': 'Posted July 28, 2012....","{'funny': '', 'posted': 'Posted June 2, 2012.'...","{'funny': '', 'posted': 'Posted June 29, 2014....","{'funny': '', 'posted': 'Posted November 22, 2...","{'funny': '', 'posted': 'Posted February 23, 2...",,,,
4,"{'funny': '3 people found this review funny', ...","{'funny': '1 person found this review funny', ...","{'funny': '2 people found this review funny', ...","{'funny': '', 'posted': 'Posted July 11, 2013....",,,,,,


In [None]:
#Muestra los datos no nulos del dataFrame
data_reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25799 entries, 0 to 25798
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       25771 non-null  object
 1   1       12106 non-null  object
 2   2       7425 non-null   object
 3   3       4864 non-null   object
 4   4       3335 non-null   object
 5   5       2331 non-null   object
 6   6       1585 non-null   object
 7   7       1022 non-null   object
 8   8       597 non-null    object
 9   9       269 non-null    object
dtypes: object(10)
memory usage: 2.0+ MB


In [None]:
# Muestra el porcentajes de datos nulos por columnas
porcentajes_nulos = (data_reviews.isnull().mean() * 100).round(2)
porcentajes_nulos

0     0.11
1    53.08
2    71.22
3    81.15
4    87.07
5    90.96
6    93.86
7    96.04
8    97.69
9    98.96
dtype: float64

In [None]:
'''# Elimina todas las columnas con un total de datos nulos mayor a 85%
columnas_a_eliminar = porcentajes_nulos[porcentajes_nulos > 85].index
data_reviews_elim_col = data_reviews.drop(columns=columnas_a_eliminar)
data_reviews_elim_col'''

'# Elimina todas las columnas con un total de datos nulos mayor a 85%\ncolumnas_a_eliminar = porcentajes_nulos[porcentajes_nulos > 85].index\ndata_reviews_elim_col = data_reviews.drop(columns=columnas_a_eliminar)\ndata_reviews_elim_col'

In [None]:
# Explora columna 0 del DataFrame
data_reviews[0]  # Se observa que es una columna anidada

0        {'funny': '', 'posted': 'Posted November 5, 20...
1        {'funny': '', 'posted': 'Posted June 24, 2014....
2        {'funny': '', 'posted': 'Posted February 3.', ...
3        {'funny': '', 'posted': 'Posted October 14, 20...
4        {'funny': '3 people found this review funny', ...
                               ...                        
25794    {'funny': '', 'posted': 'Posted May 31.', 'las...
25795    {'funny': '', 'posted': 'Posted June 17.', 'la...
25796    {'funny': '1 person found this review funny', ...
25797    {'funny': '', 'posted': 'Posted July 21.', 'la...
25798    {'funny': '1 person found this review funny', ...
Name: 0, Length: 25799, dtype: object

In [None]:
# Extrae la columna 0 anidada (aplanado de columna)
data_reviews_col_0 = pd.json_normalize(data_reviews[0])
data_reviews_col_0.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,"Posted November 5, 2011.",,1250,No ratings yet,True,Simple yet with great replayability. In my opi...
1,,"Posted June 24, 2014.",,251610,15 of 20 people (75%) found this review helpful,True,I know what you think when you see this title ...
2,,Posted February 3.,,248820,No ratings yet,True,A suitably punishing roguelike platformer. Wi...
3,,"Posted October 14, 2013.",,250320,2 of 2 people (100%) found this review helpful,True,This game... is so fun. The fight sequences ha...
4,3 people found this review funny,"Posted April 15, 2014.",,211420,35 of 43 people (81%) found this review helpful,True,Git gud


In [None]:
# Explora columna 1 del DataFrame
data_reviews[1]  # Se observa que es una columna anidada

0        {'funny': '', 'posted': 'Posted July 15, 2011....
1        {'funny': '', 'posted': 'Posted September 8, 2...
2        {'funny': '', 'posted': 'Posted December 4, 20...
3        {'funny': '', 'posted': 'Posted July 28, 2012....
4        {'funny': '1 person found this review funny', ...
                               ...                        
25794                                                 None
25795                                                 None
25796                                                 None
25797    {'funny': '', 'posted': 'Posted July 10.', 'la...
25798    {'funny': '', 'posted': 'Posted July 20.', 'la...
Name: 1, Length: 25799, dtype: object

In [None]:
# Extrae la columna 1 anidada (aplanado de columna)
data_reviews_col_1 = pd.json_normalize(data_reviews[1])
data_reviews_col_1.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,"Posted July 15, 2011.",,22200,No ratings yet,True,It's unique and worth a playthrough.
1,,"Posted September 8, 2013.",,227300,0 of 1 people (0%) found this review helpful,True,For a simple (it's actually not all that simpl...
2,,"Posted December 4, 2015.","Last edited December 5, 2015.",370360,No ratings yet,True,"""Run for fun? What the hell kind of fun is that?"""
3,,"Posted July 28, 2012.",,20920,1 of 1 people (100%) found this review helpful,True,"Really Really Really Great Game, very good sto..."
4,1 person found this review funny,"Posted December 23, 2013.",,211820,12 of 16 people (75%) found this review helpful,True,"It's like Terraria, you play for 9 hours strai..."


In [None]:
# Explora columna 2 del DataFrame
data_reviews[2]  # Se observa que es una columna anidada

0        {'funny': '', 'posted': 'Posted April 21, 2011...
1        {'funny': '', 'posted': 'Posted November 29, 2...
2        {'funny': '', 'posted': 'Posted November 3, 20...
3        {'funny': '', 'posted': 'Posted June 2, 2012.'...
4        {'funny': '2 people found this review funny', ...
                               ...                        
25794                                                 None
25795                                                 None
25796                                                 None
25797    {'funny': '', 'posted': 'Posted July 10.', 'la...
25798    {'funny': '', 'posted': 'Posted July 2.', 'las...
Name: 2, Length: 25799, dtype: object

In [None]:
# Extrae la columna 2 anidada (aplanado de columna)
data_reviews_col_2 = pd.json_normalize(data_reviews[2])
data_reviews_col_2.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,"Posted April 21, 2011.",,43110,No ratings yet,True,Great atmosphere. The gunplay can be a bit chu...
1,,"Posted November 29, 2013.",,239030,1 of 4 people (25%) found this review helpful,True,Very fun little game to play when your bored o...
2,,"Posted November 3, 2014.",,237930,No ratings yet,True,"Elegant integration of gameplay, story, world ..."
3,,"Posted June 2, 2012.",,204100,1 of 1 people (100%) found this review helpful,True,"Just buy it already. Great Story, Great Multip..."
4,2 people found this review funny,"Posted March 14, 2014.",,730,5 of 5 people (100%) found this review helpful,True,"Hold shift to win, Hold CTRL to lose."


In [None]:
# Explora columna 3 del DataFrame
data_reviews[3]  # Se observa que es una columna anidada

0                                                     None
1                                                     None
2        {'funny': '', 'posted': 'Posted October 15, 20...
3        {'funny': '', 'posted': 'Posted June 29, 2014....
4        {'funny': '', 'posted': 'Posted July 11, 2013....
                               ...                        
25794                                                 None
25795                                                 None
25796                                                 None
25797    {'funny': '', 'posted': 'Posted July 8.', 'las...
25798                                                 None
Name: 3, Length: 25799, dtype: object

In [None]:
# Extrae la columna 3 anidada (aplanado de columna)
data_reviews_col_3 = pd.json_normalize(data_reviews[3])
data_reviews_col_3.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,,,,,,
1,,,,,,,
2,,"Posted October 15, 2014.",,263360.0,No ratings yet,True,"Random drops and random quests, with stat poin..."
3,,"Posted June 29, 2014.",,224600.0,1 of 2 people (50%) found this review helpful,True,"It was a great game from what I played, right ..."
4,,"Posted July 11, 2013.",,204300.0,No ratings yet,True,"OH YES, THIS GAME IS THE BEST, THEY ADD STUFF ..."


In [None]:
# Explora columna 4 del DataFrame
data_reviews[4]  # Se observa que es una columna anidada

0                                                     None
1                                                     None
2        {'funny': '', 'posted': 'Posted October 15, 20...
3        {'funny': '', 'posted': 'Posted November 22, 2...
4                                                     None
                               ...                        
25794                                                 None
25795                                                 None
25796                                                 None
25797                                                 None
25798                                                 None
Name: 4, Length: 25799, dtype: object

In [None]:
# Extrae la columna 4 anidada (aplanado de columna)
data_reviews_col_4 = pd.json_normalize(data_reviews[4])
data_reviews_col_4.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,,,,,,
1,,,,,,,
2,,"Posted October 15, 2014.",,107200.0,No ratings yet,True,Fun balance of tactics and strategy. Potentia...
3,,"Posted November 22, 2012.",,207610.0,No ratings yet,True,The ending to this game is.... ♥♥♥♥♥♥♥.... Jus...
4,,,,,,,


In [None]:
# Explora columna 5 del DataFrame
data_reviews[5]  # Se observa que es una columna anidada

0                                                     None
1                                                     None
2        {'funny': '', 'posted': 'Posted October 15, 20...
3        {'funny': '', 'posted': 'Posted February 23, 2...
4                                                     None
                               ...                        
25794                                                 None
25795                                                 None
25796                                                 None
25797                                                 None
25798                                                 None
Name: 5, Length: 25799, dtype: object

In [None]:
# Extrae la columna 5 anidada (aplanado de columna)
data_reviews_col_5 = pd.json_normalize(data_reviews[5])
data_reviews_col_5.head()

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,,,,,,
1,,,,,,,
2,,"Posted October 15, 2014.",,224500.0,No ratings yet,True,"Fun world builder, with plenty of option of ho..."
3,,"Posted February 23, 2012.",,108710.0,No ratings yet,True,"Alan wake is a really good game, the light eff..."
4,,,,,,,


In [None]:
df_reviews

Unnamed: 0,user_id,user_url,reviews
0,76561197970982479,http://steamcommunity.com/profiles/76561197970...,"[{'funny': '', 'posted': 'Posted November 5, 2..."
1,js41637,http://steamcommunity.com/id/js41637,"[{'funny': '', 'posted': 'Posted June 24, 2014..."
2,evcentric,http://steamcommunity.com/id/evcentric,"[{'funny': '', 'posted': 'Posted February 3.',..."
3,doctr,http://steamcommunity.com/id/doctr,"[{'funny': '', 'posted': 'Posted October 14, 2..."
4,maplemage,http://steamcommunity.com/id/maplemage,"[{'funny': '3 people found this review funny',..."
...,...,...,...
25794,76561198306599751,http://steamcommunity.com/profiles/76561198306...,"[{'funny': '', 'posted': 'Posted May 31.', 'la..."
25795,Ghoustik,http://steamcommunity.com/id/Ghoustik,"[{'funny': '', 'posted': 'Posted June 17.', 'l..."
25796,76561198310819422,http://steamcommunity.com/profiles/76561198310...,"[{'funny': '1 person found this review funny',..."
25797,76561198312638244,http://steamcommunity.com/profiles/76561198312...,"[{'funny': '', 'posted': 'Posted July 21.', 'l..."


In [None]:
# Elimina la columna reviews (anidada)
df_reviews.drop(columns='reviews', inplace=True)

In [None]:
# Une los DataFrames verticalmente
data_reviews_filas = pd.concat([data_reviews_col_0, data_reviews_col_1, data_reviews_col_2, data_reviews_col_3, data_reviews_col_4, data_reviews_col_5], ignore_index=True)
data_reviews_filas

Unnamed: 0,funny,posted,last_edited,item_id,helpful,recommend,review
0,,"Posted November 5, 2011.",,1250,No ratings yet,True,Simple yet with great replayability. In my opi...
1,,"Posted June 24, 2014.",,251610,15 of 20 people (75%) found this review helpful,True,I know what you think when you see this title ...
2,,Posted February 3.,,248820,No ratings yet,True,A suitably punishing roguelike platformer. Wi...
3,,"Posted October 14, 2013.",,250320,2 of 2 people (100%) found this review helpful,True,This game... is so fun. The fight sequences ha...
4,3 people found this review funny,"Posted April 15, 2014.",,211420,35 of 43 people (81%) found this review helpful,True,Git gud
...,...,...,...,...,...,...,...
154789,,,,,,,
154790,,,,,,,
154791,,,,,,,
154792,,,,,,,


In [None]:
 # Une los DataFrames verticalmente
data_reviews_final = pd.concat([df_reviews, data_reviews_filas],axis=1)
data_reviews_final

Unnamed: 0,user_id,user_url,funny,posted,last_edited,item_id,helpful,recommend,review
0,76561197970982479,http://steamcommunity.com/profiles/76561197970...,,"Posted November 5, 2011.",,1250,No ratings yet,True,Simple yet with great replayability. In my opi...
1,js41637,http://steamcommunity.com/id/js41637,,"Posted June 24, 2014.",,251610,15 of 20 people (75%) found this review helpful,True,I know what you think when you see this title ...
2,evcentric,http://steamcommunity.com/id/evcentric,,Posted February 3.,,248820,No ratings yet,True,A suitably punishing roguelike platformer. Wi...
3,doctr,http://steamcommunity.com/id/doctr,,"Posted October 14, 2013.",,250320,2 of 2 people (100%) found this review helpful,True,This game... is so fun. The fight sequences ha...
4,maplemage,http://steamcommunity.com/id/maplemage,3 people found this review funny,"Posted April 15, 2014.",,211420,35 of 43 people (81%) found this review helpful,True,Git gud
...,...,...,...,...,...,...,...,...,...
154789,,,,,,,,,
154790,,,,,,,,,
154791,,,,,,,,,
154792,,,,,,,,,


In [None]:
data_reviews_final.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154794 entries, 0 to 154793
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   user_id      25799 non-null  object
 1   user_url     25799 non-null  object
 2   funny        55832 non-null  object
 3   posted       55832 non-null  object
 4   last_edited  55832 non-null  object
 5   item_id      55832 non-null  object
 6   helpful      55832 non-null  object
 7   recommend    55832 non-null  object
 8   review       55832 non-null  object
dtypes: object(9)
memory usage: 10.6+ MB


In [None]:
# Muestra el porcentajes de datos nulos por columnas
porcentajes_nulos = (data_reviews_final.isnull().mean() * 100).round(2)
porcentajes_nulos

user_id        83.33
user_url       83.33
funny          63.93
posted         63.93
last_edited    63.93
item_id        63.93
helpful        63.93
recommend      63.93
review         63.93
dtype: float64

---

**EDA:** *output_steam_games*

---

In [None]:
#Carga y muestra el archivo
df_games = pd.read_json('/content/drive/MyDrive/P_I_1/dateset/output_steam_games.json', lines=True)
df_games

Unnamed: 0,publisher,genres,app_name,title,url,release_date,tags,reviews_url,specs,price,early_access,id,developer
0,,,,,,,,,,,,,
1,,,,,,,,,,,,,
2,,,,,,,,,,,,,
3,,,,,,,,,,,,,
4,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
120440,Ghost_RUS Games,"[Casual, Indie, Simulation, Strategy]",Colony On Mars,Colony On Mars,http://store.steampowered.com/app/773640/Colon...,2018-01-04,"[Strategy, Indie, Casual, Simulation]",http://steamcommunity.com/app/773640/reviews/?...,"[Single-player, Steam Achievements]",1.99,0.0,773640.0,"Nikita ""Ghost_RUS"""
120441,Sacada,"[Casual, Indie, Strategy]",LOGistICAL: South Africa,LOGistICAL: South Africa,http://store.steampowered.com/app/733530/LOGis...,2018-01-04,"[Strategy, Indie, Casual]",http://steamcommunity.com/app/733530/reviews/?...,"[Single-player, Steam Achievements, Steam Clou...",4.99,0.0,733530.0,Sacada
120442,Laush Studio,"[Indie, Racing, Simulation]",Russian Roads,Russian Roads,http://store.steampowered.com/app/610660/Russi...,2018-01-04,"[Indie, Simulation, Racing]",http://steamcommunity.com/app/610660/reviews/?...,"[Single-player, Steam Achievements, Steam Trad...",1.99,0.0,610660.0,Laush Dmitriy Sergeevich
120443,SIXNAILS,"[Casual, Indie]",EXIT 2 - Directions,EXIT 2 - Directions,http://store.steampowered.com/app/658870/EXIT_...,2017-09-02,"[Indie, Casual, Puzzle, Singleplayer, Atmosphe...",http://steamcommunity.com/app/658870/reviews/?...,"[Single-player, Steam Achievements, Steam Cloud]",4.99,0.0,658870.0,"xropi,stev3ns"


In [None]:
#Muestra los datos no nulos del dataFrame
df_games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120445 entries, 0 to 120444
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   publisher     24083 non-null  object 
 1   genres        28852 non-null  object 
 2   app_name      32133 non-null  object 
 3   title         30085 non-null  object 
 4   url           32135 non-null  object 
 5   release_date  30068 non-null  object 
 6   tags          31972 non-null  object 
 7   reviews_url   32133 non-null  object 
 8   specs         31465 non-null  object 
 9   price         30758 non-null  object 
 10  early_access  32135 non-null  float64
 11  id            32133 non-null  float64
 12  developer     28836 non-null  object 
dtypes: float64(2), object(11)
memory usage: 11.9+ MB


In [None]:
# Muestra el porcentajes de datos nulos por columnas
porcentajes_nulos = (df_games.isnull().mean() * 100).round(2)
porcentajes_nulos

publisher       80.00
genres          76.05
app_name        73.32
title           75.02
url             73.32
release_date    75.04
tags            73.46
reviews_url     73.32
specs           73.88
price           74.46
early_access    73.32
id              73.32
developer       76.06
dtype: float64

---

**EDA:** *australian_users_items*

---



In [None]:
#Carga y muestra el archivo
df_items = load_json_lines('/content/drive/MyDrive/P_I_1/dateset/australian_users_items.json')
df_items

Unnamed: 0,user_id,items_count,steam_id,user_url,items
0,76561197970982479,277,76561197970982479,http://steamcommunity.com/profiles/76561197970...,"[{'item_id': '10', 'item_name': 'Counter-Strik..."
1,js41637,888,76561198035864385,http://steamcommunity.com/id/js41637,"[{'item_id': '10', 'item_name': 'Counter-Strik..."
2,evcentric,137,76561198007712555,http://steamcommunity.com/id/evcentric,"[{'item_id': '1200', 'item_name': 'Red Orchest..."
3,Riot-Punch,328,76561197963445855,http://steamcommunity.com/id/Riot-Punch,"[{'item_id': '10', 'item_name': 'Counter-Strik..."
4,doctr,541,76561198002099482,http://steamcommunity.com/id/doctr,"[{'item_id': '300', 'item_name': 'Day of Defea..."
...,...,...,...,...,...
88305,76561198323066619,22,76561198323066619,http://steamcommunity.com/profiles/76561198323...,"[{'item_id': '413850', 'item_name': 'CS:GO Pla..."
88306,76561198326700687,177,76561198326700687,http://steamcommunity.com/profiles/76561198326...,"[{'item_id': '11020', 'item_name': 'TrackMania..."
88307,XxLaughingJackClown77xX,0,76561198328759259,http://steamcommunity.com/id/XxLaughingJackClo...,[]
88308,76561198329548331,7,76561198329548331,http://steamcommunity.com/profiles/76561198329...,"[{'item_id': '304930', 'item_name': 'Unturned'..."


In [None]:
#Muestra los datos no nulos del dataFrame
df_items.info()

In [None]:
# Muestra el porcentajes de datos nulos por columnas
porcentajes_nulos = (df_items.isnull().mean() * 100).round(2)
porcentajes_nulos

In [None]:
# Explora las columnas del DataFrame
df_items.head   # Se observa que la columna items es una columna anidada


In [None]:
# Inicializa una lista para almacenar los DataFrames normalizados
lista_dataframes = []

# Itera sobre las filas de la columna 'items'
for index, fila in df_items['items'].iteritems():
    # Normaliza la fila actual y agrega el DataFrame resultante a la lista
    df_normalizado = pd.json_normalize(fila)
    lista_dataframes.append(df_normalizado)

In [None]:
# Concatena todos los DataFrames normalizados en uno solo
data_items_col = pd.concat(lista_dataframes, ignore_index=True)
data_items_col

In [None]:
# Elimina la columna items anidada del DataFrame
df_items.drop(columns=['items'], inplace=True)

In [None]:
# Concatena el DataFrame original con la versión aplanada de la columna items
data_items_final = pd.concat([df_items, data_items_col], axis=1)
data_items_final

In [None]:
data_items_final.info()

In [None]:
# Muestra el porcentajes de datos nulos por columnas
porcentajes_nulos = (data_items_final.isnull().mean() * 100).round(2)
porcentajes_nulos

---

**Exportar CSVs**

---

In [None]:
# Especifica el nombre del archivo CSV y la ruta donde deseas guardarlo
nombre_archivo_csv = 'data_reviews_final.csv'
ruta_guardado = '/content/drive/MyDrive/P_I_1/CSVs/'

# Combina la ruta y el nombre del archivo
ruta_completa = ruta_guardado + nombre_archivo_csv

# Exporta el DataFrame a un archivo CSV
data_reviews_final.to_csv(ruta_completa, index=False)


# Imprime un mensaje de confirmación
print(f"DataFrame exportado exitosamente a: {ruta_completa}")

In [None]:
# Especifica el nombre del archivo CSV y la ruta donde deseas guardarlo
nombre_archivo_csv = 'data_games_final.csv'
ruta_guardado = '/content/drive/MyDrive/P_I_1/CSVs/'

# Combina la ruta y el nombre del archivo
ruta_completa = ruta_guardado + nombre_archivo_csv

# Exporta el DataFrame a un archivo CSV
df_games.to_csv(ruta_completa, index=False)


# Imprime un mensaje de confirmación
print(f"DataFrame exportado exitosamente a: {ruta_completa}")

In [None]:
# Especifica el nombre del archivo CSV y la ruta donde deseas guardarlo
nombre_archivo_csv = 'data_items_final.csv'
ruta_guardado = '/content/drive/MyDrive/P_I_1/CSVs/'

# Combina la ruta y el nombre del archivo
ruta_completa = ruta_guardado + nombre_archivo_csv

# Exporta el DataFrame a un archivo CSV
data_items_final.to_csv(ruta_completa, index=False)


# Imprime un mensaje de confirmación
print(f"DataFrame exportado exitosamente a: {ruta_completa}")