### **1. Proceso de _ETL_ de los datos del Proyecto Individual 01**
### **(_Extract, Transformation, Loading_)** 

<br> <br> <br> 

El presente _Jupyter notebook_ contiene el código en _Python_ para el desarrollo  de la extracción, transformación y carga del conjunto de datos del Proyecto Individual 01; el cual es un paso fundamental para los procesos siguientes de _EDA_ e implementación del modelo de Machine Learning ( _ML_)

<br> <br> <br> <br>

**1.1. Importación de las bibliotecas de _Python_ que serán usadas para el proceso de _ETL_ de los datos del proyecto**

In [1]:

import pandas as pd
import numpy as np
import json
import ast
import matplotlib

#----------------------------------------------------------------------------------------------------

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder

#----------------------------------------------------------------------------------------------------

%load_ext autoreload
import FuncionesDA

    # El archivo FuncionesDA.py contiene funciones que permiten simplificar el
    # proceso de ETL de los datos 
#----------------------------------------------------------------------------------------------------

<br> <br> <br>

**1.2. Proceso de _ETL_ para los archivos de datos del proyecto**



Extracción de los datos desde el archivo con extensión .json, conversión en un objeto tipo _Dataframe_ (manejo de la librería _Pandas_), observación de su contenido, transformación de las variables pertinentes para el proyecto y carga de los datos tranformados a un formato apropiado.

<br> <br>

**1.2.1.  Archivo `output_steam_games.json`**

In [2]:

filepath_games = "Data/Raw Data/output_steam_games.json" 
  #Ruta al conjunto de datos australian_user_reviews.json 
  #Se guarda en la variable filepath_games

rows_games = []                          
with open(filepath_games) as file:
    for line in file.readlines():
         Data_01 = json.loads(line)
         rows_games.append(Data_01)    
   #Lectura de cada una de las líneas del conjunto de datos en formato JSON
   #Cada línea se almacena en el objeto rows_games

Data_Games = pd.DataFrame(rows_games)
   #Creación de un Dataframe (Pandas) con las filas almacenadas en rows_games'''

Data_Games


Unnamed: 0,publisher,genres,app_name,title,url,release_date,tags,reviews_url,specs,price,early_access,id,developer
0,,,,,,,,,,,,,
1,,,,,,,,,,,,,
2,,,,,,,,,,,,,
3,,,,,,,,,,,,,
4,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
120440,Ghost_RUS Games,"[Casual, Indie, Simulation, Strategy]",Colony On Mars,Colony On Mars,http://store.steampowered.com/app/773640/Colon...,2018-01-04,"[Strategy, Indie, Casual, Simulation]",http://steamcommunity.com/app/773640/reviews/?...,"[Single-player, Steam Achievements]",1.99,False,773640,"Nikita ""Ghost_RUS"""
120441,Sacada,"[Casual, Indie, Strategy]",LOGistICAL: South Africa,LOGistICAL: South Africa,http://store.steampowered.com/app/733530/LOGis...,2018-01-04,"[Strategy, Indie, Casual]",http://steamcommunity.com/app/733530/reviews/?...,"[Single-player, Steam Achievements, Steam Clou...",4.99,False,733530,Sacada
120442,Laush Studio,"[Indie, Racing, Simulation]",Russian Roads,Russian Roads,http://store.steampowered.com/app/610660/Russi...,2018-01-04,"[Indie, Simulation, Racing]",http://steamcommunity.com/app/610660/reviews/?...,"[Single-player, Steam Achievements, Steam Trad...",1.99,False,610660,Laush Dmitriy Sergeevich
120443,SIXNAILS,"[Casual, Indie]",EXIT 2 - Directions,EXIT 2 - Directions,http://store.steampowered.com/app/658870/EXIT_...,2017-09-02,"[Indie, Casual, Puzzle, Singleplayer, Atmosphe...",http://steamcommunity.com/app/658870/reviews/?...,"[Single-player, Steam Achievements, Steam Cloud]",4.99,False,658870,"xropi,stev3ns"


In [3]:

print(f"En principio, el conjunto de datos contiene {Data_Games.shape[0]} filas y {Data_Games.shape[1]} variables")

En principio, el conjunto de datos contiene 120445 filas y 13 variables


 La observación preliminar del Dataframe _Data_Games_  muestra la existencia registros vacíos en algunas de las variables (si no en todas ellas). Se procede, por tanto, a la eliminación dichos registros.

In [4]:
Data_Games = Data_Games.dropna(how='all').reset_index(drop=True)

Data_Games

Unnamed: 0,publisher,genres,app_name,title,url,release_date,tags,reviews_url,specs,price,early_access,id,developer
0,Kotoshiro,"[Action, Casual, Indie, Simulation, Strategy]",Lost Summoner Kitty,Lost Summoner Kitty,http://store.steampowered.com/app/761140/Lost_...,2018-01-04,"[Strategy, Action, Indie, Casual, Simulation]",http://steamcommunity.com/app/761140/reviews/?...,[Single-player],4.99,False,761140,Kotoshiro
1,"Making Fun, Inc.","[Free to Play, Indie, RPG, Strategy]",Ironbound,Ironbound,http://store.steampowered.com/app/643980/Ironb...,2018-01-04,"[Free to Play, Strategy, Indie, RPG, Card Game...",http://steamcommunity.com/app/643980/reviews/?...,"[Single-player, Multi-player, Online Multi-Pla...",Free To Play,False,643980,Secret Level SRL
2,Poolians.com,"[Casual, Free to Play, Indie, Simulation, Sports]",Real Pool 3D - Poolians,Real Pool 3D - Poolians,http://store.steampowered.com/app/670290/Real_...,2017-07-24,"[Free to Play, Simulation, Sports, Casual, Ind...",http://steamcommunity.com/app/670290/reviews/?...,"[Single-player, Multi-player, Online Multi-Pla...",Free to Play,False,670290,Poolians.com
3,彼岸领域,"[Action, Adventure, Casual]",弹炸人2222,弹炸人2222,http://store.steampowered.com/app/767400/2222/,2017-12-07,"[Action, Adventure, Casual]",http://steamcommunity.com/app/767400/reviews/?...,[Single-player],0.99,False,767400,彼岸领域
4,,,Log Challenge,,http://store.steampowered.com/app/773570/Log_C...,,"[Action, Indie, Casual, Sports]",http://steamcommunity.com/app/773570/reviews/?...,"[Single-player, Full controller support, HTC V...",2.99,False,773570,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
32130,Ghost_RUS Games,"[Casual, Indie, Simulation, Strategy]",Colony On Mars,Colony On Mars,http://store.steampowered.com/app/773640/Colon...,2018-01-04,"[Strategy, Indie, Casual, Simulation]",http://steamcommunity.com/app/773640/reviews/?...,"[Single-player, Steam Achievements]",1.99,False,773640,"Nikita ""Ghost_RUS"""
32131,Sacada,"[Casual, Indie, Strategy]",LOGistICAL: South Africa,LOGistICAL: South Africa,http://store.steampowered.com/app/733530/LOGis...,2018-01-04,"[Strategy, Indie, Casual]",http://steamcommunity.com/app/733530/reviews/?...,"[Single-player, Steam Achievements, Steam Clou...",4.99,False,733530,Sacada
32132,Laush Studio,"[Indie, Racing, Simulation]",Russian Roads,Russian Roads,http://store.steampowered.com/app/610660/Russi...,2018-01-04,"[Indie, Simulation, Racing]",http://steamcommunity.com/app/610660/reviews/?...,"[Single-player, Steam Achievements, Steam Trad...",1.99,False,610660,Laush Dmitriy Sergeevich
32133,SIXNAILS,"[Casual, Indie]",EXIT 2 - Directions,EXIT 2 - Directions,http://store.steampowered.com/app/658870/EXIT_...,2017-09-02,"[Indie, Casual, Puzzle, Singleplayer, Atmosphe...",http://steamcommunity.com/app/658870/reviews/?...,"[Single-player, Steam Achievements, Steam Cloud]",4.99,False,658870,"xropi,stev3ns"


In [5]:
Data_Games.shape, Data_Games.info

((32135, 13),
 <bound method DataFrame.info of               publisher                                             genres  \
 0             Kotoshiro      [Action, Casual, Indie, Simulation, Strategy]   
 1      Making Fun, Inc.               [Free to Play, Indie, RPG, Strategy]   
 2          Poolians.com  [Casual, Free to Play, Indie, Simulation, Sports]   
 3                  彼岸领域                        [Action, Adventure, Casual]   
 4                   NaN                                                NaN   
 ...                 ...                                                ...   
 32130   Ghost_RUS Games              [Casual, Indie, Simulation, Strategy]   
 32131            Sacada                          [Casual, Indie, Strategy]   
 32132      Laush Studio                        [Indie, Racing, Simulation]   
 32133          SIXNAILS                                    [Casual, Indie]   
 32134               NaN                                                NaN   
 
    

In [6]:
FuncionesDA.Data_Type(Data_Games)

Unnamed: 0,Variable,Type,NaN,No_NaN,NaN_(%),No_NaN_(%)
0,publisher,"[<class 'str'>, <class 'float'>]",8052,24083,25.057,74.943
1,genres,"[<class 'list'>, <class 'float'>]",3283,28852,10.216,89.784
2,app_name,"[<class 'str'>, <class 'float'>]",2,32133,0.006,99.994
3,title,"[<class 'str'>, <class 'float'>]",2050,30085,6.379,93.621
4,url,[<class 'str'>],0,32135,0.0,100.0
5,release_date,"[<class 'str'>, <class 'float'>]",2067,30068,6.432,93.568
6,tags,"[<class 'list'>, <class 'float'>]",163,31972,0.507,99.493
7,reviews_url,"[<class 'str'>, <class 'float'>]",2,32133,0.006,99.994
8,specs,"[<class 'list'>, <class 'float'>]",670,31465,2.085,97.915
9,price,"[<class 'float'>, <class 'str'>]",1377,30758,4.285,95.715


In [7]:
Data_Games['publisher']

0               Kotoshiro
1        Making Fun, Inc.
2            Poolians.com
3                    彼岸领域
4                     NaN
               ...       
32130     Ghost_RUS Games
32131              Sacada
32132        Laush Studio
32133            SIXNAILS
32134                 NaN
Name: publisher, Length: 32135, dtype: object

In [8]:
Data_Games['price'].unique()

array([4.99, 'Free To Play', 'Free to Play', 0.99, 2.99, 3.99, 9.99,
       18.99, 29.99, nan, 'Free', 10.99, 1.59, 14.99, 1.99, 59.99, 8.99,
       6.99, 7.99, 39.99, 19.99, 7.49, 12.99, 5.99, 2.49, 15.99, 1.25,
       24.99, 17.99, 61.99, 3.49, 11.99, 13.99, 'Free Demo',
       'Play for Free!', 34.99, 74.76, 1.49, 32.99, 99.99, 14.95, 69.99,
       16.99, 79.99, 49.99, 5.0, 44.99, 13.98, 29.96, 119.99, 109.99,
       149.99, 771.71, 'Install Now', 21.99, 89.99,
       'Play WARMACHINE: Tactics Demo', 0.98, 139.92, 4.29, 64.99,
       'Free Mod', 54.99, 74.99, 'Install Theme', 0.89, 'Third-party',
       0.5, 'Play Now', 299.99, 1.29, 3.0, 15.0, 5.49, 23.99, 49.0, 20.99,
       10.93, 1.39, 'Free HITMAN™ Holiday Pack', 36.99, 4.49, 2.0, 4.0,
       9.0, 234.99, 1.95, 1.5, 199.0, 189.0, 6.66, 27.99, 10.49, 129.99,
       179.0, 26.99, 399.99, 31.99, 399.0, 20.0, 40.0, 3.33, 199.99,
       22.99, 320.0, 38.85, 71.7, 59.95, 995.0, 27.49, 3.39, 6.0, 19.95,
       499.99, 16.06, 4.68, 131

In [9]:
Data_Games['release_date']

0        2018-01-04
1        2018-01-04
2        2017-07-24
3        2017-12-07
4               NaN
            ...    
32130    2018-01-04
32131    2018-01-04
32132    2018-01-04
32133    2017-09-02
32134           NaN
Name: release_date, Length: 32135, dtype: object

In [10]:
Data_Games['early_access']

0        False
1        False
2        False
3        False
4        False
         ...  
32130    False
32131    False
32132    False
32133    False
32134     True
Name: early_access, Length: 32135, dtype: object

In [None]:
FuncionesDA.Duplicates(Data_Games, "id")

<br> <br> <br><br>
**1.2.2.  Archivo `australian_users_items.json`**

<br> <br> <br> <br>
**1.2.3.  Archivo `australian_users_reviews.json`**