# Steam Library Analyzer
### GitHub repository: [https://github.com/saulTejeda117/Steam-Data-Analyzer](https://github.com/saulTejeda117/Steam-Data-Analyzer)

Steam Library Analyzer es un projecto de ciencia de datos enfocado principalmente en el analisis predictivo de los habitos de juego de los usuarios de [_`Steam`_](https://store.steampowered.com). Su objetivo principal es determinar el tiempo estimado para completar todos los juegos de la biblioteca de un jugador. Para ello se ha hecho uso de fuentes de información tales como la WEB dedicada a videojuegos [_`How Long To Beat`_](https://howlongtobeat.com) y la [_`Steam API`_](https://steamcommunity.com/dev) que hacen posible acceder a información necesaria como:

- **Completion Rate:** Es la métrica que indica la proporción de juegos que un jugador ha completado en comparación con el total de juegos en su biblioteca.
  
- **Total games:** Hace referencia a la cantidad total de juegos que un usuario tiene actualmente en su biblioteca de juegos de Steam.
  
- **Perfect Games:** Se refiere aquellosjuegos cuyas metas y logros han sido alcanzados al 100%, según las estadísticas proporcionadas por Steam.

<img src="SteamAnalyzerCover2.jpg">


</img>

<hr>

In [1]:
import requests
import json
import time
import pandas as pd
from IPython import display
import matplotlib.pyplot as plt
import numpy as np
import re

## 1.1 Obtain User Steam Profile Data

El proceso de análisis se inicia mediante la obtención de la información esencial de la cuenta de usuario que se pretende evaluar. En este sentido, se procede a extraer los datos pertinentes del archivo _JSON_ denominado  _`"steam_credentials.json"`_, dicho archivo alberga información crucial, incluyendo:

- **Steam API key:** son identificadores únicos e irrepetibles proporcionados por Steam a desarrolladores y aplicaciones que desean acceder a la Steam API.
  
- **Steam ID:** se refiere a un identificador único utilizado para identificar de manera única a los usuarios y sus perfiles en la plataforma Steam.

In [2]:
# Load the steam credentials JSON file 
with open('steam_credentials.json') as json_file:
    credentials = json.load(json_file)

api_key = credentials.get('api_key')
steam_id = credentials.get('steam_id')

## 1.1 Obtain User Steam Profile Data

Posteriormente se realiza una consulta a la [_`Steam API`_](https://steamcommunity.com/dev) para obtener los datos de la cuenta de usuario al que pertecenen las credenciales ingresadas. Se comprueba la  respuesta de la petición.


In [3]:
# Obtener la URL de la información del jugador utilizando la API de Steam
player_info_url = f'http://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key={api_key}&steamids={steam_id}'
response = requests.get(player_info_url)

if (response != None):
    data = response.json()
    # print(data, "\n\n\n")
    print("Username: ", data['response']['players'][0]['personaname'])
    print("Avatar: ", data['response']['players'][0]['avatarfull'])
    print("Link: ", data['response']['players'][0]['profileurl'])

else:
    print("Something  went wrong!")


Username:  Grabma
Avatar:  https://avatars.steamstatic.com/af32b9e84f67edb7cdacc52177c5f8f05ce0fded_full.jpg
Link:  https://steamcommunity.com/id/saultejm/


<hr>

## 1.2 Obtain User Steam Library Data

Después de obtener los datos del usuario de la cuenta de Steam procedemos a obtener los datos de juego de su biblioteca, de los datos principales que nos interesan obtener en esta parte del proceso destacan principalmente

- **appid:** son identificadores únicos e irrepetibles proporcionados por Steam a desarrolladores y aplicaciones que desean acceder a la Steam API.
  
- **steam_id:** se refiere a un identificador único utilizado para identificar de manera única a los usuarios y sus perfiles en la plataforma Steam.


### 1.2.1 Obtain AppID and Playtime Data

Después de obtener los datos del usuario de la cuenta de Steam procedemos a obtener los datos de juego de su biblioteca, de los datos principales que nos interesan obtener en esta parte del proceso destacan principalmente

- **appid:** son identificadores únicos e irrepetibles proporcionados por Steam a desarrolladores y aplicaciones que desean acceder a la Steam API.
  
- **playtime_forever:** se refiere a un identificador único utilizado para identificar de manera única a los usuarios y sus perfiles en la plataforma Steam.

- **total_playtime:** hace referencia al tiempo total de juego acumulado por un usuario en todos los juegos de su biblioteca de Steam. Este dato puede ser esencial para comprender la dedicación y el nivel de participación de un jugador en su colección de juegos.

In [4]:
# Get data from my Steam library
games_endpoint = f"https://api.steampowered.com/IPlayerService/GetOwnedGames/v1/?key={api_key}&steamid={steam_id}"
response_games = requests.get(games_endpoint)
data_games = response_games.json()
df_games = pd.json_normalize(data_games['response']['games'])

df_games['game_name'] = None
df_games['achievement_percentage'] = None
df_games['achievement_completed'] = None
df_games['total_achievements'] = None
df_games['beat_time'] = None
df_games

Unnamed: 0,appid,playtime_forever,playtime_windows_forever,playtime_mac_forever,playtime_linux_forever,rtime_last_played,playtime_disconnected,playtime_2weeks,game_name,achievement_percentage,achievement_completed,total_achievements,beat_time
0,9050,142,142,0,0,1597370032,0,,,,,,
1,9070,0,0,0,0,0,0,,,,,,
2,208200,0,0,0,0,0,0,,,,,,
3,400,245,245,0,0,1594507407,0,,,,,,
4,20900,0,0,0,0,0,0,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
238,1144770,3,3,0,0,1698369396,0,,,,,,
239,544610,0,0,0,0,0,0,,,,,,
240,226620,0,0,0,0,0,0,,,,,,
241,43160,0,0,0,0,0,0,,,,,,


### 1.2.2 Obtain Games' names

Este paso es sumamente importante para el proceso de extracción de los datos, debido a que las fuentes de información de las cuales se extrae la información son diferentes y no es posible acceder a los datos de los juegos en [_`How Long To Beat`_](https://howlongtobeat.com) únicamente con el AppID de [_`Steam API`_](https://steamcommunity.com/dev), por lo que es necesario extraer el nombre de los juegos.

In [5]:
errors = 0
df_games['appid'] = df_games['appid'].astype(str)

for game in range(len(df_games)):
    appid = df_games.iloc[game]['appid']
    app_details_endpoint = f"https://store.steampowered.com/api/appdetails/?appids={appid}"
    response_app_details = requests.get(app_details_endpoint)
    
    if response_app_details.status_code == 200:
        data_app_details = response_app_details.json()
        
        try:
            game_name = data_app_details[str(appid)]['data']['name']
            df_games.loc[game, 'game_name'] = game_name   
            
        except:   
            errors += 1
            pass
        
    else:
        errors += 1
        pass
    time.sleep(1)
print(f"Process Completed. Errors {errors}")
df_games

Process Completed. Errors 6


Unnamed: 0,appid,playtime_forever,playtime_windows_forever,playtime_mac_forever,playtime_linux_forever,rtime_last_played,playtime_disconnected,playtime_2weeks,game_name,achievement_percentage,achievement_completed,total_achievements,beat_time
0,9050,142,142,0,0,1597370032,0,,DOOM 3,,,,
1,9070,0,0,0,0,0,0,,DOOM 3 Resurrection of Evil,,,,
2,208200,0,0,0,0,0,0,,DOOM 3,,,,
3,400,245,245,0,0,1594507407,0,,Portal,,,,
4,20900,0,0,0,0,0,0,,The Witcher: Enhanced Edition Director's Cut,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
238,1144770,3,3,0,0,1698369396,0,,SLUDGE LIFE,,,,
239,544610,0,0,0,0,0,0,,Battlestar Galactica Deadlock,,,,
240,226620,0,0,0,0,0,0,,Desktop Dungeons,,,,
241,43160,0,0,0,0,0,0,,,,,,


In [6]:
total_playtime  = df_games['playtime_forever'].sum()

indice_max_playtime = df_games['playtime_forever'].idxmax()
favorite_game_appid = df_games.loc[indice_max_playtime, 'game_name']

print("Total playtime: ",total_playtime, "minutes")
print("Favorite Game: ", df_games.loc[indice_max_playtime,'game_name'])

Total playtime:  204694 minutes
Favorite Game:  Brawlhalla


### 1.2.3 Obtain games archivements information


- **achievement_completion:** eEnestel contexto "achievement_progress" se refiere al progreso que un jugador ha alcanzado en desbloquear o completar logros dentro de un juego.

- **achv_percentage** se refiere al porcentaje de logros o hitos que un jugador ha desbloqueado o completado en un juego en comparación con el total de logros disponibles

- **achievement_completed:**

- **total_games:**.


In [7]:
total_games = 0
for game in range(len(df_games)):
    appid = df_games.iloc[game]['appid']
    url_achievements = f'http://api.steampowered.com/ISteamUserStats/GetPlayerAchievements/v0001/?key={api_key}&steamid={steam_id}&appid={appid}'
    response = requests.get(url_achievements)
    data_achievements = response.json()

    if(data_achievements['playerstats']['success'] != False):

        try:
            total_achievements = len(data_achievements['playerstats']['achievements'])
            num_achievements_1 = sum(1 for achievement in data_achievements['playerstats']['achievements'] if achievement['achieved'] == 1)
            num_achievements_0 = sum(1 for achievement in data_achievements['playerstats']['achievements'] if achievement['achieved'] == 0)

            achievement_percentage = (num_achievements_1*100)/total_achievements
        
            total_games += 1

            df_games.loc[game, 'total_achievements'] = total_achievements
            df_games.loc[game, 'achievement_completed'] = num_achievements_1
            df_games.loc[game, 'achievement_percentage'] = achievement_percentage
            
        except:
            pass
        
print("\n**********PROCESO TERMINADO************\n")
print("Total Games with Archivements:", total_games)

df_games


**********PROCESO TERMINADO************

Total Games with Archivements: 206


Unnamed: 0,appid,playtime_forever,playtime_windows_forever,playtime_mac_forever,playtime_linux_forever,rtime_last_played,playtime_disconnected,playtime_2weeks,game_name,achievement_percentage,achievement_completed,total_achievements,beat_time
0,9050,142,142,0,0,1597370032,0,,DOOM 3,,,,
1,9070,0,0,0,0,0,0,,DOOM 3 Resurrection of Evil,,,,
2,208200,0,0,0,0,0,0,,DOOM 3,0.0,0,65,
3,400,245,245,0,0,1594507407,0,,Portal,33.333333,5,15,
4,20900,0,0,0,0,0,0,,The Witcher: Enhanced Edition Director's Cut,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
238,1144770,3,3,0,0,1698369396,0,,SLUDGE LIFE,0.0,0,14,
239,544610,0,0,0,0,0,0,,Battlestar Galactica Deadlock,0.0,0,26,
240,226620,0,0,0,0,0,0,,Desktop Dungeons,0.0,0,35,
241,43160,0,0,0,0,0,0,,,0.0,0,70,


<hr>

## 1.3 Data sets: Get the howlongtobeat data

In [12]:
games = 0

for game in range(len(df_games)):
    game_name = df_games.iloc[game]['game_name']
    if(game_name != None):
        game_name1 = re.sub(r'[^a-zA-Z0-9\s\:\.\-\,]', '', game_name)
    
        beat_time_data = f"https://hltb-api.vercel.app/api?name={game_name1}"
        beat_time_response = requests.get(beat_time_data)
        
        if (beat_time_response.status_code == 200):
            beat_time_data = beat_time_response.json()
            
            try:
                if(beat_time_data[0]['gameplayCompletionist'] != 0):
                    
                    df_games.loc[game, 'beat_time'] = (beat_time_data[0]['gameplayCompletionist'])*60

                else:
                    df_games.loc[game, 'beat_time'] = (beat_time_data[0]['gameplayMain'])*60
                    
                games += 1
            except:
                print("ERROR:", game_name1)
                
print("\n**********PROCESO TERMINADO************\n")

ERROR: The Witcher: Enhanced Edition Directors Cut
ERROR: Dead Space 2008
ERROR: LEGO Star Wars - The Complete Saga
ERROR: The Witcher 2: Assassins of Kings Enhanced Edition
ERROR: Batman: Arkham Asylum Game of the Year Edition
ERROR: Warhammer 40,000: Dawn of War - Game of the Year Edition
ERROR: Tom Clancys Ghost Recon Phantoms - NA
ERROR: Warhammer 40,000: Dawn of War II: Retribution
ERROR: The Walking Dead: Season Two
ERROR: Godot Engine
ERROR: Aseprite
ERROR: Wallpaper Engine
ERROR: Driver Booster for Steam
ERROR: Between Two Castles - Digital Edition
ERROR: Kao the Kangaroo: Round 2 2003 re-release
ERROR: GameGuru Classic
ERROR: The Dream Machine: Chapter 1  2
ERROR: Warhammer Underworlds - Shadespire Edition

**********PROCESO TERMINADO************



In [21]:
# Muestra las primeras 100 filas del DataFrame
pd.options.display.max_rows = 100
df_games.head(100)

Unnamed: 0,appid,playtime_forever,playtime_windows_forever,playtime_mac_forever,playtime_linux_forever,rtime_last_played,playtime_disconnected,playtime_2weeks,game_name,achievement_percentage,achievement_completed,total_achievements,beat_time
0,9050,142,142,0,0,1597370032,0,,DOOM 3,,,,960.0
1,9070,0,0,0,0,0,0,,DOOM 3 Resurrection of Evil,,,,360.0
2,208200,0,0,0,0,0,0,,DOOM 3,0.0,0.0,65.0,960.0
3,400,245,245,0,0,1594507407,0,,Portal,33.333333,5.0,15.0,600.0
4,20900,0,0,0,0,0,0,,The Witcher: Enhanced Edition Director's Cut,,,,
5,13500,0,0,0,0,0,0,,Prince of Persia: Warrior Within™,,,,1020.0
6,13530,0,0,0,0,0,0,,Prince of Persia: The Two Thrones™,,,,720.0
7,13600,0,0,0,0,0,0,,Prince of Persia®: The Sands of Time,,,,600.0
8,19980,0,0,0,0,0,0,,Prince of Persia®,,,,1080.0
9,17470,37,37,0,0,1665022863,0,,Dead Space (2008),,,,
