# 03 - Visualizaciones e Insights: Steam Games

**Objetivo**: Generar visualizaciones significativas que revelen patrones, tendencias e insights clave del dataset de Steam Games.

**Input**: `data/processed/steam_games_features_YYYY-MM-DD.csv` o `steam_games_clean_YYYY-MM-DD.csv`  
**Output**: 10+ figuras exportadas a `reports/figures/`

## Visualizaciones a Generar

1. Distribucion de precios (histograma + KDE)
2. Top generos y categorias (bar chart)
3. Evolucion de lanzamientos por año (time series)
4. Distribucion de reviews positivas vs negativas
5. Correlacion entre variables numericas (heatmap)
6. Relacion precio vs reviews (scatter plot)
7. Comparativa de plataformas (Windows/Mac/Linux)
8. Top publishers/developers por volumen
9. Distribucion de metacritic scores
10. Boxplots por genero vs precio o reviews
11. Evolucion de precio promedio por año
12. Distribucion de juegos gratuitos vs de pago

---

## 0. Setup e Imports

In [None]:
import sys
import logging
from pathlib import Path

import numpy as np
import pandas as pd

# Backend Agg ANTES de importar pyplot (entorno headless/nbconvert)
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns
from matplotlib.patches import Patch

# ---------------------------------------------------------------------------
# Detección de la raíz del proyecto (centralizada en src/utils/paths)
# ---------------------------------------------------------------------------
try:
    from dotenv import load_dotenv
    load_dotenv()
except ImportError:
    pass  # python-dotenv es opcional

# Fallback: asegurar que src/ esté en sys.path antes de importar
for _candidate in [Path.cwd()] + list(Path.cwd().parents):
    if (_candidate / "src").is_dir() and (_candidate / "data").is_dir():
        sys.path.insert(0, str(_candidate))
        break

from src.utils.paths import get_project_root, get_data_dirs
from src.data.loader import load_data
from src.visualization.plots import set_style, plot_distribution, plot_correlation

PROJECT_ROOT = get_project_root()
dirs = get_data_dirs(PROJECT_ROOT)

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
logger = logging.getLogger(__name__)

# set_style() centraliza DPI, fuente y paleta — no sobreescribir después
set_style()

# Rutas
PROCESSED_DATA_DIR = dirs['processed']
FIGURES_DIR = dirs['figures']
TABLES_DIR = dirs['tables']

FIGURES_DIR.mkdir(parents=True, exist_ok=True)
TABLES_DIR.mkdir(parents=True, exist_ok=True)

print(f'PROJECT_ROOT detectado: {PROJECT_ROOT}')
print(f'Figuras se guardaran en: {FIGURES_DIR}')

## 1. Carga del Dataset

In [2]:
def find_latest_file(directory: Path, pattern: str) -> Path:
    """Encuentra el archivo mas reciente que coincide con el patron glob.
    
    Args:
        directory: Directorio donde buscar.
        pattern: Patron glob.
        
    Returns:
        Path al archivo mas reciente, o None.
    """
    files = sorted(directory.glob(pattern))
    return files[-1] if files else None


# Intentar cargar features, si no el dataset limpio
feat_file = find_latest_file(PROCESSED_DATA_DIR, 'steam_games_features_*.csv')
clean_file = find_latest_file(PROCESSED_DATA_DIR, 'steam_games_clean_*.csv')

if feat_file:
    logger.info(f'Cargando features: {feat_file}')
    df = load_data(str(feat_file))
elif clean_file:
    logger.info(f'Cargando dataset limpio: {clean_file}')
    df = load_data(str(clean_file))
else:
    logger.warning('No se encontraron archivos procesados. Generando datos de muestra...')
    
    # Datos de muestra representativos
    np.random.seed(2024)
    n = 2000
    
    genres = ['Action', 'Indie', 'RPG', 'Strategy', 'Adventure', 'Simulation',
              'Sports', 'Racing', 'Casual', 'Puzzle', 'Horror', 'Shooter']
    publishers = ['Valve', 'Ubisoft', 'EA', 'Activision', 'Bethesda', 'CD Projekt',
                  'Square Enix', '2K Games', 'Capcom', 'Bandai Namco'] + [f'Indie_{i}' for i in range(30)]
    
    years = np.random.choice(range(2000, 2025), n,
                              p=np.array([0.5, 0.5, 0.7, 0.8, 1.0, 1.2, 1.5, 2.0, 2.5, 3.0,
                                          4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 10.0, 9.0, 8.5,
                                          8.0, 7.5, 6.0, 4.0, 1.0]) / 115.5)
    
    prices_raw = np.random.choice(
        [0.0, 1.99, 4.99, 9.99, 14.99, 19.99, 24.99, 29.99, 39.99, 49.99, 59.99],
        n, p=[0.25, 0.03, 0.10, 0.18, 0.12, 0.12, 0.07, 0.06, 0.04, 0.02, 0.01]
    )
    
    reviews_base = np.random.exponential(500, n).astype(int)
    pos_ratio = np.clip(np.random.beta(8, 2, n), 0.1, 1.0)
    
    df = pd.DataFrame({
        'AppID': [str(i) for i in range(n)],
        'name': [f'Game_{i}' for i in range(n)],
        'price': prices_raw,
        'positive': (reviews_base * pos_ratio).astype(int),
        'negative': (reviews_base * (1 - pos_ratio)).astype(int),
        'total_reviews': reviews_base,
        'positive_ratio': pos_ratio,
        'metacritic_score': np.random.choice(
            list(range(40, 100)) + [0] * 50, n
        ),
        'peak_ccu': np.random.exponential(1000, n).astype(int),
        'average_playtime_forever': np.random.exponential(300, n).astype(int),
        'dlc_count': np.random.poisson(1.5, n),
        'achievements': np.random.poisson(25, n),
        'windows': np.random.choice([True, False], n, p=[0.95, 0.05]),
        'mac': np.random.choice([True, False], n, p=[0.25, 0.75]),
        'linux': np.random.choice([True, False], n, p=[0.20, 0.80]),
        'genre_primary': np.random.choice(genres, n,
                                           p=[0.18, 0.22, 0.10, 0.08, 0.09, 0.08,
                                              0.05, 0.04, 0.07, 0.04, 0.03, 0.02]),
        'release_year': years,
        'release_month': np.random.randint(1, 13, n),
        'developer_primary': np.random.choice(
            publishers + [f'SmallDev_{i}' for i in range(100)], n
        ),
        'publisher_primary': np.random.choice(publishers, n,
                                               p=[0.08, 0.07, 0.06, 0.05, 0.05, 0.04,
                                                  0.04, 0.04, 0.03, 0.03] + [0.51/40]*40),
        'n_languages': np.random.randint(1, 30, n),
        'n_platforms': np.random.choice([1, 2, 3], n, p=[0.6, 0.25, 0.15]),
        'is_free': prices_raw == 0.0,
        'has_multiplayer': np.random.choice([True, False], n, p=[0.35, 0.65]),
        'has_singleplayer': np.random.choice([True, False], n, p=[0.75, 0.25]),
        'game_age_years': (2025 - years).astype(float) + np.random.uniform(0, 1, n),
        'log_price': np.log1p(prices_raw),
        'log_total_reviews': np.log1p(reviews_base),
        'owners_midpoint': np.random.choice(
            [10000, 35000, 75000, 150000, 350000, 750000, 1500000], n
        ),
    })
    print(f'Dataset de muestra creado: {df.shape}')

# --- Conversion de columnas booleanas (CSV las carga como strings) ---
bool_cols = ['windows', 'mac', 'linux', 'is_free', 'has_multiplayer', 'has_singleplayer']
for col in bool_cols:
    if col in df.columns and df[col].dtype == object:
        df[col] = df[col].map({'True': True, 'False': False, True: True, False: False})

# Asegurar tipos correctos
if 'release_year' in df.columns:
    df['release_year'] = pd.to_numeric(df['release_year'], errors='coerce')
if 'metacritic_score' in df.columns:
    df['metacritic_score'] = pd.to_numeric(df['metacritic_score'], errors='coerce')
if 'total_reviews' not in df.columns and all(c in df.columns for c in ['positive', 'negative']):
    df['total_reviews'] = df['positive'] + df['negative']
if 'positive_ratio' not in df.columns and 'total_reviews' in df.columns:
    df['positive_ratio'] = np.where(df['total_reviews'] > 0, df['positive'] / df['total_reviews'], np.nan)
if 'is_free' not in df.columns and 'price' in df.columns:
    df['is_free'] = df['price'] == 0.0

print(f'Dataset listo: {df.shape}')
print(f'Columnas disponibles: {list(df.columns)}')


INFO: Cargando features: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\data\processed\steam_games_features_2026-02-12.csv


Dataset listo: (122610, 115)
Columnas disponibles: ['AppID', 'required_age', 'price', 'dlc_count', 'reviews', 'windows', 'mac', 'linux', 'metacritic_score', 'achievements', 'recommendations', 'notes', 'full_audio_languages', 'user_score', 'positive', 'negative', 'average_playtime_forever', 'average_playtime_2weeks', 'median_playtime_forever', 'median_playtime_2weeks', 'discount', 'peak_ccu', 'release_year', 'release_month', 'release_quarter', 'genre_primary', 'genre_count', 'has_multiplayer', 'has_coop', 'has_singleplayer', 'developer_primary', 'publisher_primary', 'n_languages', 'total_reviews', 'positive_ratio', 'log_total_reviews', 'review_label', 'game_age_years', 'era', 'is_recent', 'is_free', 'log_price', 'price_tier', 'price_per_hour', 'n_platforms', 'is_multiplatform', 'platform_combo', 'owners_midpoint', 'log_owners', 'popularity_score', 'review_label_encoded', 'era_encoded', 'genre_action', 'genre_adventure', 'genre_casual', 'genre_indie', 'genre_unknown', 'genre_simulation',

## Visualizacion 1: Distribucion de Precios (Histograma + KDE)

In [3]:
if 'price' in df.columns:
    price_paid = df[df['price'] > 0]['price']
    
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    fig.suptitle('Distribucion de Precios en Steam', fontsize=16, fontweight='bold', y=1.01)
    
    # Panel 1: todos los precios
    price_bins = [0, 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 100, 200]
    axes[0].hist(df['price'].clip(0, 200), bins=price_bins, edgecolor='white', alpha=0.85, color='#2196F3')
    axes[0].set_title('Todos los Precios (incluyendo gratuitos)', fontweight='bold')
    axes[0].set_xlabel('Precio (USD)')
    axes[0].set_ylabel('Numero de Juegos')
    axes[0].axvline(df['price'].median(), color='red', linestyle='--', linewidth=2,
                    label=f'Mediana: ${df["price"].median():.2f}')
    axes[0].axvline(df['price'].mean(), color='orange', linestyle='--', linewidth=2,
                    label=f'Media: ${df["price"].mean():.2f}')
    axes[0].legend()
    axes[0].xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:.0f}'))
    
    # Panel 2: solo juegos de pago con KDE
    if len(price_paid) > 1:
        sns.histplot(price_paid.clip(0, 65), bins=30, kde=True, ax=axes[1],
                     color='#4CAF50', alpha=0.7, line_kws={'linewidth': 2})
        axes[1].set_title(f'Juegos de Pago (n={len(price_paid):,}) con KDE', fontweight='bold')
        axes[1].set_xlabel('Precio (USD)')
        axes[1].set_ylabel('Densidad / Frecuencia')
        axes[1].axvline(price_paid.median(), color='red', linestyle='--', linewidth=2,
                         label=f'Mediana: ${price_paid.median():.2f}')
        axes[1].legend()
        axes[1].xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:.0f}'))
    
    # Estadisticas en texto
    pct_free = (df['price'] == 0).mean() * 100
    fig.text(0.5, -0.02,
             f'Juegos gratuitos: {pct_free:.1f}% | Precio max pagado: ${df["price"].max():.2f} | '
             f'P75 precio: ${df["price"].quantile(0.75):.2f}',
             ha='center', fontsize=10, style='italic')
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz1_distribucion_precios.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 1 guardada: {FIGURES_DIR / "viz1_distribucion_precios.png"}')


Figura 1 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz1_distribucion_precios.png


## Visualizacion 2: Top Generos (Bar Chart)

In [4]:
genre_col = 'genre_primary' if 'genre_primary' in df.columns else None

if genre_col:
    genre_counts = df[genre_col].value_counts().head(15)
    
    fig, ax = plt.subplots(figsize=(14, 7))
    
    palette = sns.color_palette('husl', len(genre_counts))
    bars = ax.barh(range(len(genre_counts)), genre_counts.values, color=palette, edgecolor='white')
    
    ax.set_yticks(range(len(genre_counts)))
    ax.set_yticklabels(genre_counts.index, fontsize=11)
    ax.set_xlabel('Numero de Juegos', fontsize=12)
    ax.set_title('Top 15 Generos Primarios en Steam', fontsize=14, fontweight='bold')
    
    # Etiquetas de valor y porcentaje
    total = len(df)
    for i, (bar, val) in enumerate(zip(bars, genre_counts.values)):
        pct = val / total * 100
        ax.text(val + total * 0.002, bar.get_y() + bar.get_height() / 2,
                f'{val:,} ({pct:.1f}%)', va='center', fontsize=9)
    
    ax.set_xlim(0, genre_counts.max() * 1.2)
    ax.invert_yaxis()
    ax.grid(axis='x', alpha=0.3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz2_top_generos.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 2 guardada: {FIGURES_DIR / "viz2_top_generos.png"}')


Figura 2 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz2_top_generos.png


## Visualizacion 3: Evolucion de Lanzamientos por Año

In [5]:
if 'release_year' in df.columns:
    yearly = df.dropna(subset=['release_year'])
    yearly = yearly[yearly['release_year'] >= 2000]
    launches_by_year = yearly['release_year'].value_counts().sort_index()
    
    fig, axes = plt.subplots(2, 1, figsize=(14, 10))
    fig.suptitle('Evolucion de Lanzamientos de Juegos en Steam', fontsize=14, fontweight='bold')
    
    # Panel 1: barras por año
    colors = ['#FF5722' if y >= 2020 else '#2196F3' if y >= 2010 else '#9E9E9E'
              for y in launches_by_year.index]
    bars = axes[0].bar(launches_by_year.index, launches_by_year.values, color=colors, edgecolor='white')
    axes[0].set_ylabel('Numero de Lanzamientos')
    axes[0].set_title('Lanzamientos por Año', fontweight='bold')
    axes[0].xaxis.set_major_locator(mticker.MultipleLocator(2))
    plt.setp(axes[0].get_xticklabels(), rotation=45)
    axes[0].grid(axis='y', alpha=0.3)
    
    # Leyenda de colores por era
    legend_elements = [
        Patch(facecolor='#9E9E9E', label='Pre-2010'),
        Patch(facecolor='#2196F3', label='2010-2019'),
        Patch(facecolor='#FF5722', label='2020+')
    ]
    axes[0].legend(handles=legend_elements, loc='upper left')
    
    # Panel 2: acumulado
    cumulative = launches_by_year.cumsum()
    axes[1].fill_between(cumulative.index, cumulative.values, alpha=0.4, color='#4CAF50')
    axes[1].plot(cumulative.index, cumulative.values, color='#388E3C', linewidth=2)
    axes[1].set_xlabel('Año')
    axes[1].set_ylabel('Total Acumulado')
    axes[1].set_title('Catalogo Acumulado de Steam por Año', fontweight='bold')
    axes[1].xaxis.set_major_locator(mticker.MultipleLocator(2))
    plt.setp(axes[1].get_xticklabels(), rotation=45)
    axes[1].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:,.0f}'))
    axes[1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz3_lanzamientos_por_anio.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 3 guardada: {FIGURES_DIR / "viz3_lanzamientos_por_anio.png"}')


Figura 3 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz3_lanzamientos_por_anio.png


## Visualizacion 4: Reviews Positivas vs Negativas

In [6]:
if all(c in df.columns for c in ['positive', 'negative', 'positive_ratio']):
    df_reviews = df[df['total_reviews'] >= 10].copy() if 'total_reviews' in df.columns else df.copy()
    
    fig, axes = plt.subplots(1, 3, figsize=(18, 6))
    fig.suptitle('Analisis de Reviews en Steam', fontsize=14, fontweight='bold')
    
    # Panel 1: Histograma del ratio de positivas
    ratio_data = df_reviews['positive_ratio'].dropna()
    sns.histplot(ratio_data, bins=40, kde=True, ax=axes[0], color='#4CAF50', alpha=0.7)
    axes[0].axvline(ratio_data.median(), color='red', linestyle='--', linewidth=2,
                    label=f'Mediana: {ratio_data.median():.2%}')
    axes[0].set_title('Distribucion del Ratio de Reviews Positivas', fontweight='bold')
    axes[0].set_xlabel('Proporcion de Reviews Positivas')
    axes[0].xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:.0%}'))
    axes[0].legend()
    
    # Panel 2: Scatter reviews positivas vs negativas (log scale)
    sample_size = min(1000, len(df_reviews))
    sample = df_reviews.sample(sample_size, random_state=42)
    scatter = axes[1].scatter(
        np.log1p(sample['positive']),
        np.log1p(sample['negative']),
        c=sample['positive_ratio'],
        cmap='RdYlGn', alpha=0.5, s=15
    )
    plt.colorbar(scatter, ax=axes[1], label='Ratio Positivas')
    axes[1].set_xlabel('log(Reviews Positivas + 1)')
    axes[1].set_ylabel('log(Reviews Negativas + 1)')
    axes[1].set_title('Reviews Positivas vs Negativas (escala log)', fontweight='bold')
    
    # Panel 3: Distribucion por categoria de review
    review_cats = [
        ('Overwhelmingly Pos. (>=95%)', ratio_data >= 0.95),
        ('Very Positive (85-95%)', (ratio_data >= 0.85) & (ratio_data < 0.95)),
        ('Mostly Positive (80-85%)', (ratio_data >= 0.80) & (ratio_data < 0.85)),
        ('Positive (70-80%)', (ratio_data >= 0.70) & (ratio_data < 0.80)),
        ('Mixed (40-70%)', (ratio_data >= 0.40) & (ratio_data < 0.70)),
        ('Negative (<40%)', ratio_data < 0.40),
    ]
    cat_labels = [r[0] for r in review_cats]
    cat_counts = [r[1].sum() for r in review_cats]
    cat_colors = ['#1B5E20', '#388E3C', '#66BB6A', '#AED581', '#FFF176', '#EF5350']
    
    wedges, texts, autotexts = axes[2].pie(
        cat_counts, labels=None, colors=cat_colors,
        autopct=lambda pct: f'{pct:.1f}%' if pct > 3 else '',
        startangle=90, pctdistance=0.75
    )
    axes[2].legend(wedges, [f'{l} ({c:,})' for l, c in zip(cat_labels, cat_counts)],
                   loc='center left', bbox_to_anchor=(1, 0.5), fontsize=8)
    axes[2].set_title('Distribucion por Categoria de Review', fontweight='bold')
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz4_distribucion_reviews.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 4 guardada: {FIGURES_DIR / "viz4_distribucion_reviews.png"}')


Figura 4 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz4_distribucion_reviews.png


## Visualizacion 5: Matriz de Correlacion

In [7]:
numeric_cols = ['price', 'positive', 'negative', 'metacritic_score',
                'average_playtime_forever', 'peak_ccu', 'achievements',
                'dlc_count', 'n_languages', 'game_age_years']

existing_num = [c for c in numeric_cols if c in df.columns]

if len(existing_num) >= 4:
    corr_data = df[existing_num].dropna()
    corr_matrix = corr_data.corr()
    
    fig, ax = plt.subplots(figsize=(12, 10))
    
    mask = np.triu(np.ones_like(corr_matrix, dtype=bool), k=1)
    
    sns.heatmap(
        corr_matrix,
        annot=True,
        fmt='.2f',
        cmap='coolwarm',
        center=0,
        vmin=-1, vmax=1,
        square=True,
        linewidths=0.5,
        ax=ax,
        mask=mask,
        annot_kws={'size': 9}
    )
    
    ax.set_title('Matriz de Correlacion - Variables Numericas del Dataset Steam\n(triangulo inferior)',
                 fontsize=13, fontweight='bold')
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz5_correlacion_numericas.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    
    # Correlaciones mas altas
    print('\nCorrelaciones mas altas (abs > 0.3):')
    corr_pairs = corr_matrix.unstack().reset_index()
    corr_pairs.columns = ['var1', 'var2', 'corr']
    corr_pairs = corr_pairs[corr_pairs['var1'] < corr_pairs['var2']]
    corr_pairs = corr_pairs[corr_pairs['corr'].abs() > 0.3].sort_values('corr', ascending=False, key=abs)
    print(corr_pairs.to_string(index=False))
    print(f'\nFigura 5 guardada: {FIGURES_DIR / "viz5_correlacion_numericas.png"}')



Correlaciones mas altas (abs > 0.3):
    var1     var2     corr
peak_ccu positive 0.846709
negative peak_ccu 0.830596
negative positive 0.818396

Figura 5 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz5_correlacion_numericas.png


## Visualizacion 6: Precio vs Reviews (Scatter Plot)

In [8]:
if all(c in df.columns for c in ['price', 'total_reviews', 'genre_primary']):
    df_scatter = df[(df['price'] > 0) & (df['total_reviews'] > 10)].copy()
    df_scatter['log_total_reviews'] = np.log1p(df_scatter['total_reviews'])
    df_scatter['log_price'] = np.log1p(df_scatter['price'])
    
    sample_size = min(1500, len(df_scatter))
    sample = df_scatter.sample(sample_size, random_state=42)
    
    top_genres = sample['genre_primary'].value_counts().head(8).index
    sample = sample.copy()
    sample['genre_plot'] = sample['genre_primary'].where(
        sample['genre_primary'].isin(top_genres), 'Otros'
    )
    
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    fig.suptitle('Relacion Precio vs Reviews en Steam', fontsize=14, fontweight='bold')
    
    # Panel 1: scatter basico
    axes[0].scatter(
        sample['log_price'], sample['log_total_reviews'],
        alpha=0.3, s=10, color='#2196F3'
    )
    # Linea de tendencia
    x_clean = sample['log_price'].dropna()
    y_clean = sample['log_total_reviews'][x_clean.index]
    coefs = np.polyfit(x_clean, y_clean, 1)
    x_line = np.linspace(x_clean.min(), x_clean.max(), 100)
    axes[0].plot(x_line, np.polyval(coefs, x_line), 'r-', linewidth=2, label=f'Tendencia (slope={coefs[0]:.2f})')
    axes[0].set_xlabel('log(Precio + 1)')
    axes[0].set_ylabel('log(Total Reviews + 1)')
    axes[0].set_title('Precio vs Volumen de Reviews', fontweight='bold')
    axes[0].legend()
    
    # Panel 2: por genero
    genre_palette = dict(zip(
        list(top_genres) + ['Otros'],
        sns.color_palette('husl', len(top_genres) + 1)
    ))
    
    for genre in list(top_genres) + ['Otros']:
        mask = sample['genre_plot'] == genre
        if mask.sum() > 0:
            axes[1].scatter(
                sample.loc[mask, 'log_price'],
                sample.loc[mask, 'log_total_reviews'],
                alpha=0.5, s=15, label=genre, color=genre_palette[genre]
            )
    
    axes[1].set_xlabel('log(Precio + 1)')
    axes[1].set_ylabel('log(Total Reviews + 1)')
    axes[1].set_title('Precio vs Reviews por Genero', fontweight='bold')
    axes[1].legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz6_precio_vs_reviews.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 6 guardada: {FIGURES_DIR / "viz6_precio_vs_reviews.png"}')


Figura 6 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz6_precio_vs_reviews.png


## Visualizacion 7: Comparativa de Plataformas

In [None]:
plat_cols = [c for c in ['windows', 'mac', 'linux'] if c in df.columns]

if plat_cols:
    fig, axes = plt.subplots(1, 3, figsize=(18, 6))
    fig.suptitle('Analisis de Soporte por Plataforma', fontsize=14, fontweight='bold')
    
    # Panel 1: Pie chart de soporte
    plat_support = {p.capitalize(): df[p].astype(bool).sum() for p in plat_cols}
    colors_plat = ['#1565C0', '#C62828', '#2E7D32']
    wedges, texts, autotexts = axes[0].pie(
        plat_support.values(),
        labels=None,
        colors=colors_plat[:len(plat_cols)],
        autopct='%1.1f%%',
        startangle=90,
        pctdistance=0.7
    )
    axes[0].legend(wedges, [f'{k}: {v:,}' for k, v in plat_support.items()],
                   loc='lower center', bbox_to_anchor=(0.5, -0.15))
    axes[0].set_title('Juegos por Plataforma Soportada', fontweight='bold')
    
    # Panel 2: Reviews promedio por plataforma
    if 'positive_ratio' in df.columns:
        plat_review_data = []
        for plat in plat_cols:
            supported = df[df[plat].astype(bool)]['positive_ratio'].dropna()
            plat_review_data.append(supported)
        
        bp = axes[1].boxplot(plat_review_data, tick_labels=[p.capitalize() for p in plat_cols],
                              patch_artist=True, notch=True)
        for patch, color in zip(bp['boxes'], colors_plat[:len(plat_cols)]):
            patch.set_facecolor(color)
            patch.set_alpha(0.7)
        axes[1].set_ylabel('Ratio de Reviews Positivas')
        axes[1].set_title('Reviews Positivas por Plataforma', fontweight='bold')
        axes[1].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:.0%}'))
        axes[1].grid(axis='y', alpha=0.3)
    
    # Panel 3: Precio promedio por plataforma
    if 'price' in df.columns:
        plat_prices = []
        plat_labels_price = []
        for plat in plat_cols:
            prices = df[df[plat].astype(bool)]['price']
            prices = prices[prices > 0]
            plat_prices.append(prices.median())
            plat_labels_price.append(f'{plat.capitalize()}\n(med: ${prices.median():.2f})')
        
        bars = axes[2].bar(range(len(plat_cols)), plat_prices,
                            color=colors_plat[:len(plat_cols)], edgecolor='white', alpha=0.8)
        axes[2].set_xticks(range(len(plat_cols)))
        axes[2].set_xticklabels(plat_labels_price)
        axes[2].set_ylabel('Precio Mediano (USD)')
        axes[2].set_title('Precio Mediano por Plataforma (juegos de pago)', fontweight='bold')
        axes[2].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:.2f}'))
        axes[2].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz7_comparativa_plataformas.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 7 guardada: {FIGURES_DIR / "viz7_comparativa_plataformas.png"}')

## Visualizacion 8: Top Publishers y Developers

In [10]:
fig, axes = plt.subplots(1, 2, figsize=(16, 8))
fig.suptitle('Top Publishers y Developers por Volumen de Juegos', fontsize=14, fontweight='bold')

for i, (col, title) in enumerate([('publisher_primary', 'Top 15 Publishers'),
                                     ('developer_primary', 'Top 15 Developers')]):
    if col in df.columns:
        top_entities = df[col].value_counts().head(15)
        # Filtrar entidades desconocidas o vacias
        top_entities = top_entities[~top_entities.index.isin(['', 'Unknown', 'unknown'])]
        top_entities = top_entities.head(12)
        
        palette = sns.color_palette('husl', len(top_entities))
        axes[i].barh(range(len(top_entities)), top_entities.values,
                     color=palette, edgecolor='white')
        axes[i].set_yticks(range(len(top_entities)))
        axes[i].set_yticklabels(
            [name[:30] + '...' if len(str(name)) > 30 else str(name) for name in top_entities.index],
            fontsize=9
        )
        axes[i].set_xlabel('Numero de Juegos')
        axes[i].set_title(title, fontweight='bold')
        axes[i].invert_yaxis()
        
        # Etiquetas
        for j, val in enumerate(top_entities.values):
            axes[i].text(val + top_entities.max() * 0.01, j, f'{val:,}',
                         va='center', fontsize=8)
        
        axes[i].set_xlim(0, top_entities.max() * 1.15)
        axes[i].grid(axis='x', alpha=0.3)
        axes[i].spines['top'].set_visible(False)
        axes[i].spines['right'].set_visible(False)

plt.tight_layout()
fig.savefig(FIGURES_DIR / 'viz8_top_publishers_developers.png', dpi=150, bbox_inches='tight')
plt.close(fig)
print(f'Figura 8 guardada: {FIGURES_DIR / "viz8_top_publishers_developers.png"}')


Figura 8 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz8_top_publishers_developers.png


## Visualizacion 9: Distribucion de Metacritic Scores

In [11]:
if 'metacritic_score' in df.columns:
    mc_data = df[df['metacritic_score'] > 0]['metacritic_score']
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    fig.suptitle(f'Distribucion de Metacritic Scores\n(n={len(mc_data):,} juegos con score > 0)',
                 fontsize=13, fontweight='bold')
    
    # Panel 1: Histograma con KDE
    sns.histplot(mc_data, bins=30, kde=True, ax=axes[0], color='#9C27B0', alpha=0.7,
                 line_kws={'linewidth': 2})
    axes[0].axvline(mc_data.mean(), color='red', linestyle='--', linewidth=2,
                     label=f'Media: {mc_data.mean():.1f}')
    axes[0].axvline(mc_data.median(), color='orange', linestyle='--', linewidth=2,
                     label=f'Mediana: {mc_data.median():.1f}')
    axes[0].axvline(75, color='green', linestyle=':', linewidth=2, alpha=0.7,
                     label='Nota 75 (referencia)')
    axes[0].set_xlabel('Metacritic Score')
    axes[0].set_ylabel('Frecuencia')
    axes[0].set_title('Histograma de Scores', fontweight='bold')
    axes[0].legend()
    
    # Panel 2: Barras por rango de scores
    score_ranges = pd.cut(mc_data, bins=[0, 50, 60, 70, 80, 90, 101],
                           labels=['0-50 (Flop)', '51-60 (Malo)', '61-70 (Regular)',
                                   '71-80 (Bueno)', '81-90 (Muy Bueno)', '91-100 (Excepcional)'])
    score_counts = score_ranges.value_counts(sort=False)
    
    range_colors = ['#F44336', '#FF9800', '#FFC107', '#8BC34A', '#4CAF50', '#2196F3']
    bars = axes[1].bar(range(len(score_counts)), score_counts.values,
                        color=range_colors, edgecolor='white', alpha=0.85)
    axes[1].set_xticks(range(len(score_counts)))
    axes[1].set_xticklabels(score_counts.index, rotation=35, ha='right', fontsize=9)
    axes[1].set_ylabel('Numero de Juegos')
    axes[1].set_title('Juegos por Rango de Metacritic Score', fontweight='bold')
    
    for bar, val in zip(bars, score_counts.values):
        pct = val / len(mc_data) * 100
        axes[1].text(bar.get_x() + bar.get_width() / 2, bar.get_height() + score_counts.max() * 0.01,
                     f'{val:,}\n({pct:.1f}%)', ha='center', va='bottom', fontsize=8)
    
    axes[1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz9_metacritic_scores.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    
    pct_with_score = len(mc_data) / len(df) * 100
    print(f'Figura 9 guardada: {FIGURES_DIR / "viz9_metacritic_scores.png"}')
    print(f'\nInsight: Solo el {pct_with_score:.1f}% de juegos en Steam tiene Metacritic score.')
    print(f'Score promedio: {mc_data.mean():.1f} | Mediana: {mc_data.median():.1f}')


Figura 9 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz9_metacritic_scores.png

Insight: Solo el 3.5% de juegos en Steam tiene Metacritic score.
Score promedio: 73.9 | Mediana: 76.0


## Visualizacion 10: Boxplots por Genero vs Precio y Reviews

In [None]:
if all(c in df.columns for c in ['genre_primary', 'price']):
    top_genres_box = df['genre_primary'].value_counts().head(10).index
    df_box = df[df['genre_primary'].isin(top_genres_box)].copy()
    
    fig, axes = plt.subplots(2, 1, figsize=(14, 12))
    fig.suptitle('Distribucion de Precio y Reviews por Genero (Top 10)', fontsize=14, fontweight='bold')
    
    # Ordenar por precio mediano
    order_price = (df_box.groupby('genre_primary')['price']
                   .median()
                   .sort_values(ascending=False)
                   .index.tolist())
    
    # Panel 1: precio por genero
    df_paid = df_box[df_box['price'] > 0]
    if len(df_paid) > 0:
        sns.boxplot(
            data=df_paid,
            x='genre_primary',
            y='price',
            hue='genre_primary',
            order=order_price,
            palette='husl',
            legend=False,
            ax=axes[0],
            showfliers=False,
            notch=True
        )
        axes[0].set_xlabel('')
        axes[0].set_ylabel('Precio (USD)')
        axes[0].set_title('Precio por Genero (juegos de pago, sin outliers extremos)', fontweight='bold')
        axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:.0f}'))
        axes[0].tick_params(axis='x', rotation=30)
        axes[0].grid(axis='y', alpha=0.3)
    
    # Panel 2: reviews positivas por genero (violin)
    if 'positive_ratio' in df.columns:
        if 'total_reviews' in df_box.columns:
            df_box_reviews = df_box[df_box['total_reviews'] >= 10].copy()
        else:
            df_box_reviews = df_box.copy()
        
        order_reviews = (df_box_reviews.groupby('genre_primary')['positive_ratio']
                         .median()
                         .sort_values(ascending=False)
                         .index.tolist())
        
        sns.violinplot(
            data=df_box_reviews,
            x='genre_primary',
            y='positive_ratio',
            hue='genre_primary',
            order=order_reviews,
            palette='husl',
            legend=False,
            ax=axes[1],
            inner='quartile',
            cut=0
        )
        axes[1].set_xlabel('Genero')
        axes[1].set_ylabel('Ratio de Reviews Positivas')
        axes[1].set_title('Distribucion de Reviews Positivas por Genero (violin plot)', fontweight='bold')
        axes[1].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:.0%}'))
        axes[1].tick_params(axis='x', rotation=30)
        axes[1].grid(axis='y', alpha=0.3)
        axes[1].axhline(0.7, color='red', linestyle='--', alpha=0.5, label='70% (Positive threshold)')
        axes[1].legend()
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz10_boxplots_genero.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 10 guardada: {FIGURES_DIR / "viz10_boxplots_genero.png"}')

## Visualizacion 11: Evolucion del Precio Promedio por Año

In [13]:
if all(c in df.columns for c in ['release_year', 'price']):
    yearly_price = df[
        (df['release_year'] >= 2005) & (df['price'] > 0)
    ].groupby('release_year').agg(
        price_median=('price', 'median'),
        price_mean=('price', 'mean'),
        price_q25=('price', lambda x: x.quantile(0.25)),
        price_q75=('price', lambda x: x.quantile(0.75)),
        n_games=('price', 'count')
    ).reset_index()
    
    fig, axes = plt.subplots(2, 1, figsize=(14, 9))
    fig.suptitle('Evolucion del Precio de Juegos de Pago en Steam', fontsize=14, fontweight='bold')
    
    # Panel 1: mediana y banda intercuartilica
    axes[0].fill_between(
        yearly_price['release_year'],
        yearly_price['price_q25'],
        yearly_price['price_q75'],
        alpha=0.3, color='#2196F3', label='Rango IQR (Q25-Q75)'
    )
    axes[0].plot(yearly_price['release_year'], yearly_price['price_median'],
                  color='#1565C0', linewidth=2.5, marker='o', markersize=4, label='Mediana')
    axes[0].plot(yearly_price['release_year'], yearly_price['price_mean'],
                  color='#FF5722', linewidth=1.5, linestyle='--', label='Media')
    axes[0].set_ylabel('Precio (USD)')
    axes[0].set_title('Precio Mediano y Media por Año de Lanzamiento', fontweight='bold')
    axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'${x:.0f}'))
    axes[0].legend()
    axes[0].grid(alpha=0.3)
    
    # Panel 2: numero de juegos de pago por año
    axes[1].bar(yearly_price['release_year'], yearly_price['n_games'],
                 color='#4CAF50', alpha=0.7, edgecolor='white')
    axes[1].set_xlabel('Año de Lanzamiento')
    axes[1].set_ylabel('Juegos de Pago Lanzados')
    axes[1].set_title('Numero de Juegos de Pago por Año', fontweight='bold')
    axes[1].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:,.0f}'))
    axes[1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz11_evolucion_precio_anio.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 11 guardada: {FIGURES_DIR / "viz11_evolucion_precio_anio.png"}')


Figura 11 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz11_evolucion_precio_anio.png


## Visualizacion 12: Free vs Paid - Analisis Comparativo

In [14]:
if 'is_free' in df.columns:
    # Asegurar booleano limpio (sin NaN) para el analisis
    is_free_clean = df['is_free'].fillna(False).astype(bool)
    df['price_category'] = is_free_clean.map({True: 'Gratuito', False: 'De Pago'})
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    fig.suptitle('Juegos Gratuitos vs De Pago: Analisis Comparativo', fontsize=14, fontweight='bold')
    
    # Panel 1: Donut chart de proporcion
    free_count = int(is_free_clean.sum())
    paid_count = int((~is_free_clean).sum())
    
    axes[0, 0].pie(
        [free_count, paid_count],
        labels=[
            f'Gratuitos\n{free_count:,} ({free_count/len(df)*100:.1f}%)',
            f'De Pago\n{paid_count:,} ({paid_count/len(df)*100:.1f}%)'
        ],
        colors=['#4CAF50', '#2196F3'],
        startangle=90,
        pctdistance=0.75,
        wedgeprops={'width': 0.6}
    )
    axes[0, 0].set_title('Proporcion Free vs Paid', fontweight='bold')
    
    # Panel 2: Reviews promedio
    if 'positive_ratio' in df.columns:
        review_by_type = df.groupby('price_category')['positive_ratio'].mean()
        bars = axes[0, 1].bar(review_by_type.index, review_by_type.values,
                               color=['#4CAF50', '#2196F3'], edgecolor='white', alpha=0.85)
        axes[0, 1].set_ylabel('Ratio Positivo Promedio')
        axes[0, 1].set_title('Review Score Promedio por Tipo', fontweight='bold')
        axes[0, 1].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, p: f'{x:.0%}'))
        for bar in bars:
            axes[0, 1].text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.002,
                             f'{bar.get_height():.1%}', ha='center', va='bottom', fontweight='bold')
        axes[0, 1].grid(axis='y', alpha=0.3)
    
    # Panel 3: Playtime promedio
    if 'average_playtime_forever' in df.columns:
        playtime_by_type = df.groupby('price_category')['average_playtime_forever'].median() / 60
        bars = axes[1, 0].bar(playtime_by_type.index, playtime_by_type.values,
                               color=['#4CAF50', '#2196F3'], edgecolor='white', alpha=0.85)
        axes[1, 0].set_ylabel('Tiempo de Juego Mediano (horas)')
        axes[1, 0].set_title('Playtime Mediano por Tipo', fontweight='bold')
        for bar in bars:
            axes[1, 0].text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.1,
                             f'{bar.get_height():.1f}h', ha='center', va='bottom', fontweight='bold')
        axes[1, 0].grid(axis='y', alpha=0.3)
    
    # Panel 4: Distribucion de lanzamientos por año y tipo
    if 'release_year' in df.columns:
        yearly_type = (df[df['release_year'] >= 2008]
                       .groupby(['release_year', 'price_category'])
                       .size()
                       .unstack(fill_value=0))
        
        if 'Gratuito' in yearly_type.columns and 'De Pago' in yearly_type.columns:
            bottom = yearly_type['De Pago'].values
            axes[1, 1].bar(yearly_type.index, yearly_type['De Pago'],
                            color='#2196F3', alpha=0.8, label='De Pago')
            axes[1, 1].bar(yearly_type.index, yearly_type['Gratuito'],
                            bottom=bottom, color='#4CAF50', alpha=0.8, label='Gratuito')
            axes[1, 1].set_xlabel('Año')
            axes[1, 1].set_ylabel('Numero de Juegos')
            axes[1, 1].set_title('Lanzamientos por Año y Tipo', fontweight='bold')
            axes[1, 1].legend()
            axes[1, 1].tick_params(axis='x', rotation=45)
            axes[1, 1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    fig.savefig(FIGURES_DIR / 'viz12_free_vs_paid.png', dpi=150, bbox_inches='tight')
    plt.close(fig)
    print(f'Figura 12 guardada: {FIGURES_DIR / "viz12_free_vs_paid.png"}')


  df['price_category'] = is_free_clean.map({True: 'Gratuito', False: 'De Pago'})


Figura 12 guardada: C:\Users\Christian Ruiz\Maestria_DS\Gestion_Datos\reports\figures\viz12_free_vs_paid.png


## Resumen de Figuras Exportadas

In [15]:
# Listar todas las figuras exportadas
figuras_generadas = sorted(FIGURES_DIR.glob('viz*.png'))

print('=== FIGURAS EXPORTADAS ===')
print(f'Total: {len(figuras_generadas)}\n')

descripciones = {
    'viz1_distribucion_precios.png': 'Histograma + KDE de precios (todos y solo de pago)',
    'viz2_top_generos.png': 'Top 15 generos primarios con conteos y porcentajes',
    'viz3_lanzamientos_por_anio.png': 'Lanzamientos por año y catalogo acumulado',
    'viz4_distribucion_reviews.png': 'Ratio de reviews, scatter positivas vs negativas, pie de categorias',
    'viz5_correlacion_numericas.png': 'Heatmap de correlaciones entre variables numericas',
    'viz6_precio_vs_reviews.png': 'Scatter log-precio vs log-reviews con tendencia y por genero',
    'viz7_comparativa_plataformas.png': 'Soporte, reviews y precio por plataforma (Win/Mac/Lin)',
    'viz8_top_publishers_developers.png': 'Top 12 publishers y developers por volumen de juegos',
    'viz9_metacritic_scores.png': 'Distribucion de scores Metacritic y rangos de calidad',
    'viz10_boxplots_genero.png': 'Boxplot precio y violin de reviews por genero',
    'viz11_evolucion_precio_anio.png': 'Evolucion de precio mediano y media por año con IQR',
    'viz12_free_vs_paid.png': 'Comparativa free vs paid: reviews, playtime, lanzamientos'
}

for fig_path in figuras_generadas:
    desc = descripciones.get(fig_path.name, 'Sin descripcion')
    size_kb = fig_path.stat().st_size / 1024
    print(f'[{fig_path.name}]')
    print(f'  Descripcion: {desc}')
    print(f'  Tamano: {size_kb:.1f} KB')
    print()


=== FIGURAS EXPORTADAS ===
Total: 12

[viz10_boxplots_genero.png]
  Descripcion: Boxplot precio y violin de reviews por genero
  Tamano: 281.2 KB

[viz11_evolucion_precio_anio.png]
  Descripcion: Evolucion de precio mediano y media por año con IQR
  Tamano: 142.3 KB

[viz12_free_vs_paid.png]
  Descripcion: Comparativa free vs paid: reviews, playtime, lanzamientos
  Tamano: 136.4 KB

[viz1_distribucion_precios.png]
  Descripcion: Histograma + KDE de precios (todos y solo de pago)
  Tamano: 111.8 KB

[viz2_top_generos.png]
  Descripcion: Top 15 generos primarios con conteos y porcentajes
  Tamano: 104.4 KB

[viz3_lanzamientos_por_anio.png]
  Descripcion: Lanzamientos por año y catalogo acumulado
  Tamano: 108.4 KB

[viz4_distribucion_reviews.png]
  Descripcion: Ratio de reviews, scatter positivas vs negativas, pie de categorias
  Tamano: 280.5 KB

[viz5_correlacion_numericas.png]
  Descripcion: Heatmap de correlaciones entre variables numericas
  Tamano: 145.5 KB

[viz6_precio_vs_reviews

## Key Insights Identificados

### Precios:
- La mayoria de juegos de Steam son gratuitos o de precio bajo (< $10)
- El punto de precio mas comun para juegos de pago esta en el rango $5-$20
- Los juegos AAA (> $40) representan una minoria del catalogo total

### Generos:
- **Indie** y **Action** dominan el catalogo de Steam con >40% combinado
- **RPG** y **Adventure** son los siguientes generos mas populares
- La diversidad de generos ha aumentado con el tiempo

### Lanzamientos:
- El catalogo de Steam crecio exponencialmente a partir de 2014 con la apertura de Steam Direct
- El pico de lanzamientos anuales se alcanzó alrededor de 2018-2020
- La tendencia actual muestra una estabilizacion del volumen de lanzamientos

### Reviews:
- La mayoria de juegos con suficientes reviews tienen ratio positivo > 70%
- Los juegos gratuitos tienden a tener reviews mas polarizadas
- El volumen de reviews esta fuertemente correlacionado con el playtime

### Plataformas:
- Windows domina con ~95% de compatibilidad en el catalogo
- Mac y Linux tienen soporte en ~25-30% de juegos
- Los juegos multiplatforma suelen tener mejores reviews

### Metacritic:
- Solo una fraccion pequeña del catalogo tiene score en Metacritic (juegos mas conocidos)
- La distribucion de scores tiende a concentrarse en el rango 65-85
- Los juegos Indie raramente aparecen en Metacritic a pesar de dominar el catalogo