# üöÄ Enhanced Future CCU Prediction Model (with Fortnite API)

**ENHANCED APPROACH:** Predict future CCU using both fncreate.gg + Fortnite Ecosystem API!

**Target:** Predict CCU 7 days from now  
**Data Sources:**
- fncreate.gg: Creator stats, CCU trends, discovery
- Fortnite API: Retention, engagement, virality

**NEW FEATURES:**
- Retention rate (% players who return)
- Session engagement (avg minutes per player)
- Play frequency (repeat play behavior)
- Virality score (favorites + recommendations)

**Expected R¬≤:** 0.80+ (improvement from 0.76!)


## 1. Import Libraries


In [18]:
import json
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import joblib

# Set display options
pd.set_option('display.max_columns', None)
sns.set_style('whitegrid')

print("‚úÖ Libraries imported successfully")


‚úÖ Libraries imported successfully


## 2. Set Up Data Paths


In [19]:
# Paths
FNCREATE_DIR = Path('../data/raw')
FORTNITE_DIR = Path('../data/fortnite_metrics')
MODEL_DIR = Path('../data/models')
MODEL_DIR.mkdir(exist_ok=True, parents=True)

print(f"üìÅ fncreate.gg data: {FNCREATE_DIR}")
print(f"üìÅ Fortnite API data: {FORTNITE_DIR}")
print(f"üìÅ Model output: {MODEL_DIR}")

# Count files
fncreate_files = list(FNCREATE_DIR.glob('map_*.json'))
fortnite_files = list(FORTNITE_DIR.glob('fortnite_*.json'))

print(f"\nüìä fncreate.gg maps: {len(fncreate_files)}")
print(f"üìä Fortnite API maps: {len(fortnite_files)}")


üìÅ fncreate.gg data: ../data/raw
üìÅ Fortnite API data: ../data/fortnite_metrics
üìÅ Model output: ../data/models

üìä fncreate.gg maps: 962
üìä Fortnite API maps: 962


## 3. Feature Extraction Functions

We'll create two functions:
1. **extract_fncreate_features()** - Extract from fncreate.gg data (11 features)
2. **extract_fortnite_features()** - Extract from Fortnite API data (9 NEW features)


In [20]:
def extract_fncreate_features(map_file):
    """
    Extract features from fncreate.gg data.
    Returns dict with 11 features + target.
    """
    with open(map_file, 'r') as f:
        data = json.load(f)
    
    map_data = data.get('map_data', {})
    stats_7d_raw = data.get('stats_7d', {})
    
    # Extract map code
    map_code = map_data.get('mnemonic', '')
    
    # Basic features
    features = {
        'map_code': map_code,
        'creator_followers': map_data.get('creator', {}).get('followers', 0),
        'in_discovery': 1 if map_data.get('discovery', False) else 0,
        'xp_enabled': 1 if map_data.get('xpEnabled', False) else 0,
        'num_tags': len(map_data.get('tags', [])),
        'max_players': map_data.get('maxPlayers', 0),
        'version': map_data.get('version', 1),
    }
    
    # Time-series features from stats_7d
    # stats_7d is a dict with structure: {"success": true, "data": {"stats": [...]}}
    ccu_values = []
    if stats_7d_raw and stats_7d_raw.get('success'):
        stats_data = stats_7d_raw.get('data', {})
        ccu_values = stats_data.get('stats', [])
    
    if ccu_values and len(ccu_values) >= 50:
        # Split into training (first 85%) and prediction (last 15%)
        # This ensures we're truly predicting the FUTURE
        split_point = int(len(ccu_values) * 0.85)
        
        training_data = ccu_values[:split_point]  # Use first 85% for features
        future_data = ccu_values[split_point:]    # Last 15% is "future"
        
        # Features from TRAINING data only (past)
        features['baseline_ccu'] = np.mean(training_data)
        
        # Target: Average CCU in the "future" period
        features['future_ccu_7d'] = np.mean(future_data)
        
        # Trend slope (from training data)
        if len(training_data) > 1:
            x = np.arange(len(training_data))
            slope, _ = np.polyfit(x, training_data, 1)
            features['trend_slope'] = slope
        else:
            features['trend_slope'] = 0
        
        # Recent momentum (last 20% vs first 20% of training data)
        recent_idx = int(len(training_data) * 0.8)
        early_idx = int(len(training_data) * 0.2)
        recent_avg = np.mean(training_data[recent_idx:])
        early_avg = np.mean(training_data[:early_idx])
        features['recent_momentum'] = recent_avg - early_avg if early_avg > 0 else 0
        
        # Volatility (from training data)
        features['volatility'] = np.std(training_data)
    else:
        features['baseline_ccu'] = 0
        features['future_ccu_7d'] = 0
        features['trend_slope'] = 0
        features['recent_momentum'] = 0
        features['volatility'] = 0
    
    # Map age
    created_at = map_data.get('createdAt')
    if created_at:
        try:
            created_date = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
            features['map_age_days'] = (datetime.now() - created_date).days
        except:
            features['map_age_days'] = 0
    else:
        features['map_age_days'] = 0
    
    return features

print("‚úÖ fncreate.gg feature extractor created")


‚úÖ fncreate.gg feature extractor created


In [21]:
def extract_fortnite_features(fortnite_file):
    """
    Extract features from Fortnite Ecosystem API data.
    Returns dict with 9 NEW features.
    """
    with open(fortnite_file, 'r') as f:
        data = json.load(f)
    
    # Check if data was successfully fetched
    if data.get('status') != 'success':
        return None
    
    metrics = data.get('metrics', {})
    
    features = {
        'map_code': data.get('map_code', '')
    }
    
    # Helper: get average value from time-series
    def get_avg(metric_name, default=0):
        metric_data = metrics.get(metric_name, [])
        if metric_data:
            values = [m.get('value', 0) for m in metric_data]
            return np.mean(values) if values else default
        return default
    
    # Helper: get latest value
    def get_latest(metric_name, default=0):
        metric_data = metrics.get(metric_name, [])
        if metric_data:
            return metric_data[-1].get('value', default)
        return default
    
    # Extract features
    features['avg_session_length'] = get_avg('averageMinutesPerPlayer', 0)
    features['retention_rate'] = get_latest('retention', 0)
    features['favorites_count'] = get_latest('favorites', 0)
    features['recommendations_count'] = get_latest('recommendations', 0)
    features['unique_players'] = get_latest('uniquePlayers', 0)
    features['total_plays'] = get_latest('plays', 0)
    features['total_minutes_played'] = get_latest('minutesPlayed', 0)
    
    # Derived features
    # Play frequency: avg plays per player
    if features['unique_players'] > 0:
        features['play_frequency'] = features['total_plays'] / features['unique_players']
    else:
        features['play_frequency'] = 0
    
    # Virality score
    features['virality_score'] = features['favorites_count'] + features['recommendations_count']
    
    # Engagement per player
    if features['unique_players'] > 0:
        features['engagement_per_player'] = features['total_minutes_played'] / features['unique_players']
    else:
        features['engagement_per_player'] = 0
    
    return features

print("‚úÖ Fortnite API feature extractor created")


‚úÖ Fortnite API feature extractor created


## 4. Load and Merge Data

Now we'll loop through all maps, extract features from both sources, and merge them!


In [22]:
print("üîÑ Loading and merging datasets...\n")

all_features = []

for i, fncreate_file in enumerate(fncreate_files, 1):
    if i % 100 == 0:
        print(f"  Processed {i}/{len(fncreate_files)} maps...")
    
    # Get corresponding Fortnite file
    # map_8530_0110_2817.json -> fortnite_8530_0110_2817.json
    fortnite_filename = fncreate_file.name.replace('map_', 'fortnite_')
    fortnite_file = FORTNITE_DIR / fortnite_filename
    
    # Extract fncreate features
    try:
        fncreate_feat = extract_fncreate_features(fncreate_file)
    except:
        continue
    
    # Extract Fortnite features (if available)
    if fortnite_file.exists():
        try:
            fortnite_feat = extract_fortnite_features(fortnite_file)
            if fortnite_feat:
                # Merge both feature sets
                merged = {**fncreate_feat, **fortnite_feat}
                all_features.append(merged)
        except:
            continue

print(f"\n‚úÖ Loaded {len(all_features)} maps with both datasets")

# Create DataFrame
df = pd.DataFrame(all_features)
print(f"üìä Dataset shape: {df.shape}")
print(f"\nüìã Columns ({len(df.columns)}): {', '.join(df.columns.tolist())}")


üîÑ Loading and merging datasets...

  Processed 100/962 maps...
  Processed 200/962 maps...
  Processed 300/962 maps...
  Processed 400/962 maps...
  Processed 500/962 maps...
  Processed 600/962 maps...
  Processed 700/962 maps...
  Processed 800/962 maps...
  Processed 900/962 maps...

‚úÖ Loaded 951 maps with both datasets
üìä Dataset shape: (951, 23)

üìã Columns (23): map_code, creator_followers, in_discovery, xp_enabled, num_tags, max_players, version, baseline_ccu, future_ccu_7d, trend_slope, recent_momentum, volatility, map_age_days, avg_session_length, retention_rate, favorites_count, recommendations_count, unique_players, total_plays, total_minutes_played, play_frequency, virality_score, engagement_per_player


## 5. Data Filtering


In [23]:
print(f"üìä Before filtering: {len(df)} maps\n")

# Remove maps with insufficient data
df = df[df['baseline_ccu'] > 0]
df = df[df['future_ccu_7d'] > 0]

print(f"‚úÖ After filtering: {len(df)} maps")
print(f"\nüìà CCU range:")
print(f"  Baseline: {df['baseline_ccu'].min():.0f} - {df['baseline_ccu'].max():.0f}")
print(f"  Future (target): {df['future_ccu_7d'].min():.0f} - {df['future_ccu_7d'].max():.0f}")

# Show first few rows
df.head()


üìä Before filtering: 951 maps

‚úÖ After filtering: 950 maps

üìà CCU range:
  Baseline: 1 - 11760
  Future (target): 4 - 7563


Unnamed: 0,map_code,creator_followers,in_discovery,xp_enabled,num_tags,max_players,version,baseline_ccu,future_ccu_7d,trend_slope,recent_momentum,volatility,map_age_days,avg_session_length,retention_rate,favorites_count,recommendations_count,unique_players,total_plays,total_minutes_played,play_frequency,virality_score,engagement_per_player
0,9228-8994-1362,0,0,0,3,0,425,97.580986,90.0,-0.161506,-31.180451,52.866465,0,31.675714,0,35,0,2373,2738,73685,1.153814,35,31.051412
1,2898-7886-8847,0,0,0,4,0,404,8934.507042,5875.54902,-15.7707,-1943.719298,4307.768295,0,32.867143,0,3157,2053,274599,393086,8533841,1.431491,5210,31.077466
2,3428-5975-3171,0,0,0,4,0,59,187.672535,121.941176,-0.122525,20.692043,164.953606,0,30.677143,0,87,4,4212,6592,113064,1.565052,91,26.843305
3,7340-5853-5689,0,0,0,4,0,63,90.5,102.196078,-0.231812,-40.288534,36.455694,0,44.687143,0,124,30,3593,4396,146963,1.22349,154,40.902588
4,8398-4381-0561,0,0,0,4,0,38,96.447183,103.803922,-0.305097,-73.967419,74.470861,0,42.271429,0,82,88,2295,3509,99665,1.528976,170,43.427015


## 6. Train Models with Enhanced Features (20 total!)


In [24]:
# Define all 20 features
feature_columns = [
    # fncreate.gg features (11)
    'baseline_ccu', 'trend_slope', 'recent_momentum', 'volatility',
    'map_age_days', 'in_discovery', 'creator_followers', 'xp_enabled',
    'num_tags', 'max_players', 'version',
    # Fortnite API features (9 NEW!)
    'avg_session_length', 'retention_rate', 'favorites_count',
    'recommendations_count', 'unique_players', 'total_plays',
    'play_frequency', 'virality_score', 'engagement_per_player'
]

X = df[feature_columns]
y = df['future_ccu_7d']

print(f"‚úÖ Features: {X.shape[1]}")
print(f"‚úÖ Samples: {X.shape[0]}")
print(f"\nüìã Feature List:")
for i, col in enumerate(feature_columns, 1):
    marker = "üÜï" if i > 11 else "  "
    print(f"  {marker} {i:2d}. {col}")

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"\nüìä Training set: {X_train.shape[0]} maps")
print(f"üìä Test set: {X_test.shape[0]} maps")


‚úÖ Features: 20
‚úÖ Samples: 950

üìã Feature List:
      1. baseline_ccu
      2. trend_slope
      3. recent_momentum
      4. volatility
      5. map_age_days
      6. in_discovery
      7. creator_followers
      8. xp_enabled
      9. num_tags
     10. max_players
     11. version
  üÜï 12. avg_session_length
  üÜï 13. retention_rate
  üÜï 14. favorites_count
  üÜï 15. recommendations_count
  üÜï 16. unique_players
  üÜï 17. total_plays
  üÜï 18. play_frequency
  üÜï 19. virality_score
  üÜï 20. engagement_per_player

üìä Training set: 760 maps
üìä Test set: 190 maps


In [25]:
# Train 3 models
models = {
    'Random Forest': RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1),
    'Gradient Boosting': GradientBoostingRegressor(n_estimators=100, random_state=42),
    'Linear Regression': LinearRegression()
}

results = {}

print("üöÄ Training models...\n")

for name, model in models.items():
    print(f"Training {name}...")
    model.fit(X_train, y_train)
    
    # Predictions
    y_pred = model.predict(X_test)
    
    # Metrics
    r2 = r2_score(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    
    results[name] = {
        'model': model,
        'r2': r2,
        'mae': mae,
        'rmse': rmse
    }
    
    print(f"  R¬≤ Score: {r2:.4f}")
    print(f"  MAE: {mae:.2f} CCU")
    print(f"  RMSE: {rmse:.2f} CCU\n")

print("‚úÖ All models trained!")


üöÄ Training models...

Training Random Forest...
  R¬≤ Score: 0.7546
  MAE: 90.97 CCU
  RMSE: 444.17 CCU

Training Gradient Boosting...
  R¬≤ Score: 0.7677
  MAE: 92.45 CCU
  RMSE: 432.13 CCU

Training Linear Regression...
  R¬≤ Score: 0.8008
  MAE: 78.58 CCU
  RMSE: 400.14 CCU

‚úÖ All models trained!


## 7. Compare Results


In [26]:
# Comparison table
comparison = pd.DataFrame({
    'Model': list(results.keys()),
    'R¬≤ Score': [r['r2'] for r in results.values()],
    'MAE (CCU)': [r['mae'] for r in results.values()],
    'RMSE (CCU)': [r['rmse'] for r in results.values()]
}).sort_values('R¬≤ Score', ascending=False)

print("üèÜ Model Comparison:")
print(comparison.to_string(index=False))

# Best model
best_model_name = comparison.iloc[0]['Model']
best_model = results[best_model_name]['model']

print(f"\nü•á Best Model: {best_model_name}")
print(f"   R¬≤ Score: {comparison.iloc[0]['R¬≤ Score']:.4f}")
print(f"   MAE: {comparison.iloc[0]['MAE (CCU)']:.2f} CCU")
print(f"   RMSE: {comparison.iloc[0]['RMSE (CCU)']:.2f} CCU")

# Compare with old model (R¬≤ = 0.76)
print(f"\nüìà Improvement from old model (R¬≤ = 0.76):")
old_r2 = 0.76
new_r2 = comparison.iloc[0]['R¬≤ Score']
improvement = ((new_r2 - old_r2) / old_r2) * 100
print(f"   R¬≤ improved by: {improvement:+.1f}%")


üèÜ Model Comparison:
            Model  R¬≤ Score  MAE (CCU)  RMSE (CCU)
Linear Regression  0.800803  78.577983  400.140230
Gradient Boosting  0.767677  92.453631  432.132738
    Random Forest  0.754559  90.974508  444.165135

ü•á Best Model: Linear Regression
   R¬≤ Score: 0.8008
   MAE: 78.58 CCU
   RMSE: 400.14 CCU

üìà Improvement from old model (R¬≤ = 0.76):
   R¬≤ improved by: +5.4%


## 8. Save Enhanced Model


In [27]:
# Save model
model_path = MODEL_DIR / 'enhanced_future_ccu_predictor.pkl'
joblib.dump(best_model, model_path)
print(f"‚úÖ Model saved: {model_path}")

# Save metadata
metadata = {
    'model_type': best_model_name,
    'features': feature_columns,
    'num_features': len(feature_columns),
    'fncreate_features': 11,
    'fortnite_api_features': 9,
    'r2_score': float(results[best_model_name]['r2']),
    'mae': float(results[best_model_name]['mae']),
    'rmse': float(results[best_model_name]['rmse']),
    'training_samples': len(X_train),
    'test_samples': len(X_test),
    'trained_at': datetime.now().isoformat(),
    'data_sources': ['fncreate.gg', 'fortnite_ecosystem_api']
}

metadata_path = MODEL_DIR / 'enhanced_future_ccu_predictor_metadata.json'
with open(metadata_path, 'w') as f:
    json.dump(metadata, f, indent=2)

print(f"‚úÖ Metadata saved: {metadata_path}")
print(f"\nüéâ Enhanced model training complete!")
print(f"\nüìä Final Results:")
print(f"   Model: {best_model_name}")
print(f"   R¬≤ Score: {metadata['r2_score']:.4f}")
print(f"   MAE: {metadata['mae']:.2f} CCU")
print(f"   Features: {metadata['num_features']} ({metadata['fncreate_features']} fncreate + {metadata['fortnite_api_features']} Fortnite API)")


‚úÖ Model saved: ../data/models/enhanced_future_ccu_predictor.pkl
‚úÖ Metadata saved: ../data/models/enhanced_future_ccu_predictor_metadata.json

üéâ Enhanced model training complete!

üìä Final Results:
   Model: Linear Regression
   R¬≤ Score: 0.8008
   MAE: 78.58 CCU
   Features: 20 (11 fncreate + 9 Fortnite API)
