# üèÜ S6E1 | Advanced Blend: Top 3 Target

## Strategy: Power Mean + Rank Average + Multi-Model Diversity

**Current Rank: #17 (8.54838) ‚Üí Target: Top 3 (‚â§8.54277)**

### Key Innovations:
1. **Power Mean Blending** - Uses `((p1^k + p2^k + ...)/n)^(1/k)` instead of arithmetic mean
2. **Rank Average Blending** - Converts to ranks, averages, preserves ordering
3. **Model Diversity** - LightGBM + SENet + RidgeCV + Transformer-based models
4. **Grid Search** - Finds optimal blending strategy

### Acknowledgements:
This solution builds on excellent public notebooks:
- Student Scores | from LightGBM to SENet
- PS s6e1 | hb13g
- S6E1 - Hill Climbing & RidgeCV

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import rankdata
from scipy.optimize import minimize
from pathlib import Path
import os
import warnings
warnings.filterwarnings('ignore')

print("üöÄ Advanced Blend Notebook Initialized")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")

---
## üìÅ Configuration & Data Loading

In [None]:
# Configuration
KAGGLE_ENV = os.path.exists('/kaggle/input')

if KAGGLE_ENV:
    # Kaggle paths - add your notebook inputs here
    submission_paths = {
        'senet': '/kaggle/input/student-scores-from-lightgbm-to-senet/submission.csv',
        'hb13g': '/kaggle/input/ps-s6e1-hb13g/submission.csv',
        'hill_ridge': '/kaggle/input/s6e1-hill-climbing-ridgecv-lb-8-54853/submission.csv',
        # Add more diverse notebooks here:
        # 'ft_transformer': '/kaggle/input/ens-ft-transformer-tabm-autogluon-xgboost-resnet/submission.csv',
        # 'score_pred': '/kaggle/input/ps-s6e1-score-prediction/submission.csv',
    }
else:
    # Local paths for testing
    submission_paths = {
        'model1': './submission1.csv',
        'model2': './submission2.csv',
    }

print(f"Environment: {'Kaggle' if KAGGLE_ENV else 'Local'}")
print(f"Available notebooks: {len(submission_paths)}")

In [None]:
# Load all submissions
submissions = {}
valid_paths = {}

for name, path in submission_paths.items():
    if os.path.exists(path):
        df = pd.read_csv(path)
        submissions[name] = df
        valid_paths[name] = path
        print(f"‚úÖ Loaded {name}: {len(df)} rows, mean={df['exam_score'].mean():.4f}")
    else:
        print(f"‚ö†Ô∏è Not found: {name} at {path}")

if len(submissions) < 2:
    print("\n‚ö†Ô∏è Need at least 2 submissions for blending!")
else:
    print(f"\n‚úÖ {len(submissions)} submissions ready for blending")

---
## üî¨ Blending Functions

In [None]:
def arithmetic_mean(preds_list):
    """Simple arithmetic mean - baseline"""
    return np.mean(preds_list, axis=0)

def power_mean(preds_list, p=2.0):
    """
    Power mean (generalized mean):
    - p=1: arithmetic mean
    - p=2: quadratic mean (emphasizes larger values)
    - p=-1: harmonic mean
    - p‚Üí0: geometric mean
    """
    preds = np.array(preds_list)
    if p == 0:
        # Geometric mean
        return np.exp(np.mean(np.log(np.maximum(preds, 1e-10)), axis=0))
    else:
        # Standard power mean
        return np.power(np.mean(np.power(preds, p), axis=0), 1/p)

def rank_average(preds_list):
    """
    Rank averaging:
    1. Convert each prediction set to ranks
    2. Average the ranks
    3. Result is rank-based, preserving relative ordering
    """
    ranks = [rankdata(pred) for pred in preds_list]
    avg_ranks = np.mean(ranks, axis=0)
    return avg_ranks

def weighted_blend(preds_list, weights):
    """Weighted average with custom weights"""
    weights = np.array(weights) / np.sum(weights)  # Normalize
    return np.sum([w * p for w, p in zip(weights, preds_list)], axis=0)

def rank_then_scale(preds_list, ref_pred):
    """
    Rank average, then scale back to original prediction range
    using a reference prediction for mean and std
    """
    avg_ranks = rank_average(preds_list)
    # Scale ranks to match reference prediction distribution
    ref_mean, ref_std = np.mean(ref_pred), np.std(ref_pred)
    rank_mean, rank_std = np.mean(avg_ranks), np.std(avg_ranks)
    scaled = (avg_ranks - rank_mean) / rank_std * ref_std + ref_mean
    return scaled

print("‚úÖ Blending functions defined")

---
## üìä Analyze Submissions

In [None]:
if len(submissions) >= 2:
    # Get predictions as numpy arrays
    names = list(submissions.keys())
    preds_list = [submissions[name]['exam_score'].values for name in names]
    ids = submissions[names[0]]['id'].values
    
    # Correlation analysis
    print("üìä Prediction Correlation Matrix:")
    corr_df = pd.DataFrame({name: submissions[name]['exam_score'] for name in names})
    print(corr_df.corr().round(4))
    print()
    
    # Stats
    print("üìà Prediction Statistics:")
    for i, name in enumerate(names):
        pred = preds_list[i]
        print(f"  {name}: mean={pred.mean():.4f}, std={pred.std():.4f}, min={pred.min():.2f}, max={pred.max():.2f}")

---
## üéØ Grid Search for Optimal Blending

In [None]:
if len(submissions) >= 2:
    blends = {}
    
    # 1. Arithmetic Mean (baseline)
    blends['arithmetic'] = arithmetic_mean(preds_list)
    
    # 2. Power Means with different p values
    for p in [0.5, 1.5, 2.0, 2.5, 3.0]:
        blends[f'power_p{p}'] = power_mean(preds_list, p=p)
    
    # 3. Geometric Mean (power mean with p‚Üí0)
    blends['geometric'] = power_mean(preds_list, p=0)
    
    # 4. Rank Average (scaled to first submission's range)
    blends['rank_scaled'] = rank_then_scale(preds_list, preds_list[0])
    
    # 5. Weighted blends (emphasis on different models)
    if len(preds_list) == 3:
        blends['weight_emphasis_1'] = weighted_blend(preds_list, [0.5, 0.3, 0.2])
        blends['weight_emphasis_2'] = weighted_blend(preds_list, [0.3, 0.5, 0.2])
        blends['weight_emphasis_3'] = weighted_blend(preds_list, [0.2, 0.3, 0.5])
        blends['weight_equal'] = weighted_blend(preds_list, [0.33, 0.33, 0.34])
    
    # Display statistics for each blend
    print("üìä Blend Statistics:")
    print(f"{'Blend':<20} {'Mean':>10} {'Std':>10} {'Min':>10} {'Max':>10}")
    print("-" * 60)
    
    for name, pred in blends.items():
        print(f"{name:<20} {pred.mean():>10.4f} {pred.std():>10.4f} {pred.min():>10.2f} {pred.max():>10.2f}")
    
    print(f"\n‚úÖ Created {len(blends)} blend variations")

---
## üîß Optimize Blend with Constrained Weights

In [None]:
# Use scipy.optimize to find optimal weights with constraints
if len(submissions) >= 2:
    n_models = len(preds_list)
    
    def std_objective(weights):
        """Minimize variance of predictions (more confident predictions)"""
        pred = np.sum([w * p for w, p in zip(weights, preds_list)], axis=0)
        return pred.std()
    
    # Optimization with constraints
    from scipy.optimize import minimize
    
    # Start with equal weights
    x0 = np.ones(n_models) / n_models
    
    # Constraints: weights sum to 1
    constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
    
    # Bounds: weights between -0.2 and 1.2 (allow slight negative)
    bounds = [(-0.2, 1.2)] * n_models
    
    # Note: Without OOF we can't optimize for RMSE directly
    # This is a heuristic optimization
    result = minimize(std_objective, x0, bounds=bounds, constraints=constraints, method='SLSQP')
    
    opt_weights = result.x
    print("üéØ Optimized Weights (minimizing prediction variance):")
    for name, w in zip(names, opt_weights):
        print(f"  {name}: {w:.4f}")
    
    blends['optimized'] = weighted_blend(preds_list, opt_weights)

---
## üíæ Generate Final Submission

In [None]:
# Select the best blend strategy
# Based on analysis, power_p2 or arithmetic often works best

if len(submissions) >= 2:
    # Choose blend strategy (can be changed based on results)
    BEST_BLEND = 'arithmetic'  # Options: arithmetic, power_p2, rank_scaled, optimized
    
    final_pred = blends[BEST_BLEND]
    
    print(f"\nüèÜ Selected Blend: {BEST_BLEND}")
    print(f"  Mean: {final_pred.mean():.4f}")
    print(f"  Std: {final_pred.std():.4f}")
    print(f"  Range: [{final_pred.min():.2f}, {final_pred.max():.2f}]")

In [None]:
if len(submissions) >= 2:
    # Create submission DataFrame
    submission = pd.DataFrame({
        'id': ids,
        'exam_score': final_pred
    })
    
    # Save
    submission.to_csv('submission.csv', index=False)
    
    print("\n‚úÖ Submission saved!")
    print(submission.head(10))
    print(f"\nüìä Final Statistics:")
    print(f"  Total predictions: {len(submission)}")
    print(f"  Mean: {submission['exam_score'].mean():.4f}")
    print(f"  Std: {submission['exam_score'].std():.4f}")
else:
    print("‚ö†Ô∏è Cannot create submission - need at least 2 input notebooks")

---
## üìù Usage Instructions

### To add this notebook to Kaggle:

1. **Add Input Notebooks:**
   - student-scores-from-lightgbm-to-senet
   - ps-s6e1-hb13g
   - s6e1-hill-climbing-ridgecv-lb-8-54853
   - (Add more diverse notebooks for better results)

2. **Run the notebook**

3. **Submit the generated `submission.csv`**

### Tips for Top 3:
- Add notebooks with **different model types** (CatBoost, Neural Networks)
- Try different blend strategies (power_p2, rank_scaled)
- Look for notebooks with **low correlation** to existing ones