<div style="padding:18px;border-radius:12px;background:linear-gradient(135deg,#f8fafc,#eef2ff);border:1px solid #e5e7eb">
  <h1 style="margin:0;font-size:34px;font-weight:800;">🍹 Minimal Submission Blender</h1>
  <p style="margin:8px 0 0 0;color:#334155;">Loads multiple submission CSVs, performs lightweight checks, blends with a weight dict, and exports the result.</p>
</div>


## 1) Setup
Import libraries and define the weight dictionary.


In [1]:
# Import libraries
import numpy as np
import pandas as pd

In [2]:
# Define the weights dictionary (path -> weight)
weights = {
    "/kaggle/input/predicting-road-accident-risk-vault/submission.csv": 1.2,
    "/kaggle/input/predicting-road-accident-risk-vault/test_tabm_plus_origcol_tuned.csv": 0.1,
}

## 2) Helpers
Tiny utilities for normalization, loading, and column inference.

In [3]:
# Normalize a weight map to sum to 1.0
def normalize_weights(weight_map):
    # Compute sum of weights
    total = sum(weight_map.values())
    # Validate total is non-zero
    if total == 0:
        # Raise an error for zero-sum weights
        raise ValueError("Weights sum to zero.")
    # Return normalized weights
    return {k: v / total for k, v in weight_map.items()}

# Infer the prediction column name
def infer_prediction_column(df):
    # Define candidate column names
    candidates = ["accident_risk"]
    # Return the first candidate that exists
    for c in candidates:
        if c in df.columns:
            return c
    # Fallback to first numeric column
    numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
    # Validate that a numeric column exists
    if not numeric_cols:
        # Raise error when nothing numeric is found
        raise ValueError("No numeric columns available to infer predictions.")
    # Return the first numeric column as a fallback
    return numeric_cols[0]

# Load a CSV and return the frame and its prediction column
def load_csv(path):
    # Read the CSV
    df = pd.read_csv(path)
    # Infer the prediction column
    pred_col = infer_prediction_column(df)
    # Return the frame and prediction column name
    return df, pred_col

# Minimal EDA just for submission columns
def minimal_submission_eda(name, df, pred_col):
    # Print file header
    print(f"\n=== {name} ===")
    # Print shape
    print("Shape:", df.shape)
    # Print prediction column
    print("Prediction column:", pred_col)
    # Print missing values for prediction
    print("Missing in prediction:", df[pred_col].isna().sum())
    # Print simple numeric stats for prediction
    print(df[pred_col].describe())

## 3) Load
Read each CSV and identify the prediction column.

In [4]:
# Normalize weights
norm_weights = normalize_weights(weights)

# Prepare containers
dfs = {}
pred_cols = {}
pred_series = {}

# Iterate through the files in the weight map
for path, w in norm_weights.items():
    # Load the CSV and infer the prediction column
    df, pred_col = load_csv(path)
    # Store the DataFrame
    dfs[path] = df
    # Store the prediction column name
    pred_cols[path] = pred_col
    # Store the prediction Series
    pred_series[path] = df[pred_col]

# Display first few rows for a quick glance
for path, df in dfs.items():
    # Show a small preview
    display(df.head(3))

Unnamed: 0,id,accident_risk
0,517754,0.296114
1,517755,0.11963
2,517756,0.18167


Unnamed: 0,id,accident_risk
0,517754,0.296001
1,517755,0.117532
2,517756,0.180968


## 4) Simple Checks
Lightweight sanity checks only for submission predictions.

In [5]:
# Run minimal EDA for each submission
for path, df in dfs.items():
    # Retrieve the prediction column for this file
    pred_col = pred_cols[path]
    # Execute minimal EDA
    minimal_submission_eda(path, df, pred_col)


=== /kaggle/input/predicting-road-accident-risk-vault/submission.csv ===
Shape: (172585, 2)
Prediction column: accident_risk
Missing in prediction: 0
count    172585.000000
mean          0.351653
std           0.157366
min           0.014443
25%           0.241393
50%           0.335573
75%           0.454288
max           0.872705
Name: accident_risk, dtype: float64

=== /kaggle/input/predicting-road-accident-risk-vault/test_tabm_plus_origcol_tuned.csv ===
Shape: (172585, 2)
Prediction column: accident_risk
Missing in prediction: 0
count    172585.000000
mean          0.351481
std           0.157728
min           0.007166
25%           0.240518
50%           0.335231
75%           0.454553
max           0.872947
Name: accident_risk, dtype: float64


## 5) Blend
Apply the normalized weights to the predictions and create a `blended_prediction`.

In [6]:
# Initialize blended series
blended = None

# Iterate through paths and normalized weights
for path, w in norm_weights.items():
    # Retrieve the current prediction series
    s = pred_series[path].astype(float)
    # Initialize blended or accumulate weighted sum
    if blended is None:
        # Start with weighted base
        blended = s * float(w)
    else:
        # Add weighted component
        blended = blended + s * float(w)

# Choose a base DataFrame to attach the blended column
base_path = list(dfs.keys())[0]

# Create a copy as the output DataFrame
out_df = dfs[base_path].copy()

# Assign the blended prediction
out_df["accident_risk"] = blended

# Show a small preview
display(out_df.head(10))

Unnamed: 0,id,accident_risk
0,517754,0.296105
1,517755,0.119468
2,517756,0.181616
3,517757,0.311009
4,517758,0.399585
5,517759,0.463091
6,517760,0.264101
7,517761,0.195304
8,517762,0.3754
9,517763,0.326356


## 6) Export
Write the blended submission to disk.

In [7]:
# Define output path
output_path = "/kaggle/working/submission.csv"

# Save without index
out_df.to_csv(output_path, index=False)

# Print confirmation
print(f"✅ Saved: {output_path}")

✅ Saved: /kaggle/working/submission.csv
