# **Instruction**

The file is designed to generate predictions by combining the outputs of multiple models. To run the code, you need:

1. The predicted probabilities for the test set from both models, in the format: (ID, probability).
2. The predicted probabilities for the validation set from both models, along with the true label, in the format: (ID, probability).
3. The label of validation set (ID, label).

**Note:** This code assumes that the training set is split such that the first 80% is used for training and the last 20% for validation, which matches the team’s current setup.

The combination strategies implemented include:

1. **Averaging Probabilities:** Directly average the probabilities from the two models and use the result to make final predictions.
2. **Rule-Based Combination:** Predict an edge (label = 1) only if both models agree that the probability is above 0.5.
3. **Meta-classifier:** Use the probabilities from both models (on the validation set) as input features, train a logistic regression to learn how much to trust each model, and apply the learned weighting to the test set probabilities for final prediction.
4. **Confidence-Weighted Averaging:** Weight the predictions of each model based on their validation accuracy, giving more weight to the better-performing model.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from tqdm import tqdm

## Prerequisit: upload the necessary data

In [2]:
# Load the probabilities saved from both models
gcn_probs = np.load("/content/gcn_probs.npy")
rf_probs = np.load("/content/rf_probs.npy")

gcn_val_probs = np.load("/content/gcn_val_probs.npy")
rf_val_probs = np.load("/content/rf_val_probs.npy")
y_val = np.load("/content/val_labels.npy")

## Averaging Probabilities

In [3]:
# Ensure both have the same length (sanity check)
assert len(gcn_probs) == len(rf_probs), "Mismatch in prediction lengths!"

# Simple average ensemble
final_probs = (gcn_probs + rf_probs) / 2

# Convert to binary prediction (threshold at 0.5)
final_preds = (final_probs > 0.5).astype(int)

# Prepare submission DataFrame
submission = pd.DataFrame({
    "ID": np.arange(len(final_preds)),
    "Predicted": final_preds
})

# Save final submission
submission_file = "/content/ensemble_submission.csv"
submission.to_csv(submission_file, index=False)
print(f"Ensemble submission saved to {submission_file}")

Ensemble submission saved to /content/ensemble_submission.csv


## Rule-Based Combination

In [4]:
# Convert probabilities to binary predictions (threshold = 0.5)
gcn_preds = (gcn_probs > 0.5).astype(int)
rf_preds = (rf_probs > 0.5).astype(int)

# Rule-based ensemble: predict edge only if **both models agree there is an edge**
final_preds = (gcn_preds & rf_preds).astype(int)

# Prepare submission DataFrame
submission = pd.DataFrame({
    "ID": np.arange(len(final_preds)),
    "Predicted": final_preds
})

# Save final submission
submission_file = "/content/rule_based_submission.csv"
submission.to_csv(submission_file, index=False)

print(f"Rule-based ensemble submission saved to {submission_file}")

Rule-based ensemble submission saved to /content/rule_based_submission.csv


## Meta-classifier

In [5]:
# Combine into meta features
meta_X_train = np.column_stack([gcn_val_probs, rf_val_probs])
meta_y_train = y_val

# Train meta-classifier
meta_clf = LogisticRegression()
meta_clf.fit(meta_X_train, meta_y_train)

# Evaluate on validation set (just for curiosity)
val_preds = meta_clf.predict_proba(meta_X_train)[:, 1]
print(f"Meta-Classifier Validation AUC: {roc_auc_score(meta_y_train, val_preds):.4f}")

Meta-Classifier Validation AUC: 0.8308


In [6]:
# Combine into meta features for test set
meta_X_test = np.column_stack([gcn_probs, rf_probs])

# Predict using meta-classifier
test_preds = meta_clf.predict(meta_X_test)

# Save final submission
submission = pd.DataFrame({
    "ID": np.arange(len(test_preds)),
    "Predicted": test_preds
})

submission.to_csv("/content/meta_submission.csv", index=False)
print("Meta-Classifier predictions saved to /content/meta_submission.csv")

Meta-Classifier predictions saved to /content/meta_submission.csv


## Confidence-Weighted Averaging

In [7]:
gcn_val_auc = roc_auc_score(y_val, gcn_val_probs)
rf_val_auc = roc_auc_score(y_val, rf_val_probs)

print(f"GCN Validation AUC: {gcn_val_auc:.4f}")
print(f"RF Validation AUC: {rf_val_auc:.4f}")

GCN Validation AUC: 0.4995
RF Validation AUC: 0.8309


In [8]:
alpha = gcn_val_auc / (gcn_val_auc + rf_val_auc)
print(f"GCN Weight (α): {alpha:.4f}")

GCN Weight (α): 0.3754


In [9]:
final_probs = alpha * gcn_probs + (1 - alpha) * rf_probs

In [10]:
final_preds = (final_probs > 0.5).astype(int)

submission = pd.DataFrame({
    "ID": np.arange(len(final_preds)),
    "Predicted": final_preds
})

submission.to_csv("/content/confidence_weighted_submission.csv", index=False)
print("Confidence-Weighted Averaging submission saved to /content/confidence_weighted_submission.csv")

Confidence-Weighted Averaging submission saved to /content/confidence_weighted_submission.csv
