# Loop 4 Analysis: XGBoost vs RF Predictions

## Key Question
exp_003 (XGBoost) uses same features as exp_000 (RF, best LB 0.7799).
- How many predictions differ?
- Which passengers changed?
- Should we submit exp_003 or create an ensemble?

In [1]:
import pandas as pd
import numpy as np

# Load all candidate predictions
exp_000 = pd.read_csv('/home/code/submission_candidates/candidate_000.csv')  # RF with Age - BEST LB 0.7799
exp_002 = pd.read_csv('/home/code/submission_candidates/candidate_002.csv')  # RF no Age - LB 0.7703
exp_003 = pd.read_csv('/home/code/submission_candidates/candidate_003.csv')  # XGBoost with Age

# Load test data for analysis
test = pd.read_csv('/home/data/test.csv')

print("Prediction distributions:")
print(f"exp_000 (RF, best LB): {exp_000['Survived'].value_counts().to_dict()}")
print(f"exp_002 (RF no Age):   {exp_002['Survived'].value_counts().to_dict()}")
print(f"exp_003 (XGBoost):     {exp_003['Survived'].value_counts().to_dict()}")

Prediction distributions:
exp_000 (RF, best LB): {0: 264, 1: 154}
exp_002 (RF no Age):   {0: 264, 1: 154}
exp_003 (XGBoost):     {0: 273, 1: 145}


In [2]:
# Compare exp_000 vs exp_003 (XGBoost)
diff_003 = exp_000['Survived'] != exp_003['Survived']
print(f"\nexp_000 vs exp_003 (XGBoost):")
print(f"  Predictions that differ: {diff_003.sum()} ({diff_003.sum()/len(diff_003)*100:.1f}%)")
print(f"  Predictions that agree: {(~diff_003).sum()} ({(~diff_003).sum()/len(diff_003)*100:.1f}%)")

# Direction of changes
changed_to_0 = ((exp_000['Survived'] == 1) & (exp_003['Survived'] == 0)).sum()
changed_to_1 = ((exp_000['Survived'] == 0) & (exp_003['Survived'] == 1)).sum()
print(f"\n  Changed 1→0 (survived→died): {changed_to_0}")
print(f"  Changed 0→1 (died→survived): {changed_to_1}")


exp_000 vs exp_003 (XGBoost):
  Predictions that differ: 27 (6.5%)
  Predictions that agree: 391 (93.5%)

  Changed 1→0 (survived→died): 18
  Changed 0→1 (died→survived): 9


In [3]:
# Analyze which passengers changed between exp_000 and exp_003
changed_idx = exp_000[diff_003]['PassengerId'].values
print(f"\nPassengers with different predictions (exp_000 vs exp_003):")
print(f"PassengerIds: {changed_idx}")

# Get their characteristics
changed_passengers = test[test['PassengerId'].isin(changed_idx)].copy()
changed_passengers['exp_000_pred'] = exp_000[diff_003]['Survived'].values
changed_passengers['exp_003_pred'] = exp_003[diff_003]['Survived'].values

print(f"\nCharacteristics of changed passengers:")
print(changed_passengers[['PassengerId', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'exp_000_pred', 'exp_003_pred']].to_string())


Passengers with different predictions (exp_000 vs exp_003):
PassengerIds: [ 896  898  910  911  920  933  967  982 1010 1023 1050 1051 1057 1073
 1094 1098 1117 1126 1141 1175 1183 1205 1215 1239 1251 1259 1274]

Characteristics of changed passengers:
     PassengerId  Pclass     Sex   Age  SibSp  Parch      Fare  exp_000_pred  exp_003_pred
4            896       3  female  22.0      1      1   12.2875             1             0
6            898       3  female  30.0      0      0    7.6292             1             0
18           910       3  female  27.0      1      0    7.9250             0             1
19           911       3  female  45.0      0      0    7.2250             1             0
28           920       1    male  41.0      0      0   30.5000             0             1
41           933       1    male   NaN      0      0   26.5500             0             1
75           967       1    male  32.5      0      0  211.5000             0             1
90           982   

In [4]:
# Analyze patterns in changed predictions
print("\nAnalysis of changed predictions:")
print("\nBy Sex:")
print(changed_passengers.groupby('Sex').size())

print("\nBy Pclass:")
print(changed_passengers.groupby('Pclass').size())

print("\nBy direction of change:")
for idx, row in changed_passengers.iterrows():
    direction = "1→0" if row['exp_000_pred'] == 1 else "0→1"
    print(f"  {row['PassengerId']}: {row['Sex']}, Pclass={row['Pclass']}, Age={row['Age']}, Fare={row['Fare']:.2f} - {direction}")


Analysis of changed predictions:

By Sex:
Sex
female    17
male      10
dtype: int64

By Pclass:
Pclass
1    10
3    17
dtype: int64

By direction of change:
  896: female, Pclass=3, Age=22.0, Fare=12.29 - 1→0
  898: female, Pclass=3, Age=30.0, Fare=7.63 - 1→0
  910: female, Pclass=3, Age=27.0, Fare=7.92 - 0→1
  911: female, Pclass=3, Age=45.0, Fare=7.22 - 1→0
  920: male, Pclass=1, Age=41.0, Fare=30.50 - 0→1
  933: male, Pclass=1, Age=nan, Fare=26.55 - 0→1
  967: male, Pclass=1, Age=32.5, Fare=211.50 - 0→1
  982: female, Pclass=3, Age=22.0, Fare=13.90 - 1→0
  1010: male, Pclass=1, Age=36.0, Fare=75.24 - 0→1
  1023: male, Pclass=1, Age=53.0, Fare=28.50 - 1→0
  1050: male, Pclass=1, Age=42.0, Fare=26.55 - 0→1
  1051: female, Pclass=3, Age=26.0, Fare=13.78 - 1→0
  1057: female, Pclass=3, Age=26.0, Fare=22.02 - 1→0
  1073: male, Pclass=1, Age=37.0, Fare=83.16 - 0→1
  1094: male, Pclass=1, Age=47.0, Fare=227.53 - 1→0
  1098: female, Pclass=3, Age=35.0, Fare=7.75 - 1→0
  1117: female, Pcla

In [5]:
# Create ensemble predictions
print("\n" + "="*60)
print("ENSEMBLE OPTIONS")
print("="*60)

# Option 1: Majority vote of all 3
ensemble_vote = ((exp_000['Survived'] + exp_002['Survived'] + exp_003['Survived']) >= 2).astype(int)
print(f"\nOption 1: Majority vote (exp_000, exp_002, exp_003)")
print(f"  Distribution: {pd.Series(ensemble_vote).value_counts().to_dict()}")
print(f"  Differs from exp_000: {(ensemble_vote != exp_000['Survived']).sum()}")

# Option 2: Weighted toward exp_000 (best LB)
# If exp_000 and exp_003 agree, use that; otherwise use exp_000
ensemble_weighted = exp_000['Survived'].copy()
print(f"\nOption 2: Favor exp_000 (best LB) when disagreement")
print(f"  Same as exp_000 by design")

# Option 3: Use exp_003 only where all 3 agree
all_agree = (exp_000['Survived'] == exp_002['Survived']) & (exp_002['Survived'] == exp_003['Survived'])
ensemble_conservative = exp_000['Survived'].copy()
print(f"\nOption 3: Use exp_000 unless all 3 agree")
print(f"  All 3 agree on: {all_agree.sum()} predictions ({all_agree.sum()/len(all_agree)*100:.1f}%)")


ENSEMBLE OPTIONS

Option 1: Majority vote (exp_000, exp_002, exp_003)
  Distribution: {0: 264, 1: 154}
  Differs from exp_000: 2

Option 2: Favor exp_000 (best LB) when disagreement
  Same as exp_000 by design

Option 3: Use exp_000 unless all 3 agree
  All 3 agree on: 385 predictions (92.1%)


In [6]:
# Key insight: How different is XGBoost from RF?
print("\n" + "="*60)
print("KEY INSIGHTS")
print("="*60)

print(f"\n1. exp_003 (XGBoost) differs from exp_000 (RF) by {diff_003.sum()} predictions ({diff_003.sum()/len(diff_003)*100:.1f}%)")
print(f"   - This is significant enough to potentially change LB")
print(f"   - XGBoost predicts fewer survivors (145 vs 154)")

print(f"\n2. exp_002 (RF no Age) differs from exp_000 by only 8 predictions (1.9%)")
print(f"   - Those 8 changes HURT LB by ~1% (0.7799→0.7703)")

print(f"\n3. XGBoost gives lower importance to Age (3.3% vs 8% for RF)")
print(f"   - This might help with distribution shift")
print(f"   - But we learned that Age is IMPORTANT for LB")

print(f"\n4. DECISION FACTORS:")
print(f"   - exp_003 is a reasonable test: same features as best LB, different model")
print(f"   - Risk: XGBoost's changes might hurt like exp_002's did")
print(f"   - Opportunity: XGBoost might capture different patterns")
print(f"   - With 2 submissions left, we can test exp_003 and still have 1 for final attempt")


KEY INSIGHTS

1. exp_003 (XGBoost) differs from exp_000 (RF) by 27 predictions (6.5%)
   - This is significant enough to potentially change LB
   - XGBoost predicts fewer survivors (145 vs 154)

2. exp_002 (RF no Age) differs from exp_000 by only 8 predictions (1.9%)
   - Those 8 changes HURT LB by ~1% (0.7799→0.7703)

3. XGBoost gives lower importance to Age (3.3% vs 8% for RF)
   - This might help with distribution shift
   - But we learned that Age is IMPORTANT for LB

4. DECISION FACTORS:
   - exp_003 is a reasonable test: same features as best LB, different model
   - Risk: XGBoost's changes might hurt like exp_002's did
   - Opportunity: XGBoost might capture different patterns
   - With 2 submissions left, we can test exp_003 and still have 1 for final attempt


In [7]:
# Final recommendation
print("\n" + "="*60)
print("RECOMMENDATION")
print("="*60)

print("""
1. DO NOT submit exp_003 yet - the 27 prediction changes are risky
   - exp_002 changed only 8 predictions and HURT LB by 1%
   - exp_003 changes 27 predictions - could hurt more

2. BETTER APPROACH: Create a more conservative ensemble
   - Use exp_000 as base (best LB)
   - Only change predictions where XGBoost has HIGH confidence
   - Or: Try hyperparameter tuning on exp_000's RF

3. ALTERNATIVE: Submit exp_003 to test XGBoost
   - If LB improves: XGBoost is better, continue with it
   - If LB worsens: Stick with RF, try other improvements
   - This uses 1 submission but gives valuable information

4. REALISTIC GOAL: Beat 0.7799 (current best)
   - Target of 1.0 is impossible
   - Top Kaggle scores are 80-82%
   - We're at 78%, need ~2% improvement
""")


RECOMMENDATION

1. DO NOT submit exp_003 yet - the 27 prediction changes are risky
   - exp_002 changed only 8 predictions and HURT LB by 1%
   - exp_003 changes 27 predictions - could hurt more

2. BETTER APPROACH: Create a more conservative ensemble
   - Use exp_000 as base (best LB)
   - Only change predictions where XGBoost has HIGH confidence
   - Or: Try hyperparameter tuning on exp_000's RF

3. ALTERNATIVE: Submit exp_003 to test XGBoost
   - If LB improves: XGBoost is better, continue with it
   - If LB worsens: Stick with RF, try other improvements
   - This uses 1 submission but gives valuable information

4. REALISTIC GOAL: Beat 0.7799 (current best)
   - Target of 1.0 is impossible
   - Top Kaggle scores are 80-82%
   - We're at 78%, need ~2% improvement



In [8]:
# Save ensemble prediction for potential use
ensemble_df = pd.DataFrame({
    'PassengerId': exp_000['PassengerId'],
    'Survived': ensemble_vote
})
ensemble_df.to_csv('/home/code/submission_candidates/candidate_ensemble.csv', index=False)
print("Saved ensemble prediction to candidate_ensemble.csv")
print(f"Ensemble differs from exp_000 by: {(ensemble_vote != exp_000['Survived']).sum()} predictions")

Saved ensemble prediction to candidate_ensemble.csv
Ensemble differs from exp_000 by: 2 predictions
