# Evolver Loop 2 Analysis: Strategic Direction for Titanic

## Current Status Summary
- **Best CV Score**: 84.17% (exp_000, Gradient Boosting baseline)
- **Target Score**: 1.0 (placeholder - needs clarification)
- **Remaining Submissions**: 9/10
- **Phase**: Early feature engineering, ready for systematic improvement

## Key Findings from Loop 1
1. **Data leakage impact is minimal** (0.23% difference) - preprocessing approach is acceptable
2. **Sex dominates predictions** (46.2% feature importance) - must respect this in feature engineering
3. **Cabin deck letters not significant** (p=0.172) - binary HasCabin is sufficient
4. **Strong survival patterns identified**: Women (74.2%), 1st class (63.0%), HasCabin (66.7%)

## Competitive Intelligence Reviewed
- **Top kernels accessed**: ldfreeman3 framework (99% accuracy), startupsci solutions, alexisbcook tutorial
- **Key techniques identified**: Title extraction, family size features, ensemble methods, hyperparameter tuning
- **Winning patterns**: Gradient boosting + logistic regression ensembles, careful feature engineering

## Next Steps
This notebook will analyze the gap between current performance and competitive benchmarks, then identify specific high-ROI improvements.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings('ignore')

# Load data
train = pd.read_csv('/home/data/train.csv')
test = pd.read_csv('/home/data/test.csv')

print("Dataset shapes:", train.shape, test.shape)
print("\nCurrent CV score: 84.17%")
print("Competitive benchmark: 78-82% on LB (typical for Titanic)")
print("Gap analysis: We're performing at competitive level, but can improve further")

Dataset shapes: (891, 12) (418, 11)

Current CV score: 84.17%
Competitive benchmark: 78-82% on LB (typical for Titanic)
Gap analysis: We're performing at competitive level, but can improve further


In [3]:
# Analyze current feature set from baseline
print("=== Current Feature Analysis ===")
print("\nMissing value patterns:")
print(train.isnull().sum())

print("\n=== Target Distribution ===")
survival_rate = train['Survived'].mean()
print(f"Overall survival rate: {survival_rate:.1%}")

print("\n=== Key Patterns from Loop 1 ===")
print("Women survival: 74.2% (vs men 18.9%)")
print("1st class survival: 63.0% (vs 3rd class 24.2%)")
print("HasCabin survival: 66.7% (vs no cabin 30.0%)")

# Check for additional opportunities
print("\n=== Untapped Opportunities ===")
print("Name titles (Mr, Mrs, Miss, Master, etc.)")
print("Ticket patterns (shared tickets = families?)")
print("Fare per person (Fare / FamilySize)")
print("Age groups (child vs adult vs elderly)")
print("Embarked port differences")

=== Current Feature Analysis ===

Missing value patterns:
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

=== Target Distribution ===
Overall survival rate: 38.4%

=== Key Patterns from Loop 1 ===
Women survival: 74.2% (vs men 18.9%)
1st class survival: 63.0% (vs 3rd class 24.2%)
HasCabin survival: 66.7% (vs no cabin 30.0%)

=== Untapped Opportunities ===
Name titles (Mr, Mrs, Miss, Master, etc.)
Ticket patterns (shared tickets = families?)
Fare per person (Fare / FamilySize)
Age groups (child vs adult vs elderly)
Embarked port differences


In [4]:
# Analyze competitive intelligence gaps
print("=== Gap Analysis: Current vs Competitive ===")
print("\nCurrent features (from baseline):")
print("- Title (Mr, Mrs, Miss, Master, Other)")
print("- FamilySize (SibSp + Parch + 1)")
print("- HasCabin (binary)")
print("- Sex, Pclass, Age, Fare (basic)")

print("\nCompetitive techniques not yet tried:")
print("1. IsAlone flag (FamilySize == 1)")
print("2. FarePerPerson (Fare / FamilySize)")
print("3. Age bins (Child/Teen/Adult/Senior)")
print("4. Ticket prefix extraction")
print("5. Embarked port engineering")
print("6. Hyperparameter tuning")
print("7. Model ensembling (GB + Logistic Regression)")
print("8. Advanced models (XGBoost, LightGBM)")

print("\n=== Priority Ranking (ROI) ===")
print("HIGH: IsAlone, Age bins, Hyperparameter tuning")
print("MEDIUM: FarePerPerson, Embarked engineering")
print("LOW: Ticket prefix (complex, may not help)")
print("ENSEMBLE: GB + Logistic Regression blend")

=== Gap Analysis: Current vs Competitive ===

Current features (from baseline):
- Title (Mr, Mrs, Miss, Master, Other)
- FamilySize (SibSp + Parch + 1)
- HasCabin (binary)
- Sex, Pclass, Age, Fare (basic)

Competitive techniques not yet tried:
1. IsAlone flag (FamilySize == 1)
2. FarePerPerson (Fare / FamilySize)
3. Age bins (Child/Teen/Adult/Senior)
4. Ticket prefix extraction
5. Embarked port engineering
6. Hyperparameter tuning
7. Model ensembling (GB + Logistic Regression)
8. Advanced models (XGBoost, LightGBM)

=== Priority Ranking (ROI) ===
HIGH: IsAlone, Age bins, Hyperparameter tuning
MEDIUM: FarePerPerson, Embarked engineering
LOW: Ticket prefix (complex, may not help)
ENSEMBLE: GB + Logistic Regression blend


# Evolver Loop 2 Analysis: Post-Fix Optimization Strategy

## Current Status (Updated)
- **Best CV Score**: 83.84% (exp_001, XGBoost, clean pipeline)
- **Previous CV**: 84.17% (exp_000, leaky pipeline)  
- **Score drop**: 0.33% (confirms leakage fixed - GOOD!)
- **Pipeline**: Trustworthy (evaluator verified: TRUSTWORTHY)
- **Submissions used**: 0/10
- **Remaining**: 9 submissions

## Key Findings from exp_001
1. **Data leakage FIXED** - sklearn pipelines properly implemented
2. **Model upgraded** - XGBoost with n_estimators=500, max_depth=4, learning_rate=0.05
3. **New features added**: IsAlone, AgeBin, FarePerPerson
4. **Feature importance**: Title_Mr (19.4%), Sex_female (14.9%), Sex_male (10.0%) dominate
5. **Pipeline is production-ready** - no more leakage concerns

In [None]:
# Load updated session state
import json
import pandas as pd
import numpy as np

with open('/home/code/session_state.json', 'r') as f:
    session_state = json.load(f)

print("=== Experiment History ===")
for i, exp in enumerate(session_state['experiments']):
    print(f"{i}. {exp['name']}: {exp['score']:.2f}% ({exp['model_type']})")

print(f"\n=== Submission Status ===")
print(f"Used: {len(session_state['submissions'])}/10")
print(f"Remaining: {session_state['remaining_submissions']}")

print(f"\n=== Feature Importance from exp_001 ===")
print("1. Title_Mr: 19.4%")
print("2. Sex_female: 14.9%")
print("3. Sex_male: 10.0%")
print("4. Pclass_3: 7.7%")
print("5. Title_Other: 5.7%")
print("6. Title_Master: 4.9%")
print("7. FamilySize: 3.6%")
print("8. HasCabin: 3.1%")

print(f"\n=== Key Insights ===")
print(f"- Sex features combined: ~25% importance (dominant)")
print(f"- Title features combined: ~28% importance (critical)")
print(f"- Title_Other at 5.7% suggests splitting may help")
print(f"- Family structure features: ~6% combined")

## 1. CV-LB Gap Analysis: URGENT

We have 9 submissions remaining and NO leaderboard feedback. The evaluator correctly identified this as critical.

**Decision**: Submit exp_001 NOW to establish CV-LB correlation. Benefits:
- Validate trustworthy pipeline on real data
- Determine if CV is optimistic/pessimistic  
- Inform optimization strategy
- Low risk (1 submission of 9 remaining)

**Risk assessment**: Current model is trustworthy and competitive. Worth using 1 submission.

**Expected LB score**: Based on typical Titanic competition:
- If CV is optimistic: 81-83% LB
- If CV is realistic: 83-85% LB
- If CV is pessimistic: 85-87% LB