# Evolver Loop 3: Winning Solution Analysis & Strategy Refinement

## Summary of Findings

**Current Status:**
- Best CV: 0.3311 (baseline)
- Last experiment: 0.3217 (REGRESSION of -0.0094)
- Target: 0.3865 (gap: 0.0554)

**What Failed:**
- Binning numerical features (10 bins) destroyed information
- Target encoding had implementation issues (2D array flattening)
- Interactions may have added noise
- No hyperparameter tuning for new feature types

**What Winners Did:**
- Treat ALL features as categorical (no binning for low-cardinality)
- Use shallower trees (depth 7-8 for categorical vs 16-18 for numerical)
- Extensive target encoding on feature combinations
- 50-350 OOF models with stacking/hill climbing
- Original data augmentation (4:1 weighting)

**Key Insight:** Pure categorical treatment (convert to category dtype) preserves ordinal relationships while allowing tree models to find optimal splits.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder
import xgboost as xgb
import lightgbm as lgb
from sklearn.metrics import average_precision_score
import warnings
warnings.filterwarnings('ignore')

# Load data
train = pd.read_csv('/home/data/train.csv')
test = pd.read_csv('/home/data/test.csv')

print("Dataset shapes:")
print(f"Train: {train.shape}")
print(f"Test: {test.shape}")

# Check cardinality of numerical features
numerical_cols = ['Temparature', 'Humidity', 'Moisture', 'Nitrogen', 'Potassium', 'Phosphorous']
print("\nCardinality of numerical features:")
for col in numerical_cols:
    unique_vals = train[col].nunique()
    print(f"{col}: {unique_vals} unique values")
    
print("\nTarget distribution:")
print(train['Fertilizer Name'].value_counts())

Dataset shapes:
Train: (750000, 10)
Test: (250000, 9)

Cardinality of numerical features:
Temparature: 14 unique values
Humidity: 23 unique values
Moisture: 41 unique values
Nitrogen: 39 unique values
Potassium: 20 unique values
Phosphorous: 43 unique values

Target distribution:
Fertilizer Name
14-35-14    114436
10-26-26    113887
17-17-17    112453
28-28       111158
20-20       110889
DAP          94860
Urea         92317
Name: count, dtype: int64


## 1. Pure Categorical Treatment (No Binning)

In [2]:
# Convert all features to categorical WITHOUT binning
# This preserves the original values as categories

cat_train = train.copy()
cat_test = test.copy()

# Convert numerical features to categorical using original values as strings
for col in numerical_cols:
    cat_train[col] = cat_train[col].astype(str)
    cat_test[col] = cat_test[col].astype(str)

# Label encode categorical features
cat_features = ['Soil Type', 'Crop Type']
for col in cat_features:
    le = LabelEncoder()
    cat_train[col] = le.fit_transform(cat_train[col])
    cat_test[col] = le.transform(cat_test[col])

print("Feature types after processing:")
print(cat_train.dtypes)

# Prepare data for modeling
X = cat_train.drop(['Fertilizer Name'], axis=1)  # Train has no Id column
y = cat_train['Fertilizer Name']
X_test = cat_test.copy()  # Test has no Id column either

# Encode target to integers
le_target = LabelEncoder()
y_encoded = le_target.fit_transform(y)

print(f"\nTraining data shape: {X.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Target classes: {le_target.classes_}")
print("\nSample of processed data:")
print(X.head())

Feature types after processing:
id                  int64
Temparature        object
Humidity           object
Moisture           object
Soil Type           int64
Crop Type           int64
Nitrogen           object
Potassium          object
Phosphorous        object
Fertilizer Name    object
dtype: object

Training data shape: (750000, 9)
Test data shape: (250000, 9)

Sample of processed data:
   id Temparature Humidity Moisture  Soil Type  Crop Type Nitrogen Potassium  \
0   0          37       70       36          1          8       36         4   
1   1          27       69       65          4          4       30         6   
2   2          29       63       32          4          4       24        12   
3   3          35       62       54          4          0       39        12   
4   4          35       58       43          3          6       37         2   

  Phosphorous  
0           5  
1          18  
2          16  
3           4  
4          16  


## 2. Test Pure Categorical Approach

In [None]:
# Quick test with XGBoost (depth=7, as per winning solutions)

# Stratified 5-fold CV - split on ENCODED labels
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

fold_scores = []

# Get number of classes
n_classes = len(y.unique())
print(f"Number of classes: {n_classes}")

# Encode target labels to integers (do this once, outside the loop)
le_target = LabelEncoder()
y_encoded = le_target.fit_transform(y)
print(f"Target classes: {le_target.classes_}")
print(f"y_encoded type: {type(y_encoded)}")
print(f"y_encoded shape: {y_encoded.shape}")
print(f"y_encoded sample: {y_encoded[:10]}")

# Convert string features to category dtype for XGBoost
X_cat = X.copy()
for col in numerical_cols:
    X_cat[col] = X_cat[col].astype('category')

print("\nRunning 5-fold CV with pure categorical features...")

for fold, (train_idx, val_idx) in enumerate(skf.split(X_cat, y_encoded)):
    X_train, X_val = X_cat.iloc[train_idx], X_cat.iloc[val_idx]
    y_train, y_val = y_encoded[train_idx], y_encoded[val_idx]
    
    print(f"Fold {fold+1}: y_train type={type(y_train)}, unique={np.unique(y_train)[:5]}")
    
    # XGBoost with categorical support
    model = xgb.XGBClassifier(
        objective='multi:softprob',
        eval_metric='mlogloss',
        tree_method='hist',
        enable_categorical=True,  # Enable native categorical support
        max_depth=7,
        learning_rate=0.05,
        n_estimators=500,
        subsample=0.8,
        colsample_bytree=0.8,
        random_state=42,
        device='cuda'
    )
    
    model.fit(X_train, y_train, 
              eval_set=[(X_val, y_val)],
              verbose=False)
    
    # Predict probabilities
    val_pred = model.predict_proba(X_val)
    
    # Calculate MAP@3 for this fold
    from sklearn.metrics import label_ranking_average_precision_score
    
    # Convert to binary relevance matrix for MAP@3 calculation
    y_val_bin = pd.get_dummies(y_val).values
    fold_map3 = label_ranking_average_precision_score(y_val_bin, val_pred)
    fold_scores.append(fold_map3)
    
    print(f"Fold {fold+1}: MAP@3 = {fold_map3:.4f}")

print(f"\nCV MAP@3: {np.mean(fold_scores):.4f} ± {np.std(fold_scores):.4f}")
print(f"Fold scores: {fold_scores}")

# Compare to baseline
baseline_score = 0.3311
improvement = np.mean(fold_scores) - baseline_score
print(f"\nImprovement over baseline: {improvement:.4f}")

if improvement > 0:
    print("✅ Pure categorical treatment IMPROVES performance!")
else:
    print("❌ Pure categorical treatment does NOT improve performance")

## 3. Hyperparameter Tuning for Categorical Features

In [None]:
# Test different depths and learning rates
# Winning solutions found depth 7-8 optimal for categorical treatment

param_grid = {
    'max_depth': [6, 7, 8],
    'learning_rate': [0.03, 0.05, 0.07]
}

results = []

for depth in param_grid['max_depth']:
    for lr in param_grid['learning_rate']:
        fold_scores = []
        
        for fold, (train_idx, val_idx) in enumerate(skf.split(X, y)):
            X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
            y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
            
            model = xgb.XGBClassifier(
                objective='multi:softprob',
                eval_metric='mlogloss',
                tree_method='hist',
                enable_categorical=True,
                max_depth=depth,
                learning_rate=lr,
                n_estimators=500,
                subsample=0.8,
                colsample_bytree=0.8,
                random_state=42,
                device='cuda'
            )
            
            model.fit(X_train, y_train, eval_set=[(X_val, y_val)], verbose=False)
            val_pred = model.predict_proba(X_val)
            
            y_val_bin = pd.get_dummies(y_val).values
            fold_map3 = label_ranking_average_precision_score(y_val_bin, val_pred)
            fold_scores.append(fold_map3)
        
        cv_score = np.mean(fold_scores)
        results.append({
            'max_depth': depth,
            'learning_rate': lr,
            'cv_score': cv_score,
            'std': np.std(fold_scores)
        })
        
        print(f"Depth {depth}, LR {lr}: CV = {cv_score:.4f} ± {np.std(fold_scores):.4f}")

# Find best parameters
results_df = pd.DataFrame(results)
best_params = results_df.loc[results_df['cv_score'].idxmax()]

print(f"\nBest parameters: {best_params.to_dict()}")
print(f"Best CV score: {best_params['cv_score']:.4f}")

## 4. Key Insights & Recommendations

### What Works:
1. **Pure categorical treatment** (convert to category dtype, no binning)
2. **Shallower trees** (depth 7-8) for categorical features
3. **XGBoost native categorical support** (enable_categorical=True)
4. **Stratified 5-fold CV** (already correct)

### What Doesn't Work:
1. ❌ **Binning low-cardinality features** (destroys ordinal information)
2. ❌ **Target encoding without proper implementation** (2D array issues)
3. ❌ **Adding interactions without validation** (adds noise)
4. ❌ **No hyperparameter tuning** (suboptimal performance)

### Next Steps:
1. Implement pure categorical treatment in exp_003
2. Tune hyperparameters (depth 6-8, lr 0.03-0.07)
3. Add CatBoost with native categorical support
4. If successful (>0.340), carefully add back target encoding
5. Then add selective interactions if needed
6. Finally, build diverse ensemble for stacking

### Expected Improvement:
- Conservative: 0.340 (+0.009 from baseline)
- Optimistic: 0.350 (+0.019 from baseline)
- With proper ensembling: 0.360-0.370
- Target: 0.3865 (requires full stacking pipeline)