# 🧪 Final Spell: Ensembling the Magic Potions 🧙‍♂️✨
After crafting multiple powerful models — each with its own strengths — it’s time to blend their wisdom. Ensembling helps us reduce variance, smooth out individual model errors, and often leads to better generalization.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

## 🧩 What’s in the Mix?

In [2]:
sub1 = pd.read_csv('/kaggle/input/1006-amini-soil-no-log/lgbm_submission_no_log.csv')
sub2 = pd.read_csv('/kaggle/input/1020-lgbm-amini-soil/lgbm-new-sub.csv')
sub3 = pd.read_csv('/kaggle/input/1036-sub-amini-soil/submission7 (1).csv')
sub4 = pd.read_csv('/kaggle/input/1031-sub-amini-soil-rf/submission-rf5.csv')

In [3]:
sub1

Unnamed: 0,ID,Gap
0,ID_NGS9Bx_N,-3890.073357
1,ID_NGS9Bx_P,21.454337
2,ID_NGS9Bx_K,-329.416736
3,ID_NGS9Bx_Ca,-12450.208686
4,ID_NGS9Bx_Mg,-3851.013127
...,...,...
26593,ID_oMn2Yb_Fe,-384.605610
26594,ID_oMn2Yb_Mn,-395.487203
26595,ID_oMn2Yb_Zn,-10.898126
26596,ID_oMn2Yb_Cu,-4.304426


In [4]:
sub2 

Unnamed: 0,ID,Gap
0,ID_NGS9Bx_N,-3282.220573
1,ID_NGS9Bx_P,34.322758
2,ID_NGS9Bx_K,-234.209918
3,ID_NGS9Bx_Ca,-12538.606794
4,ID_NGS9Bx_Mg,-3612.015590
...,...,...
26593,ID_oMn2Yb_Fe,-356.691786
26594,ID_oMn2Yb_Mn,-325.503982
26595,ID_oMn2Yb_Zn,-8.902568
26596,ID_oMn2Yb_Cu,-3.918015


In [5]:
sub3

Unnamed: 0,ID,Gap
0,ID_NGS9Bx_N,-3668.840000
1,ID_NGS9Bx_P,17.153920
2,ID_NGS9Bx_K,-351.728000
3,ID_NGS9Bx_Ca,-15872.114400
4,ID_NGS9Bx_Mg,-4140.016000
...,...,...
26593,ID_oMn2Yb_Fe,-416.284960
26594,ID_oMn2Yb_Mn,-380.442080
26595,ID_oMn2Yb_Zn,-9.044672
26596,ID_oMn2Yb_Cu,-4.142488


In [6]:
sub4

Unnamed: 0,ID,Gap
0,ID_NGS9Bx_N,-3968.000000
1,ID_NGS9Bx_P,13.772320
2,ID_NGS9Bx_K,-397.088000
3,ID_NGS9Bx_Ca,-12913.344000
4,ID_NGS9Bx_Mg,-3729.328000
...,...,...
26593,ID_oMn2Yb_Fe,-418.660720
26594,ID_oMn2Yb_Mn,-435.567920
26595,ID_oMn2Yb_Zn,-19.269560
26596,ID_oMn2Yb_Cu,-5.012992


## 🎯 Strategy Behind the Weights

In [7]:
# First blend: combine two LightGBM variants equally

lgbm_sub = sub1.copy()
lgbm_sub.Gap = (sub1.Gap * 0.5) + (sub2.Gap * 0.5)

In [8]:
lgbm_sub

Unnamed: 0,ID,Gap
0,ID_NGS9Bx_N,-3586.146965
1,ID_NGS9Bx_P,27.888548
2,ID_NGS9Bx_K,-281.813327
3,ID_NGS9Bx_Ca,-12494.407740
4,ID_NGS9Bx_Mg,-3731.514358
...,...,...
26593,ID_oMn2Yb_Fe,-370.648698
26594,ID_oMn2Yb_Mn,-360.495592
26595,ID_oMn2Yb_Zn,-9.900347
26596,ID_oMn2Yb_Cu,-4.111220


>✅ Why? These two LGBM submissions likely have similar modeling pipelines but might capture slightly different interactions or hyperparameter sweet spots. A 50/50 blend smooths their differences. You will ask but you said sub1 is the best of the best? Why then are you equally giving weights to both sub1 and sub2? Honestly, the public LB and the private LB sometimes tell different stories, so I was just playing it safe😜

In [9]:
# First blend: combine two random forest variants equally

rf_sub = sub3.copy()
rf_sub.Gap = (sub3.Gap * 0.5) + (sub4.Gap * 0.5)

In [10]:
# Final blend: bring in sub3 with 35% weight

final_sub = lgbm_sub.copy()
final_sub.Gap = ((lgbm_sub.Gap * 0.65) + (rf_sub.Gap * 0.35))

>🧠 Why 65/35? Sub3 might have demonstrated better leaderboard or cross-validation performance in some areas. Giving it a 35% say in the final blend helps inject diversity and correction — without overpowering the stable LGBM foundation.

In [11]:
final_sub

Unnamed: 0,ID,Gap
0,ID_NGS9Bx_N,-3667.442527
1,ID_NGS9Bx_P,23.539648
2,ID_NGS9Bx_K,-314.221463
3,ID_NGS9Bx_Ca,-13158.820251
4,ID_NGS9Bx_Mg,-3802.619533
...,...,...
26593,ID_oMn2Yb_Fe,-387.037148
26594,ID_oMn2Yb_Mn,-377.123885
26595,ID_oMn2Yb_Zn,-11.390216
26596,ID_oMn2Yb_Cu,-4.274502


In [12]:
final_sub.to_csv('final_lgbm_rf_sub.csv', index=False)

## Don't Just Have a Good Day, Have A Great Day!
# THE END!