# 1. Introduction

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>1.1 Context</b></p>
</div>

Ensembled Model from the two notebooks [Safe Driver Prediction - LightGBM Submission
](https://www.kaggle.com/code/andir16/safe-driver-prediction-lightgbm-submission) and [Safe Driver Prediction - Denoising Autoencoder
](https://www.kaggle.com/code/andir16/safe-driver-prediction-denoising-autoencoder) for the Porto Seguro's Safe Driver Prediction competition.

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>1.2 Used code</b></p>
</div>

The code used is taken from [The Kaggle Workbook](https://www.amazon.com/Kaggle-Workbook-Self-learning-exercises-competitions/dp/1804611212) by [(Banachewicz & Massaron)](#3.-References)

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>1.3 Libraries</b></p>
</div>

In [1]:
import pandas as pd
import numpy as np
from numba import jit

# 2. Implementation

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>2.1 Evaluation Function</b></p>
</div>

In [2]:
@jit
def eval_gini(y_true, y_pred):
    y_true = np.asarray(y_true)
    y_true = y_true[np.argsort(y_pred)]
    ntrue = 0
    gini = 0
    delta = 0
    n = len(y_true)
    for i in range(n-1, -1, -1):
        y_i = y_true[i]
        ntrue += y_i
        gini += y_i * delta
        delta += 1 - y_i
    gini = 1 - 2 * gini / (ntrue * (n - ntrue))
    return gini

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>2.2 Loading the out-of-fold Predictions</b></p>
</div>

In [3]:
lgb_oof = pd.read_csv("../input/safe-driver-prediction-files/lgb_oof.csv")
dnn_oof = pd.read_csv("../input/safe-driver-prediction-files/dnn_oof.csv")
target = pd.read_csv("../input/porto-seguro-safe-driver-prediction/train.csv", usecols=['id','target'])

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>2.3 Blending</b></p>
</div>

In [4]:
lgb_oof_ranks = (lgb_oof.target.rank() / len(lgb_oof))
dnn_oof_ranks = (dnn_oof.target.rank() / len(dnn_oof))

In [5]:
baseline = eval_gini(y_true=target.target, y_pred=lgb_oof_ranks)
print(f"starting from a oof lgb baseline {baseline:0.5f}\n")
best_alpha = 1.0
for alpha in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]:
    ensemble = alpha * lgb_oof_ranks + (1.0 - alpha) * dnn_oof_ranks
    score = eval_gini(y_true=target.target, y_pred=ensemble)
    print(f"lgd={alpha:0.1f} dnn={(1.0 - alpha):0.1f} -> {score:0.5f}")
    
    if score > baseline:
        baseline = score
        best_alpha = alpha
        
print(f"\nBest alpha is {best_alpha:0.1f}")

Compilation is falling back to object mode WITH looplifting enabled because Function "eval_gini" failed type inference due to: [1m[1mnon-precise type pyobject[0m
[0m[1mDuring: typing of argument at /tmp/ipykernel_19/4072306350.py (3)[0m
[1m
File "../../tmp/ipykernel_19/4072306350.py", line 3:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m
  @jit
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "eval_gini" failed type inference due to: [1m[1mCannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>[0m
[1m
File "../../tmp/ipykernel_19/4072306350.py", line 9:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m[0m
  @jit
[1m
File "../../tmp/ipykernel_19/4072306350.py", line 3:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m
  state.func_ir.loc))
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://nu

starting from a oof lgb baseline 0.28812

lgd=0.1 dnn=0.9 -> 0.26765
lgd=0.2 dnn=0.8 -> 0.27266
lgd=0.3 dnn=0.7 -> 0.27711
lgd=0.4 dnn=0.6 -> 0.28092
lgd=0.5 dnn=0.5 -> 0.28403
lgd=0.6 dnn=0.4 -> 0.28638
lgd=0.7 dnn=0.3 -> 0.28795
lgd=0.8 dnn=0.2 -> 0.28875
lgd=0.9 dnn=0.1 -> 0.28880

Best alpha is 0.9


In [6]:
lgb_submission = pd.read_csv("../input/safe-driver-prediction-files/lgb_submission.csv")
dnn_submission = pd.read_csv("../input/safe-driver-prediction-files/dnn_submission.csv")
submission = pd.read_csv(
"../input/porto-seguro-safe-driver-prediction/sample_submission.csv")

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>2.4 50/50 Weight Distribution</b></p>
</div>

In [7]:
lgb_ranks = (lgb_submission.target.rank() / len(lgb_submission))
dnn_ranks = (dnn_submission.target.rank() / len(dnn_submission))
submission.target = lgb_ranks * 0.5 + dnn_ranks * 0.5
submission.to_csv("equal_blend_rank.csv", index=False)

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>2.5 Weight Distribution from out-of-fold Predictions</b></p>
</div>

In [8]:
lgb_ranks = (lgb_submission.target.rank() / len(lgb_submission))
dnn_ranks = (dnn_submission.target.rank() / len(dnn_submission))
submission.target = lgb_ranks * best_alpha +  dnn_ranks * (1.0 - best_alpha)
submission.to_csv("blend_rank.csv", index=False)

# 3. References

<div style="color:white;display:fill;
            background-color:#48AFFF;font-size:160%;
            font-family:Arial">
    <p style="padding: 4px;color:white;"><b>3.1 References</b></p>
</div>

* Banachewicz, Konrad; Massaron, Luca. [The Kaggle Workbook](https://www.amazon.com/Kaggle-Workbook-Self-learning-exercises-competitions/dp/1804611212): Self-learning exercises and valuable insights for Kaggle data science competitions. Packt Publishing. 
* [Safe Driver Prediction - LightGBM Submission
](https://www.kaggle.com/code/andir16/safe-driver-prediction-lightgbm-submission) by [Andreas Renz](https://www.kaggle.com/andir16)
* [Safe Driver Prediction - Denoising Autoencoder
](https://www.kaggle.com/code/andir16/safe-driver-prediction-denoising-autoencoder) by [Andreas Renz](https://www.kaggle.com/andir16)
