In [1]:
import os, sys
sys.path.append(os.path.abspath('../utils'))
import pandas as pd
import pickle

# Comparison of results of each of the net benefit feature selection methods

Load the results of each of the methods:

In [2]:
with open('../data/skl_selection_path.pkl', 'rb') as f:
        skl_selection_path = pickle.load(f)

with open('../data/forward_selection.pkl', 'rb') as f:
        forward_selection = pickle.load(f)

with open('../data/backward_selection.pkl', 'rb') as f:
        backward_selection = pickle.load(f)

with open('../data/backward_selection_pt_0.pkl', 'rb') as f:
        backward_selection_pt_0 = pickle.load(f)

with open('../data/backward_selection_pt_1.pkl', 'rb') as f:
        backward_selection_pt_1 = pickle.load(f)

The weighted regularization path method from [03_net_benefit_regularization_method.ipynb](./03_net_benefit_regularization_method.ipynb) selects $\{x_1, x_3, x_4, x_5 \}$ by mean net benefit and all features selected at $p_t = 0.2 \text{ or } 0.8$.

In [3]:
pd.DataFrame(skl_selection_path["optimal"]).loc[["x0_used", "x1_used", "x2_used", "x3_used", "x4_used"], ["mnb", "net_benefit_pt_0", "net_benefit_pt_1"] ]


Unnamed: 0,mnb,net_benefit_pt_0,net_benefit_pt_1
x0_used,1.0,1.0,1.0
x1_used,1.0,1.0,1.0
x2_used,0.0,1.0,1.0
x3_used,1.0,1.0,1.0
x4_used,1.0,1.0,1.0


The itterative method similar to forward stepwise selection demonstrated in [04_feature_importance_iterative_method.ipynb](./04_feature_importance_iterative_method.ipynb) selects $\{ x_1, x_4\}$:

In [4]:
forward_selection.rename(columns = {"net_benefit_pt_0": "net_benefit p_t=0.8",
                                    "net_benefit_pt_1": "net_benefit p_t=0.2"}).sort_values("mnb")

Unnamed: 0,features,mnb,net_benefit p_t=0.8,net_benefit p_t=0.2
4,"[x1, x4, x3, x0, x2]",0.142756,0.055,0.2575
3,"[x1, x4, x3, x0]",0.172733,0.08,0.28875
2,"[x1, x4, x3]",0.1786,0.09,0.285
0,[x1],0.209659,0.03,0.3475
1,"[x1, x4]",0.219324,0.105,0.32375


The backward stepwise selection method demonstrated in [05_backward_stepwise_method.ipynb](./05_backward_stepwise_method.ipynb) selects $\{ x_1, x_4\}$:

In [5]:
# selected based on mean net benefit
backward_selection.sort_values("nb")

Unnamed: 0,features,nb
1,"[x0, x1, x2, x4]",0.124883
0,"[x0, x1, x2, x3, x4]",0.142756
2,"[x0, x1, x4]",0.180078
4,[x1],0.209659
3,"[x1, x4]",0.210887


In [6]:
# Selected based on a threshold probability of 0.8
backward_selection_pt_0.sort_values("nb")

Unnamed: 0,features,nb
1,"[x0, x1, x3, x4]",-0.09
2,"[x0, x1, x4]",-0.065
4,[x4],0.03
3,"[x1, x4]",0.035
0,"[x0, x1, x2, x3, x4]",0.055


In [7]:
# Selected based on a threshold probability of 0.2
backward_selection_pt_1.sort_values("nb")

Unnamed: 0,features,nb
0,"[x0, x1, x2, x3, x4]",0.2575
1,"[x0, x1, x2, x4]",0.27875
4,[x1],0.29375
2,"[x0, x1, x4]",0.315
3,"[x0, x1]",0.345


### Conclusion

In the example datasets we have applied the net benefit feature selection methods to, the forward and backward selection procedures, based on mean net benefit, select the same features ($\{ x_0, x_4\}$). The weighted LASSO method selects more features (\{x_0, x_1, x_3, x_4 \}). We also not that when a specific probability threshold is used as the basis of feature selection then the results are sensitive to the probability threshold.