# Manually Feature Elimination
удаляем features по результатам submit'ов для $\chi^2$ отбора
1. Отбросим те features из `top20chi2`, которые понижают score на submit'е
   - plus column `Value`
2. Отбросим те features из `top20chi2`, которые не повышают score на submit'е
   - plus column `Value`
3. Отбросим все features with `sum`
4. Оставим только те features, которые увеличивают score по сравнению с предыдущим результатом

In [1]:
import numpy  as np
import pandas as pd

In [2]:
df_trn = pd.read_csv('../data/train-agg-cut.csv')
df_tst = pd.read_csv('../data/test-agg-cut.csv')
df_sbm = pd.read_csv('../data/sample_submission.csv')

In [3]:
X_trn = df_trn.drop(columns=['FraudResult'], axis=1)
y_trn = df_trn['FraudResult']

X_tst = df_tst

In [4]:
import os
from collections import Counter
from sklearn.ensemble import BaggingClassifier

In [5]:
def prediction(X_trn, y_trn, X_tst, name):
    classifier = BaggingClassifier(n_estimators=1000, n_jobs=-1)
    predict = classifier.fit(X_trn, y_trn).predict(X_tst)
    print('Predict = ', Counter(predict))
    df_sbm['FraudResult'] = predict

    # определяем был ли ранее точно такой же результат
    current_subm_set = set(df_sbm[df_sbm['FraudResult'] == 1].index.tolist())
    
    # просматриваем все файлы в папке submitted
    is_exist = False
    files = os.listdir('../submitted')
    files.sort()
    for f in files:
        f_csv = pd.read_csv('../submitted/' + f)
        if set(f_csv[f_csv['FraudResult'] == 1].index.tolist()) == current_subm_set:
            print('It is the same as in: ' + f)
            is_exist = True
    if not is_exist:
        print('New result! Write it')
        df_sbm.to_csv('../submitted/AlBo0807_Manually_Feature_Elimination_'+name+'.csv', encoding='utf-8', index=False)

In [6]:
top20chi2 = [
    'AmountPositive',                                                           #01
    'Value',                                                                    #02
    'account_product_transactions__AmountPositive_global_sum',                  #03
    'account_provider_transactions__AmountPositive_global_avg',                 #04
    'account_product_category_transactions__AmountPositive_global_sum',         #05
    'account_provider_transactions__Value_global_avg',                          #06
    'account_provider_transactions__AmountPositive_global_sum',                 #07
    'account_channel_transactions__AmountPositive_global_sum',                  #08
    'account_product_transactions__AmountPositive_global_avg',                  #09
    'account_transactions__AmountPositive_global_sum',                          #10
    'account_pricing_strategy_transactions__AmountPositive_global_sum',         #11
    'account_product_category_transactions__AmountPositive_global_avg',         #12
    'account_product_transactions__AmountPositive_week_sum',                    #13
    'account_provider_transactions__AmountPositive_week_avg',                   #14
    'account_pricing_strategy_transactions__AmountPositive_global_avg',         #15
    'account_product_transactions__Value_global_avg',                           #16
    'account_product_category_transactions__Value_global_avg',                  #17
    'account_channel_transactions__AmountPositive_global_avg',                  #18
    'account_transactions__AmountPositive_global_avg',                          #19
    'account_provider_transactions__AmountPositive_week_sum'                    #20
]

### Version #1
Отбросим те features из `top20chi2`, которые понижают score на submit'е

In [7]:
top20chi2_ver1 = [
    'AmountPositive',                                                           #01
#     'Value',                                                                    #02
    'account_product_transactions__AmountPositive_global_sum',                  #03
    'account_provider_transactions__AmountPositive_global_avg',                 #04
#     'account_product_category_transactions__AmountPositive_global_sum',         #05
    'account_provider_transactions__Value_global_avg',                          #06
    'account_provider_transactions__AmountPositive_global_sum',                 #07
    'account_channel_transactions__AmountPositive_global_sum',                  #08
#     'account_product_transactions__AmountPositive_global_avg',                  #09
    'account_transactions__AmountPositive_global_sum',                          #10
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',         #11
#     'account_product_category_transactions__AmountPositive_global_avg',         #12
    'account_product_transactions__AmountPositive_week_sum',                    #13
#     'account_provider_transactions__AmountPositive_week_avg',                   #14
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',         #15
#     'account_product_transactions__Value_global_avg',                           #16
#     'account_product_category_transactions__Value_global_avg',                  #17
    'account_channel_transactions__AmountPositive_global_avg',                  #18
#     'account_transactions__AmountPositive_global_avg',                          #19
#     'account_provider_transactions__AmountPositive_week_sum'                    #20
]

In [9]:
prediction(X_trn[top20chi2_ver1], y_trn, X_tst[top20chi2_ver1], 'ver1')

Predict =  Counter({0: 44947, 1: 72})
It is the same as in: AlBo0726_top18chi2_BaggingClassifier.csv


**Result:** `0.827586206896552` (same `top18chi2`)

### Version #1 with column `Value`
Отбросим те features из `top20chi2`, которые понижают score на submit'е, plus column `Value`

In [10]:
top20chi2_ver1_Value = [
    'AmountPositive',                                                           #01
    'Value',                                                                    #02
    'account_product_transactions__AmountPositive_global_sum',                  #03
    'account_provider_transactions__AmountPositive_global_avg',                 #04
#     'account_product_category_transactions__AmountPositive_global_sum',         #05
    'account_provider_transactions__Value_global_avg',                          #06
    'account_provider_transactions__AmountPositive_global_sum',                 #07
    'account_channel_transactions__AmountPositive_global_sum',                  #08
#     'account_product_transactions__AmountPositive_global_avg',                  #09
    'account_transactions__AmountPositive_global_sum',                          #10
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',         #11
#     'account_product_category_transactions__AmountPositive_global_avg',         #12
    'account_product_transactions__AmountPositive_week_sum',                    #13
#     'account_provider_transactions__AmountPositive_week_avg',                   #14
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',         #15
#     'account_product_transactions__Value_global_avg',                           #16
#     'account_product_category_transactions__Value_global_avg',                  #17
    'account_channel_transactions__AmountPositive_global_avg',                  #18
#     'account_transactions__AmountPositive_global_avg',                          #19
#     'account_provider_transactions__AmountPositive_week_sum'                    #20
]

In [11]:
prediction(X_trn[top20chi2_ver1_Value], y_trn, X_tst[top20chi2_ver1_Value], 'ver1_Value')

Predict =  Counter({0: 44946, 1: 73})
It is the same as in: AlBo0724_top20chi2_BaggingClassifier.csv


**Result:** `0.813559322033898` (same `top20chi2`, worse than `top18chi2`)

### Version #2
Отбросим те features из `top20chi2`, которые не повышают score на submit'е

In [12]:
top20chi2_ver2 = [
    'AmountPositive',                                                           #01
#     'Value',                                                                    #02
    'account_product_transactions__AmountPositive_global_sum',                  #03
    'account_provider_transactions__AmountPositive_global_avg',                 #04
#     'account_product_category_transactions__AmountPositive_global_sum',         #05
#     'account_provider_transactions__Value_global_avg',                          #06
#     'account_provider_transactions__AmountPositive_global_sum',                 #07
#     'account_channel_transactions__AmountPositive_global_sum',                  #08
#     'account_product_transactions__AmountPositive_global_avg',                  #09
#     'account_transactions__AmountPositive_global_sum',                          #10
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',         #11
#     'account_product_category_transactions__AmountPositive_global_avg',         #12
    'account_product_transactions__AmountPositive_week_sum',                    #13
#     'account_provider_transactions__AmountPositive_week_avg',                   #14
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',         #15
#     'account_product_transactions__Value_global_avg',                           #16
#     'account_product_category_transactions__Value_global_avg',                  #17
    'account_channel_transactions__AmountPositive_global_avg',                  #18
#     'account_transactions__AmountPositive_global_avg',                          #19
#     'account_provider_transactions__AmountPositive_week_sum'                    #20
]

In [13]:
prediction(X_trn[top20chi2_ver2], y_trn, X_tst[top20chi2_ver2], 'ver2')

Predict =  Counter({0: 44946, 1: 73})
New result! Write it


**Result:** `0.827586206896552` (same `top18chi2` and `ver1`)

### Version #2 with column `Value`
Отбросим те features из `top20chi2`, которые не повышают score на submit'е, plus column `Value`

In [14]:
top20chi2_ver2_Value = [
    'AmountPositive',                                                           #01
    'Value',                                                                    #02
    'account_product_transactions__AmountPositive_global_sum',                  #03
    'account_provider_transactions__AmountPositive_global_avg',                 #04
#     'account_product_category_transactions__AmountPositive_global_sum',         #05
#     'account_provider_transactions__Value_global_avg',                          #06
#     'account_provider_transactions__AmountPositive_global_sum',                 #07
#     'account_channel_transactions__AmountPositive_global_sum',                  #08
#     'account_product_transactions__AmountPositive_global_avg',                  #09
#     'account_transactions__AmountPositive_global_sum',                          #10
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',         #11
#     'account_product_category_transactions__AmountPositive_global_avg',         #12
    'account_product_transactions__AmountPositive_week_sum',                    #13
#     'account_provider_transactions__AmountPositive_week_avg',                   #14
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',         #15
#     'account_product_transactions__Value_global_avg',                           #16
#     'account_product_category_transactions__Value_global_avg',                  #17
    'account_channel_transactions__AmountPositive_global_avg',                  #18
#     'account_transactions__AmountPositive_global_avg',                          #19
#     'account_provider_transactions__AmountPositive_week_sum'                    #20
]

In [15]:
prediction(X_trn[top20chi2_ver2_Value], y_trn, X_tst[top20chi2_ver2_Value], 'ver2_Value')

Predict =  Counter({0: 44945, 1: 74})
New result! Write it


**Result:** `0.813559322033898` (worse than `top18` and `ver1` and `ver2`)

### Version #3
Отбросим все features with `sum`

In [16]:
top20chi2_ver3 = [
    'AmountPositive',                                                           #01
    'Value',                                                                    #02
#     'account_product_transactions__AmountPositive_global_sum',                  #03
    'account_provider_transactions__AmountPositive_global_avg',                 #04
#     'account_product_category_transactions__AmountPositive_global_sum',         #05
    'account_provider_transactions__Value_global_avg',                          #06
#     'account_provider_transactions__AmountPositive_global_sum',                 #07
#     'account_channel_transactions__AmountPositive_global_sum',                  #08
    'account_product_transactions__AmountPositive_global_avg',                  #09
#     'account_transactions__AmountPositive_global_sum',                          #10
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',         #11
    'account_product_category_transactions__AmountPositive_global_avg',         #12
#     'account_product_transactions__AmountPositive_week_sum',                    #13
    'account_provider_transactions__AmountPositive_week_avg',                   #14
    'account_pricing_strategy_transactions__AmountPositive_global_avg',         #15
    'account_product_transactions__Value_global_avg',                           #16
    'account_product_category_transactions__Value_global_avg',                  #17
    'account_channel_transactions__AmountPositive_global_avg',                  #18
    'account_transactions__AmountPositive_global_avg',                          #19
#     'account_provider_transactions__AmountPositive_week_sum'                    #20
]

In [17]:
prediction(X_trn[top20chi2_ver3], y_trn, X_tst[top20chi2_ver3], 'ver3')

Predict =  Counter({0: 44953, 1: 66})
New result! Write it


**Result:** `0.714285714285714` (much worse)

### Version #4
Оставим только те features, которые увеличивают score по сравнению с предыдущим результатом

начнем с 6 features, 5 из которых - это `var2`

In [18]:
top200chi2inc6 = [
    'AmountPositive',                                                      #001
#     'Value',                                                               #002
    'account_product_transactions__AmountPositive_global_sum',             #003
    'account_provider_transactions__AmountPositive_global_avg',            #004
#     'account_product_category_transactions__AmountPositive_global_sum',    #005
    'account_provider_transactions__Value_global_avg',                     #006
#     'account_provider_transactions__AmountPositive_global_sum',            #007
#     'account_channel_transactions__AmountPositive_global_sum',             #008
#     'account_product_transactions__AmountPositive_global_avg',             #009
#     'account_transactions__AmountPositive_global_sum',                     #010
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',    #011
#     'account_product_category_transactions__AmountPositive_global_avg',    #012
    'account_product_transactions__AmountPositive_week_sum',               #013
#     'account_provider_transactions__AmountPositive_week_avg',              #014
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',    #015
#     'account_product_transactions__Value_global_avg',                      #016
#     'account_product_category_transactions__Value_global_avg',             #017
    'account_channel_transactions__AmountPositive_global_avg',             #018
#     'account_transactions__AmountPositive_global_avg',                     #019 
#     'account_provider_transactions__AmountPositive_week_sum',              #020
#     'account_product_category_transactions__AmountPositive_week_sum',      #021
#     'account_pricing_strategy_transactions__Value_global_avg',             #022
#     'account_channel_transactions__Value_global_avg',                      #023
#     'account_transactions__Value_global_avg',                              #024
#     'product_transactions__AmountPositive_global_sum',                     #025
#     'product_transactions__AmountPositive_week_sum',                       #026
#     'account_provider_transactions__Value_week_avg',                       #027
#     'product_transactions__AmountPositive_month_sum',                      #028
#     'provider_transactions__AmountPositive_global_avg',                    #029
#     'account_channel_transactions__AmountPositive_week_sum',               #030
#     'account_transactions__AmountPositive_week_sum',
#     'account_transactions__AmountPositive_global_min',
#     'account_transactions__Value_global_min',
#     'account_transactions__AmountNegative_global_min',
#     'account_channel_transactions__AmountPositive_global_min',
#     'account_channel_transactions__Value_global_min',
#     'account_channel_transactions__AmountNegative_global_min',
#     'account_product_category_transactions__AmountPositive_week_min',
#     'account_product_category_transactions__Value_week_min',
#     'account_product_category_transactions__AmountNegative_week_min',
#     'account_pricing_strategy_transactions__AmountPositive_global_min',
#     'account_pricing_strategy_transactions__Value_global_min',
#     'account_pricing_strategy_transactions__AmountNegative_global_min',
#     'account_transactions__AmountNegative_week_min',
#     'account_transactions__AmountPositive_week_min',
#     'account_transactions__Value_week_min',
#     'account_channel_transactions__AmountNegative_week_min',
#     'account_channel_transactions__AmountPositive_week_min',
#     'account_channel_transactions__Value_week_min',
#     'account_pricing_strategy_transactions__AmountPositive_week_min',
#     'account_pricing_strategy_transactions__Value_week_min',
#     'account_pricing_strategy_transactions__AmountNegative_week_min',
#     'account_pricing_strategy_transactions__AmountPositive_week_avg',
#     'account_product_transactions__AmountPositive_week_avg',
#     'account_provider_transactions__AmountPositive_week_min',
#     'account_provider_transactions__Value_week_min',
#     'account_provider_transactions__AmountNegative_week_min',
#     'pricing_strategy_transactions__AmountPositive_global_avg',
#     'account_pricing_strategy_transactions__AmountPositive_week_sum',
#     'account_product_transactions__AmountPositive_week_min',
#     'account_product_transactions__Value_week_min',
#     'account_product_transactions__AmountNegative_week_min',
#     'provider_transactions__AmountPositive_month_avg',
#     'account_provider_transactions__AmountPositive_global_min',
#     'account_provider_transactions__Value_global_min',
#     'account_provider_transactions__AmountNegative_global_min',
#     'account_product_category_transactions__AmountPositive_global_min',
#     'account_product_category_transactions__Value_global_min',
#     'account_product_category_transactions__AmountNegative_global_min',
#     'account_channel_transactions__AmountPositive_week_avg',
#     'account_transactions__AmountPositive_week_avg',
#     'account_product_category_transactions__AmountPositive_week_avg',
#     'provider_transactions__Value_global_avg',
#     'provider_transactions__AmountPositive_week_avg',
#     'pricing_strategy_transactions__AmountNegative_global_min',
#     'pricing_strategy_transactions__AmountPositive_global_min',
#     'pricing_strategy_transactions__Value_global_min',
#     'product_transactions__Value_week_sum',
#     'pricing_strategy_transactions__Value_global_avg',
#     'product_transactions__Value_global_sum',
#     'account_pricing_strategy_transactions__Value_week_avg',
#     'account_product_transactions__Value_week_avg',
#     'product_transactions__Value_month_sum',
#     'provider_transactions__Value_month_avg',
#     'account_channel_transactions__Value_week_avg',
#     'account_transactions__Value_week_avg',
#     'provider_transactions__Value_week_avg',
#     'account_product_category_transactions__Value_week_avg',
#     'account_product_transactions__AmountPositive_global_min',
#     'account_product_transactions__Value_global_min',
#     'account_product_transactions__AmountNegative_global_min',
#     'product_transactions__AmountPositive_global_avg',
#     'pricing_strategy_transactions__AmountPositive_week_avg',
#     'account_product_category_transactions__AmountNegative_week_max',
#     'account_product_category_transactions__AmountPositive_week_max',
#     'account_product_category_transactions__Value_week_max',
#     'account_provider_transactions__AmountPositive_month_avg',
#     'account_product_transactions__AmountNegative_week_max',
#     'account_provider_transactions__AmountNegative_week_max',
#     'account_product_transactions__AmountPositive_week_max',
#     'account_product_transactions__Value_week_max',
#     'account_provider_transactions__AmountPositive_week_max',
#     'account_provider_transactions__Value_week_max',
#     'account_product_category_transactions__AmountNegative_month_max',
#     'account_product_category_transactions__AmountPositive_month_max',
#     'account_product_category_transactions__Value_month_max',
#     'account_pricing_strategy_transactions__AmountNegative_week_max',
#     'account_pricing_strategy_transactions__AmountPositive_week_max',
#     'account_pricing_strategy_transactions__Value_week_max',
#     'account_product_transactions__AmountNegative_month_max',
#     'account_product_transactions__AmountPositive_month_max',
#     'account_product_transactions__Value_month_max',
#     'provider_transactions__AmountNegative_week_count',
#     'provider_transactions__AmountPositive_week_count',
#     'provider_transactions__Value_week_count',
#     'account_transactions__AmountNegative_week_max',
#     'account_transactions__AmountPositive_week_max',
#     'account_transactions__Value_week_max',
#     'account_channel_transactions__AmountNegative_week_max',
#     'account_channel_transactions__AmountPositive_week_max',
#     'account_channel_transactions__Value_week_max',
#     'account_pricing_strategy_transactions__Value_week_count',
#     'account_pricing_strategy_transactions__AmountPositive_week_count',
#     'account_pricing_strategy_transactions__AmountNegative_week_count',
#     'account_product_transactions__AmountPositive_week_count',
#     'account_product_transactions__Value_week_count',
#     'account_product_transactions__AmountNegative_week_count',
#     'account_provider_transactions__Value_week_count',
#     'account_provider_transactions__AmountNegative_week_count',
#     'account_provider_transactions__AmountPositive_week_count',
#     'account_product_category_transactions__AmountPositive_week_count',
#     'account_product_category_transactions__Value_week_count',
#     'account_product_category_transactions__AmountNegative_week_count',
#     'account_transactions__Value_week_count',
#     'account_transactions__AmountPositive_week_count',
#     'account_transactions__AmountNegative_week_count',
#     'account_channel_transactions__AmountNegative_week_count',
#     'account_channel_transactions__Value_week_count',
#     'account_channel_transactions__AmountPositive_week_count',
#     'product_transactions__AmountPositive_month_avg',
#     'account_transactions__AmountNegative_month_max',
#     'account_transactions__AmountPositive_month_max',
#     'account_transactions__Value_month_max',
#     'account_channel_transactions__AmountNegative_month_max',
#     'account_channel_transactions__AmountPositive_month_max',
#     'account_channel_transactions__Value_month_max',
#     'account_pricing_strategy_transactions__AmountNegative_month_max',
#     'account_pricing_strategy_transactions__AmountPositive_month_max',
#     'account_pricing_strategy_transactions__Value_month_max',
#     'account_provider_transactions__AmountNegative_month_max',
#     'account_provider_transactions__AmountPositive_month_max',
#     'account_provider_transactions__Value_month_max',
#     'pricing_strategy_transactions__Value_week_avg',
#     'channel_transactions__AmountPositive_global_sum',
#     'provider_transactions__AmountNegative_global_avg',
#     'channel_transactions__AmountNegative_global_avg',
#     'product_transactions__AmountPositive_week_avg',
#     'provider_transactions__AmountNegative_month_count',
#     'provider_transactions__AmountPositive_month_count',
#     'provider_transactions__Value_month_count',
#     'account_product_category_transactions__AmountNegative_month_min',
#     'account_product_category_transactions__AmountPositive_month_min',
#     'account_product_category_transactions__Value_month_min',
#     'provider_transactions__AmountNegative_global_sum',
#     'product_transactions__AmountNegative_global_sum',
#     'provider_transactions__AmountNegative_month_sum',
#     'channel_transactions__AmountNegative_global_sum',
#     'product_transactions__Value_global_avg',
#     'channel_transactions__AmountNegative_month_sum',
#     'account_product_transactions__AmountPositive_month_count',
#     'account_product_transactions__Value_month_count',
#     'account_product_transactions__AmountNegative_month_count',
#     'account_pricing_strategy_transactions__Value_month_count',
#     'account_pricing_strategy_transactions__AmountPositive_month_count',
#     'account_pricing_strategy_transactions__AmountNegative_month_count',
#     'account_provider_transactions__Value_month_count',
#     'account_provider_transactions__AmountNegative_month_count',
#     'account_provider_transactions__AmountPositive_month_count',
#     'account_transactions__Value_month_count',
#     'account_transactions__AmountPositive_month_count',
#     'account_transactions__AmountNegative_month_count',
#     'account_channel_transactions__AmountNegative_month_count',
#     'account_channel_transactions__Value_month_count',
#     'account_channel_transactions__AmountPositive_month_count',
#     'account_product_category_transactions__AmountPositive_month_count',
#     'account_product_category_transactions__Value_month_count',
#     'account_product_category_transactions__AmountNegative_month_count',
#     'provider_transactions__AmountNegative_month_avg',
#     'account_transactions__AmountNegative_month_min',
#     'account_transactions__AmountPositive_month_min',
#     'account_transactions__Value_month_min',
#     'account_channel_transactions__AmountNegative_month_min',
#     'account_channel_transactions__AmountPositive_month_min',
#     'account_channel_transactions__Value_month_min',
#     'account_product_transactions__AmountNegative_month_min',
#     'account_product_transactions__AmountPositive_month_min',
#     'account_product_transactions__Value_month_min',
#     'provider_transactions__AmountNegative_global_count',
#     'provider_transactions__AmountPositive_global_count',
#     'provider_transactions__Value_global_count'
]
# 001, 003, 004, 006, 013, 018 (same `ver2` plus 006)

In [19]:
prediction(X_trn[top200chi2inc6], y_trn, X_tst[top200chi2inc6], 'ver4inc6')

Predict =  Counter({0: 44946, 1: 73})
It is the same as in: AlBo0807_Manually_Feature_Elimination_ver2.csv


7 features

In [20]:
top200chi2inc7 = [
    'AmountPositive',                                                      #001
#     'Value',                                                               #002
    'account_product_transactions__AmountPositive_global_sum',             #003
    'account_provider_transactions__AmountPositive_global_avg',            #004
#     'account_product_category_transactions__AmountPositive_global_sum',    #005
    'account_provider_transactions__Value_global_avg',                     #006
#     'account_provider_transactions__AmountPositive_global_sum',            #007
#     'account_channel_transactions__AmountPositive_global_sum',             #008
#     'account_product_transactions__AmountPositive_global_avg',             #009
    'account_transactions__AmountPositive_global_sum',                     #010
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',    #011
#     'account_product_category_transactions__AmountPositive_global_avg',    #012
    'account_product_transactions__AmountPositive_week_sum',               #013
#     'account_provider_transactions__AmountPositive_week_avg',              #014
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',    #015
#     'account_product_transactions__Value_global_avg',                      #016
#     'account_product_category_transactions__Value_global_avg',             #017
    'account_channel_transactions__AmountPositive_global_avg',             #018
#     'account_transactions__AmountPositive_global_avg',                     #019 
#     'account_provider_transactions__AmountPositive_week_sum',              #020
#     'account_product_category_transactions__AmountPositive_week_sum',      #021
#     'account_pricing_strategy_transactions__Value_global_avg',             #022
#     'account_channel_transactions__Value_global_avg',                      #023
#     'account_transactions__Value_global_avg',                              #024
#     'product_transactions__AmountPositive_global_sum',                     #025
#     'product_transactions__AmountPositive_week_sum',                       #026
#     'account_provider_transactions__Value_week_avg',                       #027
#     'product_transactions__AmountPositive_month_sum',                      #028
#     'provider_transactions__AmountPositive_global_avg',                    #029
#     'account_channel_transactions__AmountPositive_week_sum',               #030
]

In [21]:
prediction(X_trn[top200chi2inc7], y_trn, X_tst[top200chi2inc7], 'ver4inc7')

Predict =  Counter({0: 44946, 1: 73})
It is the same as in: AlBo0807_Manually_Feature_Elimination_ver2.csv


8 features

In [23]:
top200chi2inc8 = [
    'AmountPositive',                                                      #001
#     'Value',                                                               #002
    'account_product_transactions__AmountPositive_global_sum',             #003
    'account_provider_transactions__AmountPositive_global_avg',            #004
#     'account_product_category_transactions__AmountPositive_global_sum',    #005
    'account_provider_transactions__Value_global_avg',                     #006
#     'account_provider_transactions__AmountPositive_global_sum',            #007
#     'account_channel_transactions__AmountPositive_global_sum',             #008
#     'account_product_transactions__AmountPositive_global_avg',             #009
    'account_transactions__AmountPositive_global_sum',                     #010
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',    #011
#     'account_product_category_transactions__AmountPositive_global_avg',    #012
    'account_product_transactions__AmountPositive_week_sum',               #013
#     'account_provider_transactions__AmountPositive_week_avg',              #014
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',    #015
#     'account_product_transactions__Value_global_avg',                      #016
#     'account_product_category_transactions__Value_global_avg',             #017
    'account_channel_transactions__AmountPositive_global_avg',             #018
#     'account_transactions__AmountPositive_global_avg',                     #019 
    'account_provider_transactions__AmountPositive_week_sum',              #020
#     'account_product_category_transactions__AmountPositive_week_sum',      #021
#     'account_pricing_strategy_transactions__Value_global_avg',             #022
#     'account_channel_transactions__Value_global_avg',                      #023
#     'account_transactions__Value_global_avg',                              #024
#     'product_transactions__AmountPositive_global_sum',                     #025
#     'product_transactions__AmountPositive_week_sum',                       #026
#     'account_provider_transactions__Value_week_avg',                       #027
#     'product_transactions__AmountPositive_month_sum',                      #028
#     'provider_transactions__AmountPositive_global_avg',                    #029
#     'account_channel_transactions__AmountPositive_week_sum',               #030
]

In [24]:
prediction(X_trn[top200chi2inc8], y_trn, X_tst[top200chi2inc8], 'ver4inc8')

Predict =  Counter({0: 44946, 1: 73})
It is the same as in: AlBo0807_Manually_Feature_Elimination_ver2.csv


9 features

In [26]:
top200chi2inc9 = [
    'AmountPositive',                                                      #001
#     'Value',                                                               #002
    'account_product_transactions__AmountPositive_global_sum',             #003
    'account_provider_transactions__AmountPositive_global_avg',            #004
#     'account_product_category_transactions__AmountPositive_global_sum',    #005
    'account_provider_transactions__Value_global_avg',                     #006
#     'account_provider_transactions__AmountPositive_global_sum',            #007
#     'account_channel_transactions__AmountPositive_global_sum',             #008
#     'account_product_transactions__AmountPositive_global_avg',             #009
    'account_transactions__AmountPositive_global_sum',                     #010
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',    #011
#     'account_product_category_transactions__AmountPositive_global_avg',    #012
    'account_product_transactions__AmountPositive_week_sum',               #013
#     'account_provider_transactions__AmountPositive_week_avg',              #014
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',    #015
#     'account_product_transactions__Value_global_avg',                      #016
#     'account_product_category_transactions__Value_global_avg',             #017
    'account_channel_transactions__AmountPositive_global_avg',             #018
#     'account_transactions__AmountPositive_global_avg',                     #019 
    'account_provider_transactions__AmountPositive_week_sum',              #020
#     'account_product_category_transactions__AmountPositive_week_sum',      #021
#     'account_pricing_strategy_transactions__Value_global_avg',             #022
#     'account_channel_transactions__Value_global_avg',                      #023
#     'account_transactions__Value_global_avg',                              #024
#     'product_transactions__AmountPositive_global_sum',                     #025
#     'product_transactions__AmountPositive_week_sum',                       #026
#     'account_provider_transactions__Value_week_avg',                       #027
#     'product_transactions__AmountPositive_month_sum',                      #028
#     'provider_transactions__AmountPositive_global_avg',                    #029
    'account_channel_transactions__AmountPositive_week_sum',               #030
]

In [27]:
prediction(X_trn[top200chi2inc9], y_trn, X_tst[top200chi2inc9], 'ver4inc9')

Predict =  Counter({0: 44946, 1: 73})
It is the same as in: AlBo0807_Manually_Feature_Elimination_ver2.csv


10 features

In [28]:
top200chi2inc10 = [
    'AmountPositive',                                                      #001
#     'Value',                                                               #002
    'account_product_transactions__AmountPositive_global_sum',             #003
    'account_provider_transactions__AmountPositive_global_avg',            #004
#     'account_product_category_transactions__AmountPositive_global_sum',    #005
    'account_provider_transactions__Value_global_avg',                     #006
#     'account_provider_transactions__AmountPositive_global_sum',            #007
#     'account_channel_transactions__AmountPositive_global_sum',             #008
#     'account_product_transactions__AmountPositive_global_avg',             #009
    'account_transactions__AmountPositive_global_sum',                     #010
#     'account_pricing_strategy_transactions__AmountPositive_global_sum',    #011
#     'account_product_category_transactions__AmountPositive_global_avg',    #012
    'account_product_transactions__AmountPositive_week_sum',               #013
#     'account_provider_transactions__AmountPositive_week_avg',              #014
#     'account_pricing_strategy_transactions__AmountPositive_global_avg',    #015
#     'account_product_transactions__Value_global_avg',                      #016
#     'account_product_category_transactions__Value_global_avg',             #017
    'account_channel_transactions__AmountPositive_global_avg',             #018
#     'account_transactions__AmountPositive_global_avg',                     #019 
    'account_provider_transactions__AmountPositive_week_sum',              #020
#     'account_product_category_transactions__AmountPositive_week_sum',      #021
#     'account_pricing_strategy_transactions__Value_global_avg',             #022
#     'account_channel_transactions__Value_global_avg',                      #023
#     'account_transactions__Value_global_avg',                              #024
#     'product_transactions__AmountPositive_global_sum',                     #025
#     'product_transactions__AmountPositive_week_sum',                       #026
#     'account_provider_transactions__Value_week_avg',                       #027
#     'product_transactions__AmountPositive_month_sum',                      #028
#     'provider_transactions__AmountPositive_global_avg',                    #029
    'account_channel_transactions__AmountPositive_week_sum',               #030
    'account_transactions__AmountPositive_week_sum',                       #031
#     'account_transactions__AmountPositive_global_min',
#     'account_transactions__Value_global_min',
#     'account_transactions__AmountNegative_global_min',
#     'account_channel_transactions__AmountPositive_global_min',
#     'account_channel_transactions__Value_global_min',
#     'account_channel_transactions__AmountNegative_global_min',
#     'account_product_category_transactions__AmountPositive_week_min',
#     'account_product_category_transactions__Value_week_min',
#     'account_product_category_transactions__AmountNegative_week_min',
]

In [29]:
prediction(X_trn[top200chi2inc10], y_trn, X_tst[top200chi2inc10], 'ver4inc10')

Predict =  Counter({0: 44946, 1: 73})
It is the same as in: AlBo0807_Manually_Feature_Elimination_ver2.csv


`k   F1-score on zindi`

`01  0.679245283018868`

`02  0.666666666666667`

`03  0.766666666666667`

`04  0.8`

`05  0.766666666666667`

`06  0.8`

`07  0.8`

`08  0.8`

`10  0.8`

`13  0.813559322033898`

`18  0.827586206896552`

`20  0.813559322033898`

`25  0.793103448275862`

`26  0.793103448275862`

`27  0.793103448275862`

`28  0.793103448275862`

`29  0.696969696969697`

`30  0.73015873015873`

`31  0.793103448275862`

`32  0.754098360655738`

`35  0.741935483870968`

`38  0.721311475409836`

`39  0.754098360655738`

`40  0.676470588235294`

`43  0.741935483870968`

`46  0.733333333333333`

`53  0.754098360655738`

`54  0.741935483870968`

`62  0.741935483870968`

`63  0.779661016949153`

`71  0.733333333333333`