## TalkingData AdTracking Fraud Detection Challenge
https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection


------------------


### This notebook is meant to demo a hyper-param searching algos -- BayesianOptimization

BayesianOptimization repo: [BayesianOptimization Github Link](https://github.com/fmfn/BayesianOptimization)

Bayesian optimization works by constructing a posterior distribution of functions (gaussian process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not, as seen in the picture below.

![BayesianOptimization in action](https://github.com/fmfn/BayesianOptimization/blob/master/examples/bo_example.png)

As you iterate over and over, the algorithm balances its needs of exploration and exploitation taking into account what it knows about the target function. At each step a Gaussian Process is fitted to the known samples (points previously explored), and the posterior distribution, combined with a exploration strategy (such as UCB (Upper Confidence Bound), or EI (Expected Improvement)), are used to determine the next point that should be explored (see the gif below).

![BayesianOptimization in action](https://github.com/fmfn/BayesianOptimization/blob/master/examples/bayesian_optimization.gif)

This process is designed to minimize the number of steps required to find a combination of parameters that are close to the optimal combination. To do so, this method uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore Bayesian Optimization is most adequate for situations where sampling the function to be optimized is a very expensive endeavor. See the references for a proper discussion of this method.

In [3]:
import os
import psutil
import time
import pandas as pd
import gc
# sklearn imports
from sklearn.metrics.scorer import roc_auc_score
import lightgbm
# bayes_opt imports
from bayes_opt import BayesianOptimization

# memory
process = psutil.Process(os.getpid())
memused = process.memory_info().rss
print('Total memory in use before reading data: {:.02f} GB'.format(memused/(2**30))) 

Total memory in use before reading data: 0.10 GB


In [5]:
# # read data
df_train = pd.read_hdf('../insample_iterations/insample_data/train.hdf').astype('float32')
df_test = pd.read_hdf('../insample_iterations/insample_data/test.hdf').astype('float32')
# col
target = 'is_attributed'
features = [
    'app',
    'device',
    'os',
    'channel',
    'hour',
    'in_test_hh',
    'ip_day_hour_clicks',
    'ip_app_day_hour_clicks',
    'ip_os_day_hour_clicks',
    'ip_device_day_hour_clicks',
    'ip_day_test_hh_clicks',
    'ip_app_device_clicks',
    'ip_app_device_day_clicks',
    'ip_day_nunique_app',
    'ip_day_nunique_device',
    'ip_day_nunique_channel',
    'ip_day_nunique_hour',
    'ip_nunique_app',
    'ip_nunique_device',
    'ip_nunique_channel',
    'ip_nunique_hour',
    'app_day_nunique_channel',
    'app_nunique_channel',
    'ip_app_day_nunique_os',
    'ip_app_nunique_os',
    'ip_device_os_day_nunique_app',
    'ip_device_os_nunique_app',
    'ip_app_day_var_hour',
    'ip_device_day_var_hour',
    'ip_os_day_var_hour',
    'ip_channel_day_var_hour',
    'ip_app_os_var_hour',
    'ip_app_channel_var_day',
    'ip_app_channel_mean_hour',
    'ip_day_cumcount',
    'ip_cumcount',
    'ip_app_day_cumcount',
    'ip_app_cumcount',
    'ip_device_os_day_cumcount',
    'ip_device_os_cumcount',
    'next_click',
    'previous_click',
]
# categorical
categorical_features = [
    'app',
    'device',
    'os',
    'channel',
    'hour',
    'in_test_hh',
]
# prep data
dtrain = lightgbm.Dataset(
    df_train[features].values,
    label=df_train[target].values,
    feature_name=features,
    categorical_feature=categorical_features,
    free_raw_data=True,
)
dtest = lightgbm.Dataset(
    df_test[features].values,
    label=df_test[target].values,
    feature_name=features,
    categorical_feature=categorical_features
)
# cleanup
del df_train
gc.collect()
print('done data prep!!!')
# memory status
memused = process.memory_info().rss
print('Total memory in use after reading data: {:.02f} GB '
      ''.format(memused / (2 ** 30)))

done data prep!!!
Total memory in use after reading data: 26.62 GB 


## Bayes Search Results Record
-----------------------------------------------------

| Time | Value | eta | n_rounds | num_leaves | max_depth | subsample | colsample_bytree| min_child_samples | scale_pos_weight |
|---|---|---|---|---|---|---|---|
| 01m56s | 0.97951 | 0.3 | 25 | 54 | 8 | 0.9 | 1 | 100 | 100  
| 01m50s | 0.97935 | 0.3 | 25 | 40 | 8 | 0.9 | 1 | 100 | 100  
| 01m04s | 0.97431 | 0.3 | 25 | 4 | 5 | 0.9 | 0.6 | 100 | 100  


In [None]:
def lightgbm_objective(num_leaves, max_depth, colsample_bytree):
    lightgbm_params = {
        'boosting_type': 'gbdt',
        'objective': 'binary',
        'learning_rate': 0.3,
        'num_leaves': int(round(num_leaves, 0)),
        'max_depth': int(round(max_depth, 0)),
        'min_split_gain': 0,
        'subsample': 0.9,
        'subsample_freq': 1,
        'colsample_bytree': round(colsample_bytree, 1),
        'min_child_samples': 100,
        'min_child_weight': 0,
        'max_bin': 100,
        'subsample_for_bin': 200000,
        'reg_alpha': 0,
        'reg_lambda': 0,
        'scale_pos_weight': 100,
        'metric': 'auc',
        'nthread': 22,
        'verbose': 0,
    }
    model = lightgbm.train(
        params=lightgbm_params, 
        train_set=dtrain,
        num_boost_round=25,
        feature_name=features,
        categorical_feature=categorical_features,
        verbose_eval=1
    )
    proba = model.predict(df_test[features], num_iteration=model.best_iteration)
    roc_score = roc_auc_score(y_true=df_test[target], y_score=proba)
    return roc_score

# A parameter grid for XGBoost
params = {
    'num_leaves': (4, 64),
    'max_depth': (4, 8),
    'colsample_bytree': (0.5, 1.0)
}

# Initialize BO optimizer
lightgbm_bayesopt = BayesianOptimization(
    f=lightgbm_objective, 
    pbounds=params,
    random_state=1,
    verbose=1
)
# Maximize auc score
lightgbm_bayesopt.maximize(init_points=5, n_iter=20)

# get best param
best_params = lightgbm_bayesopt.res['max']['max_params']

[31mInitialization[0m
[94m-------------------------------------------------------------------------------[0m
 Step |   Time |      Value |   colsample_bytree |   max_depth |   num_leaves | 
    1 | 01m48s | [35m   0.97831[0m | [32m            0.7096[0m | [32m     4.3694[0m | [32m     29.0213[0m | 
    2 | 01m21s | [35m   0.97905[0m | [32m            0.8426[0m | [32m     4.7450[0m | [32m     47.2195[0m | 
    3 | 01m04s |    0.97431 |             0.6022 |      5.3822 |       4.0069 | 
    4 | 01m24s | [35m   0.97921[0m | [32m            0.9391[0m | [32m     5.5871[0m | [32m     22.1400[0m | 
    5 | 01m18s |    0.97793 |             0.5137 |      6.1553 |      12.8054 | 




[31mBayesian Optimization[0m
[94m-------------------------------------------------------------------------------[0m
 Step |   Time |      Value |   colsample_bytree |   max_depth |   num_leaves | 
    6 | 01m50s | [35m   0.97935[0m | [32m            0.9939[0m | [32m     7.9925[0m | [32m     39.8044[0m | 
    7 | 01m56s |    0.97925 |             0.9867 |      7.9415 |      63.9839 | 
    8 | 01m24s |    0.97807 |             0.9729 |      4.0254 |      63.8512 | 


  " state: %s" % convergence_dict)


    9 | 01m58s |    0.97930 |             0.5071 |      7.9855 |      55.2732 | 
   10 | 01m38s |    0.97881 |             0.9792 |      7.9976 |      18.8534 | 


  " state: %s" % convergence_dict)


   11 | 01m56s | [35m   0.97951[0m | [32m            0.9979[0m | [32m     7.9647[0m | [32m     53.6639[0m | 


  " state: %s" % convergence_dict)


   12 | 01m25s |    0.97824 |             0.9873 |      4.0340 |      14.9677 | 


  " state: %s" % convergence_dict)


   13 | 01m54s |    0.97904 |             0.5082 |      7.9859 |      45.3286 | 
   14 | 01m54s |    0.97950 |             0.9992 |      6.9632 |      58.1132 | 


  " state: %s" % convergence_dict)
  " state: %s" % convergence_dict)
  " state: %s" % convergence_dict)


   15 | 01m54s |    0.97817 |             0.8539 |      7.1605 |      56.1544 | 
   16 | 01m21s |    0.97506 |             0.9605 |      7.9282 |       4.2543 | 
   17 | 01m26s |    0.97807 |             0.9989 |      4.0060 |      38.4673 | 


  " state: %s" % convergence_dict)


   18 | 01m43s |    0.97947 |             0.9387 |      7.9877 |      28.7524 | 


  " state: %s" % convergence_dict)


   19 | 01m26s |    0.97760 |             0.5070 |      4.0176 |      18.7738 | 


  " state: %s" % convergence_dict)


   20 | 01m59s |    0.97927 |             0.5428 |      7.9978 |      61.2742 | 
   21 | 01m26s |    0.97758 |             0.9929 |      4.0024 |       7.8988 | 
   22 | 01m51s |    0.97914 |             0.5017 |      7.9966 |      34.6797 | 
