Public code for submitted paper "DMBGN: Deep Multi-Behavior Graph Networks for Voucher Redemption Rate Prediction" for SIGKDD 2021. This code covers the experimental results in Chapter 5 in the submitted paper. Note that the following experiments are conducted on a randomly desensitized sampled dataset from original dataset (Region C) mentioned in the paper, all related id features are hashed for public use.

This notebook is organized into 5 parts:
1. Data Processing: generate the training data from the original log table (for log description, please refer to README.md file under ./data directory
2. Baseline Models: corresponding to 5 baseline models compared in the submitted paper, including LR, GBDT, DNN, WDL and DIN model.
3. Proposed Method: DMBGN: our proposed model, which includes experiment 2 variants of DMBGN (AvgPooling and Pretrained) with our final model DMBGN
4. Summary: a summary of experiment results
5. Reference

The content of this notebook is as:
- 1 Data Processing
  - 1.1 Logs Processing
  - 1.2 Label Encoding and Normalization
- 2 Baseline Models
  - 2.1 LR
  - 2.2 GBDT
  - 2.3 DNN
  - 2.4 WDL
  - 2.5 DIN
- 3 Proposed Method: DMBGN
  - 3.1 DMBGN-AvgPooling
  - 3.2 DMBGN-Pretrained
    - 3.2.1 UVG Graphs
    - 3.2.2 GNN Networks
      - 3.2.2.1 Get Pretrained Item Embedding
      - 3.2.2.2 Train GNN
      - 3.2.2.3 Generate Pretrained UVG Embedding
      - 3.2.2.4 GMBDN Log Processing
  - 3.3 DMBGN
- 4 Summary
- 5 Reference


Note that you can run all codes directly for all the results. For DMBGN it might takes a longer time and we used 8 GPUs for accerlation purpose.


Author: \
    Lin Li (boolean.ll@alibaba-inc.com) \
    Fengtong Xiao (fengtong.xiao@alibaba-inc.com) \
    Weinan Xu (stella.xu@lazada.com)

References: \
    DMBGN: Deep Multi-Behavior Graph Networks for Voucher Redemption Rate Prediction

In [31]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:70% !important; }</style>"))

import numpy as np
import pandas as pd
import pandasql as ps

import warnings
import logging
import sys
import pickle
import csv
import os

from datetime import datetime, timedelta
from sklearn.metrics import log_loss, roc_auc_score
from prettytable import PrettyTable

import torch
from deepctr_torch.inputs import SparseFeat, DenseFeat,VarLenSparseFeat, get_feature_names
from deepctr_torch.models import *
from tqdm import tqdm

sys.setrecursionlimit(9000000) 
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s]: %(message)s')
warnings.filterwarnings('ignore')

from models.util import *
from models import LabelEncoderExt
from models.din import DIN
from models.DMBGN import DMBGN

In [2]:
device = torch.device('cpu')
# def get_device(idx=0):
#     if torch.cuda.is_available():
#         device = torch.device('cuda')
#     else:
#         device = torch.device('cpu')
#     return device
# device_count = torch.cuda.device_count()
# device = get_device()
device

device(type='cpu')

# Data Processing

In [3]:
# Load the pickle file, note you might need to unzip the kdd_data.pkl.zip to recover the pkl file
# To unzip, use the following linux commands:
# $ cd ./data
# $ unzip kdd_data.pkl.zip

file_path = './data/kdd_data.pkl'
with open(file_path, "rb") as f:
    log_df = pickle.load(f)
    session_df = pickle.load(f)
    item_df = pickle.load(f)
log_df.shape, session_df.shape, item_df.shape

DeepCTR-PyTorch version 0.2.9 detected. Your version is 0.2.3.
Use `pip install -U deepctr-torch` to upgrade.Changelog: https://github.com/shenweichen/DeepCTR-Torch/releases/tag/v0.2.9


((62068, 19), (1118593, 14), (286735, 6))

##  Logs Processing
construct the historical UVG sequence following the chronological order 


In [4]:
log_df

Unnamed: 0,session_id,label,user_id,promotion_id,voucher_min_spend,voucher_discount,voucher_collect_time,voucher_redeem_time,campaign_name,user_age_level,user_gender,user_purchase_level,user_trd__orders_cnt_hist,user_trd__actual_gmv_usd_hist,user_trd__orders_cnt_platform_discount_hist,user_trd__max_gmv_usd_hist,user_trd__avg_gmv_usd_hist,user_trd__min_gmv_usd_hist,dtype
0,12130_38,1,12130,38,888,80,14363,28292,C3,5.0,1,8.0,52.0,706.648652,4.0,79.950388,11.584404,0.827519,train
1,12130_85,0,12130,85,4999,500,49983,49983,C2,5.0,1,8.0,69.0,822.218764,9.0,60.165341,8.747008,0.601405,test
2,12156_64,0,12156,64,7799,700,21441,21441,C3,4.0,0,9.0,26.0,349.797257,4.0,118.703723,8.531640,0.928380,test
3,12156_91,1,12156,91,799,80,46418,52254,C2,4.0,0,9.0,61.0,793.370613,7.0,215.294128,7.778143,0.517295,test
4,12156_310,1,12156,310,349,30,121387,131549,C1,4.0,0,9.0,65.0,842.410759,8.0,215.294128,7.589286,0.517295,train
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62063,11873_319,0,11873,319,1999,200,120575,120575,C1,2.0,1,9.0,18.0,884.002906,16.0,260.133633,21.047688,0.000000,train
62064,11873_305,1,11873,305,4999,500,120576,131797,C1,2.0,1,9.0,18.0,884.002906,16.0,260.133633,21.047688,0.000000,train
62065,11879_159,0,11879,159,799,70,72071,72071,C0,4.0,0,8.0,10.0,340.424720,1.0,53.094769,28.368727,3.595659,test
62066,11880_38,0,11880,38,888,80,3867,4451,C3,2.0,1,9.0,169.0,5085.468023,24.0,501.537557,14.009554,0.000000,train


In [5]:
q1 = "select *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY CAST(voucher_collect_time AS BIGINT) ASC) as rk from {logdf}".format(logdf="log_df")
log_df = ps.sqldf(q1, locals())

In [6]:
sql = " SELECT L.* \
            , coalesce(R.session_id, '') AS hist_session_id \
            , coalesce(R.promotion_id, '') AS hist_promotion_id \
            , coalesce(R.rk, '') AS hist_rk \
            , R.voucher_collect_time AS hist_voucher_collect_time \
        FROM {df} L \
        LEFT JOIN {df} R \
        ON L.user_id = R.user_id \
        AND L.session_id != R.session_id \
        AND L.campaign_name != R.campaign_name \
        AND R.label = 1 \
        AND L.rk > R.rk \
        ORDER BY L.rk ASC, R.rk ASC".format(df='log_df')
log_df_tmp = ps.sqldf(sql, locals())
log_df_tmp

Unnamed: 0,session_id,label,user_id,promotion_id,voucher_min_spend,voucher_discount,voucher_collect_time,voucher_redeem_time,campaign_name,user_age_level,...,user_trd__orders_cnt_platform_discount_hist,user_trd__max_gmv_usd_hist,user_trd__avg_gmv_usd_hist,user_trd__min_gmv_usd_hist,dtype,rk,hist_session_id,hist_promotion_id,hist_rk,hist_voucher_collect_time
0,0_82,0,0,82,299,30,45451,45521,C2,4.0,...,0.0,6.376755,3.917888,2.518198,train,1,,,,
1,1_425,0,1,425,40,4,100489,132869,C1,0.0,...,1.0,27.976711,8.408431,0.739039,train,1,,,,
2,10_82,0,10,82,299,30,46827,48182,C2,,...,0.0,4.972645,3.039102,1.498169,train,1,,,,
3,100_82,0,100,82,299,30,46227,46227,C2,3.0,...,34.0,397.755770,17.843147,0.003525,train,1,,,,
4,1000_386,0,1000,386,0,60,96836,96836,C1,3.0,...,0.0,0.000000,0.000000,0.000000,train,1,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
83457,6777_391,0,6777,391,199,30,128346,128346,C1,4.0,...,50.0,93.471440,7.419820,0.000000,train,102,6777_248,248,39,74788
83458,6777_383,0,6777,383,500,50,128346,128346,C1,4.0,...,50.0,93.471440,7.419820,0.000000,train,103,6777_138,138,2,39379
83459,6777_383,0,6777,383,500,50,128346,128346,C1,4.0,...,50.0,93.471440,7.419820,0.000000,train,103,6777_206,206,17,59004
83460,6777_383,0,6777,383,500,50,128346,128346,C1,4.0,...,50.0,93.471440,7.419820,0.000000,train,103,6777_215,215,27,64992


In [7]:
user_voucher_log = log_df_tmp.groupby([ 'session_id', 'label', 'user_id', 'promotion_id', 'voucher_min_spend', 
                       'voucher_discount', 'voucher_collect_time', 'voucher_redeem_time',
                       'user_age_level', 'user_gender', 'user_purchase_level',
                       'user_trd__orders_cnt_hist', 'user_trd__actual_gmv_usd_hist',
                       'user_trd__orders_cnt_platform_discount_hist',
                       'user_trd__max_gmv_usd_hist', 'user_trd__avg_gmv_usd_hist',
                       'user_trd__min_gmv_usd_hist', 'dtype']) \
                .agg({'hist_session_id': lambda x: "%s" % ','.join(x),
                      'hist_promotion_id': lambda x: "%s" % ','.join(x),
                      'hist_rk': lambda x: list(x),
                      'hist_voucher_collect_time': "count"}).reset_index()

user_voucher_log['keys_length'] = user_voucher_log['hist_voucher_collect_time']
user_voucher_log = user_voucher_log.drop(columns=['hist_rk', 'hist_voucher_collect_time'])

In [8]:
user_voucher_log

Unnamed: 0,session_id,label,user_id,promotion_id,voucher_min_spend,voucher_discount,voucher_collect_time,voucher_redeem_time,user_age_level,user_gender,...,user_trd__orders_cnt_hist,user_trd__actual_gmv_usd_hist,user_trd__orders_cnt_platform_discount_hist,user_trd__max_gmv_usd_hist,user_trd__avg_gmv_usd_hist,user_trd__min_gmv_usd_hist,dtype,hist_session_id,hist_promotion_id,keys_length
0,0_82,0,0,82,299,30,45451,45521,4.0,0,...,4.0,23.507327,0.0,6.376755,3.917888,2.518198,train,,,0
1,10000_38,0,10000,38,888,80,13253,13254,3.0,0,...,6.0,218.105824,9.0,42.643151,12.116990,3.581674,train,,,0
2,10001_159,1,10001,159,799,70,80001,89306,0.0,0,...,39.0,373.530611,3.0,32.090705,6.225510,0.561611,train,,,0
3,10001_319,0,10001,319,1999,200,115663,115663,0.0,0,...,42.0,381.810513,3.0,32.090705,5.614860,0.487420,train,10001_159,159,1
4,10001_38,0,10001,38,888,80,12867,19625,0.0,0,...,11.0,133.172685,0.0,26.419591,11.097724,1.811817,train,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58046,99_82,0,99,82,299,30,46577,46577,3.0,0,...,44.0,2462.818244,50.0,329.698604,18.944756,0.000000,train,,,0
58047,99_83,1,99,83,1999,200,35523,53242,3.0,0,...,43.0,2397.249462,46.0,329.698604,19.025789,0.000000,test,,,0
58048,99_91,0,99,91,799,80,35523,46578,3.0,0,...,43.0,2397.249462,46.0,329.698604,19.025789,0.000000,train,,,0
58049,9_61,1,9,61,2999,300,24981,28017,3.0,1,...,22.0,715.893947,23.0,150.129713,9.419657,1.116494,train,,,0


In [9]:
# a statistics of historical UVG sequence distribution
user_voucher_log.groupby(['keys_length']).agg({'session_id': "count"}).reset_index()

Unnamed: 0,keys_length,session_id
0,0,39908
1,1,8720
2,2,4236
3,3,2326
4,4,1325
5,5,721
6,6,267
7,7,218
8,8,115
9,9,72


## Label Encoding and Normalization
Label encoding for sparse features and normlaization for dense features

In [10]:
df = user_voucher_log.copy()
df = df.take(np.random.permutation(len(df)))

In [11]:
sparse_feature = ['promotion_id','session_id','user_gender','user_age_level','user_purchase_level'] #['promotion_id','voucher_min_spend','voucher_discount_amount']

hist_list_features = ['hist_promotion_id','hist_session_id','keys_length']

ignore_features=['dtype','venture','ds','user_id','label','voucher_collect_time','voucher_redeem_time','campaign_name','rk']
ignore_features_key_words = ['out', 'emb']

dense_feature = []
train_features = []

for feat in df.columns:
    flag = True 
    for key in ignore_features_key_words:
        if key in feat:
            flag = False
            break
    if feat not in ignore_features and flag is True:
        if feat not in hist_list_features:
            train_features.append(feat)
        if feat not in sparse_feature and feat not in hist_list_features:
            dense_feature.append(feat)

target = 'label'
df[['session_id','promotion_id','user_gender','user_age_level','user_purchase_level']]=df[['session_id','promotion_id','user_gender','user_age_level','user_purchase_level']].astype('str')

In [12]:
label_encoder = {}
for feat in sparse_feature:
    df[feat] = df[feat].fillna(0)
    print("lbe {}".format(feat))
    
    lbe = LabelEncoderExt()
    lbe.fit(df[feat])
        
    df[feat] = lbe.transform(df[feat])
    label_encoder[feat] = lbe
    logging.warn('LabelEncoder encoding ' + feat + " len " + str(len(lbe)))
print("done")



lbe promotion_id
LabelEncoderExt fitting...
LabelEncoderExt transforming...
lbe session_id
LabelEncoderExt fitting...
LabelEncoderExt transforming...




lbe user_gender
LabelEncoderExt fitting...
LabelEncoderExt transforming...
lbe user_age_level
LabelEncoderExt fitting...
LabelEncoderExt transforming...
lbe user_purchase_level
LabelEncoderExt fitting...
LabelEncoderExt transforming...
done


In [13]:
mean_kv = {}
std_kv = {}
for feat in dense_feature:
    print(feat)
    mean_kv[feat] = df[feat].mean()
    std_kv[feat] = df[feat].std()
    df[feat] = (df[feat] - mean_kv[feat]) / std_kv[feat]

voucher_min_spend
voucher_discount
user_trd__orders_cnt_hist
user_trd__actual_gmv_usd_hist
user_trd__orders_cnt_platform_discount_hist
user_trd__max_gmv_usd_hist
user_trd__avg_gmv_usd_hist
user_trd__min_gmv_usd_hist


In [14]:
deep_ctr_df = df 

dctr_train = deep_ctr_df[deep_ctr_df.dtype == 'train']
dctr_v1 = deep_ctr_df[deep_ctr_df.dtype == 'test']

# Baseline Models

In [15]:
results={}

In [16]:
embedding_dim = 16
sparse_feature_columns = [SparseFeat(feat, len(label_encoder[feat].classes_), embedding_dim = embedding_dim) for feat in sparse_feature]
dense_feature_columns = [DenseFeat(feat, 1, ) for feat in dense_feature]

dnn_feature_columns = sparse_feature_columns + dense_feature_columns
linear_feature_columns = sparse_feature_columns + dense_feature_columns

## LR
LR: Logistic Regression [1] is a shallow model.

In [17]:
def gen_model_input_data(feature_names, raw_features, target):
    model_input = {}
    for name in feature_names:
        if name in sparse_feature:
            model_input[name] = raw_features[name]
        else:
            model_input[name] = raw_features[name].fillna(0).astype(np.float32)
    return raw_features[target], model_input

feature_names = get_feature_names(dense_feature_columns + sparse_feature_columns)
train_label, train_model_input = gen_model_input_data(feature_names, dctr_train, target)
test_label1, test_model_input1 = gen_model_input_data(feature_names, dctr_v1, target)

In [18]:
model_name = 'LR'
epoch = 50
batch_size = 300

model = WDL(linear_feature_columns=dnn_feature_columns,
            dnn_feature_columns=[], 
            dnn_use_bn=True,
            l2_reg_dnn=0.1, 
            l2_reg_embedding = 0.0001, 
            dnn_hidden_units=(128,64), 
            init_std=1, 
            dnn_dropout=0.5, 
            task='binary', 
            dnn_activation='relu', 
            device=device)

model
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.1, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy',metrics=['auc','logloss'])
model.fit(train_model_input,  train_label.values.astype(int), batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1)
res = model.evaluate(test_model_input1,test_label1)
pred1 = model.predict(test_model_input1)
results[model_name] = model, res, pred1, test_label1

cpu
Train on 46361 samples, validate on 11690 samples, 155 steps per epoch


155it [00:02, 70.88it/s]


Epoch 1/50
2s - loss:  0.5892 - auc:  0.6761 - logloss:  0.5889 - val_auc:  0.6808 - val_logloss:  0.5127


155it [00:01, 101.31it/s]


Epoch 2/50
1s - loss:  0.4733 - auc:  0.7048 - logloss:  0.4732 - val_auc:  0.6906 - val_logloss:  0.4478


155it [00:01, 101.15it/s]


Epoch 3/50
1s - loss:  0.4313 - auc:  0.7231 - logloss:  0.4311 - val_auc:  0.7018 - val_logloss:  0.4242


155it [00:01, 102.56it/s]


Epoch 4/50
1s - loss:  0.4148 - auc:  0.7346 - logloss:  0.4148 - val_auc:  0.7109 - val_logloss:  0.4142


155it [00:01, 102.26it/s]


Epoch 5/50
1s - loss:  0.4068 - auc:  0.7434 - logloss:  0.4068 - val_auc:  0.7177 - val_logloss:  0.4087


155it [00:01, 101.90it/s]


Epoch 6/50
1s - loss:  0.4018 - auc:  0.7484 - logloss:  0.4017 - val_auc:  0.7228 - val_logloss:  0.4050


155it [00:01, 101.52it/s]


Epoch 7/50
1s - loss:  0.3982 - auc:  0.7526 - logloss:  0.3983 - val_auc:  0.7270 - val_logloss:  0.4022


155it [00:01, 102.81it/s]


Epoch 8/50
1s - loss:  0.3954 - auc:  0.7560 - logloss:  0.3954 - val_auc:  0.7307 - val_logloss:  0.3998


155it [00:01, 92.58it/s]


Epoch 9/50
1s - loss:  0.3930 - auc:  0.7572 - logloss:  0.3932 - val_auc:  0.7333 - val_logloss:  0.3979


155it [00:01, 84.30it/s]


Epoch 10/50
1s - loss:  0.3911 - auc:  0.7604 - logloss:  0.3911 - val_auc:  0.7356 - val_logloss:  0.3962


155it [00:01, 83.57it/s]


Epoch 11/50
1s - loss:  0.3895 - auc:  0.7610 - logloss:  0.3895 - val_auc:  0.7372 - val_logloss:  0.3949


155it [00:01, 78.37it/s]


Epoch 12/50
2s - loss:  0.3881 - auc:  0.7628 - logloss:  0.3880 - val_auc:  0.7387 - val_logloss:  0.3937


155it [00:01, 85.34it/s]


Epoch 13/50
1s - loss:  0.3869 - auc:  0.7635 - logloss:  0.3870 - val_auc:  0.7398 - val_logloss:  0.3927


155it [00:01, 85.20it/s]


Epoch 14/50
1s - loss:  0.3859 - auc:  0.7647 - logloss:  0.3859 - val_auc:  0.7410 - val_logloss:  0.3919


155it [00:01, 83.97it/s]


Epoch 15/50
1s - loss:  0.3851 - auc:  0.7644 - logloss:  0.3852 - val_auc:  0.7419 - val_logloss:  0.3910


155it [00:01, 85.04it/s]


Epoch 16/50
1s - loss:  0.3844 - auc:  0.7652 - logloss:  0.3842 - val_auc:  0.7428 - val_logloss:  0.3904


155it [00:01, 86.32it/s]


Epoch 17/50
1s - loss:  0.3837 - auc:  0.7660 - logloss:  0.3839 - val_auc:  0.7437 - val_logloss:  0.3898


155it [00:01, 78.45it/s]


Epoch 18/50
2s - loss:  0.3832 - auc:  0.7662 - logloss:  0.3831 - val_auc:  0.7439 - val_logloss:  0.3892


155it [00:01, 83.94it/s]


Epoch 19/50
1s - loss:  0.3827 - auc:  0.7664 - logloss:  0.3828 - val_auc:  0.7438 - val_logloss:  0.3889


155it [00:01, 79.57it/s]


Epoch 20/50
2s - loss:  0.3823 - auc:  0.7664 - logloss:  0.3823 - val_auc:  0.7448 - val_logloss:  0.3884


155it [00:01, 86.14it/s]


Epoch 21/50
1s - loss:  0.3819 - auc:  0.7667 - logloss:  0.3818 - val_auc:  0.7450 - val_logloss:  0.3881


155it [00:01, 86.91it/s]


Epoch 22/50
1s - loss:  0.3816 - auc:  0.7666 - logloss:  0.3814 - val_auc:  0.7452 - val_logloss:  0.3878


155it [00:01, 87.91it/s]


Epoch 23/50
1s - loss:  0.3813 - auc:  0.7669 - logloss:  0.3814 - val_auc:  0.7455 - val_logloss:  0.3875


155it [00:01, 86.74it/s]


Epoch 24/50
1s - loss:  0.3811 - auc:  0.7668 - logloss:  0.3811 - val_auc:  0.7459 - val_logloss:  0.3872


155it [00:01, 86.64it/s]


Epoch 25/50
1s - loss:  0.3808 - auc:  0.7676 - logloss:  0.3808 - val_auc:  0.7462 - val_logloss:  0.3870


155it [00:01, 86.84it/s]


Epoch 26/50
1s - loss:  0.3806 - auc:  0.7680 - logloss:  0.3806 - val_auc:  0.7464 - val_logloss:  0.3868


155it [00:01, 85.66it/s]


Epoch 27/50
1s - loss:  0.3804 - auc:  0.7673 - logloss:  0.3805 - val_auc:  0.7462 - val_logloss:  0.3867


155it [00:01, 80.51it/s]


Epoch 28/50
2s - loss:  0.3803 - auc:  0.7673 - logloss:  0.3802 - val_auc:  0.7468 - val_logloss:  0.3865


155it [00:01, 85.80it/s]


Epoch 29/50
1s - loss:  0.3801 - auc:  0.7667 - logloss:  0.3801 - val_auc:  0.7468 - val_logloss:  0.3864


155it [00:01, 85.86it/s]


Epoch 30/50
1s - loss:  0.3800 - auc:  0.7674 - logloss:  0.3800 - val_auc:  0.7470 - val_logloss:  0.3862


155it [00:01, 85.95it/s]


Epoch 31/50
1s - loss:  0.3799 - auc:  0.7679 - logloss:  0.3799 - val_auc:  0.7470 - val_logloss:  0.3861


155it [00:01, 87.07it/s]


Epoch 32/50
1s - loss:  0.3798 - auc:  0.7671 - logloss:  0.3796 - val_auc:  0.7472 - val_logloss:  0.3860


155it [00:01, 87.11it/s]


Epoch 33/50
1s - loss:  0.3797 - auc:  0.7674 - logloss:  0.3800 - val_auc:  0.7472 - val_logloss:  0.3859


155it [00:01, 86.29it/s]


Epoch 34/50
1s - loss:  0.3796 - auc:  0.7673 - logloss:  0.3798 - val_auc:  0.7476 - val_logloss:  0.3857


155it [00:01, 87.05it/s]


Epoch 35/50
1s - loss:  0.3795 - auc:  0.7677 - logloss:  0.3793 - val_auc:  0.7475 - val_logloss:  0.3858


155it [00:01, 80.48it/s]


Epoch 36/50
2s - loss:  0.3794 - auc:  0.7681 - logloss:  0.3794 - val_auc:  0.7475 - val_logloss:  0.3857


155it [00:01, 86.08it/s]


Epoch 37/50
1s - loss:  0.3793 - auc:  0.7676 - logloss:  0.3795 - val_auc:  0.7474 - val_logloss:  0.3856


155it [00:01, 85.48it/s]


Epoch 38/50
1s - loss:  0.3793 - auc:  0.7676 - logloss:  0.3792 - val_auc:  0.7478 - val_logloss:  0.3855


155it [00:01, 86.59it/s]


Epoch 39/50
1s - loss:  0.3792 - auc:  0.7673 - logloss:  0.3792 - val_auc:  0.7480 - val_logloss:  0.3855


155it [00:01, 87.07it/s]


Epoch 40/50
1s - loss:  0.3792 - auc:  0.7677 - logloss:  0.3790 - val_auc:  0.7479 - val_logloss:  0.3855


155it [00:01, 86.48it/s]


Epoch 41/50
1s - loss:  0.3791 - auc:  0.7676 - logloss:  0.3792 - val_auc:  0.7480 - val_logloss:  0.3854


155it [00:01, 86.06it/s]


Epoch 42/50
1s - loss:  0.3790 - auc:  0.7687 - logloss:  0.3789 - val_auc:  0.7482 - val_logloss:  0.3853


155it [00:01, 84.61it/s]


Epoch 43/50
1s - loss:  0.3790 - auc:  0.7683 - logloss:  0.3790 - val_auc:  0.7480 - val_logloss:  0.3853


155it [00:01, 79.15it/s]


Epoch 44/50
2s - loss:  0.3790 - auc:  0.7683 - logloss:  0.3789 - val_auc:  0.7476 - val_logloss:  0.3853


155it [00:01, 83.61it/s]


Epoch 45/50
1s - loss:  0.3789 - auc:  0.7677 - logloss:  0.3789 - val_auc:  0.7478 - val_logloss:  0.3853


155it [00:01, 84.74it/s]


Epoch 46/50
1s - loss:  0.3789 - auc:  0.7677 - logloss:  0.3791 - val_auc:  0.7481 - val_logloss:  0.3852


155it [00:01, 85.53it/s]


Epoch 47/50
1s - loss:  0.3789 - auc:  0.7681 - logloss:  0.3786 - val_auc:  0.7482 - val_logloss:  0.3851


155it [00:01, 85.53it/s]


Epoch 48/50
1s - loss:  0.3788 - auc:  0.7687 - logloss:  0.3791 - val_auc:  0.7483 - val_logloss:  0.3851


155it [00:01, 85.25it/s]


Epoch 49/50
1s - loss:  0.3788 - auc:  0.7686 - logloss:  0.3787 - val_auc:  0.7484 - val_logloss:  0.3851


155it [00:02, 73.75it/s]


Epoch 50/50
2s - loss:  0.3788 - auc:  0.7683 - logloss:  0.3786 - val_auc:  0.7481 - val_logloss:  0.3852


## GBDT
GBDT: Gradient Boosting Decision Tree [2] is used to assess the performance of non deep-learning algorithms. Only dense features are used.

In [19]:
import xgboost as xgb
model_name = "xgBoost"
xgb_data_train = dctr_train
df_xgb_train =  xgb.DMatrix(xgb_data_train[dense_feature], 
                            label=xgb_data_train[target])

xgb_data_test1 = dctr_v1
df_xgb_test1 = xgb.DMatrix(xgb_data_test1[dense_feature],
                            label=xgb_data_test1[target])

params = {'objective': 'binary:logistic',
          'eval_metric': ['auc', 'logloss'],
          'learning_rate': 0.06,
          'num_leaves':256,
          'max_depth':7,
          'max_bin':64}

evallist = [(df_xgb_test1, 'eval'), (df_xgb_train, 'train')]
num_boost_round = 130
xgb = xgb.train(params,
                df_xgb_train,
                num_boost_round,
                evallist) 

xgb_pred_v1 = xgb.predict(df_xgb_test1)
xgb_label_v1 = xgb_data_test1[target]
auc_t1 = roc_auc_score(xgb_label_v1, xgb_pred_v1)
logloss_t1 = log_loss(xgb_label_v1, xgb_pred_v1)
res = {'eval_auc':auc_t1, "eval_logloss":logloss_t1}

results[model_name] = xgb, res, xgb_pred_v1, xgb_label_v1

Parameters: { "num_leaves" } might not be used.

  This could be a false alarm, with some parameters getting used by language bindings but
  then being mistakenly passed down to XGBoost core, or some parameter actually being used
  but getting flagged wrongly here. Please open an issue if you find any such cases.


[0]	eval-auc:0.74426	eval-logloss:0.66234	train-auc:0.75173	train-logloss:0.66225
[1]	eval-auc:0.74628	eval-logloss:0.63497	train-auc:0.75559	train-logloss:0.63484
[2]	eval-auc:0.74539	eval-logloss:0.61074	train-auc:0.75743	train-logloss:0.61042
[3]	eval-auc:0.74809	eval-logloss:0.58899	train-auc:0.76167	train-logloss:0.58850
[4]	eval-auc:0.74831	eval-logloss:0.56951	train-auc:0.76250	train-logloss:0.56880
[5]	eval-auc:0.74857	eval-logloss:0.55195	train-auc:0.76321	train-logloss:0.55111
[6]	eval-auc:0.74915	eval-logloss:0.53605	train-auc:0.76352	train-logloss:0.53515
[7]	eval-auc:0.74995	eval-logloss:0.52171	train-auc:0.76406	train-logloss:0.52070
[8]	eval-auc:0.75152	eval-l

## DNN
The Deep Neural Network is used as the first baseline taking both dense features and embedding of sparse id features into the model.

In [33]:
model_name = 'DNN'
epoch = 20
batch_size = 300

model = WDL(linear_feature_columns=[], 
            dnn_feature_columns=dnn_feature_columns, 
            dnn_use_bn=True,
            l2_reg_dnn=0.1, 
            l2_reg_embedding = 0.0001, 
            dnn_hidden_units=(128,64), 
            init_std=1, 
            dnn_dropout=0.5, 
            task='binary', 
            dnn_activation='relu', 
            device=device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.1, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy',metrics=['auc','logloss'])
model.fit(train_model_input,  train_label.values.astype(int), batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1)
res = model.evaluate(test_model_input1,test_label1)
pred1 = model.predict(test_model_input1)
results[model_name] = model, res, pred1, test_label1

cpu
Train on 46361 samples, validate on 11690 samples, 155 steps per epoch


1it [00:00, 33.92it/s]


RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

## WDL
Wide and Deep model [3] is widely accepted in real industrial applications. Compared with DNN, it has an additional linear model besides the deep model.

In [None]:
model_name = 'WDL'
epoch = 20
batch_size = 300

model = WDL(linear_feature_columns=linear_feature_columns, 
            dnn_feature_columns=dnn_feature_columns, 
            dnn_use_bn=True,
            l2_reg_dnn=0.1, 
            l2_reg_embedding = 0.0001, 
            dnn_hidden_units=(128,64), 
            init_std=1, 
            dnn_dropout=0.5, 
            task='binary', 
            dnn_activation='relu', 
            device=device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.05, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy',metrics=['auc','logloss'])
model.fit(train_model_input,  train_label.values.astype(int), batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1)

res = model.evaluate(test_model_input1,test_label1)
pred1 = model.predict(test_model_input1)

results[model_name] = model, res, pred1, test_label1

## DIN
Deep Interest Network [4] is an attention-based model in recommendation systems that has been proven successful in Alibaba. We use this as our second baseline, replacing the user’s historical item sequences with user’s historical voucher sequences to adapt to the VRR prediction task.

In [34]:
def get_xy_fd(dataset):
    print ("start generating x y data")
    
    dnn_feature_columns = []
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder[feat]), embedding_dim = 16) 
                            for feat in ['session_id','user_gender','user_age_level','user_purchase_level', 'promotion_id']]
    
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_promotion_id', len(label_encoder["promotion_id"]), embedding_dim=16), sequence_size)]
    
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in dense_feature]
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in ['keys_length']]

    feature_names = get_feature_names(dnn_feature_columns)
    print (feature_names)
    
    behavior_feature_list = ['promotion_id']

    y, x = gen_dmbgn_input_data(feature_names, dataset, target, label_encoder, 6, sparse_feature, hist_list_features)
    return x, y, dnn_feature_columns, behavior_feature_list

sequence_size = 6
embedding_dim1 = 16

x, y, dnn_feature_columns, behavior_feature_list = get_xy_fd(dctr_train)
test_model_input1, test_label1, _, _ = get_xy_fd(dctr_v1)
print("done")

start generating x y data
['session_id', 'user_gender', 'user_age_level', 'user_purchase_level', 'promotion_id', 'hist_promotion_id', 'voucher_min_spend', 'voucher_discount', 'user_trd__orders_cnt_hist', 'user_trd__actual_gmv_usd_hist', 'user_trd__orders_cnt_platform_discount_hist', 'user_trd__max_gmv_usd_hist', 'user_trd__avg_gmv_usd_hist', 'user_trd__min_gmv_usd_hist', 'keys_length']
handling hist_list_features Feature: hist_promotion_id


100%|██████████| 46361/46361 [00:00<00:00, 90532.25it/s] 


LabelEncoderExt transforming...
start generating x y data
['session_id', 'user_gender', 'user_age_level', 'user_purchase_level', 'promotion_id', 'hist_promotion_id', 'voucher_min_spend', 'voucher_discount', 'user_trd__orders_cnt_hist', 'user_trd__actual_gmv_usd_hist', 'user_trd__orders_cnt_platform_discount_hist', 'user_trd__max_gmv_usd_hist', 'user_trd__avg_gmv_usd_hist', 'user_trd__min_gmv_usd_hist', 'keys_length']
handling hist_list_features Feature: hist_promotion_id


100%|██████████| 11690/11690 [00:00<00:00, 110155.48it/s]

LabelEncoderExt transforming...





done


In [35]:
model_name = 'DIN'

model = DIN(dnn_feature_columns, 
            behavior_feature_list, 
            # target_emb_dim_aft=0, 
            device=device, 
            att_activation='prelu', 
            att_weight_normalization=False, 
            dnn_activation='relu', 
            l2_reg_dnn=0.1, 
            l2_reg_embedding = 0.0001, 
            dnn_hidden_units=(128,64), 
            att_hidden_size=(64,), 
            init_std=1, 
            dnn_dropout=0.5, 
            dnn_use_bn=True)


model.embedding_dict['promotion_id'].requires_grad = True
model.embedding_dict['promotion_id'].weight.requires_grad = True

model.embedding_dict['hist_promotion_id'].requires_grad = True
model.embedding_dict['hist_promotion_id'].weight.requires_grad = True
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.05, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy', metrics=['auc', 'logloss'])

loss_func = model.loss_func
optim = model.optim
metrics = model.metrics
feature_index = model.feature_index

model

DIN(
  (embedding_dict): ModuleDict(
    (session_id): Embedding(58052, 16)
    (user_gender): Embedding(4, 16)
    (user_age_level): Embedding(10, 16)
    (user_purchase_level): Embedding(12, 16)
    (promotion_id): Embedding(462, 16)
    (hist_promotion_id): Embedding(462, 16)
  )
  (linear_model): Linear(
    (embedding_dict): ModuleDict()
  )
  (out): PredictionLayer()
  (attention): HistAttentionSeqPoolingLayer(
    (local_att): AttentionUnit(
      (dnn): DNN(
        (dropout): Dropout(p=0, inplace=False)
        (linears): ModuleList(
          (0): Linear(in_features=64, out_features=64, bias=True)
        )
        (activation_layers): ModuleList(
          (0): PReLU(num_parameters=1)
        )
      )
      (dense): Linear(in_features=64, out_features=1, bias=True)
    )
  )
  (dnn): DNN(
    (dropout): Dropout(p=0.5, inplace=False)
    (linears): ModuleList(
      (0): Linear(in_features=104, out_features=128, bias=True)
      (1): Linear(in_features=128, out_features=64, 

In [36]:
device_ids = [i for i in range(torch.cuda.device_count())]
device_count = len(device_ids)
if torch.cuda.device_count() > 1:
    print("Use", torch.cuda.device_count(), "GPUs!")
    model = torch.nn.DataParallel(model, device_ids)
    
model = model.to(device)

In [37]:
epoch = 20
batch_size = 300
res, pred1 = fit(model, feature_index, optim, metrics, loss_func, x, y, batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1, device=device, device_count=device_count)

results[model_name] = model, res, pred1, test_label1

Train on 46361 samples, validate on 11690 samples, 155 steps per epoch


155it [00:03, 47.30it/s]


Epoch 1/20
Epoch Time : 3
3s - loss:  0.4593 - auc:  0.6290 - logloss:  0.4590 - val_auc:  0.7100 - val_logloss:  0.4010


155it [00:03, 49.40it/s]


Epoch 2/20
Epoch Time : 3
3s - loss:  0.3933 - auc:  0.7319 - logloss:  0.3930 - val_auc:  0.7457 - val_logloss:  0.3845


155it [00:03, 49.96it/s]


Epoch 3/20
Epoch Time : 3
3s - loss:  0.3807 - auc:  0.7584 - logloss:  0.3803 - val_auc:  0.7609 - val_logloss:  0.3768


155it [00:03, 49.74it/s]


Epoch 4/20
Epoch Time : 3
3s - loss:  0.3731 - auc:  0.7730 - logloss:  0.3728 - val_auc:  0.7646 - val_logloss:  0.3752


155it [00:03, 50.96it/s]


Epoch 5/20
Epoch Time : 3
3s - loss:  0.3682 - auc:  0.7818 - logloss:  0.3679 - val_auc:  0.7697 - val_logloss:  0.3737


155it [00:03, 50.15it/s]


Epoch 6/20
Epoch Time : 3
3s - loss:  0.3644 - auc:  0.7867 - logloss:  0.3642 - val_auc:  0.7718 - val_logloss:  0.3719


155it [00:03, 49.69it/s]


Epoch 7/20
Epoch Time : 3
3s - loss:  0.3600 - auc:  0.7948 - logloss:  0.3598 - val_auc:  0.7714 - val_logloss:  0.3707


155it [00:03, 47.23it/s]


Epoch 8/20
Epoch Time : 3
3s - loss:  0.3566 - auc:  0.8003 - logloss:  0.3564 - val_auc:  0.7768 - val_logloss:  0.3684


155it [00:03, 49.00it/s]


Epoch 9/20
Epoch Time : 3
3s - loss:  0.3533 - auc:  0.8058 - logloss:  0.3532 - val_auc:  0.7761 - val_logloss:  0.3687


155it [00:03, 49.17it/s]


Epoch 10/20
Epoch Time : 3
3s - loss:  0.3490 - auc:  0.8110 - logloss:  0.3489 - val_auc:  0.7783 - val_logloss:  0.3704


155it [00:03, 48.08it/s]


Epoch 11/20
Epoch Time : 3
3s - loss:  0.3460 - auc:  0.8164 - logloss:  0.3463 - val_auc:  0.7716 - val_logloss:  0.3717


155it [00:03, 46.49it/s]


Epoch 12/20
Epoch Time : 3
3s - loss:  0.3415 - auc:  0.8246 - logloss:  0.3415 - val_auc:  0.7771 - val_logloss:  0.3708


155it [00:03, 47.47it/s]


Epoch 13/20
Epoch Time : 3
3s - loss:  0.3337 - auc:  0.8329 - logloss:  0.3338 - val_auc:  0.7723 - val_logloss:  0.3738


155it [00:03, 47.93it/s]


Epoch 14/20
Epoch Time : 3
3s - loss:  0.3238 - auc:  0.8484 - logloss:  0.3237 - val_auc:  0.7674 - val_logloss:  0.3868


155it [00:03, 48.14it/s]


Epoch 15/20
Epoch Time : 3
3s - loss:  0.3064 - auc:  0.8688 - logloss:  0.3065 - val_auc:  0.7647 - val_logloss:  0.3901


155it [00:03, 44.60it/s]


Epoch 16/20
Epoch Time : 3
3s - loss:  0.2729 - auc:  0.9015 - logloss:  0.2728 - val_auc:  0.7461 - val_logloss:  0.4346


155it [00:03, 47.79it/s]


Epoch 17/20
Epoch Time : 3
3s - loss:  0.2097 - auc:  0.9439 - logloss:  0.2096 - val_auc:  0.7324 - val_logloss:  0.5094


155it [00:03, 43.02it/s]


Epoch 18/20
Epoch Time : 3
3s - loss:  0.1313 - auc:  0.9755 - logloss:  0.1311 - val_auc:  0.7107 - val_logloss:  0.6225


155it [00:03, 48.53it/s]


Epoch 19/20
Epoch Time : 3
3s - loss:  0.0694 - auc:  0.9913 - logloss:  0.0693 - val_auc:  0.7056 - val_logloss:  0.6991


155it [00:03, 47.91it/s]


Epoch 20/20
Epoch Time : 3
3s - loss:  0.0341 - auc:  0.9972 - logloss:  0.0340 - val_auc:  0.7061 - val_logloss:  0.8405


# Proposed Method: DMBGN

## DMBGN-AvgPooling
Instead of using Higher-order Graph Neural Networks to model user-voucher-item relationships, it directly takes an average of pre-trained item embeddings from user behavior happening both before and after voucher collection. For target UVG, it only takes an average of pre-collection item embeddings.

In [38]:
def process_pretrain_emb(emb, emb_size = 16, spliter = " "):
    return np.zeros(emb_size, dtype=np.float32) if emb == 'nan' else np.array(emb.split(spliter), dtype=np.float32)

item_df[['atc_emb', 'ord_emb']] = item_df[['atc_emb', 'ord_emb']].astype('str')
item_emb_dict = {}
for index, row in tqdm(item_df.iterrows()):
    item_emb_dict[row['item_id']] = process_pretrain_emb(row['atc_emb']), process_pretrain_emb(row['ord_emb'])

286735it [00:15, 18552.69it/s]


In [39]:
emb_size = 16
sid_emb_dict_tmp = {}

session_df['sid_enc'] = label_encoder['session_id'].transform(session_df['session_id'])

for index, row in tqdm(session_df.iterrows()):
    sid = row['sid_enc']
    if sid not in sid_emb_dict_tmp:
        sid_emb_dict_tmp[row['sid_enc']] = np.zeros(emb_size, dtype=np.float32), np.zeros(emb_size, dtype=np.float32), 0, 0
    
    if row['rk'] > 6:
        continue
    
    emb_bef, emb_aft, cnt_bef, cnt_aft = sid_emb_dict_tmp.get(sid)
    item_atc_emb, item_ord_emb = item_emb_dict.get(row['item_id'])
    if row['type'] == 'bef':
        emb_bef += item_atc_emb if row['action_type'] == 'cart' else item_ord_emb
        cnt_bef += 1
    else:
        emb_aft += item_atc_emb if row['action_type'] == 'cart' else item_ord_emb
        cnt_aft += 1
    sid_emb_dict_tmp[sid] = emb_bef, emb_aft, cnt_bef, cnt_aft

sid_emb_dict = {}
for sid, value in tqdm(sid_emb_dict_tmp.items()):
    emb_bef, emb_aft, cnt_bef, cnt_aft = value
    sid_emb_dict[sid] = np.concatenate((emb_bef/(1.0*cnt_bef), emb_aft/(1.0*cnt_aft)), axis=0)

sid_emb_dict_bef = {}
for sid, value in tqdm(sid_emb_dict_tmp.items()):
    emb_bef, emb_aft, cnt_bef, cnt_aft = value
    sid_emb_dict_bef[sid] = emb_bef/(1.0*cnt_bef)

LabelEncoderExt transforming...


1118593it [00:51, 21626.99it/s]
100%|██████████| 58052/58052 [00:00<00:00, 183708.32it/s]
100%|██████████| 58052/58052 [00:00<00:00, 421114.80it/s]


In [40]:
def init_emb_ts(emb_dic, requires_grad = False, emb_size = 16, lbe = None):
    if lbe is None:
        raise Exception("Encoder is empty")
        
    indices = lbe.transform([str(val) for val in emb_dic.keys()])
    session_size = int(len(lbe))
    
    ts_emb = torch.rand(session_size, emb_size, dtype = torch.float)
    for i, (key, emb) in tqdm(enumerate(emb_dic.items())):
        ts_emb[indices[i]] = torch.FloatTensor(emb)
    emb_ts = torch.nn.Embedding.from_pretrained(ts_emb)
    emb_ts.weight.requires_grad = requires_grad
    return emb_ts

In [41]:
sid_emb_ts = init_emb_ts(sid_emb_dict, emb_size = 32, lbe=label_encoder['session_id'])
sid_emb_ts

LabelEncoderExt transforming...


58052it [00:00, 91819.28it/s]


Embedding(58052, 32)

In [42]:
def get_xy_fd(dataset):
    print ("start generating x y data")
    
    dnn_feature_columns = []
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder[feat]), embedding_dim = embedding_dim1) for feat in ['user_gender','user_age_level','user_purchase_level']]
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder['promotion_id']), embedding_dim = embedding_dim2) for feat in ['session_id']]
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder[feat]), embedding_dim = embedding_dim2) for feat in ['promotion_id']]
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder['session_id']), embedding_dim = embedding_dim1) for feat in ['sid']]
    
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_promotion_id', len(label_encoder["promotion_id"]), embedding_dim=embedding_dim2), sequence_size)]
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_session_id', len(label_encoder["session_id"]), embedding_dim=embedding_dim2), sequence_size)]

    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in dense_feature]
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in ['keys_length']]

    feature_names = get_feature_names(dnn_feature_columns)
    print ("feature_names:", feature_names)
    
    behavior_feature_list = ['promotion_id', 'session_id']
    dataset['sid'] = dataset['session_id']
    y, x = gen_dmbgn_input_data(feature_names, dataset, target, label_encoder, 6, sparse_feature, hist_list_features)
    x['session_id'] = x['promotion_id']
    return x, y, dnn_feature_columns, behavior_feature_list

In [43]:
sequence_size = 6
embedding_dim1 = 16 # pid
embedding_dim2 = sid_emb_ts.weight.shape[1] #sid
print(embedding_dim2)

x, y, dnn_feature_columns, behavior_feature_list = get_xy_fd(dctr_train)
test_model_input1, test_label1, _, _ = get_xy_fd(dctr_v1)
print("done")

32
start generating x y data
feature_names: ['user_gender', 'user_age_level', 'user_purchase_level', 'session_id', 'promotion_id', 'sid', 'hist_promotion_id', 'hist_session_id', 'voucher_min_spend', 'voucher_discount', 'user_trd__orders_cnt_hist', 'user_trd__actual_gmv_usd_hist', 'user_trd__orders_cnt_platform_discount_hist', 'user_trd__max_gmv_usd_hist', 'user_trd__avg_gmv_usd_hist', 'user_trd__min_gmv_usd_hist', 'keys_length']
handling hist_list_features Feature: hist_promotion_id


100%|██████████| 46361/46361 [00:00<00:00, 92672.55it/s] 


LabelEncoderExt transforming...
handling hist_list_features Feature: hist_session_id


100%|██████████| 46361/46361 [00:00<00:00, 96228.61it/s] 


LabelEncoderExt transforming...
start generating x y data
feature_names: ['user_gender', 'user_age_level', 'user_purchase_level', 'session_id', 'promotion_id', 'sid', 'hist_promotion_id', 'hist_session_id', 'voucher_min_spend', 'voucher_discount', 'user_trd__orders_cnt_hist', 'user_trd__actual_gmv_usd_hist', 'user_trd__orders_cnt_platform_discount_hist', 'user_trd__max_gmv_usd_hist', 'user_trd__avg_gmv_usd_hist', 'user_trd__min_gmv_usd_hist', 'keys_length']
handling hist_list_features Feature: hist_promotion_id


100%|██████████| 11690/11690 [00:00<00:00, 82541.69it/s]

LabelEncoderExt transforming...





handling hist_list_features Feature: hist_session_id


100%|██████████| 11690/11690 [00:00<00:00, 91495.64it/s]

LabelEncoderExt transforming...





done


In [44]:
model_name = 'DMBGN_AvgPooling'

model = DMBGN(dnn_feature_columns, 
            behavior_feature_list, 
            target_emb_dim_aft=0, 
            sequence_size=6,
            device=device, 
            att_activation='prelu', 
            att_weight_normalization=False, 
            dnn_activation='relu', 
            l2_reg_dnn=0.1, 
            l2_reg_embedding = 0.0001, 
            dnn_hidden_units=(128,64), 
            att_hidden_size=(64,), 
            init_std=1, 
            dnn_dropout=0.5, 
            dnn_use_bn=True)

model.embedding_dict['hist_session_id'] = sid_emb_ts
model.embedding_dict['hist_session_id'].requires_grad = False
model.embedding_dict['hist_session_id'].weight.requires_grad = False

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.05, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy', metrics=['auc', 'logloss'])


loss_func = model.loss_func
optim = model.optim
metrics = model.metrics
feature_index = model.feature_index

model

DMBGN(
  (embedding_dict): ModuleDict(
    (user_gender): Embedding(4, 16)
    (user_age_level): Embedding(10, 16)
    (user_purchase_level): Embedding(12, 16)
    (session_id): Embedding(462, 32)
    (promotion_id): Embedding(462, 32)
    (sid): Embedding(58052, 16)
    (hist_promotion_id): Embedding(462, 32)
    (hist_session_id): Embedding(58052, 32)
  )
  (linear_model): Linear(
    (embedding_dict): ModuleDict()
  )
  (out): PredictionLayer()
  (attention): HistAttentionSeqPoolingLayer(
    (local_att): AttentionUnit(
      (dnn): DNN(
        (dropout): Dropout(p=0, inplace=False)
        (linears): ModuleList(
          (0): Linear(in_features=256, out_features=64, bias=True)
        )
        (activation_layers): ModuleList(
          (0): PReLU(num_parameters=1)
        )
      )
      (dense): Linear(in_features=64, out_features=1, bias=True)
    )
  )
  (dnn): DNN(
    (dropout): Dropout(p=0.5, inplace=False)
    (linears): ModuleList(
      (0): Linear(in_features=209, out_

In [45]:
device_ids = [i for i in range(torch.cuda.device_count())]
device_count = len(device_ids)
if torch.cuda.device_count() > 1:
    print("Use", device_count, "GPUs!")
    model = torch.nn.DataParallel(model, device_ids)
    
model = model.to(device)

In [46]:
epoch = 20
batch_size = 300
res, pred1 = fit(model, feature_index, optim, metrics, loss_func, x, y, batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1, device=device, device_count=device_count)

results[model_name] = model, res, pred1, test_label1

Train on 46361 samples, validate on 11690 samples, 155 steps per epoch


155it [00:04, 38.55it/s]


Epoch 1/20
Epoch Time : 4
4s - loss:  0.5247 - auc:  0.6246 - logloss:  0.4686 - val_auc:  0.7161 - val_logloss:  0.3991


155it [00:03, 39.41it/s]


Epoch 2/20
Epoch Time : 4
4s - loss:  0.4366 - auc:  0.7438 - logloss:  0.3874 - val_auc:  0.7542 - val_logloss:  0.3805


155it [00:03, 39.18it/s]


Epoch 3/20
Epoch Time : 4
4s - loss:  0.4232 - auc:  0.7638 - logloss:  0.3775 - val_auc:  0.7642 - val_logloss:  0.3757


155it [00:04, 37.73it/s]


Epoch 4/20
Epoch Time : 4
4s - loss:  0.4152 - auc:  0.7749 - logloss:  0.3717 - val_auc:  0.7691 - val_logloss:  0.3725


155it [00:03, 41.07it/s]


Epoch 5/20
Epoch Time : 4
4s - loss:  0.4085 - auc:  0.7825 - logloss:  0.3674 - val_auc:  0.7698 - val_logloss:  0.3737


155it [00:03, 39.88it/s]


Epoch 6/20
Epoch Time : 4
4s - loss:  0.4037 - auc:  0.7881 - logloss:  0.3642 - val_auc:  0.7737 - val_logloss:  0.3706


155it [00:03, 39.12it/s]


Epoch 7/20
Epoch Time : 4
4s - loss:  0.3990 - auc:  0.7945 - logloss:  0.3601 - val_auc:  0.7728 - val_logloss:  0.3809


155it [00:03, 39.54it/s]


Epoch 8/20
Epoch Time : 4
4s - loss:  0.3953 - auc:  0.7997 - logloss:  0.3573 - val_auc:  0.7711 - val_logloss:  0.3744


155it [00:03, 39.35it/s]


Epoch 9/20
Epoch Time : 4
4s - loss:  0.3914 - auc:  0.8039 - logloss:  0.3545 - val_auc:  0.7729 - val_logloss:  0.3761


155it [00:04, 38.08it/s]


Epoch 10/20
Epoch Time : 4
4s - loss:  0.3878 - auc:  0.8091 - logloss:  0.3522 - val_auc:  0.7746 - val_logloss:  0.3703


155it [00:04, 37.93it/s]


Epoch 11/20
Epoch Time : 4
4s - loss:  0.3841 - auc:  0.8131 - logloss:  0.3487 - val_auc:  0.7730 - val_logloss:  0.3709


155it [00:04, 36.64it/s]


Epoch 12/20
Epoch Time : 4
4s - loss:  0.3818 - auc:  0.8166 - logloss:  0.3468 - val_auc:  0.7769 - val_logloss:  0.3723


155it [00:04, 38.12it/s]


Epoch 13/20
Epoch Time : 4
4s - loss:  0.3775 - auc:  0.8222 - logloss:  0.3425 - val_auc:  0.7721 - val_logloss:  0.3735


155it [00:04, 37.38it/s]


Epoch 14/20
Epoch Time : 4
4s - loss:  0.3720 - auc:  0.8295 - logloss:  0.3370 - val_auc:  0.7706 - val_logloss:  0.3791


155it [00:04, 37.73it/s]


Epoch 15/20
Epoch Time : 4
4s - loss:  0.3642 - auc:  0.8420 - logloss:  0.3295 - val_auc:  0.7629 - val_logloss:  0.3870


155it [00:04, 37.52it/s]


Epoch 16/20
Epoch Time : 4
4s - loss:  0.3466 - auc:  0.8627 - logloss:  0.3122 - val_auc:  0.7494 - val_logloss:  0.4106


155it [00:04, 37.62it/s]


Epoch 17/20
Epoch Time : 4
4s - loss:  0.3103 - auc:  0.9003 - logloss:  0.2756 - val_auc:  0.7308 - val_logloss:  0.4587


155it [00:04, 37.59it/s]


Epoch 18/20
Epoch Time : 4
4s - loss:  0.2410 - auc:  0.9494 - logloss:  0.2066 - val_auc:  0.7236 - val_logloss:  0.5424


155it [00:04, 36.63it/s]


Epoch 19/20
Epoch Time : 4
4s - loss:  0.1621 - auc:  0.9806 - logloss:  0.1273 - val_auc:  0.7061 - val_logloss:  0.7095


155it [00:04, 37.46it/s]


Epoch 20/20
Epoch Time : 4
4s - loss:  0.0978 - auc:  0.9942 - logloss:  0.0630 - val_auc:  0.6987 - val_logloss:  0.8358


## DMBGN-Pretrained
It uses the same weight parameters of Higher-order GNN learned during the voucher embedding pre-training as mentioned in Section 3.3. The values of weight parameters are not further updated during the main task training for DMBGN-Pretrained variant.

In [None]:
gnn_session_df = session_df.copy()
sid_lbe = LabelEncoderExt()
gnn_session_df['session_id'] = sid_lbe.fit_transform(gnn_session_df['session_id'])
gnn_session_df = gnn_session_df.fillna(0)
gnn_session_df

### UVG Graphs
load the related UVG network into the InMemoryDataset for training

In [None]:
from torch_geometric.data import DataLoader
batch_size= 512

geometric_data_path = './data/voucher_geometric/'
processed_file_name = 'graph_cache'
gnn_dat = VoucherGraphDataset(root=geometric_data_path, processed_file_name=processed_file_name, gnn_session_df=gnn_session_df)

data_loader = DataLoader(gnn_dat, batch_size=batch_size)
data_loader

### GNN Networks
In this section, we first define the User-behavior Voucher Graph (UVG) network with VoucherGraphNet, training the network with loaded dataset data_loader with VoucherGraphDataset and output the generated UVG embedding $e_{UVG}$ with UVG score $s_{UVG}$

#### Get Pretrained Item Embeddings
load the pretrinaed embedding results for atc/ord item into the embedding tensor

In [None]:
item_size = int(len(item_df) * 1.2)
emb_size = 16
atc_emb_ts = torch.zeros(item_size, emb_size, dtype = torch.double)
ord_emb_ts = torch.zeros(item_size, emb_size, dtype = torch.double)

for i in tqdm(range(len(item_df))):
    item_id = int(item_df['item_id'][i])
    idx = hash_func(item_id, item_size)
    
    if pd.isna(item_df['atc_emb'][i]) is False and len(item_df['atc_emb'][i].split(' ')) > 3:
        atc_emb = [float(val) for val in item_df['atc_emb'][i].split(' ')]
        atc_emb_ts[idx] = torch.FloatTensor(atc_emb)
    
    if pd.isna(item_df['ord_emb'][i]) is False and len(item_df['ord_emb'][i].split(' ')) > 3:
        ord_emb = [float(val) for val in item_df['ord_emb'][i].split(' ')]
        ord_emb_ts[idx] = torch.FloatTensor(ord_emb) 
        
atc_emb_ts = torch.nn.Embedding.from_pretrained(atc_emb_ts)
atc_emb_ts.weight.requires_grad = False
ord_emb_ts = torch.nn.Embedding.from_pretrained(ord_emb_ts) 
ord_emb_ts.weight.requires_grad = False

atc_emb_ts, ord_emb_ts

#### Train GNN
train the UVG Graph with Higher-order GNN, the AUC output here repesents the AUC performance using only GNN network, which can be considered as another ablation study. The trained GNN is saved in the ./data/gnet.pretrained_{epochs}.bin file to be loaded for fine-tune later

In [None]:
item_features = ['item_id', 'item_category_id', 'item_brand_id', 'item_price_level']
promotion_features = ['promotion_id', 'session_id', 'voucher_min_spend', 'voucher_discount_amount']
all_features = item_features + promotion_features + ['action_type', 'label']

after_prefix = 'a_'
before_prefix = 'b_'

In [None]:
torch.set_default_tensor_type(torch.FloatTensor)

emb_info = {
    'item_category_id': (6000, 1),
    'item_price_level': (300, 1),
    'promotion_id': (600, int(atc_emb_ts.weight.shape[1])),
    'voucher_min_spend': (500, 1),
    'voucher_discount_amount': (500, 1),
}

emb_dict = {
    'atc': atc_emb_ts,
    'ord': ord_emb_ts,
}

gnet = VoucherGraphNet(item_features, promotion_features, emb_info, emb_dict, 
                       gprefix = [before_prefix, after_prefix], gactions = ['atc', 'ord'],
                       device=device)
gnet = gnet.float().to(device)

optimizer = torch.optim.Adam(gnet.parameters(), lr=0.001)
crit = torch.nn.BCELoss()

epochs = 3
gnn_file = './data/gnet.pretrained_{epochs}.bin'.format(epochs=epochs)

train_ready = True if os.path.exists(gnn_file) else False

if train_ready is False:
    gstep = 0
    gnet.train()
    loader_cnt = 0
    for epoch in range(epochs):
        print("# Epoch - {epoch}".format(epoch=epoch))
        try:
            for dat in tqdm(data_loader):
                optimizer.zero_grad()
                graph_dicts = dat.graph_dict
                out = None
                valid_list = []
                gstep += 1
                for graph_dict in graph_dicts:
                    res, promotion_id, emb_promotion, session_id, emb_session_id = gnet(graph_dict)
                    if res is None or torch.isnan(res):
                        valid_list.append(False)
                        continue
                    valid_list.append(True)
                    out = res if out is None else torch.cat([out, res], dim=0)

                pred = out.cpu()
                lbe = dat.label.cpu()[valid_list]
                loss = crit(pred, lbe)
                loss.backward()
                optimizer.step()

                if gstep % 25 == 0:
                    auc = roc_auc_score(lbe.detach().numpy().ravel(), pred.detach().numpy().ravel())
                    print('GNN[{gstep}] - auc {auc}; loss {loss}'.format(auc=auc ,gstep=gstep ,loss=loss))
        except Exception as e:
            print(e)

    torch.save(gnet.state_dict(), gnn_file)

In [None]:
# Load the pretrained GNN network parameter
gnet = VoucherGraphNet(item_features, promotion_features, emb_info, emb_dict, 
                       gprefix = [before_prefix, after_prefix], gactions = ['atc', 'ord'], 
                       device=device)
gnet.load_state_dict(torch.load(gnn_file))
gnet = gnet.float().to(device)


# Use Before UVG Only
gnet_before = VoucherGraphNet(item_features, promotion_features, emb_info, emb_dict, device=device)
gnet_before.load_state_dict(torch.load(gnn_file))
gnet_before = gnet_before.float().to(device)
gnet_before.gprefix = [before_prefix, 'unknown'] 

gnet

#### Generate Pretrained UVG Embeddings
generate pretrained embeddings from UVG network for each UVG. Note here we have gnet and gnet_before which represents UVG including both 'bef' and 'aft' user behaviors and including 'bef' user behaviors only. From the AUC performance output we see that gnet has better performance than gnet_before, which indicates the importance of taking 'aft' user behaviors into consideration. This can be considered as another ablation study

In [None]:
import pickle

pickle_protocol = 4

gnet.eval()
gnet_before.eval()
gstep = 0


raw_sid_uvg_graphs_dic = {}

promotion_emb_dic = {}
session_emb_raw_dic = {}
session_emb_raw_dic_before = {}

gnn_pretrained_emb_file = 'data/gnet.pretrained.{epochs}.emb.pkl'.format(epochs=epochs)

gnn_emb_ready = True if os.path.exists(gnn_pretrained_emb_file) else False

def hash_func(ts, item_size):
    return ts % item_size

if gnn_emb_ready is False:
    session_emb_dic = {}
    sid_uvg_graphs_dic = {}
    session_emb_dic_before = {}
    for dat in DataLoader(gnn_dat, batch_size=batch_size * 30):
        graph_dicts = dat.graph_dict
        out = None
        out2 = None
        valid_list = []
        valid_list2 = []
        gstep += 1
        for graph_dict in graph_dicts :
            res, promotion_id, emb_promotion, session_id, emb_session_id = gnet(graph_dict)
            res_before, _, _, _, emb_session_id_before = gnet_before(graph_dict)
            if res is None or torch.isnan(res):
                valid_list.append(False)
                continue

            valid_list.append(True)
            out = res if out is None else torch.cat([out, res], dim=0)

            promotion_emb_dic[promotion_id.unsqueeze(0).cpu().detach().numpy()[0]] = emb_promotion.cpu().detach().numpy()
            session_emb_dic[session_id.unsqueeze(0).cpu().detach().numpy()[0]] = emb_session_id.cpu().detach().numpy()
            sid_uvg_graphs_dic[session_id.unsqueeze(0).cpu().detach().numpy()[0]] = graph_dict
            
            if res_before is not None or torch.isnan(res) is False:
                session_emb_dic_before[session_id.unsqueeze(0).cpu().detach().numpy()[0]] = emb_session_id_before.cpu().detach().numpy()
                out2 = res_before if out2 is None else torch.cat([out2, res_before], dim=0)
                valid_list2.append(True)
            else :
                valid_list2.append(False)

        try:
            lbe = dat.label.cpu()[valid_list]
            lbe2 = dat.label.cpu()[valid_list2]
            pred = out.cpu()
            pred2 = out2.cpu()
            loss = crit(pred, lbe)
            loss2 = crit(pred2, lbe2)
            auc = roc_auc_score(lbe.detach().numpy().ravel(), pred.detach().numpy().ravel())
            auc2 = roc_auc_score(lbe2.detach().numpy().ravel(), pred2.detach().numpy().ravel())
            print("gstep = {gstep}; auc = {auc}; loss = {loss}; auc_before = {auc2} {loss2}".format(gstep=gstep,
                                                                                                    auc=auc, 
                                                                                                    loss2=loss2,
                                                                                                    auc2 =auc2,
                                                                                                    loss=loss))

        except Exception as e:
            print(e)

    for key, emb in tqdm(session_emb_dic.items()):
        session_emb_raw_dic[sid_lbe.label_encoder.classes_[key]] = emb

    # session_emb_raw_dic_before use for target session id with no after behaviors know beforehand
    for key, emb in tqdm(session_emb_dic_before.items()):
        session_emb_raw_dic_before[sid_lbe.label_encoder.classes_[key]] = emb

    for key, uvg in tqdm(sid_uvg_graphs_dic.items()):
         raw_sid_uvg_graphs_dic[sid_lbe.label_encoder.classes_[key]] = uvg
        
    with open(gnn_pretrained_emb_file, 'wb') as f:
        pickle.dump(promotion_emb_dic, f, pickle_protocol)
        pickle.dump(session_emb_raw_dic, f, pickle_protocol)
        pickle.dump(session_emb_raw_dic_before, f, pickle_protocol)
        pickle.dump(raw_sid_uvg_graphs_dic, f, pickle_protocol)

else :
    print('loading existing trained embeddings...')
    with open(gnn_pretrained_emb_file, "rb") as f:
        promotion_emb_dic = pickle.load(f)
        session_emb_raw_dic = pickle.load(f)
        session_emb_raw_dic_before = pickle.load(f)
        raw_sid_uvg_graphs_dic = pickle.load(f)
        
len(promotion_emb_dic), len(session_emb_raw_dic), len(session_emb_raw_dic_before), len(raw_sid_uvg_graphs_dic)

In [None]:
import math

labels = []
scores = []

labels_before = []
scores_before = [] 

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

for iii in tqdm(range(len(session_df))):
    sid = session_df['session_id'].values[iii]
    promotion_id = int(session_df['promotion_id'].values[iii])
    label = session_df['label'].values[iii]
    
    if sid not in session_emb_raw_dic:
        continue
    
    session_emb = session_emb_raw_dic[sid]
    promotion_emb = promotion_emb_dic[promotion_id]
            
    labels.append(label)
    score = sigmoid(np.matmul(session_emb, promotion_emb))
    scores.append(score)
    
    if sid in session_emb_raw_dic_before:
        session_emb_before = session_emb_raw_dic_before[sid]
        score_before = sigmoid(np.matmul(session_emb_before, promotion_emb))
        scores_before.append(score_before)
        labels_before.append(label)
            
auc = roc_auc_score(labels, scores)
auc_before = roc_auc_score(labels_before, scores_before)
len(scores), auc, auc_before

#### DMBGN Log Processing
log processing for DMBGN training process

In [None]:
df = user_voucher_log.copy()
df = df.take(np.random.permutation(len(df)))
df['hist_sid'] = df['hist_session_id']

sparse_feature = ['promotion_id','session_id','user_gender','user_age_level','user_purchase_level'] #['promotion_id','voucher_min_spend','voucher_discount_amount']

hist_list_features = ['hist_promotion_id','hist_sid','keys_length']

ignore_features=['hist_session_id', 'dtype','venture','ds','user_id','label','voucher_collect_time','voucher_redeem_time','campaign_name','rk']

dense_feature = []
train_features = []

for feat in df.columns:
    if feat in ignore_features:
        continue
    if feat not in hist_list_features:
        train_features.append(feat)
    if feat not in hist_list_features and feat not in sparse_feature:
        dense_feature.append(feat)

target = 'label'
df[['session_id','promotion_id','user_gender','user_age_level','user_purchase_level']]=df[['session_id','promotion_id','user_gender','user_age_level','user_purchase_level']].astype('str')

In [None]:
label_encoder = {}
for feat in sparse_feature:
    df[feat] = df[feat].fillna(0)
    print("lbe {}".format(feat))
    lbe = LabelEncoderExt()
    lbe.fit(df[feat])
    df[feat] = lbe.transform(df[feat])
    label_encoder[feat] = lbe
    logging.warn('LabelEncoder encoding ' + feat + " len " + str(len(lbe)))
    
label_encoder['sid'] = label_encoder['session_id']
df['sid'] = df['promotion_id']
print("done")

In [None]:
mean_kv = {}
std_kv = {}
for feat in dense_feature:
    print(feat)
    mean_kv[feat] = df[feat].mean()
    std_kv[feat] = df[feat].std()
    df[feat] = (df[feat] - mean_kv[feat]) / std_kv[feat]

deep_ctr_df = df 
dctr_train = deep_ctr_df[deep_ctr_df.dtype == 'train']
dctr_v1 = deep_ctr_df[deep_ctr_df.dtype == 'test']

In [None]:
def get_xy_fd(dataset):
    print ("start generating x y data")
    
    dnn_feature_columns = []
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder[feat]), embedding_dim = embedding_dim1) for feat in ['promotion_id', 'user_gender','user_age_level','user_purchase_level']]
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder['promotion_id']), embedding_dim = embedding_dim2) for feat in ['sid']]
    
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_promotion_id', len(label_encoder["promotion_id"]), embedding_dim=embedding_dim1), sequence_size)]
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_sid', len(label_encoder["sid"]), embedding_dim=embedding_dim2), sequence_size)]
    
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in dense_feature]
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in ['keys_length']]

    feature_names = get_feature_names(dnn_feature_columns)
    print ("feature_names:", feature_names)
    
    behavior_feature_list = ['promotion_id', 'sid']

    y, x = gen_dmbgn_input_data(feature_names, dataset, target, label_encoder, 6, sparse_feature, hist_list_features)
 
    return x, y, dnn_feature_columns, behavior_feature_list

sequence_size = 6
embedding_dim1 = 16 # pid
embedding_dim2 = 16 # sid
print(embedding_dim2)

x, y, dnn_feature_columns, behavior_feature_list = get_xy_fd(dctr_train)
test_model_input1, test_label1, _, _ = get_xy_fd(dctr_v1)
print("done")

In [None]:
session_emb_bef_aft_16 = init_emb_ts(session_emb_raw_dic, requires_grad = False, lbe = label_encoder['session_id'])
promotion_emb_bef_aft_16 = init_emb_ts(promotion_emb_dic, requires_grad = True, lbe = label_encoder['promotion_id'])

In [None]:
model_name = 'DMBGN_Pretrained'

model = DMBGN(dnn_feature_columns, 
            behavior_feature_list,
            target_emb_dim_aft=0, 
            sequence_size=6,
            device=device, 
            att_activation='prelu', 
            att_weight_normalization=False, 
            dnn_activation='relu', 
            l2_reg_dnn=0.1, 
            l2_reg_embedding = 0.0001, 
            dnn_hidden_units=(128,64), 
            att_hidden_size=(64,), 
            init_std=1, 
            dnn_dropout=0.5, 
            dnn_use_bn=True)


gnn_fine_tune = False
model.embedding_dict['hist_promotion_id'] = promotion_emb_bef_aft_16 
model.embedding_dict['hist_promotion_id'].requires_grad = gnn_fine_tune
model.embedding_dict['hist_promotion_id'].weight.requires_grad = gnn_fine_tune

model.embedding_dict['hist_sid'] = session_emb_bef_aft_16
model.embedding_dict['hist_sid'].requires_grad = gnn_fine_tune
model.embedding_dict['hist_sid'].weight.requires_grad = gnn_fine_tune


optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.05, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy', metrics=['auc', 'logloss'])


loss_func = model.loss_func
optim = model.optim
metrics = model.metrics
feature_index = model.feature_index

model

In [None]:
device_ids = [i for i in range(torch.cuda.device_count())]
device_count = len(device_ids)
if torch.cuda.device_count() > 1:
    print("Use", device_count, "GPUs!")
    model = torch.nn.DataParallel(model, device_ids)
    
model = model.to(device)

In [None]:
epoch = 20
batch_size = 300
res, pred1 = fit(model, feature_index, optim, metrics, loss_func, x, y, batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1, device=device, device_count=device_count)

results[model_name] = model, res, pred1, test_label1

## DMBGN
The proposed model in this work, which loads the pre-trained GNN network including the item and voucher node embeddings. The GNN network parameters are further fine-tuned according to the final training loss. \
Note that it might takes a longer time to train DMBGN as it involves the training of GNN network. In our work, we used multiple GPUs to accerlate the training process

In [None]:
def get_xy_fd(dataset):
    print ("start generating x y data")
    
    dnn_feature_columns = []
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder[feat]), embedding_dim = embedding_dim1) for feat in ['promotion_id', 'user_gender','user_age_level','user_purchase_level']]
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder['promotion_id']), embedding_dim = embedding_dim2) for feat in ['sid']]
    dnn_feature_columns += [SparseFeat(feat, len(label_encoder[feat]), embedding_dim = 1) for feat in ['session_id']]
    
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_promotion_id', len(label_encoder["promotion_id"]), embedding_dim=embedding_dim1), sequence_size)]
    dnn_feature_columns += [VarLenSparseFeat(SparseFeat('hist_sid', len(label_encoder["sid"]), embedding_dim=embedding_dim2), sequence_size)]
    
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in dense_feature]
    dnn_feature_columns += [DenseFeat(feat, 1, )  for feat in ['keys_length']]

    feature_names = get_feature_names(dnn_feature_columns)
    print ("feature_names:", feature_names)
    
    behavior_feature_list = ['promotion_id', 'sid']

    y, x = gen_dmbgn_input_data(feature_names, dataset, target, label_encoder, 6, sparse_feature, hist_list_features)
 
    return x, y, dnn_feature_columns, behavior_feature_list

sequence_size = 6
embedding_dim1 = 16 # pid
embedding_dim2 = 16 # sid
print(embedding_dim2)

x, y, dnn_feature_columns, behavior_feature_list = get_xy_fd(dctr_train)
test_model_input1, test_label1, _, _ = get_xy_fd(dctr_v1)
print("done")

In [None]:
session_emb_bef_aft_16 = init_emb_ts(session_emb_raw_dic, requires_grad = False, lbe = label_encoder['session_id'])
promotion_emb_bef_aft_16 = init_emb_ts(promotion_emb_dic, requires_grad = True, lbe = label_encoder['promotion_id'])

In [None]:
hash_sid_uvg_graphs_dic = {}
hash_sids = label_encoder['session_id'].transform([str(val) for val in raw_sid_uvg_graphs_dic.keys()])
for i, (key, uvgs) in tqdm(enumerate(raw_sid_uvg_graphs_dic.items())):
    hash_sid_uvg_graphs_dic[hash_sids[i]] = uvgs

len(hash_sid_uvg_graphs_dic)

In [None]:
model_name = 'DMBGN'

model = DMBGN(dnn_feature_columns, 
              behavior_feature_list,
              target_emb_dim_aft=0, 
              sequence_size=6,
              device=device, 
              att_activation='prelu', 
              att_weight_normalization=False, 
              dnn_activation='relu', 
              l2_reg_dnn=0.1, 
              l2_reg_embedding = 0.0001, 
              dnn_hidden_units=(128,64), 
              att_hidden_size=(64,), 
              init_std=1, 
              dnn_dropout=0.5, 
              dnn_use_bn=True,
              gnet_tune=True,
              hist_gnn_dropout=0.6, 
              gnet=gnet, 
              gnet_before=gnet_before,
              hash_sid_uvg_graphs_dic=hash_sid_uvg_graphs_dic)


zeros = torch.zeros(len(label_encoder['session_id']), 1, dtype = torch.float)
model.embedding_dict['session_id'] = torch.nn.Embedding.from_pretrained(zeros)
model.embedding_dict['session_id'].requires_grad = False
model.embedding_dict['session_id'].weight.requires_grad = False

model.embedding_dict['hist_promotion_id'] = promotion_emb_bef_aft_16
model.embedding_dict['hist_promotion_id'].requires_grad = False
model.embedding_dict['hist_promotion_id'].weight.requires_grad = False

model.embedding_dict['hist_sid'] = session_emb_bef_aft_16
model.embedding_dict['hist_sid'].requires_grad = False
model.embedding_dict['hist_sid'].weight.requires_grad = False

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.05, amsgrad=True)
model.compile(optimizer, 'binary_crossentropy', metrics=['auc', 'logloss'])


loss_func = model.loss_func
optim = model.optim
metrics = model.metrics
feature_index = model.feature_index

model

In [None]:
device_ids = [i for i in range(torch.cuda.device_count())]
device_count = len(device_ids)
if torch.cuda.device_count() > 1:
    print("Use", torch.cuda.device_count(), "GPUs!")
    model = torch.nn.DataParallel(model, device_ids)
    
model = model.to(device)

In [None]:
epoch = 20
batch_size = 300
res, pred1 = fit(model, feature_index, optim, metrics, loss_func, x, y, batch_size=batch_size, epochs=epoch, validation_data=(test_model_input1,test_label1),verbose=1, device=device, device_count=device_count)

results[model_name] = model, res, pred1, test_label1

# Summary

In [None]:
results

In [None]:
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
fpr = dict()
tpr = dict()
roc_auc = dict()

i = 0
lw = 1

colors = ["cornflowerblue", "darkorange", "red", "blue", "green", "violet", "black", "purple", "olive", "cyan"]


for model_name, (model, res, pred1, test_label1) in results.items():
    fpr[i], tpr[i], _ = roc_curve(test_label1.astype(int),  pred1)
    roc_auc = auc(fpr[i], tpr[i])
    plt.plot(fpr[i], tpr[i], color=colors[i], lw=lw, label = model_name + ' ROC curve (area = ' + str(round(roc_auc, 4)) + ')')

    i += 1
    plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    
plt.title('User-Voucher Redemption Dataset ROC')
plt.legend(loc="lower right")
plt.show()

In [None]:
def relaImpr(a, b):
    x = ((a-0.5)/(b-0.5) - 1.0)*100.0
    return str("%.2f" % x) + '%'

dnn_auc = results['DNN'][1]['eval_auc']
din_auc = results['DIN'][1]['eval_auc']

table = PrettyTable(['Model','AUC','RelaImpr(DNN)','RelaImpr(DIN)', 'Logloss'], digits = 4, rounds=True)
for model_name, (model, res, pred1, test_label1) in results.items():
    table.add_row([model_name, "%.4f" % res['eval_auc'], relaImpr(res['eval_auc'], dnn_auc), relaImpr(res['eval_auc'], din_auc), "%.4f" % res['eval_logloss']])
print(table)

Note that xgBoost has a higher AUC compared to DNN and WDL, mainly due to the small sample size.

# Reference

- [1] H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner,Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al.2013.  Ad click prediction: a view from the trenches. InProceedings of the 19thACM SIGKDD international conference on Knowledge discovery and data mining.1222–1230.
- [2] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma,Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boostingdecision tree.Advances in neural information processing systems30 (2017), 3146–3154.
- [3] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016. Wide & Deep Learning for Recommender Systems.CoRRabs/1606.07792(2016). arXiv:1606.07792  http://arxiv.org/abs/1606.07792 .
- [4] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, YanghuiYan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-throughrate prediction. InProceedings of the 24th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining. 1059–1068.