Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto Micro Batch] auc is unstable when enable auto_micro_batch in DIN model implemented based on deepctr #164

Open
amare opened this issue Apr 15, 2022 · 0 comments

Comments

@amare
Copy link

amare commented Apr 15, 2022

auc is unstable when enable auto_micro_batch in DIN model implemented based on deepctr

Deeprec Info

Build by myself, commit id is 31f83623dde1a1d3792d7f41ba310b29e40abaa7, released by name r1.15.5-deeprec2204

Description

Everything is ok when using default deeprec environment, and auc is around to 0.716 under multiple experiments. However, when using the feature Auto Micro Batch, the auc fluctuates in this range [0.71-0.74] with slower training performance

below is the code skeleton


import tensorflow as tf
import horovod.tensorflow as hvd

class DIN:
    # implemented based on [deepctr](https://github.com/shenweichen/DeepCTR)
    pass

def prepareDataSet(data_path, batch_size):
    # parsed by tf.data.Dataset with prefetch
    pass

def create_model(data_path='.', batch_size=512, learning_rate=0.01):
    parsed_dataset = prepareDataSet(data_path, batch_size)
    iterator = parsed_dataset.make_one_shot_iterator()
    input_features, label = iterator.get_next()
    label = tf.reshape(label, [-1, 1])

    output = DIN(input_features)

    optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate * hvd.size(), initial_accumulator_value=1e-30)
    optimizer = hvd.DistributedOptimizer(optimizer)

    loss = tf.keras.losses.BinaryCrossentropy(from_logits=False)(label, output)
    global_step = tf.train.get_or_create_global_step()
    train_op = optimizer.minimize(loss, global_step=global_step)
    _, auc = tf.metrics.auc(label, output)

    return train_op, auc

def create_sess_config(deeprec_auto_micro_batch):
    sess_config = tf.ConfigProto()
    sess_config.gpu_options.allow_growth = False
    sess_config.gpu_options.visible_device_list = str(hvd.local_rank())

    if deeprec_auto_micro_batch:
        sess_config.graph_options.optimizer_options.micro_batch_num = 2

    return sess_config

def train(deeprec_auto_micro_batch ):
    batch_size = 512 if deeprec_auto_micro_batch else 1024
    train_op, auc = create_model(batch_size=batch_size)

    sess_config = create_sess_config(deeprec_auto_micro_batch=True)
    hooks = [
        hvd.BroadcastGlobalVariablesHook(0),
    ]
    with tf.train.MonitoredTrainingSession(hooks=hooks,
                                           config=sess_config) as mon_sess:
        fetches = {
            "train_op": train_op,
            'auc': auc,
        }

        while not mon_sess.should_stop():
            results = mon_sess.run(fetches)
            print(results['auc'])


if __name__ == "__main__":
    hvd.init()

    deeprec_auto_micro_batch = True
    train(deeprec_auto_micro_batch)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant