Add tensorboard support in Speedometer. #5345

zihaolucky · 2017-03-11T16:42:59Z

Add tensorboard logging support in Speedometer by optional import. Any other suggestion in callback functions? Should I change the Python setup.py and make dmlc/tensorboard a requirement?

zihaolucky · 2017-03-11T16:47:33Z

zihaolucky · 2017-03-14T06:14:08Z

@piiswrong I add this feature in Speedometer as it's the most commonly used callback function and it also logs the evaluation metrics, that's convenient.

piiswrong · 2017-03-14T07:29:18Z

I think its better to make a new file and add another callback. maybe add it back to speedometer after things have stablized.

zihaolucky · 2017-03-14T08:02:33Z

Agree. Can I create a tensorboard_callback.py and put this Speedometer(or change to another name) there? As the BatchEndParam only provide scalar value for logging training speed and evaluation metrics.

piiswrong · 2017-03-14T17:47:13Z

simply tensorboard is fine. also it shouldn't log anything to logging. we need better documentation on the parameters

zihaolucky · 2017-03-15T09:23:52Z

Refactor for metrics only, no longer mimic Speedometer. Add an example, with more detailed document about the API and TensorBoard as well.

zihaolucky · 2017-03-16T02:08:58Z

@piiswrong how about now?

piiswrong · 2017-03-22T21:40:51Z

Let's move this to mx.contrib

zihaolucky · 2017-03-23T03:56:21Z

Yep. why jenkins build error?

piiswrong · 2017-03-23T03:59:43Z

lint:

pylint python/mxnet --rcfile=/workspace/tests/ci_build/pylintrc
************* Module mxnet.contrib.init
C: 10, 0: Final newline missing (missing-final-newline)
************* Module mxnet.contrib.tensorboard
C: 29, 0: Line too long (101/100) (line-too-long)
make: *** [pylint] Error 16

zihaolucky · 2017-03-23T06:39:46Z

All passed now. Any ideas further? I'm now working on graph and embedding, might take sometime to do.

piiswrong · 2017-03-24T20:40:49Z

Thanks. merged

jmerkow · 2017-03-27T23:17:41Z

Can you provide source code for this? I am attempting to use this with the image_classification examples and I don't get graphs as one would expect.

I used the pypi to install tensorboard:

$pip freeze | grep tensorboard
tensorboard==1.0.0a6

I added/changed this to common/fit.py in image classification to get a dummy test running.
(FYI there is an error in the doc string for LogMetricsCallback, mx.tensorboard should be mx.contrib.tensorboard. lines)

    # starting around line 170 in image_classification/common/fit.py
    evaluation_log = 'logs/eval'
    training_log = 'logs/train'
    eval_end_callbacks = [mx.contrib.tensorboard.LogMetricsCallback(evaluation_log)]
    batch_end_callbacks += [mx.contrib.tensorboard.LogMetricsCallback(training_log)]
    # run
    model.fit(train,
        begin_epoch        = args.load_epoch if args.load_epoch else 0,
        num_epoch          = args.num_epochs,
        eval_data          = val,
        eval_metric        = eval_metrics,
        kvstore            = kv,
        optimizer          = args.optimizer,
        optimizer_params   = optimizer_params,
        initializer        = initializer,
        arg_params         = arg_params,
        aux_params         = aux_params,
        batch_end_callback = batch_end_callbacks, # This was updated
        eval_end_callback  = eval_end_callbacks, # This was added
        epoch_end_callback = checkpoint,
        allow_missing      = True,
        monitor            = monitor)

It appears that 'Step' is not being recorded properly. Attached are some screen shots.

if you look at the relative graph it becomes more clear what's happening:

zihaolucky · 2017-03-28T01:41:17Z

@jmerkow thanks for your feedback!

To clarify, did you use tensorboard --logdir=logs/train or tensorboard --logdir=logs/? As I know, if we use tensorboard --logdir=logs/, it should be two different colors in the graph.

You can use prefix in LogMetricsCallback to plot train&eval metrics in separate graph:

eval_end_callbacks = [mx.contrib.tensorboard.LogMetricsCallback(evaluation_log, prefix='eval')]
batch_end_callbacks += [mx.contrib.tensorboard.LogMetricsCallback(training_log, prefix='train')]

Or they would be in one graph just like your case, in this case, we have to use relative mode rather than step mode. Any suggestions are welcomed and let's make it better.

jmerkow · 2017-03-28T02:35:49Z

I used logs/. It just never got to the eval call back, I killed it before it finished. I am just testing the tensorboard to use to train on my data.
I solved it in my branch by adding the step manually, i.e.
self.summary_writer.add_scalar(name, value)
to
self.summary_writer.add_scalar(name, value, global_step=param.nbatch).

ysh329 · 2017-07-30T05:35:05Z

@jmerkow My Dear brother, I met same problem as yours: never got the eval call back curve or same color for eval and train curve. I didn't understand your means, May you clarify your code change below clearly, thanks a lot!

self.summary_writer.add_scalar(name, value)

to

self.summary_writer.add_scalar(name, value, global_step=param.nbatch).

* Add tensorboard support in Speedometer. * fix pylint. * Add tensorboard_callback. * Refactor. * fix lint.

Add tensorboard support in Speedometer.

34b38a4

fix pylint.

c2dc0c0

Add tensorboard_callback.

78749cf

Refactor.

97f8d92

Move to mx.contrib

0c7f4f6

fix lint.

ed25f19

piiswrong merged commit 1550f17 into apache:master Mar 24, 2017

jmerkow mentioned this pull request Mar 27, 2017

Call for PyPI (pip) installation pre-release Test dmlc/tensorboard#19

Open

Guneet-Dhillon pushed a commit to Guneet-Dhillon/mxnet that referenced this pull request Sep 13, 2017

Add tensorboard support in Speedometer. (apache#5345)

30822e4

* Add tensorboard support in Speedometer. * fix pylint. * Add tensorboard_callback. * Refactor. * fix lint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tensorboard support in Speedometer. #5345

Add tensorboard support in Speedometer. #5345

zihaolucky commented Mar 11, 2017

zihaolucky commented Mar 11, 2017

zihaolucky commented Mar 14, 2017 •

edited

piiswrong commented Mar 14, 2017

zihaolucky commented Mar 14, 2017

piiswrong commented Mar 14, 2017

zihaolucky commented Mar 15, 2017

zihaolucky commented Mar 16, 2017

piiswrong commented Mar 22, 2017

zihaolucky commented Mar 23, 2017

piiswrong commented Mar 23, 2017

zihaolucky commented Mar 23, 2017

piiswrong commented Mar 24, 2017

jmerkow commented Mar 27, 2017

zihaolucky commented Mar 28, 2017

jmerkow commented Mar 28, 2017 •

edited

ysh329 commented Jul 30, 2017

Add tensorboard support in Speedometer. #5345

Add tensorboard support in Speedometer. #5345

Conversation

zihaolucky commented Mar 11, 2017

zihaolucky commented Mar 11, 2017

zihaolucky commented Mar 14, 2017 • edited

piiswrong commented Mar 14, 2017

zihaolucky commented Mar 14, 2017

piiswrong commented Mar 14, 2017

zihaolucky commented Mar 15, 2017

zihaolucky commented Mar 16, 2017

piiswrong commented Mar 22, 2017

zihaolucky commented Mar 23, 2017

piiswrong commented Mar 23, 2017

zihaolucky commented Mar 23, 2017

piiswrong commented Mar 24, 2017

jmerkow commented Mar 27, 2017

zihaolucky commented Mar 28, 2017

jmerkow commented Mar 28, 2017 • edited

ysh329 commented Jul 30, 2017

zihaolucky commented Mar 14, 2017 •

edited

jmerkow commented Mar 28, 2017 •

edited