Building a visualization tool for MXNet #4003

zihaolucky · 2016-11-27T17:41:31Z

Hi hackers,

I've started working on building a visualization tool for MXNet, like TensorBoard for TensorFlow. As @piiswrong suggested in #3306 that 'try to strip TensorBoard out of tensorflow' and I'm going to work on this direction. Here're some of my notes after reading TensorBoard‘s documentation and searching its usage on the web, feel free to comment below.

Motivation and some backgrounds

I've tried to visualize the data using matplotlib and a bunch of helper tools like tsne on my daily work and I feel tired in rendering and adjusting the size/color of the images. Besides, it's not easy to share this data with my friends.

While TensorBoard provides good solutions for our daily use cases, such as learning curves, parameters/embedding visualization, also it's easy to share. See TensorBoard for more.

Daily use cases

Learning Curves. Visualize a scalar metric in training/testing such as accuracy, loss or some custom evaluation metrics.
Parameters insights. Like CNN's filters. Or parameters' histogram over time to debug gradient vanish by checking.
Embedding visualization. Some applications will visualize high dimension embedding data in a layer, like the last FullyConnect layer's output. In this case, tSNE is commonly used.

I think these could satisfy most people and it's already supported by TensorBoard with tf.scalar_summary, tf.image_summary, tf.histogram_summary and tensorboard.plugins.projector

TensorBoard usage

Some snippets from a tutorial on how to use TensorBoard.

# create a summary for our cost and accuracy
tf.scalar_summary("cost", cross_entropy)
tf.scalar_summary("accuracy", accuracy)

# merge all summaries into a single "operation" which we can execute in a session 
summary_op = tf.merge_all_summaries()

with tf.Session() as sess:
    # variables need to be initialized before we can use them
    sess.run(tf.initialize_all_variables())

    # create log writer object
    writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())
        
    # perform training cycles
    for epoch in range(training_epochs):
        
        # number of batches in one epoch
        batch_count = int(mnist.train.num_examples/batch_size)
        
        for i in range(batch_count):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            
            # perform the operations we defined earlier on batch
            _, summary = sess.run([train_op, summary_op], feed_dict={x: batch_x, y_: batch_y})
            
            # write log
            writer.add_summary(summary, epoch * batch_count + i)

The logic above is quite clear, where the accuracy and cost get updated every time when sess.run is called and return a Summary which is feed into log through SummaryWriter.

Feasibility

1.Easy to borrow and directly use in MXNet?

I've successfully visualized the 'makeup' curve using the code below:

counter = tf.Variable(1.0)

# create a summary for counter
tf.scalar_summary("counter", counter)

# merge all summaries into a single "operation" which we can execute in a session
summary_op = tf.merge_all_summaries()

with tf.Session() as sess:
    # variables need to be initialized before we can use them
    sess.run(tf.initialize_all_variables())

    # create log writer object
    writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())

    # perform training cycles
    for epoch in range(100):

        # perform the operations we defined earlier on batch
        counter.assign(epoch + np.random.standard_normal()).eval()
        summary = sess.run(summary_op)

        # write log
        writer.add_summary(summary, epoch)

So it means we could pass something common, here is numpy array and normal int, and reuse most of the code. I would discuss possible routes for creating an interface to connect MXNet and TensorBoard and I need your advice. But let's keep it simple now.

2.Could be striped alone?

From this README, I guess TensorBoard could be built independently?

Or, if you are building from source:

bazel build tensorflow/tensorboard:tensorboard
./bazel-bin/tensorflow/tensorboard/tensorboard --logdir=path/to/logs

TODO

To prove we can use TensorBoard in a dummy way:

Run several MXNet examples with minimum code from TF and visualize the outputs.

To keep our code clean and lightweight:

Varify whether TensorBoard could be built independently.

Or we could install entire TF together with MXNet? Is that acceptable?
I think it's okay but not good for our users and make this visualization tool too heavy, cause we also run core code in TensorFlow(the sess and Tensor.eval is actually computed by TF). But it depends on our checks, hard to tell.

Or any other way to workaround? As the summary is a proto in writer.add_summary(summary, epoch * batch_count + i) that means we could only use summaryWriter without using the computation of TF. It's possible because the doc in SummaryWriter.add_summary:

  def add_summary(self, summary, global_step=None):
    """Adds a `Summary` protocol buffer to the event file.

    This method wraps the provided summary in an `Event` protocol buffer
    and adds it to the event file.

    You can pass the result of evaluating any summary op, using
    [`Session.run()`](client.md#Session.run) or
    [`Tensor.eval()`](framework.md#Tensor.eval), to this
    function. Alternatively, you can pass a `tf.Summary` protocol
    buffer that you populate with your own data. The latter is
    commonly done to report evaluation results in event files.

    Args:
      summary: A `Summary` protocol buffer, optionally serialized as a string.
      global_step: Number. Optional global step value to record with the
        summary.
    """

If we decide to borrow TensorBoard:

Decide where to put the interface and what to do next, based on our experiments above.

The text was updated successfully, but these errors were encountered:

piiswrong · 2016-11-27T21:22:28Z

@leopd @mli

piiswrong · 2016-11-27T21:32:17Z

The way tensorboard works is it takes in a log file printed in a specific format and then renders them.
So we don't necessarily need tf.scalar_summary and tf.session to use it. We simply need to print log in the same format and run tensorboard.

Here is what I think would be a ideal solution:

we strip a minimum set of files related to tensorboard out of tensorflow and build it with Makefile or cmake, not bazel.
we modify mxnet's logging format so that it conforms with tensorboard.

But I haven't looked into this in-depth so this might be hard/impossible. So feel free to do anything that works for you first. We can discuss whether we want to merge it into mxnet or provide it as a separate solution afterwards.

zihaolucky · 2016-11-28T01:58:38Z

Yes, tensorboard only requires the proto of the logger results, but I didn't find the entrance to create a Summary object, which is return directly by scala_summary(an tensorflow op) and that means we have to run tf.run. I'm trying to walk around this.

I would look into this in the coming two weeks.

tqchen · 2016-11-28T02:14:08Z

I think tensorboard is relatively isolated. Last time i see the code, only the proto of the logger file is needed

leopd · 2016-11-28T17:41:12Z

My memory of using tensorboard is that those logfiles quickly get extremely large. Do people really share those logfiles with each other? It also made me worry that the huge amount of I/O would limit performance -- which would be more of an issue with MXNet than TF. So that's something else we can experiment/measure: what kind of IO bandwidth would be needed to produce these logfiles.

tqchen · 2016-11-29T23:12:00Z

c.f. Example of using Tensorboard in minpy

https://github.com/dmlc/minpy/blob/6e528ceab34f114f6c486c47b5e4cd417d8c03d5/docs/tutorial/visualization_tutorial/minpy_visualization.ipynb

@jermainewang maybe have more comments on details

zihaolucky · 2016-11-30T12:25:23Z

@tqchen @jermainewang Thanks for the reference, and I've found an API for scalar_summary without running tensorflow's op here but it still uses SummaryWriter and EventWriter for the tensorboard log file.

Although it has only scalar_summary now but seems they're actively working on it, I've sent an email to ask Dan if they have any plan on this direction to support more types of summary, but I haven't get feedback yet.

jermainewang · 2016-11-30T20:19:24Z

The minpy's way using tensorboard could be migrated to mxnet quite easily. There are majorly three components:

proto files: summary proto and event proto. They could be directly used. The summary writer could directly be borrowed from TF's python side.
EventWriter logic. Tensorflow has a EventWriter in c++ but could be easily rewritten in python.
RecordFileWriter logic. After serializing the event proto, TF uses recordio to write to disk. This part should be replace by our implementation, and that's it.

We plan to put the codes here: https://github.com/dmlc/minpy/tree/visualize/minpy/visualize . I will ping you again after it is updated.

jermainewang · 2016-11-30T22:52:05Z

The related PR is still under review here: dmlc/minpy#87

zihaolucky · 2016-12-01T01:52:55Z

@jermainewang That's great!

mufeili · 2016-12-01T03:09:44Z

Hi, I've finished doing the scalar summary part and is currently exploring image summaries and histogram summaries. We did not plan to do audio and graph summaries for minpy as minpy does not use a computational graph. But that should work for mxnet.

I also realize there is a new section in TensorBoard after the release of TensorFlow v0.12 for word embeddings, which is super cool: https://www.tensorflow.org/versions/master/how_tos/embedding_viz/index.html#tensorboard-embedding-visualization.

zihaolucky · 2016-12-05T15:15:30Z

Hey guys, I've finished the first one in the TODOs with generous help from @mufeili and @jermainewang

But it still requires a writer/RecordFileWriter from TF, I would submit the code as I finish the writer later.

zihaolucky · 2016-12-08T14:37:23Z

@mufeili Could you take a look at this issue? tensorflow/tensorflow#4181 in which danmane said it's 'tfrecord' that do that write file job. Then I dig into the code and find that the relevant C code tensorflow/core/lib/io/record_writer.cc and py_record_writer.cc, TensorFlow uses SWIG to make a wrapper to use them in Python.

I think it's too hard to write these in Python as it has so many dependencies, and it's not easy to use in other language, which means someone has to use Python interface for visualization purpose.

Can I just pull these related C files out, put it in core library and use SWIG or something else as a solution for Python interface? @piiswrong Could you give me some suggestions? What's your convention in writing a wrapper from C to Python?

mufeili · 2016-12-08T20:26:45Z

@zihaolucky tensorflow/tensorflow/core/lib/io/record_writer.cc is exactly where I got stuck at first. We then decided to use tf.python_io.TFRecordWriter for the time being.

zihaolucky · 2016-12-10T06:42:22Z

@mufeili @piiswrong

Good news, I've found that someone has already given a solution to write record_writer in python, check https://github.com/TeamHG-Memex/tensorboard_logger

I migrated the code to MXNet and it works, now we can use TensorBoard without relying on TF. So I've submitted the code to my branch https://github.com/zihaolucky/mxnet/tree/feature/tensorboard-support-experiment and please check it out.

mufeili · 2016-12-10T23:51:59Z

@zihaolucky Awesome! I've had a quick look at it. I think it currently only supports scalar summary as well so I am not sure if the record_writer function would still work when it comes to other kinds of summaries. But still lots of thanks!

zihaolucky · 2016-12-11T02:47:52Z

@mufeili Seems it could also support other types of summaries. As it writes a serialized event, but it just provides a scalar summary api.

https://github.com/TeamHG-Memex/tensorboard_logger/blob/master/tensorboard_logger/tensorboard_logger.py#L94-L103

terrytangyuan · 2016-12-11T03:44:46Z

Great work - exciting to see the progress! Note that you probably need to include the necessary copyright information if you borrow the code from some other project.

zihaolucky · 2016-12-11T04:05:49Z

@terrytangyuan Thanks for your kind reminder, I would do some research on the copyright issue.

piiswrong · 2016-12-11T05:46:35Z

Since mxnet as tf are both using apache it should be fine. Retaining the author comment in the begining of each file should be enough

…

On Sat, Dec 10, 2016 at 8:06 PM, Zihao Zheng ***@***.***> wrote: @terrytangyuan <https://github.com/terrytangyuan> Thanks for your kind reminder, I would do some research on the copyright issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4003 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAiudDQspmBbKcGjSlxguiCYcwABDecJks5rG3aogaJpZM4K9NFV> .

tqchen · 2016-12-11T18:45:45Z

It would be necessary to copy the LICENSE file from original repo, and retaining the copy right notice

zihaolucky · 2017-03-03T05:35:28Z

Update, we now provide a PyPI package for TensorBoard fans :)

dmlc/tensorboard#19

bravomikekilo · 2017-03-08T02:46:08Z

I make a standalone tensorboard by extract tensorboard's C++ dependency from TensorFlow.
So, we don't need to build whole TensorFlow now. Meanwhile, we can use the TensorFlow file system API from python through this reduced tensorflow library.
barvomikekilo/mxconsole

zihaolucky · 2017-03-08T05:22:34Z

@bravomikekilo great work! Any plan to ship to dmlc/tensorboard? And I believe you have to make it easier to maintain, as tensorboard might change very often and new features keep coming in(as they said in TF Dev Summit, they're going to provide more flexible plugin module for tensorboard developers). So I focus on logging part and try not to change the rendering part.

Just my personal opinion.

bravomikekilo · 2017-03-10T00:33:25Z

I mostly keep the structure of the tensorboard project, I going to fix the structure same as the offical tensorflow, so we can sync the change. I have enable the logging support from C++, so It will be much faster and reliable.

bravomikekilo · 2017-03-10T00:43:18Z

Or maybe we should merge dmlc/tensorboard to mxconsole? As the most tensorboard function can enabled from reduced tensorflow. Meanwhile, we can split the mxconsole to smaller module. The reduced tensorflow can do much more things.

zihaolucky · 2017-03-10T02:01:50Z

@piiswrong @jermainewang any thoughts?

bravomikekilo · 2017-03-10T03:32:11Z

Already merged dmlc/tensorboard to bravomikekilo/mxconsole, include native library powered summary api and tutorial.
The tutorial works fine now. I will clean up the project tomorrow.

piiswrong · 2017-03-10T03:39:13Z

What's the benefit of extracting the code vs cloning tensorflow?

bravomikekilo · 2017-03-10T03:41:32Z

The library is smaller and easy to build.

bravomikekilo · 2017-03-10T03:56:22Z

meanwhile smaller code base is much more clear and portable.
now the reduced tensorflow(tensorflow_fs) only contain about 300 source file, while origin TensorFlow
contains about 7000 source file.
tensorflow_fs keep the same project structure as TensorFlow, so it should be easy to sync the change.

bravomikekilo · 2017-03-11T04:23:37Z

@piiswrong

zihaolucky · 2017-03-11T05:04:13Z

The native library for potential more language interface support seems a good idea, while the maintainers still have to write a wrapper, same workload as they write logging interface in Scala or any other languages.

I encourage you to propose your roadmap toward this direction by extracting the code, then point out some promising benefits, if not, spending times on 10% difference while the 90% same is not a good idea.

bravomikekilo · 2017-03-11T05:44:22Z

Ok, I will try to add back the interface file for go and java from origin TensorFlow.
Scala can use java interface easily, right?
R, julia, js, and matlab only has protobuf library, so we need to write logging part.
for R, there are swig support, we should only need to change a few swig file, to add native support.
for julia, matlab, we may need use the C interface.
@zihaolucky

bravomikekilo · 2017-03-11T07:36:40Z

Besides, the native library provide a faster implementation of crc32 and protobuf writing, and it is possible to merge the native png encode support.
And considering the difference between TensorFlow and Mxnet , graph and embedding render and logger may change a lot, without a standalone tensorboard, it may be hard to achieve.

bravomikekilo · 2017-03-11T08:02:11Z

A sad story is that java and go interface don't have summary or something, maybe they just add the summary ops to the graph. Seems all logger still need to be write.

zihaolucky · 2017-03-11T08:08:28Z

Consider focusing on logging.

bravomikekilo · 2017-03-11T08:24:47Z

I can just extract only logging part, that is much smaller

bravomikekilo · 2017-03-11T08:26:52Z

maybe we should split the logging and rendering?
Now the only problem of rendering is just graph and embedding.

bravomikekilo · 2017-03-11T08:55:31Z

so, make a sum up.
we have three way to go.

fix the python code in dmlc/tensorboard
- easy to achieve
fix the python code in dmlc/tensorboard to use the native library in tensorboard build
- easy to achieve
- with native support
find a way to keep change between bravomikekilo/mxconsole and tensorflow/tensorflow

and a optional choose is to split the tensorflow_fs from mxconsole. that will easier to keep sync

RogerBorras · 2017-03-11T10:27:46Z

@zihaolucky @bravomikekilo Are you planning to port Tensorboard to mxnetR binding?? It will be great!! :)

bravomikekilo · 2017-03-11T11:41:02Z

I not good at R, but I will try. It shouldn't be too hard.

RogerBorras · 2017-03-11T11:45:32Z

Great, thanks a lot @bravomikekilo!

lichen11 · 2017-08-02T00:18:02Z

@bravomikekilo, @zihaolucky , @RogerBorras @thirdwing , it will be great if there will be a visualization board for mxnetR!

zihaolucky · 2017-08-03T02:07:28Z

@lichen11 @bravomikekilo If you can figure out the way to write the event file and the summary protobufs in R, then it could be achieved. Just refer the https://github.com/dmlc/tensorboard/tree/master/python and ping me if you need any help.

zihaolucky mentioned this issue Dec 1, 2016

Add Visualization Components dmlc/minpy#87

Merged

zihaolucky mentioned this issue Dec 10, 2016

Bring TensorBoard to MXNet #4178

Closed

tqchen closed this as completed Oct 19, 2017

emjotde mentioned this issue Sep 8, 2018

Add support for stand-alone Tensorboard marian-nmt/marian-dev#296

Open

Building a visualization tool for MXNet #4003

Building a visualization tool for MXNet #4003

Comments

zihaolucky commented Nov 27, 2016 • edited Loading

Motivation and some backgrounds

Daily use cases

TensorBoard usage

Feasibility

1.Easy to borrow and directly use in MXNet?

2.Could be striped alone?

TODO

piiswrong commented Nov 27, 2016

piiswrong commented Nov 27, 2016

zihaolucky commented Nov 28, 2016 • edited Loading

tqchen commented Nov 28, 2016

leopd commented Nov 28, 2016

tqchen commented Nov 29, 2016 • edited Loading

zihaolucky commented Nov 30, 2016

jermainewang commented Nov 30, 2016

jermainewang commented Nov 30, 2016

zihaolucky commented Dec 1, 2016

mufeili commented Dec 1, 2016

zihaolucky commented Dec 5, 2016

zihaolucky commented Dec 8, 2016

mufeili commented Dec 8, 2016

zihaolucky commented Dec 10, 2016

mufeili commented Dec 10, 2016

zihaolucky commented Dec 11, 2016 • edited Loading

terrytangyuan commented Dec 11, 2016

zihaolucky commented Dec 11, 2016

piiswrong commented Dec 11, 2016 via email

tqchen commented Dec 11, 2016

zihaolucky commented Mar 3, 2017

bravomikekilo commented Mar 8, 2017

zihaolucky commented Mar 8, 2017

bravomikekilo commented Mar 10, 2017

bravomikekilo commented Mar 10, 2017

zihaolucky commented Mar 10, 2017

bravomikekilo commented Mar 10, 2017

piiswrong commented Mar 10, 2017

bravomikekilo commented Mar 10, 2017

bravomikekilo commented Mar 10, 2017

bravomikekilo commented Mar 11, 2017

zihaolucky commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017

zihaolucky commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017

RogerBorras commented Mar 11, 2017

bravomikekilo commented Mar 11, 2017 • edited Loading

RogerBorras commented Mar 11, 2017

lichen11 commented Aug 2, 2017

zihaolucky commented Aug 3, 2017

zihaolucky commented Nov 27, 2016 •

edited

Loading

zihaolucky commented Nov 28, 2016 •

edited

Loading

tqchen commented Nov 29, 2016 •

edited

Loading

zihaolucky commented Dec 11, 2016 •

edited

Loading

bravomikekilo commented Mar 11, 2017 •

edited

Loading