Building a visualization tool for MXNet #4003

Open
zihaolucky opened this Issue Nov 27, 2016 · 21 comments

Projects

None yet

7 participants

@zihaolucky
Contributor
zihaolucky commented Nov 27, 2016 edited

Hi hackers,

I've started working on building a visualization tool for MXNet, like TensorBoard for TensorFlow. As @piiswrong suggested in #3306 that 'try to strip TensorBoard out of tensorflow' and I'm going to work on this direction. Here're some of my notes after reading TensorBoard‘s documentation and searching its usage on the web, feel free to comment below.

Motivation and some backgrounds

I've tried to visualize the data using matplotlib and a bunch of helper tools like tsne on my daily work and I feel tired in rendering and adjusting the size/color of the images. Besides, it's not easy to share this data with my friends.

While TensorBoard provides good solutions for our daily use cases, such as learning curves, parameters/embedding visualization, also it's easy to share. See TensorBoard for more.

Daily use cases

  • Learning Curves. Visualize a scalar metric in training/testing such as accuracy, loss or some custom evaluation metrics.
  • Parameters insights. Like CNN's filters. Or parameters' histogram over time to debug gradient vanish by checking.
  • Embedding visualization. Some applications will visualize high dimension embedding data in a layer, like the last FullyConnect layer's output. In this case, tSNE is commonly used.

I think these could satisfy most people and it's already supported by TensorBoard with tf.scalar_summary, tf.image_summary, tf.histogram_summary and tensorboard.plugins.projector

TensorBoard usage

Some snippets from a tutorial on how to use TensorBoard.

# create a summary for our cost and accuracy
tf.scalar_summary("cost", cross_entropy)
tf.scalar_summary("accuracy", accuracy)

# merge all summaries into a single "operation" which we can execute in a session 
summary_op = tf.merge_all_summaries()

with tf.Session() as sess:
    # variables need to be initialized before we can use them
    sess.run(tf.initialize_all_variables())

    # create log writer object
    writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())
        
    # perform training cycles
    for epoch in range(training_epochs):
        
        # number of batches in one epoch
        batch_count = int(mnist.train.num_examples/batch_size)
        
        for i in range(batch_count):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            
            # perform the operations we defined earlier on batch
            _, summary = sess.run([train_op, summary_op], feed_dict={x: batch_x, y_: batch_y})
            
            # write log
            writer.add_summary(summary, epoch * batch_count + i)

The logic above is quite clear, where the accuracy and cost get updated every time when sess.run is called and return a Summary which is feed into log through SummaryWriter.

Feasibility

1.Easy to borrow and directly use in MXNet?

I've successfully visualized the 'makeup' curve using the code below:

counter = tf.Variable(1.0)

# create a summary for counter
tf.scalar_summary("counter", counter)

# merge all summaries into a single "operation" which we can execute in a session
summary_op = tf.merge_all_summaries()

with tf.Session() as sess:
    # variables need to be initialized before we can use them
    sess.run(tf.initialize_all_variables())

    # create log writer object
    writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())

    # perform training cycles
    for epoch in range(100):

        # perform the operations we defined earlier on batch
        counter.assign(epoch + np.random.standard_normal()).eval()
        summary = sess.run(summary_op)

        # write log
        writer.add_summary(summary, epoch)

So it means we could pass something common, here is numpy array and normal int, and reuse most of the code. I would discuss possible routes for creating an interface to connect MXNet and TensorBoard and I need your advice. But let's keep it simple now.

2.Could be striped alone?

From this README, I guess TensorBoard could be built independently?

Or, if you are building from source:

bazel build tensorflow/tensorboard:tensorboard
./bazel-bin/tensorflow/tensorboard/tensorboard --logdir=path/to/logs

TODO

To prove we can use TensorBoard in a dummy way:

  • Run several MXNet examples with minimum code from TF and visualize the outputs.

To keep our code clean and lightweight:

  • Varify whether TensorBoard could be built independently.

Or we could install entire TF together with MXNet? Is that acceptable?
I think it's okay but not good for our users and make this visualization tool too heavy, cause we also run core code in TensorFlow(the sess and Tensor.eval is actually computed by TF). But it depends on our checks, hard to tell.

Or any other way to workaround? As the summary is a proto in writer.add_summary(summary, epoch * batch_count + i) that means we could only use summaryWriter without using the computation of TF. It's possible because the doc in SummaryWriter.add_summary:

  def add_summary(self, summary, global_step=None):
    """Adds a `Summary` protocol buffer to the event file.

    This method wraps the provided summary in an `Event` protocol buffer
    and adds it to the event file.

    You can pass the result of evaluating any summary op, using
    [`Session.run()`](client.md#Session.run) or
    [`Tensor.eval()`](framework.md#Tensor.eval), to this
    function. Alternatively, you can pass a `tf.Summary` protocol
    buffer that you populate with your own data. The latter is
    commonly done to report evaluation results in event files.

    Args:
      summary: A `Summary` protocol buffer, optionally serialized as a string.
      global_step: Number. Optional global step value to record with the
        summary.
    """

If we decide to borrow TensorBoard:

  • Decide where to put the interface and what to do next, based on our experiments above.
@piiswrong
Member
@piiswrong
Member

The way tensorboard works is it takes in a log file printed in a specific format and then renders them.
So we don't necessarily need tf.scalar_summary and tf.session to use it. We simply need to print log in the same format and run tensorboard.

Here is what I think would be a ideal solution:

  1. we strip a minimum set of files related to tensorboard out of tensorflow and build it with Makefile or cmake, not bazel.
  2. we modify mxnet's logging format so that it conforms with tensorboard.

But I haven't looked into this in-depth so this might be hard/impossible. So feel free to do anything that works for you first. We can discuss whether we want to merge it into mxnet or provide it as a separate solution afterwards.

@zihaolucky
Contributor
zihaolucky commented Nov 28, 2016 edited

Yes, tensorboard only requires the proto of the logger results, but I didn't find the entrance to create a Summary object, which is return directly by scala_summary(an tensorflow op) and that means we have to run tf.run. I'm trying to walk around this.

I would look into this in the coming two weeks.

@tqchen
Member
tqchen commented Nov 28, 2016

I think tensorboard is relatively isolated. Last time i see the code, only the proto of the logger file is needed

@leopd
Contributor
leopd commented Nov 28, 2016

My memory of using tensorboard is that those logfiles quickly get extremely large. Do people really share those logfiles with each other? It also made me worry that the huge amount of I/O would limit performance -- which would be more of an issue with MXNet than TF. So that's something else we can experiment/measure: what kind of IO bandwidth would be needed to produce these logfiles.

@zihaolucky
Contributor

@tqchen @jermainewang Thanks for the reference, and I've found an API for scalar_summary without running tensorflow's op here but it still uses SummaryWriter and EventWriter for the tensorboard log file.

Although it has only scalar_summary now but seems they're actively working on it, I've sent an email to ask Dan if they have any plan on this direction to support more types of summary, but I haven't get feedback yet.

@jermainewang
Member

The minpy's way using tensorboard could be migrated to mxnet quite easily. There are majorly three components:

  1. proto files: summary proto and event proto. They could be directly used. The summary writer could directly be borrowed from TF's python side.
  2. EventWriter logic. Tensorflow has a EventWriter in c++ but could be easily rewritten in python.
  3. RecordFileWriter logic. After serializing the event proto, TF uses recordio to write to disk. This part should be replace by our implementation, and that's it.

We plan to put the codes here: https://github.com/dmlc/minpy/tree/visualize/minpy/visualize . I will ping you again after it is updated.

@jermainewang
Member

The related PR is still under review here: dmlc/minpy#87

@zihaolucky
Contributor

@jermainewang That's great!

@zihaolucky zihaolucky referenced this issue in dmlc/minpy Dec 1, 2016
Merged

Add Visualization Components #87

@mufeili
mufeili commented Dec 1, 2016

Hi, I've finished doing the scalar summary part and is currently exploring image summaries and histogram summaries. We did not plan to do audio and graph summaries for minpy as minpy does not use a computational graph. But that should work for mxnet.

I also realize there is a new section in TensorBoard after the release of TensorFlow v0.12 for word embeddings, which is super cool: https://www.tensorflow.org/versions/master/how_tos/embedding_viz/index.html#tensorboard-embedding-visualization.

@zihaolucky
Contributor

Hey guys, I've finished the first one in the TODOs with generous help from @mufeili and @jermainewang

But it still requires a writer/RecordFileWriter from TF, I would submit the code as I finish the writer later.

@zihaolucky
Contributor

@mufeili Could you take a look at this issue? tensorflow/tensorflow#4181 in which danmane said it's 'tfrecord' that do that write file job. Then I dig into the code and find that the relevant C code tensorflow/core/lib/io/record_writer.cc and py_record_writer.cc, TensorFlow uses SWIG to make a wrapper to use them in Python.

I think it's too hard to write these in Python as it has so many dependencies, and it's not easy to use in other language, which means someone has to use Python interface for visualization purpose.

Can I just pull these related C files out, put it in core library and use SWIG or something else as a solution for Python interface? @piiswrong Could you give me some suggestions? What's your convention in writing a wrapper from C to Python?

@mufeili
mufeili commented Dec 8, 2016

@zihaolucky tensorflow/tensorflow/core/lib/io/record_writer.cc is exactly where I got stuck at first. We then decided to use tf.python_io.TFRecordWriter for the time being.

@zihaolucky
Contributor

@mufeili @piiswrong

Good news, I've found that someone has already given a solution to write record_writer in python, check https://github.com/TeamHG-Memex/tensorboard_logger

I migrated the code to MXNet and it works, now we can use TensorBoard without relying on TF. So I've submitted the code to my branch https://github.com/zihaolucky/mxnet/tree/feature/tensorboard-support-experiment and please check it out.

@mufeili
mufeili commented Dec 10, 2016

@zihaolucky Awesome! I've had a quick look at it. I think it currently only supports scalar summary as well so I am not sure if the record_writer function would still work when it comes to other kinds of summaries. But still lots of thanks!

@zihaolucky
Contributor
zihaolucky commented Dec 11, 2016 edited

@mufeili Seems it could also support other types of summaries. As it writes a serialized event, but it just provides a scalar summary api.

https://github.com/TeamHG-Memex/tensorboard_logger/blob/master/tensorboard_logger/tensorboard_logger.py#L94-L103

@terrytangyuan
Member

Great work - exciting to see the progress! Note that you probably need to include the necessary copyright information if you borrow the code from some other project.

@zihaolucky
Contributor

@terrytangyuan Thanks for your kind reminder, I would do some research on the copyright issue.

@piiswrong
Member
@tqchen
Member
tqchen commented Dec 11, 2016

It would be necessary to copy the LICENSE file from original repo, and retaining the copy right notice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment