Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Building a visualization tool for MXNet #4003

Closed
3 tasks done
zihaolucky opened this issue Nov 27, 2016 · 45 comments
Closed
3 tasks done

Building a visualization tool for MXNet #4003

zihaolucky opened this issue Nov 27, 2016 · 45 comments

Comments

@zihaolucky
Copy link
Member

zihaolucky commented Nov 27, 2016

Hi hackers,

I've started working on building a visualization tool for MXNet, like TensorBoard for TensorFlow. As @piiswrong suggested in #3306 that 'try to strip TensorBoard out of tensorflow' and I'm going to work on this direction. Here're some of my notes after reading TensorBoard‘s documentation and searching its usage on the web, feel free to comment below.

Motivation and some backgrounds

I've tried to visualize the data using matplotlib and a bunch of helper tools like tsne on my daily work and I feel tired in rendering and adjusting the size/color of the images. Besides, it's not easy to share this data with my friends.

While TensorBoard provides good solutions for our daily use cases, such as learning curves, parameters/embedding visualization, also it's easy to share. See TensorBoard for more.

Daily use cases

  • Learning Curves. Visualize a scalar metric in training/testing such as accuracy, loss or some custom evaluation metrics.
  • Parameters insights. Like CNN's filters. Or parameters' histogram over time to debug gradient vanish by checking.
  • Embedding visualization. Some applications will visualize high dimension embedding data in a layer, like the last FullyConnect layer's output. In this case, tSNE is commonly used.

I think these could satisfy most people and it's already supported by TensorBoard with tf.scalar_summary, tf.image_summary, tf.histogram_summary and tensorboard.plugins.projector

TensorBoard usage

Some snippets from a tutorial on how to use TensorBoard.

# create a summary for our cost and accuracy
tf.scalar_summary("cost", cross_entropy)
tf.scalar_summary("accuracy", accuracy)

# merge all summaries into a single "operation" which we can execute in a session 
summary_op = tf.merge_all_summaries()

with tf.Session() as sess:
    # variables need to be initialized before we can use them
    sess.run(tf.initialize_all_variables())

    # create log writer object
    writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())
        
    # perform training cycles
    for epoch in range(training_epochs):
        
        # number of batches in one epoch
        batch_count = int(mnist.train.num_examples/batch_size)
        
        for i in range(batch_count):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            
            # perform the operations we defined earlier on batch
            _, summary = sess.run([train_op, summary_op], feed_dict={x: batch_x, y_: batch_y})
            
            # write log
            writer.add_summary(summary, epoch * batch_count + i)

The logic above is quite clear, where the accuracy and cost get updated every time when sess.run is called and return a Summary which is feed into log through SummaryWriter.

Feasibility

1.Easy to borrow and directly use in MXNet?

I've successfully visualized the 'makeup' curve using the code below:

counter = tf.Variable(1.0)

# create a summary for counter
tf.scalar_summary("counter", counter)

# merge all summaries into a single "operation" which we can execute in a session
summary_op = tf.merge_all_summaries()

with tf.Session() as sess:
    # variables need to be initialized before we can use them
    sess.run(tf.initialize_all_variables())

    # create log writer object
    writer = tf.train.SummaryWriter(logs_path, graph=tf.get_default_graph())

    # perform training cycles
    for epoch in range(100):

        # perform the operations we defined earlier on batch
        counter.assign(epoch + np.random.standard_normal()).eval()
        summary = sess.run(summary_op)

        # write log
        writer.add_summary(summary, epoch)

So it means we could pass something common, here is numpy array and normal int, and reuse most of the code. I would discuss possible routes for creating an interface to connect MXNet and TensorBoard and I need your advice. But let's keep it simple now.

2.Could be striped alone?

From this README, I guess TensorBoard could be built independently?

Or, if you are building from source:

bazel build tensorflow/tensorboard:tensorboard
./bazel-bin/tensorflow/tensorboard/tensorboard --logdir=path/to/logs

TODO

To prove we can use TensorBoard in a dummy way:

  • Run several MXNet examples with minimum code from TF and visualize the outputs.

To keep our code clean and lightweight:

  • Varify whether TensorBoard could be built independently.

Or we could install entire TF together with MXNet? Is that acceptable?
I think it's okay but not good for our users and make this visualization tool too heavy, cause we also run core code in TensorFlow(the sess and Tensor.eval is actually computed by TF). But it depends on our checks, hard to tell.

Or any other way to workaround? As the summary is a proto in writer.add_summary(summary, epoch * batch_count + i) that means we could only use summaryWriter without using the computation of TF. It's possible because the doc in SummaryWriter.add_summary:

  def add_summary(self, summary, global_step=None):
    """Adds a `Summary` protocol buffer to the event file.

    This method wraps the provided summary in an `Event` protocol buffer
    and adds it to the event file.

    You can pass the result of evaluating any summary op, using
    [`Session.run()`](client.md#Session.run) or
    [`Tensor.eval()`](framework.md#Tensor.eval), to this
    function. Alternatively, you can pass a `tf.Summary` protocol
    buffer that you populate with your own data. The latter is
    commonly done to report evaluation results in event files.

    Args:
      summary: A `Summary` protocol buffer, optionally serialized as a string.
      global_step: Number. Optional global step value to record with the
        summary.
    """

If we decide to borrow TensorBoard:

  • Decide where to put the interface and what to do next, based on our experiments above.
@piiswrong
Copy link
Contributor

@leopd @mli

@piiswrong
Copy link
Contributor

The way tensorboard works is it takes in a log file printed in a specific format and then renders them.
So we don't necessarily need tf.scalar_summary and tf.session to use it. We simply need to print log in the same format and run tensorboard.

Here is what I think would be a ideal solution:

  1. we strip a minimum set of files related to tensorboard out of tensorflow and build it with Makefile or cmake, not bazel.
  2. we modify mxnet's logging format so that it conforms with tensorboard.

But I haven't looked into this in-depth so this might be hard/impossible. So feel free to do anything that works for you first. We can discuss whether we want to merge it into mxnet or provide it as a separate solution afterwards.

@zihaolucky
Copy link
Member Author

zihaolucky commented Nov 28, 2016

Yes, tensorboard only requires the proto of the logger results, but I didn't find the entrance to create a Summary object, which is return directly by scala_summary(an tensorflow op) and that means we have to run tf.run. I'm trying to walk around this.

I would look into this in the coming two weeks.

@tqchen
Copy link
Member

tqchen commented Nov 28, 2016

I think tensorboard is relatively isolated. Last time i see the code, only the proto of the logger file is needed

@leopd
Copy link
Contributor

leopd commented Nov 28, 2016

My memory of using tensorboard is that those logfiles quickly get extremely large. Do people really share those logfiles with each other? It also made me worry that the huge amount of I/O would limit performance -- which would be more of an issue with MXNet than TF. So that's something else we can experiment/measure: what kind of IO bandwidth would be needed to produce these logfiles.

@tqchen
Copy link
Member

tqchen commented Nov 29, 2016

@zihaolucky
Copy link
Member Author

@tqchen @jermainewang Thanks for the reference, and I've found an API for scalar_summary without running tensorflow's op here but it still uses SummaryWriter and EventWriter for the tensorboard log file.

Although it has only scalar_summary now but seems they're actively working on it, I've sent an email to ask Dan if they have any plan on this direction to support more types of summary, but I haven't get feedback yet.

@jermainewang
Copy link
Contributor

The minpy's way using tensorboard could be migrated to mxnet quite easily. There are majorly three components:

  1. proto files: summary proto and event proto. They could be directly used. The summary writer could directly be borrowed from TF's python side.
  2. EventWriter logic. Tensorflow has a EventWriter in c++ but could be easily rewritten in python.
  3. RecordFileWriter logic. After serializing the event proto, TF uses recordio to write to disk. This part should be replace by our implementation, and that's it.

We plan to put the codes here: https://github.com/dmlc/minpy/tree/visualize/minpy/visualize . I will ping you again after it is updated.

@jermainewang
Copy link
Contributor

The related PR is still under review here: dmlc/minpy#87

@zihaolucky
Copy link
Member Author

@jermainewang That's great!

@mufeili
Copy link

mufeili commented Dec 1, 2016

Hi, I've finished doing the scalar summary part and is currently exploring image summaries and histogram summaries. We did not plan to do audio and graph summaries for minpy as minpy does not use a computational graph. But that should work for mxnet.

I also realize there is a new section in TensorBoard after the release of TensorFlow v0.12 for word embeddings, which is super cool: https://www.tensorflow.org/versions/master/how_tos/embedding_viz/index.html#tensorboard-embedding-visualization.

@zihaolucky
Copy link
Member Author

Hey guys, I've finished the first one in the TODOs with generous help from @mufeili and @jermainewang

But it still requires a writer/RecordFileWriter from TF, I would submit the code as I finish the writer later.

@zihaolucky
Copy link
Member Author

@mufeili Could you take a look at this issue? tensorflow/tensorflow#4181 in which danmane said it's 'tfrecord' that do that write file job. Then I dig into the code and find that the relevant C code tensorflow/core/lib/io/record_writer.cc and py_record_writer.cc, TensorFlow uses SWIG to make a wrapper to use them in Python.

I think it's too hard to write these in Python as it has so many dependencies, and it's not easy to use in other language, which means someone has to use Python interface for visualization purpose.

Can I just pull these related C files out, put it in core library and use SWIG or something else as a solution for Python interface? @piiswrong Could you give me some suggestions? What's your convention in writing a wrapper from C to Python?

@mufeili
Copy link

mufeili commented Dec 8, 2016

@zihaolucky tensorflow/tensorflow/core/lib/io/record_writer.cc is exactly where I got stuck at first. We then decided to use tf.python_io.TFRecordWriter for the time being.

@zihaolucky
Copy link
Member Author

@mufeili @piiswrong

Good news, I've found that someone has already given a solution to write record_writer in python, check https://github.com/TeamHG-Memex/tensorboard_logger

I migrated the code to MXNet and it works, now we can use TensorBoard without relying on TF. So I've submitted the code to my branch https://github.com/zihaolucky/mxnet/tree/feature/tensorboard-support-experiment and please check it out.

@mufeili
Copy link

mufeili commented Dec 10, 2016

@zihaolucky Awesome! I've had a quick look at it. I think it currently only supports scalar summary as well so I am not sure if the record_writer function would still work when it comes to other kinds of summaries. But still lots of thanks!

@zihaolucky
Copy link
Member Author

zihaolucky commented Dec 11, 2016

@mufeili Seems it could also support other types of summaries. As it writes a serialized event, but it just provides a scalar summary api.

https://github.com/TeamHG-Memex/tensorboard_logger/blob/master/tensorboard_logger/tensorboard_logger.py#L94-L103

@terrytangyuan
Copy link
Member

Great work - exciting to see the progress! Note that you probably need to include the necessary copyright information if you borrow the code from some other project.

@zihaolucky
Copy link
Member Author

@terrytangyuan Thanks for your kind reminder, I would do some research on the copyright issue.

@piiswrong
Copy link
Contributor

piiswrong commented Dec 11, 2016 via email

@tqchen
Copy link
Member

tqchen commented Dec 11, 2016

It would be necessary to copy the LICENSE file from original repo, and retaining the copy right notice

@zihaolucky
Copy link
Member Author

Update, we now provide a PyPI package for TensorBoard fans :)

dmlc/tensorboard#19

@bravomikekilo
Copy link
Contributor

I make a standalone tensorboard by extract tensorboard's C++ dependency from TensorFlow.
So, we don't need to build whole TensorFlow now. Meanwhile, we can use the TensorFlow file system API from python through this reduced tensorflow library.
barvomikekilo/mxconsole

@zihaolucky
Copy link
Member Author

@bravomikekilo great work! Any plan to ship to dmlc/tensorboard? And I believe you have to make it easier to maintain, as tensorboard might change very often and new features keep coming in(as they said in TF Dev Summit, they're going to provide more flexible plugin module for tensorboard developers). So I focus on logging part and try not to change the rendering part.

Just my personal opinion.

@bravomikekilo
Copy link
Contributor

I mostly keep the structure of the tensorboard project, I going to fix the structure same as the offical tensorflow, so we can sync the change. I have enable the logging support from C++, so It will be much faster and reliable.

@bravomikekilo
Copy link
Contributor

Or maybe we should merge dmlc/tensorboard to mxconsole? As the most tensorboard function can enabled from reduced tensorflow. Meanwhile, we can split the mxconsole to smaller module. The reduced tensorflow can do much more things.

@zihaolucky
Copy link
Member Author

@piiswrong @jermainewang any thoughts?

@bravomikekilo
Copy link
Contributor

Already merged dmlc/tensorboard to bravomikekilo/mxconsole, include native library powered summary api and tutorial.
The tutorial works fine now. I will clean up the project tomorrow.

@piiswrong
Copy link
Contributor

What's the benefit of extracting the code vs cloning tensorflow?

@bravomikekilo
Copy link
Contributor

The library is smaller and easy to build.

@bravomikekilo
Copy link
Contributor

meanwhile smaller code base is much more clear and portable.
now the reduced tensorflow(tensorflow_fs) only contain about 300 source file, while origin TensorFlow
contains about 7000 source file.
tensorflow_fs keep the same project structure as TensorFlow, so it should be easy to sync the change.

@bravomikekilo
Copy link
Contributor

@piiswrong

@zihaolucky
Copy link
Member Author

The native library for potential more language interface support seems a good idea, while the maintainers still have to write a wrapper, same workload as they write logging interface in Scala or any other languages.

I encourage you to propose your roadmap toward this direction by extracting the code, then point out some promising benefits, if not, spending times on 10% difference while the 90% same is not a good idea.

@bravomikekilo
Copy link
Contributor

Ok, I will try to add back the interface file for go and java from origin TensorFlow.
Scala can use java interface easily, right?
R, julia, js, and matlab only has protobuf library, so we need to write logging part.
for R, there are swig support, we should only need to change a few swig file, to add native support.
for julia, matlab, we may need use the C interface.
@zihaolucky

@bravomikekilo
Copy link
Contributor

Besides, the native library provide a faster implementation of crc32 and protobuf writing, and it is possible to merge the native png encode support.
And considering the difference between TensorFlow and Mxnet , graph and embedding render and logger may change a lot, without a standalone tensorboard, it may be hard to achieve.

@bravomikekilo
Copy link
Contributor

A sad story is that java and go interface don't have summary or something, maybe they just add the summary ops to the graph. Seems all logger still need to be write.

@zihaolucky
Copy link
Member Author

Consider focusing on logging.

@bravomikekilo
Copy link
Contributor

I can just extract only logging part, that is much smaller

@bravomikekilo
Copy link
Contributor

maybe we should split the logging and rendering?
Now the only problem of rendering is just graph and embedding.

@bravomikekilo
Copy link
Contributor

so, make a sum up.
we have three way to go.

  1. fix the python code in dmlc/tensorboard
    • easy to achieve
  2. fix the python code in dmlc/tensorboard to use the native library in tensorboard build
    • easy to achieve
    • with native support
  3. find a way to keep change between bravomikekilo/mxconsole and tensorflow/tensorflow

and a optional choose is to split the tensorflow_fs from mxconsole. that will easier to keep sync

@RogerBorras
Copy link

@zihaolucky @bravomikekilo Are you planning to port Tensorboard to mxnetR binding?? It will be great!! :)

@bravomikekilo
Copy link
Contributor

bravomikekilo commented Mar 11, 2017

I not good at R, but I will try. It shouldn't be too hard.

@RogerBorras
Copy link

Great, thanks a lot @bravomikekilo!

@lichen11
Copy link

lichen11 commented Aug 2, 2017

@bravomikekilo, @zihaolucky , @RogerBorras @thirdwing , it will be great if there will be a visualization board for mxnetR!

@zihaolucky
Copy link
Member Author

@lichen11 @bravomikekilo If you can figure out the way to write the event file and the summary protobufs in R, then it could be achieved. Just refer the https://github.com/dmlc/tensorboard/tree/master/python and ping me if you need any help.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants