Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI #5728

Closed
wants to merge 15 commits into from

Commits on Apr 17, 2015

  1. Scope all RDD methods

    This commit provides a mechanism to set and unset the call scope
    around each RDD operation defined in RDD.scala. This is useful
    for tagging an RDD with the scope in which it is created. This
    will be extended to similar methods in SparkContext.scala and
    other relevant files in a future commit.
    Andrew Or committed Apr 17, 2015
    Configuration menu
    Copy the full SHA
    6b3403b View commit details
    Browse the repository at this point in the history
  2. Add a few missing scopes to certain RDD methods

    Andrew Or committed Apr 17, 2015
    Configuration menu
    Copy the full SHA
    a9ed4f9 View commit details
    Browse the repository at this point in the history
  3. Expose the necessary information in RDDInfo

    This includes the scope field that we added in previous commits,
    and the parent IDs for tracking the lineage through the listener
    API.
    Andrew Or committed Apr 17, 2015
    Configuration menu
    Copy the full SHA
    5143523 View commit details
    Browse the repository at this point in the history
  4. Translate RDD information to dot file

    It turns out that the previous scope information is insufficient
    for producing a valid dot file. In particular, the scope hierarchy
    was missing, but crucial to differentiate between a parent RDD
    being in the same encompassing scope and it being in a completely
    distinct scope. Also, unique scope identifiers are needed to
    simplify the code significantly.
    
    This commit further adds the translation logic in a UI listener
    that converts RDDInfos to dot files.
    Andrew Or committed Apr 17, 2015
    Configuration menu
    Copy the full SHA
    2184348 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f22f337 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2015

  1. Re-implement scopes through annotations instead

    The previous "working" implementation frequently ran into
    NotSerializableExceptions. Why? ClosureCleaner doesn't like
    closures being wrapped in other closures, and these closures
    are simply not cleaned (details are intentionally omitted here).
    
    This commit reimplements scoping through annotations. All methods
    that should be scoped are now annotated with @RDDScope. Then, on
    creation, each RDD derives its scope from the stack trace, similar
    to how it derives its call site. This is the cleanest approach
    that bypasses NotSerializableExceptions with least significant
    limitations.
    Andrew Or committed Apr 22, 2015
    Configuration menu
    Copy the full SHA
    9fac6f3 View commit details
    Browse the repository at this point in the history
  2. Revert a few unintended style changes

    Andrew Or committed Apr 22, 2015
    Configuration menu
    Copy the full SHA
    494d5c2 View commit details
    Browse the repository at this point in the history
  3. Move RDD scope util methods and logic to its own file

    Just a small code re-organization.
    Andrew Or committed Apr 22, 2015
    Configuration menu
    Copy the full SHA
    6a7cdca View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5e22946 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2015

  1. Reimplement rendering with dagre-d3 instead of viz.js

    Before this commit, this patch relies on a JavaScript version of
    GraphViz that was compiled from C. Even the minified version of
    this resource was ~2.5M. The main motivation for switching away
    from this library, however, is that this is a complete black box
    of which we have absolutely no control. It is not at all extensible,
    and if something breaks we will have a hard time understanding
    why.
    
    The new library, dagre-d3, is not perfect either. It does not
    officially support clustering of nodes; for certain large graphs,
    the clusters will have a lot of unnecessary whitespace. A few in
    the dagre-d3 community are looking into a solution, but until then
    we will have to live with this (minor) inconvenience.
    Andrew Or committed Apr 23, 2015
    Configuration menu
    Copy the full SHA
    205f838 View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2015

  1. Configuration menu
    Copy the full SHA
    fe7816f View commit details
    Browse the repository at this point in the history
  2. Fill in documentation + miscellaneous minor changes

    For instance, this adds ability to throw away old stage graphs.
    Andrew Or committed Apr 27, 2015
    Configuration menu
    Copy the full SHA
    8dd5af2 View commit details
    Browse the repository at this point in the history
  3. Embed the viz in the UI in a toggleable manner

    Andrew Or committed Apr 27, 2015
    Configuration menu
    Copy the full SHA
    71281fa View commit details
    Browse the repository at this point in the history
  4. Add ID to node label (minor)

    Andrew Or committed Apr 27, 2015
    Configuration menu
    Copy the full SHA
    09d361e View commit details
    Browse the repository at this point in the history

Commits on Apr 28, 2015

  1. Rat excludes

    Andrew Or committed Apr 28, 2015
    Configuration menu
    Copy the full SHA
    52187fc View commit details
    Browse the repository at this point in the history