Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Semantic flow graphs

Build Status DOI

Create semantic dataflow graphs of data science code.

Using this package, you can convert data science code to dataflow graphs with semantic content. The package works in tandem with the Data Science Ontology and our language-specific program analysis tools. Currently Python and R are supported.

For more information, please see our research paper on "Teaching machines to understand data science code by semantic enrichment of dataflow graphs".

Command-line interface

We provide a CLI that supports the recording, semantic enrichment, and visualization of flow graphs. To set up the CLI, install this package and add the bin directory to your PATH. Invoke the CLI by running flowgraphs.jl in your terminal.

The CLI includes the following commands:

  • record: Record a raw flow graph by running a script.
    Requirements: To record a Python script, you must install the Julia package PyCall.jl and the Python package flowgraph. Likewise, to record an R script, you must install the Julia package RCall.jl and the R package flowgraph.
  • enrich: Convert a raw flow graph to a semantic flow graph.
  • visualize: Visualize a flow graph using Graphviz.
    Requirements: To output an image, using the --to switch, you must install Graphviz.

All the commands take as primary argument either a directory, which is filtered by file extension, or a single file, arbitrarily named.

CLI examples

Record all Python/R scripts in the current directory, yielding raw flow graphs:

flowgraphs.jl record .

Convert a raw flow graph to a semantic flow graph:

flowgraphs.jl enrich --out my_script.graphml

Visualize a semantic flow graph, creating and opening an SVG file:

flowgraphs.jl visualize myscript.graphml --to svg --open