Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Code Continuity Analysis Framework

The framework is currently composed of the following:

  • parsers for Python, Java, Verilog, Fortran, and C/C++,
  • an AST differencing tool, Diff/AST, based on the parsers,
  • helper scripts for factbase manipulation, and
  • ontologies for the related entities.

The parsers and Diff/AST export resulting facts such as abstract syntax trees (ASTs), changes between them, and other syntactic/semantic information in XML or N-Triples. In particular, facts in N-Triples format are loaded into an RDF store such as Virtuoso to build a factbase or a database of facts. Factbases are intended to be queried for software engineering tasks such as code comprehension, debugging, change pattern mining, and code homology analysis.

Diff/AST is an experimental implementation of the AST differencing algorithm reported in the following paper:

Masatomo Hashimoto and Akira Mori, "Diff/TS: A Tool for Fine-Grained Structural Change Analysis," In Proc. 15th Working Conference on Reverse Engineering, 2008, pp. 279-288, DOI: 10.1109/WCRE.2008.44.

It compares ASTs node by node, while popular diff tools compare any (text) files line by line. The algorithm is based on an algorithm for computing tree edit distance (TED) between two ordered labeled trees. The TED between two trees is the minimum (weighted) number of edit operations to transform one tree into another. Unfortunately, applying TED algorithms directly to wild ASTs is not feasible in general because their computational complexity is essentially, at best, quadratic according to the number of AST nodes. Therefore Diff/TS makes moderate use of a TED algorithm in a divide-and-conquer manner backed by elaborated heuristics to approximate tree edit distances. Nevertheless, Diff/AST still requires much time for non-trivial massive inputs. Thus it always caches the results.


You can see the results of comparing some pairs of source files taken from samples here.

Quick start

You can instantly try Diff/AST by utilizing Docker and a ready-made container image.

$ docker pull codinuum/cca

The following command line executes Diff/AST within a container to compare sample Java programs and then saves the results in results (host) directory.

$ ./ diffast -c results samples/java/0/ samples/java/1/

Once you have built DiffViewer, you can inspect the AST differences in a viewer window. See diffviewer/ for details.

$ diffviewer/ -c results samples/java/0/ samples/java/1/

You can run both Diff/AST and DiffViewer by the following line.

$ ./ diffast -c results --view samples/java/0/ samples/java/1/

Installing parsers and Diff/AST



The following will install parsesrc and diffast.

$ opam install cca

Building parsers and Diff/AST

You can also build parsers and Diff/AST in person.


  • GNU make
  • OCaml (>=4.11.1)
  • OPAM (for installing camlzip, cryptokit, csv, git-unix, menhir, ocamlnet, pxp, ulex, uuidm, and volt.)


The following create ast/analyzing/bin/{parsesrc.opt,diffast.opt}.

$ cd src
$ make

They should be used via shell scripts ast/analyzing/bin/{parsesrc,diffast} to set some environment variables.

Using with Git

If you have built Diff/AST, you can use it with Git. Add the following lines to your .gitconfig. Note that PATH_TO_THIS_REPO should be replaced by your local path to this repository.

    tool = diffast
    prompt = false
[difftool "diffast"]
    cmd = PATH_TO_THIS_REPO/git_ext_diff "$LOCAL" "$REMOTE"
    diffast = difftool

Then you should be able to use git diffast like git diff. You will be prompted to launch diffast for each source file comparison. Other file comparisons will be ignored.

Building docker image

The following command line creates a docker image named cca. In the image, the framework is installed at /opt/cca.

$ docker build -t cca .


Apache License, Version 2.0