Skip to content

GDeltaTop

NedLetcher edited this page Feb 4, 2016 · 8 revisions

gDelta

gDelta is a tool that aims to provide more immediate feedback on the impact of changes made to DELPH-IN grammars by comparing parser output from two different states of a grammar. It can be thought of as functioning similar to a diff tool, allowing the comparison of two different versions of the same grammar, but rather than comparing the source code it compares parser output from both versions run over the same test suites.

gDelta makes use of an attribute weighting algorithm for highlighting grammar components (currently rule names and lexical types) whose distribution in the parser output have been strongly impacted by modifications to the grammar, as well as a technique for performing clustering over profile items intended to locate related groups of change. These two techniques are used to build an HTML interface which can be viewed offline.

By providing a high-level picture of the impact that modifications to the grammar has had, the hope is that grammar engineers can use gDelta to more readily check if anything unexpected has happened as well as confirm that desired changes have taken effect, earlier rather than later in the grammar development cycle, before.

Other applications of gDelta that have been suggested but as of yet unexplored:

  • Tracking uncaught regressions by comparing successive versions of grammars.
  • Exploration of linguistic phenomena via investigating the impact of systematically switching off types
  • Grammar documentation

gDelta was created by NedLetcher, TimBaldwin and RebeccaDridan.

The git repository for gDelta can be obtained here.

How to get it

Visit the gDelta is freely available under the MIT License. You can get the latest version thusly:

gDelta is intended to work for all DELPH-IN grammars. If your grammar is not working or you are having any difficulties running gDelta, contact NedLetcher.

Installing gDelta

See the README.txt file for installation notes.

Sample Output

Using gDelta to compare the gold profiles from the 1010 and the 1111 releases:

Using gDelta to investigate attempted changes to the ERG that resulted in breakages (using WeScience profiles):

Using gDelta to investigate the impact of changes made to Jacy from its development history (using profiles from the Tanaka corpus):