Kernighan's Awk a.k.a. "One True Awk"
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
test-results Automate code coverage and publish HTML reports. Update docs. Jul 23, 2016
tests Automate code coverage and publish HTML reports. Update docs. Jul 23, 2016
.gitignore Add .gitignore Jul 20, 2016
FIXES A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
Makefile Automate code coverage and publish HTML reports. Update docs. Jul 23, 2016
README A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
README.md Link to coverage results Jul 24, 2016
awk.1 A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
awk.h A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
awkgram.y A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
b.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
env.sh Automate code coverage and publish HTML reports. Update docs. Jul 23, 2016
lex.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
lib.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
main.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
maketab.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
parse.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
proto.h A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
run.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016
run.sh Automate code coverage and publish HTML reports. Update docs. Jul 23, 2016
test.sh Automate code coverage and publish HTML reports. Update docs. Jul 23, 2016
tran.c A raw copy of the tarball, except for ytab*.bak files, which were Jun 12, 2016

README.md

This is Kernighan's awk, a.k.a. "One True Awk", with the source code and test suite combined. It includes some fairly extensive cleanups and enhancements related to build and test automation.

Original source: https://www.cs.princeton.edu/~bwk/btl.mirror/ (as of 2016/7/20, the latest update is from 2012/12/20)

Another mirror, without the build and test changes: https://github.com/onetrueawk/awk

Instructions

  1. Download Clang binaries from http://llvm.org/releases/

  2. Modify CLANG_DIR in env.sh as appropriate. (Notice I'm using Clang 3.8 on Ubuntu 14.04.)

Then source env.sh.

  1. Run make all. You should get a bunch of bwk-* executables, which are build variants with runtime instrumentation added by LLVM sanitizers (Address Sanitizer, etc.)

  2. Select one of the 5 test suites (see below), and run it against an instrumented binary. For example:

    $ ./test.sh suite misc asan

This runs the misc suite against an ASAN-instrumented binary. I currently get one heap-use-after-free error (see test-results/asan.log).

This ASAN error can reproduced without the test suite:

$ source env.sh
$ ./bwk-asan -f tests/t.split2a </dev/null

Details

test.sh has a few entry points. They correspond to what was documented in tests/README.TESTS and automated with tests/REGRESS.

These tests have golden output:

  • golden (T.*)
    • This includes T.beebe which runs the scripts in tests/beebe ("tests from Nelson Beebe from gawk test suite")

These tests require another awk binary, e.g. a system awk or previous bwk binary:

  • misc (t.*) -- "random sampling of awk constructions collected over the years"
  • lilly (lilly.*) -- "miscellaneous RE tests from Bruce Lilly"
  • book (p.*) -- tests from "The AWK Programming Environment" book
  • perf (tt.**) -- performance tests

Coverage Results

See test-results/coverage.

Summary of Changes

  • Minor fixes to run on Linux/Ubuntu (e.g. yacc/bison invocation; don't assume '.' is in $PATH). Sorry Windows support was probably broken in the process.
  • Added build variants for LLVM sanitizers (make all). Removed generated files from the repo.
  • Rewrote and cleaned up the test suite, allowing it to run against arbitrary build variants. There were also some cleanups like expanding the "beebe.tar" file into tests/beebe, instead of untarring it on every test run.

The last two changes are can be seen in commit 759a20da421e4a049c7a22e4e9e791e5b82e8b94

The actual source code hasn't changed. Running the tests against instrumented binaries HAS revealed some bugs, which could be fixed. But I haven't fixed them, since I lost the original motivation for hacking on this codebase (seeing if Awk could be turned into a more modern language.)

Notes on Regex Implementation / Algorithm

The file b.c is Awk's regular expression engine. It implements the "McNaughton-Yamada-Thompson algorithm to convert a regular expression to an NFA", algorithm 3.23 in the Dragon Book, as mentioned in the footnotes here:

https://swtch.com/~rsc/regexp/regexp2.html

I see the makedfa() function, which is derived from combinations of NFA states (fatab). This is a guaranteed linear-time algorithm, is in contrast to exponential backtracking algorithm used by Perl/Python/etc.:

https://swtch.com/~rsc/regexp/regexp1.html

TODO

  • Automate the cross product of (test suite, sanitizer) ?