Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tests
.gitignore
LICENSE
Makefile
README.md
ac.cxx
ac.h
ac_fast.cxx
ac_fast.hpp
ac_lua.cxx
ac_slow.cxx
ac_slow.hpp
ac_util.hpp
load_ac.lua
mytest.cxx

README.md

aho-corasick-lua

C++ and Lua Implementation of the Aho-Corasick (AC) string matching algorithm (http://dl.acm.org/citation.cfm?id=360855).

We began with pure Lua implementation and realize the performance is not satisfactory. So we switch to C/C++ implementation.

There are two shared objects provied by this package: libac.so and ahocorasick.so The former is a regular shared object which can be directly used by C/C++ application, or by Lua via FFI; and the later is a Lua module. An example usage is shown below:

local ac = require "ahocorasick"
local dict = {"string1", "string", "etc"}
local acinst = ac.create(dict)
local r = ac.match(acinst, "mystring")

For efficiency reasons, the implementation is slightly different from the standard AC algorithm in that it doesn't return a set of strings in the dictionary that match the given string, instead it only returns one of them in case the string matches. The functionality of our implementation can be (precisely) described by following pseudo-c snippet.

string foo(input-string, dictionary) {
    string ret = the-end-of-input-string;
    for each string s in dictionary {
        // find the first occurrence match sub-string.
        ret = min(ret, strstr(input-string, s);
    }
    return ret;
}

It's pretty easy to get rid of this limitation, just to associate each state with a spare bit-vector depicting the set of strings recognized by that state.

You can’t perform that action at this time.