Subsequential Finite State Transducer

Given an input text, produces a new text by applying a fixed set of rewrite rules. The algorithm builds a minimal subsequential transducer and uses the "leftmost largest match" replacement strategy with skips. No overlap between the replaced parts is possible. The time needed to compute the transducer is linear in the size of the input dictionary. For any text t of length |t| the time it takes to perform a rewrite is also linear O(|t|+|t'|) where t' denotes the resulting output string.
Check out the Online Sandbox.

Usage

npm i --save ssfst

Example: Text Rewriting

const ssfst = require('ssfst');

const spellingCorrector = ssfst.init([
    { input: 'acheive', output: 'achieve'},
    { input: 'arguement', output: 'argument'},
    { input: 'independant', output: 'independent'},
    { input: 'posession', output: 'possession'},
    { input: 'mercy less', output: 'merciless' }
]);

spellingCorrector.process('independant'); // => "independent"
spellingCorrector.process('mercy less arguement'); // => "merciless argument"
spellingCorrector.process('they acheived a lot'); // => "they achieved a lot"

The init factory function takes a collection of pairs and returns a transducer. The transducer can be initialized by any iterable object.

function* dictGen() {
    yield { input: 'dog', output: '<a href="https://en.wikipedia.org/wiki/Dog">dog</a>' };
    yield { input: 'fox', output: '<a href="https://en.wikipedia.org/wiki/Fox">fox</a>' };
}

const transducer = ssfst.init(dictGen());
transducer.process('The quick brown fox jumped over the lazy dog.');
/* => The quick brown <a href="https://en.wikipedia.org/wiki/Fox">fox</a> jumped over the lazy <a href="https://en.wikipedia.org/wiki/Dog">dog</a>. */

Working with large datasets

Loading the full rewrite dictionary in memory is not optimal when working with large datasets. In this case we want to build the transducer by adding the entries asynchronously one at a time. This is achieved by using an async iterable.

For example, if our dataset is stored in a file, we can read its contents one line at a time.

Berlin,Germany
Buenos Aires,Argentina
London,United Kingdom
Sofia,Bulgaria
Tokyo,Japan

This is the dictionary text file. Each line contains an entry and its input and output values are separated by a comma. We implement a generator function which reads it asynchronously line by line and yields an object which is consumed by the initialization of the transducer.

const fs = require('fs');
const readline = require('readline');
const ssfst = require('ssfst');

async function* readLinesGenAsync() {
    const lineReader = readline.createInterface({
        input: fs.createReadStream(__dirname + '/capitals.txt')
    });

    for await (const line of lineReader) {
        const [input, output] = line.split(',');
        yield { input, output };
    }
}

We pass the async iterable to the initAsync factory function.

const transducer = await ssfst.initAsync(readLinesGenAsync());

Example: Key-Value Store

The subsequential transducer can also be used to efficiently store key-value pairs.

const val = transducer.process('Sofia'); // => Bulgaria
const invalid = transducer.process('Unknown Key'); // => Unknown Key

If there's no value for a given key, it will return the key itself, which simply reduces to processing a text without applying any rewrite rules.

Use with TypeScript

import * as ssfst from 'ssfst';

Run Locally

git clone https://github.com/deniskyashif/ssfst.git
cd ssfst
npm i

Sample implementations can be found in examples/.

Run the Tests

npm t

References

This implementation follows the construction presented in "Efficient Dictionary-Based Text Rewriting using Subsequential Transducers" by S. Mihov, K. Schulz

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
examples		examples
spec		spec
src		src
.codeclimate.yml		.codeclimate.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
index.d.ts		index.d.ts
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subsequential Finite State Transducer

Usage

Example: Text Rewriting

Working with large datasets

Example: Key-Value Store

Use with TypeScript

Run Locally

Run the Tests

References

About

Releases

Packages

Contributors 3

Languages

License

deniskyashif/ssfst

Folders and files

Latest commit

History

Repository files navigation

Subsequential Finite State Transducer

Usage

Example: Text Rewriting

Working with large datasets

Example: Key-Value Store

Use with TypeScript

Run Locally

Run the Tests

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages