Skip to content

cceh/super-collator

Repository files navigation

Super Collator

py39 py310 py311 py312 pypy39 coverage

Collates textual sources with relaxed spelling. Uses Gotoh's variant of the Needleman-Wunsch sequence alignment algorithm.

$ pip install super-collator
>>> from super_collator.aligner import Aligner
>>> from super_collator.ngrams import NGrams
>>> from super_collator.super_collator import to_table

>>> aligner = Aligner(-0.5, -0.5, -0.5)
>>> a = "Lorem ipsum dollar amat adipiscing elit"
>>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
>>>
>>> a = [NGrams(s).load(s, 3) for s in a.split()]
>>> b = [NGrams(s).load(s, 3) for s in b.split()]
>>>
>>> a, b, score = aligner.align(a, b, NGrams.similarity, lambda: NGrams("-"))
>>> print(to_table(list(map(str, a)), list(map(str, b))))  # doctest: +NORMALIZE_WHITESPACE
-   Lorem   ipsum -    dollar -   amat -           adipiscing elit
qui dolorem ipsum quia dolor  sit amet consectetur adipisci   velit

Documentation: https://cceh.github.io/super-collator/

PyPi: https://pypi.org/project/super-collator/

About

Collate textual sources with relaxed spelling.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published