Abydos NLP/IR library for Python
Clone or download
Latest commit 6d0c88b Oct 28, 2018
Permalink
Failed to load latest commit information.
.circleci limit to py36 build/tests Sep 20, 2018
abydos Update _rle.py Oct 28, 2018
binder updated notebooks again to reflect new(old, but slightly amended) API Oct 26, 2018
data/features applied Black codestyle Oct 24, 2018
docs updated docs structure for new/old API Oct 26, 2018
helpers cleanups to satisfy flake8 Oct 25, 2018
tests Update test_phonetic_de.py Oct 28, 2018
.codeclimate.yml configure code climate Oct 20, 2018
.coveragerc specifying the module makes these unnecessary Oct 26, 2018
.gitignore moved notebooks to binder Oct 12, 2018
.gitmodules non-functional changes to improve code style Oct 11, 2018
.project added pydev project files May 10, 2014
.pydevproject project file update (?) Apr 29, 2015
.pypirc removed legacy path Oct 15, 2018
.pyup.yml do security updates only Oct 19, 2018
.travis.yml attempted to add 3.8-dev to Travis builds Oct 24, 2018
AUTHORS.rst amended my name Aug 4, 2018
CODE_OF_CONDUCT.rst trimmed lines to <=80 chars Oct 1, 2018
CODING_STANDARDS.rst removed stray line Apr 26, 2015
HISTORY.rst updated release date (expected) Oct 28, 2018
LICENSE fixed newline May 7, 2014
MANIFEST.in removed test data from distribution; added logo May 7, 2015
README.rst added missing gotoh distance Oct 28, 2018
abydos-small.png added small logo May 7, 2015
abydos.png added logo May 7, 2015
abydos.xcf added logo May 7, 2015
abydos_64x64.png added 64x64 logo/icon Sep 25, 2018
appveyor.yml add other platforms Sep 30, 2018
badge_update.py applied Black codestyle Oct 24, 2018
btest.sh removed pycodestyle from build scripts/tox/README Oct 24, 2018
cleanup.sh ignore/delete abydos.bib.bak Oct 1, 2018
pyproject.toml add rst files explicitly to avoid second copy of README.rst Oct 25, 2018
requirements.txt working on landscape.io integration Aug 2, 2018
setup.cfg fix to test doctests in files with leading _ Oct 26, 2018
setup.py single-sourced __version__ string Oct 25, 2018
tox.ini made doctest location more correctly explicit Oct 26, 2018

README.rst

Abydos

CI & Test Status Travis-CI Build Status Circle-CI Build Status AppVeyor Build Status Semaphore Build Status Coverage Status
Code Quality Code Climate Scrutinizer Codacy CodeFactor
Dependencies Requirements Status Known Vulnerabilities Updates FOSSA Status
Local Analysis Pylint Score flake8 Errors black
Usage Documentation Status Binder License: GPL v3 Libraries.io SourceRank Zenodo
Contribution CII Best Practices 'Waffle.io - Columns and their card count' OpenHUB
PyPI PyPI PyPI versions
conda-forge conda-forge conda-forge downloads conda-forge platforms

abydos


Abydos NLP/IR library
Copyright 2014-2018 by Christopher C. Little

Abydos is a library of phonetic algorithms, string distance measures & metrics, stemmers, and string fingerprinters including:

  • Phonetic algorithms
    • Robert C. Russell's Index
    • American Soundex
    • Refined Soundex
    • Daitch-Mokotoff Soundex
    • Kölner Phonetik
    • NYSIIS
    • Match Rating Algorithm
    • Metaphone
    • Double Metaphone
    • Caverphone
    • Alpha Search Inquiry System
    • Fuzzy Soundex
    • Phonex
    • Phonem
    • Phonix
    • SfinxBis
    • phonet
    • Standardized Phonetic Frequency Code
    • Statistics Canada
    • Lein
    • Roger Root
    • Oxford Name Compression Algorithm (ONCA)
    • Eudex phonetic hash
    • Haase Phonetik
    • Reth-Schek Phonetik
    • FONEM
    • Parmar-Kumbharana
    • Davidson's Consonant Code
    • SoundD
    • PSHP Soundex/Viewex Coding
    • an early version of Henry Code
    • Norphone
    • Dolby Code
    • Phonetic Spanish
    • Spanish Metaphone
    • MetaSoundex
    • SoundexBR
    • NRL English-to-phoneme
    • Beider-Morse Phonetic Matching
  • String distance metrics
    • Levenshtein distance
    • Optimal String Alignment distance
    • Levenshtein-Damerau distance
    • Hamming distance
    • Tversky index
    • Sørensen–Dice coefficient & distance
    • Jaccard similarity coefficient & distance
    • overlap similarity & distance
    • Tanimoto coefficient & distance
    • Minkowski distance & similarity
    • Manhattan distance & similarity
    • Euclidean distance & similarity
    • Chebyshev distance
    • cosine similarity & distance
    • Jaro distance
    • Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
    • Longest common substring
    • Ratcliff-Obershelp similarity & distance
    • Match Rating Algorithm similarity
    • Normalized Compression Distance (NCD) & similarity
    • Monge-Elkan similarity & distance
    • Matrix similarity
    • Needleman-Wunsch score
    • Smither-Waterman score
    • Gotoh score
    • Length similarity
    • Prefix, Suffix, and Identity similarity & distance
    • Modified Language-Independent Product Name Search (MLIPNS) similarity & distance
    • Bag distance
    • Editex distance
    • Eudex distances
    • Sift4 distance
    • Baystat distance & similarity
    • Typo distance
    • Indel distance
    • Synoname
  • Stemmers
    • the Lovins stemmer
    • the Porter and Porter2 (Snowball English) stemmers
    • Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
    • CLEF German, German plus, and Swedish stemmers
    • Caumann's German stemmer
    • UEA-Lite Stemmer
    • Paice-Husk Stemmer
    • Schinke Latin stemmer
    • S stemmer
  • String Fingerprints
    • string fingerprint
    • q-gram fingerprint
    • phonetic fingerprint
    • Pollock & Zomora's skeleton key
    • Pollock & Zomora's omission key
    • Cisłak & Grabowski's occurrence fingerprint
    • Cisłak & Grabowski's occurrence halved fingerprint
    • Cisłak & Grabowski's count fingerprint
    • Cisłak & Grabowski's position fingerprint
    • Synoname Toolcode

Installation

Required libraries:

  • Numpy
  • Six

Recommended libraries:

  • PylibLZMA (Python 2 only--for LZMA compression string distance metric)

To install Abydos (master) from Github source:

git clone https://github.com/chrislit/abydos.git --recursive
cd abydos
python setup install

If your default python command calls Python 2.7 but you want to install for Python 3, you may instead need to call:

python3 setup install

To install Abydos (latest release) from PyPI using pip:

pip install abydos

To install from conda-forge:

conda install abydos

It should run on Python 2.7 and Python 3.3-3.7.

Testing & Contributing

To run the whole test-suite just call tox:

tox

The tox setup has the following environments: py27, py36, doctest, py27-regression, py36-regression, pylint, pycodestyle, flake8, doc8, badges, docs, py27-fuzz, & py36-fuzz. So if only want to generate documentation (in HTML, EPUB, & PDF formats), just call:

tox -e docs

In order to only run & generate Flake8 reports, call:

tox -e flake8

Contributions such as bug reports, PRs, suggestions, desired new features, etc. are welcome through the Github Issues & Pull requests.