Skip to content

Commit

Permalink
Merge f5f96d5 into 165466b
Browse files Browse the repository at this point in the history
  • Loading branch information
chrislit committed May 29, 2019
2 parents 165466b + f5f96d5 commit d85e9e5
Show file tree
Hide file tree
Showing 718 changed files with 93,196 additions and 5,987 deletions.
1 change: 1 addition & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ jobs:
. venv/bin/activate
echo "tox" >> requirements.txt
pip install -r requirements.txt
pip install -r requirements-dev.txt
- save_cache:
paths:
Expand Down
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ dist/
flake8/
abydos.egg-info/
.settings/
*.bak
*.pyc
*.log
.coverage
.idea/
.tox/
binder/.ipynb_checkpoints
binder/Untitled*
docs/abydos.bib.bak

8 changes: 4 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ language: python

python:
- 2.7
- 3.3
- 3.4
- 3.5
- 3.6
Expand All @@ -12,17 +11,18 @@ matrix:
- python: 3.7
dist: xenial
sudo: true
- python: 3.8-dev
dist: xenial
sudo: true

notifications:
email: false

# Install packages
install:
- if [[ $TRAVIS_PYTHON_VERSION == 2* ]]; then travis_retry pip install pyliblzma; fi
- case "$TRAVIS_PYTHON_VERSION" in 3.5|3.6|3.7) travis_retry pip install cython;; esac
- case "$TRAVIS_PYTHON_VERSION" in 3.5|3.6|3.7) travis_retry pip install paq lzss;; esac
- travis_retry pip install coveralls
- travis_retry pip install -r requirements.txt
- travis_retry pip install -r requirements-dev.txt
- travis_retry python setup.py install

# Run test
Expand Down
33 changes: 28 additions & 5 deletions CODING_STANDARDS.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
CODING STANDARDS
----------------

- nosetest will be used for testing
- flake8 will be used for best practice conformance
- pydocstyle will be used to ensure documentation style conformance to PEP257
(for the most part) and NumPy documentation style
- black will be used to keep code style consistent
- Nosetest will be used for testing.
- Flake8 will be used for best practice conformance.
- Pydocstyle will be used to ensure documentation style conformance to PEP257
(for the most part) and NumPy documentation style.
- Black will be used to keep code style consistent.
- 3rd party packages may be used, but must be present in both PyPI and conda
or conda-forge. They must also support all supported Python versions.

----

Expand All @@ -28,3 +30,24 @@ A git push should be performed only under the following conditions:
- test coverage is 100% according to nosetests
- flake8 and pydocstyle should report 0 issues
- black code styling has been applied


Notes on architecture
~~~~~~~~~~~~~~~~~~~~~

As of the 0.3.6 release, each major algorithm of the compression, distance,
fingerprint, phonetic, & stemmer subpackages has been moved into a class of its
own. The distance, fingerprint, phonetic, & stemmer classes each inherit from
respectively common classes that define basic methods for these four major
types of classes.

The old functional API for these subpackages has been retained for backwards
compatibility until the release of version 0.6, but its use is deprecated as
of version 0.4. New classes (those not present at the release of version 0.3.6)
will not be given functional API wrappers.

Although, as of the 0.3.6 release, many of the classes that have are pre-0.3.6
functions encapsulated in a class simply consist of a single method that
could be a static method, making these methods static is generally avoided.
As development continues, these classes will take more advantage of object
architecture to store parameters between calls and inherit from base classes.
192 changes: 192 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,197 @@ Release History
---------------


0.4.0 (2018-05-30) *dietrich*
+++++++++++++++++++++++++++++

doi:

Version 0.4.0 focuses on distance measures, adding 211 new measures. Attempts
were made to provide normalized version for measure that did not inherently
range from 0 to 1. The other major focus was the addition of 12 tokenizers, in
service of expanding distance measure options.

Changes:

- Support for Python 3.3 was dropped.
- Deprecated functions that merely wrap class methods to maintain API
compatibility, for removal in 0.6.0
- Added methods to ConfusionTable to return:
- its internal representation
- false negative rate
- false omission rate
- positive & negative likelihood ratios
- diagnostic odds ratio
- error rate
- prevalence
- Jaccard index
- D-measure
- Phi coefficient
- joint, actual, & predicted entropies
- mutual information
- proficiency (uncertainty coefficient)
- information gain ratio
- dependency
- lift
- Deprecated f-measure & g-measure from ConfusionTable for removal in
0.6.0
- Added notes to indicate when functions, classes, & methods were added
- Added the following 12 tokenizers:
- QSkipgrams
- CharacterTokenizer
- RegexpTokenizer, WhitespaceTokenizer, & WordpunctTokenizer
- COrVClusterTokenizer, CVClusterTokenizer, & VCClusterTokenizer
- SonoriPyTokenizer & LegaliPyTokenizer
- NLTKTokenizer
- SAPSTokenizer
- Added the UnigramCorpus class & a facility for downloading data, such as
pre-processed/trained data, from storage on GitHub
- Added the Wåhlin phonetic encoding
- Added the following 211 similarity/distance/correlation measures:
- ALINE
- AMPLE
- Anderberg
- Andres & Marzo's Delta
- Average Linkage
- AZZOO
- Baroni-Urbani & Buser I & II
- Batagelj & Bren
- Baulieu I-XV
- Benini I & II
- Bennet
- Bhattacharyya
- BI-SIM
- BLEU
- Block Levenshtein
- Brainerd-Robinson
- Braun-Blanquet
- Canberra
- Chord
- Clement
- Cohen's Kappa
- Cole
- Complete Linkage
- Consonni & Todeschini I-V
- Cormode's LZ
- Covington
- Dennis
- Dice Asymmetric I & II
- Digby
- Dispersion
- Doolittle
- Dunning
- Eyraud
- Fager & McGowan
- Faith
- Fellegi-Sunter
- Fidelity
- Fleiss
- Fleiss-Levin-Paik
- FlexMetric
- Forbes I & II
- Fossum
- FuzzyWuzzy Partial String
- FuzzyWuzzy Token Set
- FuzzyWuzzy Token Sort
- Generalized Fleiss
- Gilbert
- Gilbert & Wells
- Gini I & II
- Goodall
- Goodman & Kruskal's Lambda
- Goodman & Kruskal's Lambda-r
- Goodman & Kruskal's Tau A & B
- Gower & Legendre
- Guttman's Lambda A & B
- Gwet's AC
- Hamann
- Harris & Lahey
- Hassanat
- Hawkins & Dotson
- Hellinger
- Higuera & Mico
- Hurlbert
- Iterative SubString
- Jaccard-NM
- Jensen-Shannon
- Johnson
- Kendall's Tau
- Kent & Foster I & II
- Koppen I & II
- Kuder & Richardson
- Kuhns I-XII
- Kulczynski I & II
- Longest Common Prefix
- Longest Common Suffix
- Lorentzian
- Maarel
- Marking
- Marking Metric
- MASI
- Matusita
- Maxwell & Pilliner
- McConnaughey
- McEwen & Michael
- MetaLevenshtein
- Michelet
- MinHash
- Mountford
- Mean Squared Contingency
- Mutual Information
- NCD with LZSS
- NCD with PAQ9a
- Ozbay
- Pattern
- Pearson's Chi-Squared
- Pearson & Heron II
- Pearson II & III
- Pearson's Phi
- Peirce
- Positional Q-Gram Dice, Jaccard, & Overlap
- Q-Gram
- Quantitative Cosine, Dice, & Jaccard
- Rees-Levenshtein
- Roberts
- Rogers & Tanimoto
- Rogot & Goldberg
- Rouge-L, -S, -SU, & -W
- Russell & Rao
- SAPS
- Scott's Pi
- Shape
- Shapira & Storer I
- Sift4 Extended
- Single Linkage
- Size
- Soft Cosine
- SoftTF-IDF
- Sokal & Michener
- Sokal & Sneath I-V
- Sorgenfrei
- Steffensen
- Stiles
- Stuart's Tau
- Tarantula
- Tarwid
- Tetrachoric
- TF-IDF
- Tichy
- Tulloss's R, S, T, & U
- Unigram Subtuple
- Unknown A-M
- Upholt
- Warrens I-V
- Weighted Jaccard
- Whittaker
- Yates' Chi-Squared
- YJHHR
- Yujian & Bo
- Yule's Q, Q II, & Y
- Four intersection types are now supported for all distance measure that are
based on _TokenDistance. In addition to basic crisp intersections, soft,
fuzzy, and group linkage intersections have been provided.


0.3.6 (2018-11-17) *classy carl*
++++++++++++++++++++++++++++++++

Expand Down Expand Up @@ -97,6 +288,7 @@ Changes:
- UEA-Lite Stemmer
- Paice-Husk Stemmer
- Schinke Latin stemmer
- S stemmer
- Eliminated ._compat submodule in favor of six
- Transitioned from PEP8 to flake8, etc.
- Phonetic algorithms now consistently use max_length=-1 to indicate that
Expand Down
17 changes: 17 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
tox = "*"
nose = "*"
coverage = "*"
scipy = "*"
nltk = "*"
syllabipy = "*"

[packages]
numpy = "*"
six = "*"
deprecation = "*"

0 comments on commit d85e9e5

Please sign in to comment.