Skip to content

Commit

Permalink
Merge 3764967 into 165466b
Browse files Browse the repository at this point in the history
  • Loading branch information
chrislit committed May 21, 2019
2 parents 165466b + 3764967 commit 036af0c
Show file tree
Hide file tree
Showing 711 changed files with 92,938 additions and 5,780 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ dist/
flake8/
abydos.egg-info/
.settings/
*.bak
*.pyc
*.log
.coverage
.idea/
.tox/
binder/.ipynb_checkpoints
binder/Untitled*
docs/abydos.bib.bak

7 changes: 3 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ language: python

python:
- 2.7
- 3.3
- 3.4
- 3.5
- 3.6
Expand All @@ -12,17 +11,17 @@ matrix:
- python: 3.7
dist: xenial
sudo: true
- python: 3.8-dev
dist: xenial
sudo: true

notifications:
email: false

# Install packages
install:
- if [[ $TRAVIS_PYTHON_VERSION == 2* ]]; then travis_retry pip install pyliblzma; fi
- if [[ $TRAVIS_PYTHON_VERSION == 3* ]]; then travis_retry pip install pylzss paq; fi
- travis_retry pip install coveralls
- travis_retry pip install -r requirements.txt
- travis_retry pip install -r requirements-dev.txt
- travis_retry python setup.py install

# Run test
Expand Down
33 changes: 28 additions & 5 deletions CODING_STANDARDS.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
CODING STANDARDS
----------------

- nosetest will be used for testing
- flake8 will be used for best practice conformance
- pydocstyle will be used to ensure documentation style conformance to PEP257
(for the most part) and NumPy documentation style
- black will be used to keep code style consistent
- Nosetest will be used for testing.
- Flake8 will be used for best practice conformance.
- Pydocstyle will be used to ensure documentation style conformance to PEP257
(for the most part) and NumPy documentation style.
- Black will be used to keep code style consistent.
- 3rd party packages may be used, but must be present in both PyPI and conda
or conda-forge. They must also support all supported Python versions.

----

Expand All @@ -28,3 +30,24 @@ A git push should be performed only under the following conditions:
- test coverage is 100% according to nosetests
- flake8 and pydocstyle should report 0 issues
- black code styling has been applied


Notes on architecture
~~~~~~~~~~~~~~~~~~~~~

As of the 0.3.6 release, each major algorithm of the compression, distance,
fingerprint, phonetic, & stemmer subpackages has been moved into a class of its
own. The distance, fingerprint, phonetic, & stemmer classes each inherit from
respectively common classes that define basic methods for these four major
types of classes.

The old functional API for these subpackages has been retained for backwards
compatibility until the release of version 0.6, but its use is deprecated as
of version 0.4. New classes (those not present at the release of version 0.3.6)
will not be given functional API wrappers.

Although, as of the 0.3.6 release, many of the classes that have are pre-0.3.6
functions encapsulated in a class simply consist of a single method that
could be a static method, making these methods static is generally avoided.
As development continues, these classes will take more advantage of object
architecture to store parameters between calls and inherit from base classes.
263 changes: 263 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,268 @@ Release History
---------------


0.4.0 (2018-04-30) *dietrich*
+++++++++++++++++++++++++++++

doi:

Version 0.4.0 focuses on distance measures, adding 211 new measures. Attempts
were made to provide normalized version for measure that did not inherently
range from 0 to 1. The other major focus was the addition of 12 tokenizers, in
service of expanding distance measure options.

Changes:

- Deprecated functions that merely wrap class methods to maintain API
compatibility, for removal in 0.6.0
- Added methods to ConfusionTable to return:
- its internal representation
- false negative rate
- false omission rate
- positive & negative likelihood ratios
- diagnostic odds ratio
- error rate
- prevalence
- Jaccard index
- D-measure
- Phi coefficient
- joint, actual, & predicted entropies
- mutual information
- proficiency (uncertainty coefficient)
- information gain ratio
- dependency
- lift
- Deprecated f-measure & g-measure from ConfusionTable for removal in
0.6.0
- Added notes to indicate when functions, classes, & methods were added
- Added the following 12 tokenizers:
- QSkipgrams
- CharacterTokenizer
- RegexpTokenizer, WhitespaceTokenizer, & WordpunctTokenizer
- COrVClusterTokenizer, CVClusterTokenizer, & VCClusterTokenizer
- SonoriPyTokenizer & LegaliPyTokenizer
- NLTKTokenizer
- SAPSTokenizer
- Added the UnigramCorpus class & a facility for downloading data, such as
pre-processed/trained data, from storage on GitHub
- Added the Wåhlin phonetic encoding
- Added the following 211 similarity/distance/correlation measures:
- ALINE
- AMPLE
- Anderberg
- Andres & Marzo's Delta
- Average Linkage
- AZZOO
- Baroni-Urbani & Buser I
- Baroni-Urbani & Buser II
- Batagelj & Bren
- Baulieu I
- Baulieu II
- Baulieu III
- Baulieu IV
- Baulieu V
- Baulieu VI
- Baulieu VII
- Baulieu VIII
- Baulieu IX
- Baulieu X
- Baulieu XI
- Baulieu XII
- Baulieu XIII
- Baulieu XIV
- Baulieu XV
- Benini I
- Benini II
- Bennet
- Bhattacharyya
- BI-SIM
- BLEU
- Bloc kLevenshtein
- Brainerd-Robinson
- Braun-Blanquet
- Canberra
- Chord
- Clement
- Cohen's Kappa
- Cole
- Complete Linkage
- Consonni & Todeschini I
- Consonni & Todeschini II
- Consonni & Todeschini III
- Consonni & Todeschini IV
- Consonni & Todeschini V
- Cormode's LZ
- Covington
- Dennis
- Dice Asymmetric I
- Dice Asymmetric II
- Digby
- Dispersion
- Doolittle
- Dunning
- Eyraud
- Fager & McGowan
- Faith
- Fellegi-Sunter
- Fidelity
- Fleiss
- Fleiss-Levin-Paik
- FlexMetric
- Forbes I
- Forbes II
- Fossum
- FuzzyWuzzy Partial String
- FuzzyWuzzy Token Set
- FuzzyWuzzy Token Sort
- Generalized Fleiss
- Gilbert
- Gilbert & Wells
- Gini I
- Gini II
- Goodall
- Goodman & Kruskal's Lambda
- Goodman & Kruskal's Lambda-r
- Goodman & Kruskal's Tau A
- Goodman & Kruskal's Tau B
- Gower & Legendre
- Guttman's Lambda A
- Guttman's Lambda B
- Gwet's AC
- Hamann
- Harris & Lahey
- Hassanat
- Hawkins & Dotson
- Hellinger
- Higuera & Mico
- Hurlbert
- Iterative SubString
- Jaccard-NM
- Jensen-Shannon
- Johnson
- Kendall's Tau
- Kent & Foster I
- Kent & Foster II
- Koppen I
- Koppen II
- Kuder & Richardson
- Kuhns I
- Kuhns II
- Kuhns III
- Kuhns IV
- Kuhns V
- Kuhns VI
- Kuhns VII
- Kuhns VIII
- Kuhns IX
- Kuhns X
- Kuhns XI
- Kuhns XII
- Kulczynski I
- Kulczynski II
- Longest Common Prefix
- Longest Common Suffix
- Lorentzian
- Maarel
- Marking
- Marking Metric
- MASI
- Matusita
- Maxwell & Pilliner
- McConnaughey
- McEwen & Michael
- MetaLevenshtein
- Michelet
- MinHash
- Mountford
- Mean Squared Contingency
- Mutual Information
- NCD with LZSS
- NCD with PAQ9a
- Ozbay
- Pattern
- Pearson's Chi-Squared
- Pearson & Heron II
- Pearson II
- Pearson III
- Pearson's Phi
- Peirce
- Positional Q-Gram Dice
- Positional Q-Gram Jaccard
- Positional Q-Gram Overlap
- Q-Gram
- Quantitative Cosine
- Quantitative Dice
- Quantitative Jaccard
- Rees-Levenshtein
- Roberts
- Rogers & Tanimoto
- Rogot & Goldberg
- Rouge-L
- Rouge-S
- Rouge-SU
- Rouge-W
- Russell & Rao
- SAPS
- Scott's Pi
- Shape
- Shapira & Storer I
- Sift4 Extended
- Single Linkage
- Size
- Soft Cosine
- SoftTF-IDF
- Sokal & Michener
- Sokal & Sneath I
- Sokal & Sneath II
- Sokal & Sneath III
- Sokal & Sneath IV
- Sokal & Sneath V
- Sorgenfrei
- Steffensen
- Stiles
- Stuart's Tau
- Tarantula
- Tarwid
- Tetrachoric
- TF-IDF
- Tichy
- Tulloss's R
- Tulloss's S
- Tulloss's T
- Tulloss's U
- Unigram Subtuple
- Unknown A
- Unknown B
- Unknown C
- Unknown D
- Unknown E
- Unknown F
- Unknown G
- Unknown H
- Unknown I
- Unknown J
- Unknown K
- Unknown L
- Unknown M
- Upholt
- Warrens I
- Warrens II
- Warrens III
- Warrens IV
- Warrens V
- Weighted Jaccard
- Whittaker
- Yates' Chi-Squared
- YJHHR
- Yujian & Bo
- Yule's Q
- Yule's Q II
- Yule's Y
- Four intersection types are now supported for all distance measure that are
based on _TokenDistance. In addition to basic crisp intersections, soft,
fuzzy, and group linkage intersections have been provided.


0.3.6 (2018-11-17) *classy carl*
++++++++++++++++++++++++++++++++

Expand Down Expand Up @@ -97,6 +359,7 @@ Changes:
- UEA-Lite Stemmer
- Paice-Husk Stemmer
- Schinke Latin stemmer
- S stemmer
- Eliminated ._compat submodule in favor of six
- Transitioned from PEP8 to flake8, etc.
- Phonetic algorithms now consistently use max_length=-1 to indicate that
Expand Down
17 changes: 17 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
tox = "*"
nose = "*"
coverage = "*"
scipy = "*"
nltk = "*"
syllabipy = "*"

[packages]
numpy = "*"
six = "*"
deprecation = "*"
Loading

0 comments on commit 036af0c

Please sign in to comment.