Skip to content

Commit

Permalink
chore: update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
severinsimmler committed Aug 13, 2018
1 parent 2a576cc commit 18a768e
Showing 1 changed file with 28 additions and 30 deletions.
58 changes: 28 additions & 30 deletions src/cophi_toolbox/__init__.py
@@ -1,46 +1,44 @@
"""
cophi_toolbox
~~~~~~~~~~~~~
This is an NLP preprocessing library for handling, modeling and processing text data. You
**cophi-toolbox** is a Python library for handling, modeling and processing text. You
can easily pipe a collection of text files using the high-level API:
```
corpus, metadata = cophi_toolbox.pipe(directory="british-fiction-corpus",
pathname_pattern="**/*.txt",
encoding="utf-8",
lowercase=True,
ngrams=1,
token_pattern=r"\p{L}+\p{P}?\p{L}+")
```
.. code-block:: python
corpus, metadata = ct.pipe(directory="british-fiction-corpus",
pathname_pattern="**/*.txt",
encoding="utf-8",
lowercase=True,
ngrams=1,
token_pattern=r"\p{L}+\p{P}?\p{L}+")
There are also a plenty of complexity metrics for measuring lexical richness of (literary) texts.
Measures that use sample size and vocabulary size:
* Type-Token Ratio (:func:`ttr`).
* Guiraud’s :math:`R` (:func:`guiraud_r`).
* Herdan’s :math:`C` (:func:`herdan_c`).
* Dugast’s :math:`k` (:func:`dugast_k`).
* Maas’ :math:`a^2` (:func:`maas_a2`).
* Dugast’s :math:`U` (:func:`dugast_u`).
* Tuldava’s :math:`LN` (:func:`tuldava_ln`).
* Brunet’s :math:`W` (:func:`brunet_w`).
* Carroll’s :math:`CTTR` (:func:`cttr`).
* Summer’s :math:`S` (:func:`summer_s`).
* Type-Token Ratio :math:`TTR`
* Guiraud’s :math:`R`
* Herdan’s :math:`C`
* Dugast’s :math:`k`
* Maas’ :math:`a^2`
* Dugast’s :math:`U`
* Tuldava’s :math:`LN`
* Brunet’s :math:`W`
* Carroll’s :math:`CTTR`
* Summer’s :math:`S`
Measures that use part of the frequency spectrum:
* Honoré’s :math:`H` (:func:`honore_h`).
* Sichel’s :math:`S` (:func:`sichel_s`).
* Michéa’s :math:`M` (:func:`michea_m`).
* Honoré’s :math:`H`
* Sichel’s :math:`S`
* Michéa’s :math:`M`
Measures that use the whole frequency spectrum:
* Entropy :math:`S` (:func:`entropy`).
* Yule’s :math:`K` (:func:`yule_k`).
* Simpson’s :math:`D` (:func:`simpson_d`).
* Herdan’s :math:`V_m` (:func:`herdan_vm`).
* Entropy :math:`S`
* Yule’s :math:`K`
* Simpson’s :math:`D`
* Herdan’s :math:`V_m`
Parameters of probabilistic models:
* Orlov’s :math:`Z` (:func:`orlov_z`).
* Orlov’s :math:`Z`
"""

import logging
Expand Down

0 comments on commit 18a768e

Please sign in to comment.