In [1]:
from lexicalrichness import LexicalRichness

In [2]:
# Enter your own text here if you prefer
text = """Measure of textual lexical diversity, computed as the mean length of sequential words in
                a text that maintains a minimum threshold TTR score.

                Iterates over words until TTR scores falls below a threshold, then increase factor
                counter by 1 and start over. McCarthy and Jarvis (2010, pg. 385) recommends a factor
                threshold in the range of [0.660, 0.750].
                (McCarthy 2005, McCarthy and Jarvis 2010)"""

In [3]:
# instantiate new text object (use the tokenizer=blobber argument to use the textblob tokenizer)
lex = LexicalRichness(text)

### Attributes

In [4]:
# Get list of words
list_of_words = lex.wordlist
print(list_of_words[:10], list_of_words[-10:])

['measure', 'of', 'textual', 'lexical', 'diversity', 'computed', 'as', 'the', 'mean', 'length'] ['factor', 'threshold', 'in', 'the', 'range', 'of', 'mccarthy', 'mccarthy', 'and', 'jarvis']


In [5]:
# Return word count (w).
lex.words

57

In [6]:
# Return (unique) word count (t).
lex.terms

39

**Type-token ratio** (TTR; Chotlos 1944, Templin 1957):
$$
TTR = \frac{t}{w}
$$
where $t$ or $t(w)$ is the number unique terms as function of the text of length $w$ words. 

In [7]:
# Return type-token ratio (TTR) of text.
lex.ttr

0.6842105263157895

**Root TTR** (RTTR; Guiraud 1954, 1960):
$$
RTTR = \frac{t}{\sqrt{w}}
$$

In [8]:
# Return root type-token ratio (RTTR) of text.
lex.rttr

5.165676192553671

**Corrected TTR** (RTTR; Guiraud 1954, 1960):
$$
CTTR = \frac{t}{\sqrt{2w}}
$$

In [9]:
# Return corrected type-token ratio (CTTR) of text.
lex.cttr

3.6526846651686067

**Herdan's C** (Herdan 1960, 1964):
$$
C = \frac{log(t)}{log(w)}
$$

In [10]:
# Return Herdan's C
lex.Herdan

0.9061378160786574

**Summer's index** (Summer 1966)
$$
Summer = \frac{log \log(t)}{log\log(w)}
$$

In [11]:
# Return Summer's index
lex.Summer

0.9294460323356605

**Dugast's index** (Dugast 1978):
$$
Dugast = \frac{log(w)^2}{log(w) - log (t)}
$$

In [12]:
# Return Dugast's index
lex.Dugast

43.074336212149774

**Maas's index** (Maas 1972):
$$
Maas = \frac{log(w) - log(t)}{log(w)^2}  
$$

In [13]:
lex.Maas

0.023215679867353005

### Methods

#### MSTTR: Mean segmental type-token ratio

* computed as average of TTR scores for segments in a text
* Split a text into segments of length segment_window. For each segment, compute the TTR. MSTTR score is the sum of these scores divided by the number of segments
* (Johnson 1944)

In [14]:
lex.msttr(
    segment_window=25  # size of each segment
)

0.88

#### MATTR: Moving average type-token ratio
* Computed using the average of TTRs over successive segments of a text
* Then take the average of all window's TTR
* (Covington 2007, Covington and McFall 2010)

In [15]:
# Return moving average type-token ratio (MATTR).
lex.mattr(
    window_size=25  # Size of each sliding window
)

0.8351515151515151