In [14]:
%pip install py-readability-metrics lexical-diversity 

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


Importing packages

In [57]:
from lexical_diversity import lex_div as ld
from readability import Readability
import textstat
from textblob import TextBlob
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from collections import Counter

nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\aryam\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\aryam\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

Use flemmatize if lemmatization and tokenization required otherwise use tokenization function

In [58]:
# This example has 30 sentences, so SMOG works smaller texts(>100 words) can be used for the rest of the metrics.
text = """The mysterious fog enveloped the old, abandoned mansion. Sparkling stars decorated the night sky in a cosmic dance. Why does the moon seem to follow us wherever we go? Lost in a labyrinth of thoughts, she found solace in poetry. A mischievous squirrel stole my sandwich during the picnic! The aroma of freshly baked cookies filled the cozy kitchen. In a parallel universe, time flows backward, defying logic. Excitement bubbled within her as the roller coaster climbed higher. Have you ever wondered if clouds have secret conversations? The ancient book whispered tales of forgotten civilizations. Laughter echoed through the valleys, creating a symphony of joy. A rainbow painted the horizon after the storm passed. Beware of the talking cat with a penchant for riddles. Enigmatic shadows danced on the walls of the mysterious cave. Moonlight transformed the ordinary forest into a realm of enchantment. Is there a hidden doorway to the land of dreams? Balloons soared into the sky, carrying wishes to unknown destinations. Echoes of a bygone era lingered in the dilapidated castle. The aroma of coffee awakened memories of distant lands. Puzzled by the cryptic message, she embarked on a quest for answers. Sparkling eyes reflected the innocence of a child's laughter. Waves whispered secrets to the curious seashells on the shore. Surrounded by mirrors, the room seemed to stretch into eternity. Chasing fireflies in the summer night brought nostalgic delight. A gentle breeze carried the melody of a distant song. Sudden thunder startled the sleepy town awake. In the heart of the forest, fairies danced under the moonlight. A forgotten key unlocked a chest of ancient artifacts. Reflections in the pond revealed hidden faces of contemplation. The jigsaw puzzle of life slowly revealed its intricate design."""
tok = ld.tokenize(text)
flt = ld.flemmatize(text)


TTR Variants

In [59]:
ld.ttr(flt)

0.6401384083044983

Root TTR

Root TTR is calculated as the number of types divided by the square root of the number of tokens 

In [60]:
ld.root_ttr(flt)

10.882352941176471

Log TTR

Log TTR is calculated by dividing the logarithm of the number of word types by the logarithm of the number of word tokens

In [61]:
ld.log_ttr(flt)

0.9212782786072363

MASS Index 

Maas = (log(nTokens) - log(ntypes))/log(ntokens)^2

In [62]:
ld.maas_ttr(flt)

0.031989024503587024

Mean-Segmental Type-Token Ratio (MSTTR) 

It is the average TTR for successive segments of text containing a standard number of word tokens 

In [63]:
ld.msttr(flt,window_length=25)

0.869090909090909

Moving-Average Type-Token Ratio

MSTTR computes TTR values for equal-sized segments out of the original text and averages the values for each non-overlapping segments.

In [64]:
ld.mattr(flt,window_length=25)

0.8662641509433964

Hypergeometric distribution D (HDD)

For each word type in a text, HD-D uses the hypergeometric distribution to calculate the probability of encountering one of its tokens in a random sample of 42 tokens. 

In [65]:
ld.hdd(flt)

0.8087763082825484

Lexical Diversity Scores

MTLD Score

In [66]:
ld.mtld(flt)

127.41249304396217

Measure of lexical textual diversity (moving average, bi-directional)

Revised MTLD procedure that takes a moving-average approach to compute factors. Bidirectional means that the same procedure is repeated in backward, from the last token in the text. The final value is calculated as the average factor lengths out of all the factors.

In [67]:
ld.mtld_ma_bid(flt)

100.25120430107526

Moving-average wrapped MTLD

Like MTLD-MA-Bi, it takes a moving-average approach to create factors. However, instead of working through the text in both directions, MTLD-MA-Wrap avoids partial factors by looping back to the text's beginning.

In [68]:
ld.mtld_ma_bid(flt)

100.25120430107526

Measure of lexical textual diversity (moving average, wrap)

Calculates MTLD using a moving window approach. Instead of calculating partial factors, it wraps to the beginning of the text to complete the last factors.

In [69]:
ld.mtld_ma_wrap(flt)
33.68333333333333

33.68333333333333

Readability Scores

Instantiate reader

In [70]:
r = Readability(text)

Flesch-Kincaid Grade Level

In [71]:
fk = r.flesch_kincaid()
print(fk.score)
print(fk.grade_level)

8.152502283105026
8


Flesch Reading Ease

In [72]:
f = r.flesch()
print(f.score)
print(f.ease)
print(f.grade_levels)

53.25155707762559
fairly_difficult
['10', '11', '12']


Dale Chall Readability

In [73]:
dc = r.dale_chall()
print(dc.score)
print(dc.grade_levels)

8.283074703196348
['11', '12']


Automated Readability Index (ARI)

In [74]:
ari = r.ari()
print(ari.score)
print(ari.grade_levels)
print(ari.ages)

7.4221803652968035
['8']
[13, 14]


Coleman Liau Index

In [75]:
cl = r.coleman_liau()
print(cl.score)
print(cl.grade_level)

11.102602739726027
11


Gunning Fog

In [76]:
gf = r.gunning_fog()
print(gf.score)
print(gf.grade_level)

9.783744292237444
10


SPACHE

In [77]:
s = r.spache()
print(s.score)
print(s.grade_level)

5.716194520547944
6


Linsear Write

In [78]:
lw = r.linsear_write()
print(lw.score)
print(lw.grade_level)

5.433333333333334
5


SMOG (works for a minimum of 30 sentences)

In [79]:
s = r.smog(all_sentences=True)
print(s.score)
print(s.grade_level)

10.279547748218288
10


LIX Readability Formula 

LIX = total words/total sentences + (total long words(>6) x 100)/total words

20-25 : Very Easy

30-35 : Easy

40-45 : Medium

50-55 : Difficult

60 above : Very Difficult

Source - https://originality.ai/blog/lix-readability-formula#:~:text=To%20compute%20Lix%20scores%2C%20these,average%20words%20in%20the%20sentence.

https://readable.com/blog/the-lix-and-rix-readability-formulas/

In [80]:
def calculate_lix(text):
    words = text.split()
    long_words = [word for word in words if len(word) > 6]
    num_words = len(words)
    num_sentences = text.count('.') + text.count('!') + text.count('?')
    lix = num_words / num_sentences + (float(len(long_words)) * 100) / num_words
    return lix

text = """The mysterious fog enveloped the old, abandoned mansion. Sparkling stars decorated the night sky in a cosmic dance. Why does the moon seem to follow us wherever we go? Lost in a labyrinth of thoughts, she found solace in poetry. A mischievous squirrel stole my sandwich during the picnic! The aroma of freshly baked cookies filled the cozy kitchen. In a parallel universe, time flows backward, defying logic. Excitement bubbled within her as the roller coaster climbed higher. Have you ever wondered if clouds have secret conversations? The ancient book whispered tales of forgotten civilizations. Laughter echoed through the valleys, creating a symphony of joy. A rainbow painted the horizon after the storm passed. Beware of the talking cat with a penchant for riddles. Enigmatic shadows danced on the walls of the mysterious cave. Moonlight transformed the ordinary forest into a realm of enchantment. Is there a hidden doorway to the land of dreams? Balloons soared into the sky, carrying wishes to unknown destinations. Echoes of a bygone era lingered in the dilapidated castle. The aroma of coffee awakened memories of distant lands. Puzzled by the cryptic message, she embarked on a quest for answers. Sparkling eyes reflected the innocence of a child's laughter. Waves whispered secrets to the curious seashells on the shore. Surrounded by mirrors, the room seemed to stretch into eternity. Chasing fireflies in the summer night brought nostalgic delight. A gentle breeze carried the melody of a distant song. Sudden thunder startled the sleepy town awake. In the heart of the forest, fairies danced under the moonlight. A forgotten key unlocked a chest of ancient artifacts. Reflections in the pond revealed hidden faces of contemplation. The jigsaw puzzle of life slowly revealed its intricate design."""
print(calculate_lix(text))

45.273471741637834


McAlpine EFLAW Readability Score

Returns a score for the readability of an english text for a foreign learner or English, focusing on the number of miniwords and length of sentences.

It is recommended to aim for a score equal to or lower than 25.

Source: https://strainindex.wordpress.com/2009/04/30/mcalpine-eflaw-readability-score/

In [81]:
textstat.mcalpine_eflaw(text)

13.2

Reading Time for the given text(seconds)

In [82]:
textstat.reading_time(text, ms_per_char=14.69)

22.37

Formality Score

 F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq.– verb freq. – adverb freq. – interjection freq. + 100)/2

 The frequencies are here expressed as percentages of the number of words belonging to
 a particular category with respect to the total number of words in the excerpt

 Source : https://www.researchgate.net/profile/Francis-Heylighen/publication/2420048_Formality_of_Language_definition_measurement_and_behavioral_determinants/links/0912f50584d98e852d000000/Formality-of-Language-definition-measurement-and-behavioral-determinants.pdf

In [83]:
def calculate_formality_score(text):
    # Tokenize an tag
    words = word_tokenize(text)
    pos_tags = pos_tag(words)
    
    pos_counts = Counter(tag for word, tag in pos_tags)
    print(pos_counts)
    
    # Calculate the frequencies as percentages
    total_words = len(words)
    noun_freq = (pos_counts['NN'] + pos_counts['NNS'] + pos_counts['NNP'] + pos_counts['NNPS']) / total_words * 100
    adjective_freq = (pos_counts['JJ'] + pos_counts['JJR'] + pos_counts['JJS']) / total_words * 100
    preposition_freq = pos_counts['IN'] / total_words * 100
    article_freq = (pos_counts['DT'] + pos_counts['WDT']) / total_words * 100
    pronoun_freq = (pos_counts['PRP'] + pos_counts['PRP$'] + pos_counts['WP'] + pos_counts['WP$']) / total_words * 100
    verb_freq = (pos_counts['VB'] + pos_counts['VBD'] + pos_counts['VBG'] + pos_counts['VBN'] + pos_counts['VBP'] + pos_counts['VBZ']) / total_words * 100
    adverb_freq = (pos_counts['RB'] + pos_counts['RBR'] + pos_counts['RBS']) / total_words * 100
    interjection_freq = pos_counts['UH'] / total_words * 100
    
    # Formality score formula
    F = (noun_freq + adjective_freq + preposition_freq + article_freq - pronoun_freq - verb_freq - adverb_freq - interjection_freq + 100) / 2
    
    return F

text = """The mysterious fog enveloped the old, abandoned mansion. Sparkling stars decorated the night sky in a cosmic dance. Why does the moon seem to follow us wherever we go? Lost in a labyrinth of thoughts, she found solace in poetry. A mischievous squirrel stole my sandwich during the picnic! The aroma of freshly baked cookies filled the cozy kitchen. In a parallel universe, time flows backward, defying logic. Excitement bubbled within her as the roller coaster climbed higher. Have you ever wondered if clouds have secret conversations? The ancient book whispered tales of forgotten civilizations. Laughter echoed through the valleys, creating a symphony of joy. A rainbow painted the horizon after the storm passed. Beware of the talking cat with a penchant for riddles. Enigmatic shadows danced on the walls of the mysterious cave. Moonlight transformed the ordinary forest into a realm of enchantment. Is there a hidden doorway to the land of dreams? Balloons soared into the sky, carrying wishes to unknown destinations. Echoes of a bygone era lingered in the dilapidated castle. The aroma of coffee awakened memories of distant lands. Puzzled by the cryptic message, she embarked on a quest for answers. Sparkling eyes reflected the innocence of a child's laughter. Waves whispered secrets to the curious seashells on the shore. Surrounded by mirrors, the room seemed to stretch into eternity. Chasing fireflies in the summer night brought nostalgic delight. A gentle breeze carried the melody of a distant song. Sudden thunder startled the sleepy town awake. In the heart of the forest, fairies danced under the moonlight. A forgotten key unlocked a chest of ancient artifacts. Reflections in the pond revealed hidden faces of contemplation. The jigsaw puzzle of life slowly revealed its intricate design."""
formality_score = calculate_formality_score(text)
print(formality_score)

Counter({'NN': 67, 'DT': 49, 'IN': 44, '.': 30, 'JJ': 29, 'NNS': 28, 'VBD': 24, ',': 9, 'VBN': 8, 'VBG': 7, 'PRP': 6, 'TO': 5, 'VBZ': 4, 'VBP': 4, 'RB': 4, 'NNP': 3, 'VB': 2, 'PRP$': 2, 'WRB': 1, 'JJR': 1, 'POS': 1, 'JJS': 1})
74.46808510638299
