In [1]:
import re  #just in case
import nltk
from nltk.tokenize import SyllableTokenizer  #for tokenizing syllables
from nltk.tokenize import RegexpTokenizer   #for word tokenizing to ignore punctuation
import itertools #used later to flatten/merge a list
import pickle

In [2]:
import phonemizer
from phonemizer.backend.espeak.wrapper import EspeakWrapper

# Must point to eSpeak NG's library dll
EspeakWrapper.set_library(r'C:\Program Files\eSpeak NG\libespeak-ng.dll')

from phonemizer.backend import EspeakBackend
from phonemizer.punctuation import Punctuation

### How Jargon-y is DND?

To take a look, I grabbed my copy of the Player's Handbook (PHB) and converted it to a text file. I won't be sharing since the copyright belongs to Wizards of the Coast, but I will be discussing my findings here. 

1. [Flesch–Kincaid readability test](#Flesch–Kincaid-readability-test)
    - [Flesch–Kincaid Calculations](#Running-the-Calculations)
    - [Results](#Results)
<br>
<br>
2. [SMOG Readability Test](#SMOG-Readability-Test)
    - [Results](#SMOG-Results)
<br>
<br>
3. [Readability Thoughts](#Thoughts-about-Readability-Scores)

In [3]:
with open('../data/PHB.txt', 'r', encoding='utf-8') as f:
    text = f.read()

In [4]:
len(text)

1220025

In [5]:
text2 = Punctuation(';:,.!\'"?()-').remove(text)

In [6]:
tokenizer = RegexpTokenizer(r'\w+')
phbtext = tokenizer.tokenize(text2)

Our total tokenized word count for the Player's Handbook is below. This excludes punctuation unlike in the usual nltk tokenizing, since the measurements I'm going to be using don't need that information. There's a short glance at a small section to get a feel for what we're looking at. The transformation into a .txt file was not perfect because of some text formatting in the original form, but I did the best I could without spending an excessive amount of time correcting every single issue. 

In [7]:
len(phbtext) #total words (roughly)

215058

In [8]:
phbtext[8000:8020]

['you',
 'a',
 'different',
 'way',
 'to',
 'calculate',
 'your',
 'AC',
 'If',
 'you',
 'have',
 'multiple',
 'features',
 'that',
 'give',
 'you',
 'different',
 'ways',
 'to',
 'calculate']

## Flesch–Kincaid readability test
<img src=https://wikimedia.org/api/rest_v1/media/math/render/svg/bd4916e193d2f96fa3b74ee258aaa6fe242e110e>

This readability test was created to measure how easy a text is to read, it was formulated for use on technical manuals for the military and has been adopted into the educational field as well. The formula above results in a score, the higher the score, the easier the text is to read. There is a scale from 0-100 raning from 5th grade to Professional.

[source](https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests)

This comes with a translation of the score: The Grade Level Score
<img src=https://readable.com/wp-content/uploads/2017/01/fleschkincaidchart.png>

[source](https://readable.com/readability/flesch-reading-ease-flesch-kincaid-grade-level/)

#### Syllables

In [9]:
phbtext = [w.lower() for w in phbtext]

In [10]:
backend = EspeakBackend('en-us')
phbtextphon = backend.phonemize(phbtext)

In [11]:
phbtextphon[:10]

['kɑːntɛnts ',
 'pɹɛfəs ',
 'fɔːɹ ',
 'ɪntɹədʌkʃən ',
 'faɪv ',
 'wɜːldz ',
 'ʌv ',
 'ɐdvɛntʃɚ ',
 'faɪv ',
 'juːzɪŋ ']

Cool! This thing phonemizes numerals too! 

I was having a minor issue with cutting a phonemized word down to only its vowels to equate to number of syllables in the cases of dipthongs. It was splitting the diphthongs into two and adding to the syllable count. 

Because of that, what I decided to do was to abbreviate any dipthongs into just the first vowel of the two. It won't be the exact accurate representation of the vowels, but we don't need the exact representation of the vowels, we need to see the count of syllables, and that should work. 

In [12]:
phbtextphon = [(re.sub(r'(.*)eɪ(.*)', r'\1e\2', w)) for w in phbtextphon]
phbtextphon = [(re.sub(r'(.*)oʊ(.*)', r'\1o\2', w)) for w in phbtextphon]
phbtextphon = [(re.sub(r'(.*)aɪ(.*)', r'\1a\2', w)) for w in phbtextphon]
phbtextphon = [(re.sub(r'(.*)aʊ(.*)', r'\1a\2', w)) for w in phbtextphon]
phbtextphon = [(re.sub(r'(.*)ɔɪ(.*)', r'\1ɔ\2', w)) for w in phbtextphon]

In [13]:
vowels = ['ᵻ', 'ɪ', 'e', 'ɛ', 'æ', 'ʌ', 'ə', 'u', 'ʊ', 'ɔ', 'ɑ', 'i', 'ɐ', 'ɚ', 'ɜ', 'a', 'o']

In [14]:
phbvowels = [[c for c in w if c in vowels] for w in phbtextphon]

In [15]:
phbvowels[:10]

[['ɑ', 'ɛ'],
 ['ɛ', 'ə'],
 ['ɔ'],
 ['ɪ', 'ə', 'ʌ', 'ə'],
 ['a'],
 ['ɜ'],
 ['ʌ'],
 ['ɐ', 'ɛ', 'ɚ'],
 ['a'],
 ['u', 'ɪ']]

Since this phonemizer captures numbers too and they won't be affecting our totals, I'm not going to be using the "no numbers" version of the text like I did in my first test. This captures things like page numbers in the table of contents which is maybe not perfect, but there are a lot of numbers throughout the text that are important parts of the text: calculating armor class, ability scores, rolling dice with different numbers of sides, so I want to keep them in.

In [16]:
count = [len(w) for w in phbvowels]

In [17]:
count[:10]

[2, 2, 1, 4, 1, 1, 1, 3, 1, 2]

In [18]:
sum(count)  #sum of all values = count of all syllables

332504

Now this is interesting! 332,504. 

In the previous attempt for the full text including numbers there were 350,211 syllables counted, and without numbers there were 341,505. 

#### Sentences

NLTK is still the best bet for sentence tokenizing. And, again, we can't be 100% positive that this is entirely accurate. Some of these things are table of contents, some may be sentence fragments, etc. I feel confident this output is more accurate than the syllable tokenizer.

Remember, tokenizing sentences on the "text" before we removed the punctuation.

In [19]:
sents = nltk.sent_tokenize(text)

In [20]:
sents[400:405]

['Not all characters wear armor or carry shields, however.',
 'Without armor or a shield, your character’s AC equals 10 + his or her Dexterity modifier.',
 'If your character wears armor, carries a shield, or both, calculate your AC using the rules in chapter 5.',
 'Record your AC on your character sheet.',
 'Your character needs to be proficient with armor and shields to wear and use them effectively, and your armor and shield proficiencies are determined by your class.']

In [21]:
len(sents)

11046

## Running the Calculations

### First: PHB text including numbers

1. Reading Ease
2. Grade Level Score

In [22]:
206.853-(1.015*(215058/11046))-(84.6*(332504/215058))

56.29047841319809

In [23]:
(0.39*(215058/11046))+(11.8*(332504/215058))-15.59

10.247166031973116

## Results (Part 2)

Now this is really really interesting! 

As a reminder of what the first score looked like, with numbers (as numerals) included there was a rating of 49 and without numbers a more difficult score of 47. That was in the easier end of the "difficult to read" level/College reading level. 

With this method, a score of 56 puts it very slightly on the easier end of the "fairly difficult to read" scale (between 50-60) with a range of 10-12th grade. The grade score seems to reasonably match with a score of 10 = 1-th grade level.

Again a reminder of other common texts scores:
-Time Magazine : 52
-Moby Dick : 57.9
    - one particularly long sentence about sharks in chapter 64 has a readability score of −146.77
-Highest (easiest) possible score is 121.22, every sentence must use only 1 syllable words (think Dr. Seuss!)

### Thoughts

While I still have my holdouts about the readability test in general, it's fascinating to see that the scored changed this much with a better syllable counting method alone! Everything else about the process is exactly the same. I think the easier score meshes better with my understanding and familiarity with the PHB.

## SMOG Readability Test

G. Harry McLaughlin created the SMOG (Simple Measure of Gobbledygook) in 1969 to measure text readability. There is a full breakdown of the formula [here](https://readabilityformulas.com/the-smog-readability-formula/), but it functions similarly to the Flesch–Kincaid test, and I'd like to compare.
<img src=https://readabilityformulas.com/wp-content/uploads/01-SMOG-readability-formula.png>

The SMOG test is made to be tested on 3 groups of ten sentences, from the beginning, middle, and end of a text, so I'll take some samples rather than the full text. Now, this is likely because the rest was being done by hand at the time of its creation, and this is easier to do than doing the whole book, but these are the instructions and so I will stick to them.

So this formula basically can be simplified down to SQ RT of the total number of polysyllabic words plus 3. Since we're sampling only 30 sentences 30/30 is 1 anyway... There are simplified instructions linked below. 

Link to Ohio State instructions PDF [here](https://ogg.osu.edu/media/documents/health_lit/WRRSMOG_Example.pdf)

### Sampling from the text

Knowing the beginning of the book is table of contents and the end is appendix, I want to select for a good option of 30 sentences. I'll do some searching and concatenating into a list!

In [24]:
early = sents[84:94] #some introductory stuff about DND
early 

['Because the DM can improvise to react to anything the players attempt, DND is infinitely flexible, and each adventure can be exciting and unexpected.',
 'The game has no real end; when one story or quest wraps up, another one can begin, creating an ongoing story called a campaign.',
 'Many people who play the game keep their campaigns going for months or years, meeting with their friends every week or so to pick up the story where they left off.',
 'The adventurers grow in might as the campaign continues.',
 'Each monster defeated, each adventure completed, and each treasure recovered not only adds to the continuing story, but also earns the adventurers new capabilities.',
 'This increase in power is reflected by an adventurer’s level.',
 'There’s no winning and losing in the Dungeons N Dragons game—at least, not the way those terms are usually understood.',
 'Together, the DM and the players create an exciting story of bold adventurers who confront deadly perils.',
 'sometimes an ad

In [25]:
late = sents[9500:9510] #spells and spell descriptions
late  

['Each target must make a Wisdom saving throw and falls unconscious for 10 minutes on a failed save.',
 'A creature awakens if it takes damage or if someone uses an action to shake or slap it awake.',
 'Stunning.',
 'Each target must make a Wisdom saving throw and becomes stunned for 1 minute on a failed save.',
 'Tasha ’s Hideous Laughter 1st-level enchantment Casting Time: 1 action Range: 30 feet Components: V, S, M (tiny tarts and a feather that is waved in the air) Duration: Concentration, up to 1 minute A creature of your choice that you can see within range perceives everything as hilariously funny and falls into fits of laughter if this spell affects it.',
 'The target must succeed on a Wisdom saving throw or fall prone, becoming incapacitated and unable to stand up for the duration.',
 'A creature with an Intelligence score of 4 or less isn’t affected.',
 'At the end of each of its turns, and each time it takes damage, the target can make another Wisdom saving throw.',
 'The ta

In [26]:
mid = sents[4703:4713] #something around the middle point - looks like information about shopping/money
mid  

['Only merchants, adventurers, and those offering professional services for hire commonly deal in coins.',
 'Coinage Common coins come in several different denominations based on the relative worth of the metal from which they are made.',
 'The three most common coins are the gold piece (gp), the silver piece (sp), and the copper piece (cp).',
 'With one gold piece, a character can buy a belt pouch, 50 feet of good rope, or a goat.',
 'A skilled (but not exceptional) artisan can earn one gold piece a day.',
 'The gold piece is the standard unit of measure for wealth, even if the coin itself is not commonly used.',
 'When merchants discuss deals that involve goods or services worth hundreds or thousands of gold pieces, the transactions don’t usually involve the exchange of individual coins.',
 'Rather, the gold piece is a standard measure of value, and the actual exchange is in gold bars, letters of credit, or valuable goods.',
 'One gold piece is worth ten silver pieces, the most preva

In [27]:
testvals = early+mid+late
len(testvals) #30 total sentences

30

### Polysyllabic Words

For the SMOG test, these are defined as words 3 syllables or longer

This is the test that I previously ran with the syllablizer and then secondarily by hand to compare and found a difference of 30 3+ syllable words. It counted 96 and when I counted by hand only found 66.

I wanted to try it again this way to see if it's perfect, or if my method here still has room for improvement.

In [28]:
wtoks = [tokenizer.tokenize(s) for s in testvals] #tokenized sents

In [29]:
merged = list(itertools.chain(*wtoks))
len(merged) #merged into one long list, 602 words in 30 sentences

602

In [30]:
polysyl = backend.phonemize(merged)

In [31]:
polysyl[40:50]

['bɪɡɪn ',
 'kɹiːeɪɾɪŋ ',
 'æn ',
 'ɑːŋɡoʊɪŋ ',
 'stɔːɹi ',
 'kɔːld ',
 'eɪ ',
 'kæmpeɪn ',
 'mɛni ',
 'piːpəl ']

In [32]:
polysyl = [(re.sub(r'(.*)eɪ(.*)', r'\1e\2', w)) for w in polysyl]
polysyl = [(re.sub(r'(.*)oʊ(.*)', r'\1o\2', w)) for w in polysyl]
polysyl = [(re.sub(r'(.*)aɪ(.*)', r'\1a\2', w)) for w in polysyl]
polysyl = [(re.sub(r'(.*)aʊ(.*)', r'\1a\2', w)) for w in polysyl]
polysyl = [(re.sub(r'(.*)ɔɪ(.*)', r'\1ɔ\2', w)) for w in polysyl]

In [33]:
polysylvow = [[c for c in w if c in vowels] for w in polysyl]

In [34]:
polycount = [len(w) for w in polysylvow]
polycount[40:50]

[2, 3, 1, 3, 2, 1, 1, 2, 2, 2]

In [35]:
multsyls = [w for w in polycount if w>=3]
multsyls[:25]

[3, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 4, 4, 5, 3, 4, 4, 3, 3]

In [36]:
len(multsyls) 

70

## SMOG Results

Next step is to find the nearest perfect square to get the sqrt of that value: SQ RT of 64 is **8**

Adding 3, this gives us grade levels of 11.

### The Hand Calculations

When I counted by hand (as the test was designed to be done), I counted 66 3+ syllable words in our 30 sentences. 

It's still imperfect, but only by a difference of 4 and not an entire 30. In this instance, the nearest perfect square and, as a result, the grade level score is the same by both calculation methods. With a little finesse and investigtion into spots of inconsistency, this method could be really perfected! It's information I'm going to keep with my for future projects, for sure.

## Renewed Thoughts about Readability Scores

Really, my thoughts from my first [notebook](https://github.com/Data-Science-for-Linguists-2025/Critical-Role-Analysis/blob/39dc6f876da2527aa64be64ae34435e61daf5a31/notebooks/DND_Jargon.ipynb) are much the same. This methodology of calculating is a huge improvement, but my issues with the test in general are unchanged. The concept is great and I think it's a very cool idea. It's nice that it's fairly simple for application by non-linguists, but I believe there must be a better method for calculating readability than these tests offer. But overall, I'm glad I found these tests!