# Choosing the Best Readability Metric for Dutch: A Data-Driven Approach
Imagine picking up a children's fairy tale, only to find it written in the same dense, academic language as a graduation thesis. Or, conversely, trying to understand a scientific article written with the simplicity of a bedtime story. The mismatch between a text's complexity and its intended audience can make even the most important information inacessible--or worse, entirely misunderstood.

This is where readability comes in. Readability is the measure of how easy or difficult a text is to read and understand. It's not just about the words on the page; it's about ensuring that the text meets the needs of its audience. Whether it is a fairy tale for a child, a novel for a casual reader, or a technical report for an expert, readability ensures that the message is clear, engaging and appropriate for the reader's level of understanding.

But how do we measure readability? this is where readability metrics come into play. These metrics are tools that analyze factors like sentence length, word complexity, and syllable count to assign a score to a text. They help writers, educators, and content creators tailor their work to the right audience--whether that is simplifying a government letter for non-native speakers or ensuring a children's book is engaging and easy to follow.

In this blog post, we'll explore the world of readability metrics, focusing on their application on Dutch texts. Dutch, with its compound words and unique sentence structures, presents unique challenges for readability assessment. We'll examine which different metrics are available, if they are applicable and interpretable for Dutch and assess how they perform on a variety of texts, to assess which metric is best used for the Dutch language. By the end, you'll have a clear understanding of what readability metrics are, their strengths and weaknesses and which metric is best applicable to the dutch language.

## The many types of readability metrics<br>
Over the years, linguists, educators, and researchers have developed a wide range of metrics to measure readability, each with its own unique approach. Some focus on sentence length, others on word complexity, and some even take into account the frequency of certain words or syllables. The diversity of these metrics reflects the complexity of human language and the many factors that influence how easy or difficult a text is to read.<br>
Readability metrics are not one-size-fits-all. Studies show that metrics designed for english often do not translate well to other languages [[1](https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/8b3139ea-3fd2-4e41-806d-27a6d616505b/content)]. Dutch for example, has long compound words and flexible sentence structures, which require a different approach to readability assessment. Some metrics such as SMOG, Gunning God, ARI, and Dale-Chall, were specifically developed for English and do not account for Dutch linguistic characteristics. Additionally, these metrics use American grade levels, making their scores meaningless for Dutch readability assessment.<br>
<br>
A more suitable set of metrics for Dutch readability consists of Flesch-Douma, LIX, Lexical Density and CILT. Below we examine these four metrics in detail.

#### **1. Flesch-Douma**
The Flesch reading ease equation is one of the most well-known readability equations for English texts. It evaluates readability by considering the average sentence length (ASL)(Average amount of words per sentence) and the average word complexity (AWL)(average word length in syllables).

$$Score = 206.835 - (1.015 \times ASL) - (84.6 \times AWL)$$
Source: [[2](https://api.semanticscholar.org/CorpusID:39344661)]

However, because Dutch has different syllabic structures and longer compound words, the W.H. Douma created a variation, which adjusted the variables in the Flesch reading ease equation.<br>
$$Score = 206.835 - (0.93 \times ASL) - (77 \times AWL)$$
Source:[[3](https://lirias.kuleuven.be/retrieve/553891)] [[4](https://lint.hum.uu.nl/assets/kleijn-2018.pdf)]<br>

Although the original reading ease equation is not 1 to 1 transferable from English to Dutch, the adjusted version made by Douma allows us to use the equation for Dutch texts. However, it might still struggle with long compound words such as "arbeidsongeschiktheid".<br>

The score itself on different text tends to range between 0 and 100, but can go below 0 [[5](https://web.archive.org/web/20160712094308/http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml)]. Where 0 indicates very difficult texts and 100 very easy. This range will be used as a base for the other metrics as well.<br>

#### **2. LIX (Lesbarhetsindex)**
LIX was developed in Sweden by Carl-Hugo Björnsson to measure the complexity of european languages. It is a sentence-length-based readability formula that focuses on the number of words with more than 6 letters as a proxy for difficulty. Important to note with the sentences is that a sentence is defined by a period or colon. The equation for LIX is as follows:

$$\textit{LIX} = \frac{\text{Total words}}{\text{Total sentences}} + \frac{100 \times \text{Long words}}{\text{Total words}}$$
Source: [[6](https://books.google.nl/books?id=QMiXugEACAAJ)]

Similar to the Flesch-Douma score, LIX difficulty might get artificially inflated due to long compound words. 

Score interpretation: According to [[7](https://aclanthology.org/2024.lrec-main.558)] the LIX score ranges between 20-60. with 20 being very easy and 60 being very difficult. To then make this comparable to the Flesch-Douma metric, the score is scaled based on this minimum and maximum, multiply it by 100 and subtract the result from 100 to invert the metric:

$$\textit{LIX scaled} = 100 - \frac{LIX - 20}{60-20} \times 100$$


This ensures LIX also indicates its scores between 0 and 100 with 0 being difficult and 100 being easy.

#### **3. Lexical Density**
Lexical density is a measure of vocabulary richness, which introduced to quantify how much information is packed into a text. It calculates the ratio of content words (nouns, verbs, adjectives, and adverbs) to total words, providing an indicator of how difficult a text is. Unlike other readability metrics, lexical density does not account for sentence structure or word length. It purely looks at vocabulary richness.
Equation:

$$\textit{Lexical density} = \frac{\text{Content words}}{\text{Total words}} \times 100$$
Source: [8]

This metric works well for larger texts, but less so for single sentences. A short greeting like "Good morning" gets a lexical density of 100, extremely difficult, because "Good" is an adjjective and "morning" is a noun. As such singular sentences could perform much better or much words.

Score interpretation: As lexical density is already on a scale of 0 to 100 there is no need for any scaling. However, I will invert the score as 100 indicates the text is lexically dense and thus difficult.


#### **4. CILT (Cito Index Lees Techniek)**
CILT is a Dutch-specific readability metric. Besides looking at the average world length (in letters), CILT also compares the text to a word frequency list to determine the difficulty. This word frequency list is a list of words that the average Dutch reader most frequently uses [[9](https://ris.utwente.nl/ws/portalfiles/portal/6765858/Staphorsius97indexering.pdf)].

$$CILT = 114.49 +0.28 \times FREQ - 12.33 \times ASL$$
Source: [[9](https://ris.utwente.nl/ws/portalfiles/portal/6765858/Staphorsius97indexering.pdf)]

This metric would have been perfect for Dutch readability, had the wordlist not been kept secret by CITO. I had asked them for access but was refused as their entire business model is based on said wordlist.


So, we will be looking at 3 different readability metrics. Flesch-Douma, LIX, and lexical density.

## Practical application
To apply the readability metrics discusse--Flesch-Douma, Lix, and Lexical density--we'll walk through the process of implementing them in Python. This will allow us to analyze Dutch texts and determine their readability scores. Below, we'll utline the steps required to calculate each metric, from preprocessing the text to computing the final scores.

Before diving into the calculations, we need to set up our environment by importing the necessary libraries and rools. We'll use regulax expressions (regex) to identify sentences and words, and the SpaCy library and models to classify words as content words (nouns, verbs, adjectives, and adverbs) for the Lexical density metric. Additionally we will clone the [repository](https://github.com/Hellsice/Readability-metrics-blogpost) I created to help identify syllables.
So first things first, clone the repository to the desired folder using:

In [None]:
git clone https://github.com/Hellsice/Readability-metrics-blogpost

This will give you the folder called "dependencies", which contains all files and code needed for syllable detection.
Now that this is set up, we start with importing all required libraries.

In [None]:
import re
from dependencies.word import Word
import spacy
from spacy.cli import download

- Regex will help with detecting sentences and words
- Word is used to count syllables
- SpaCy models are used to classify words.<br>
<br>
Once these are all imported you can download the required SpaCy model using:

In [None]:
download('nl_core_news_sm')

### **Text preprocessing**
The first step in calculating readability metrics is to split the text into individual sentences. However, splitting text into sentences isn't always straightforward, especially in Dutch, where abbreviations, prefixes, and suffixes complicate the process.<br>
To handle this, we define a function that uses regular expressions (regex, re library) to identify sentence boundaries. The function accounts for common prefixes, suffixes, abbreviations and edge cases.

In [None]:
alphabets= "([A-Za-z])"
prefixes = "(Dhr|Mevr|Dr|Prof)[.]"
suffixes = "(B.V|N.V|Jr|Sr|Co)"
starters = "(Hij\s|Zij\s|Het\s|Wij\s|Jullie\s|Zij\s|Hun\s|Onze\s|Maar\s|Echter\s|Dat\s|Dit\s|Waar\s|Omdat\s|Als\s|Wanneer\s)"
acronyms = "([A-Z][.][A-Z][.](?:[A-Z][.])?)"
websites = "[.](nl|be|com|net|org|io|gov|edu|me)"
digits = "([0-9])"
multiple_dots = r'\.{2,}'

def split_into_sentences(text: str, metrix='lix') -> list[str]:
    """
    Split the text into sentences.

    If the text contains substrings "<prd>" or "<stop>", they would lead 
    to incorrect splitting because they are used as markers for splitting.

    :param text: text to be split into sentences
    :type text: str

    :return: list of sentences
    :rtype: list[str]
    """
    text = " " + text + "  "
    text = text.replace("\n"," ")
    text = re.sub(prefixes,"\\1<prd>",text)
    text = re.sub(websites,"<prd>\\1",text)
    text = re.sub(digits + "[.]" + digits,"\\1<prd>\\2",text)
    text = re.sub(multiple_dots, lambda match: "<prd>" * len(match.group(0)) + "<stop>", text)
    if "Ph.D" in text: text = text.replace("Ph.D.","Ph<prd>D<prd>")
    text = re.sub("\s" + alphabets + "[.] "," \\1<prd> ",text)
    text = re.sub(acronyms+" "+starters,"\\1<stop> \\2",text)
    text = re.sub(alphabets + "[.]" + alphabets + "[.]" + alphabets + "[.]","\\1<prd>\\2<prd>\\3<prd>",text)
    text = re.sub(alphabets + "[.]" + alphabets + "[.]","\\1<prd>\\2<prd>",text)
    text = re.sub(" "+suffixes+"[.] "+starters," \\1<stop> \\2",text)
    text = re.sub(" "+suffixes+"[.]"," \\1<prd>",text)
    text = re.sub(" " + alphabets + "[.]"," \\1<prd>",text)
    if "”" in text: text = text.replace(".”","”.")
    if "\"" in text: text = text.replace(".\"","\".")
    if "!" in text: text = text.replace("!\"","\"!")
    if "?" in text: text = text.replace("?\"","\"?")
    text = text.replace(".",".<stop>")
    text = text.replace("?","?<stop>")
    text = text.replace("!","!<stop>")
    text = text.replace("<prd>",".")
    if metrix == 'lix':
        text = text.replace(':',':<stop>')
    sentences = text.split("<stop>")
    sentences = [s.strip() for s in sentences]
    if sentences and not sentences[-1]: sentences = sentences[:-1]
    return sentences

The next step is to get a function to extract all individual words, this is done using regex where we look for word boundaries and count everything between a word boundary as a word.

In [None]:
def get_words(text):
    words = re.findall(r'\b\w+\b', text)
    return words

The last preprocessing step needed before any of the metrics can be calculated is the function to get the syllable count. For this we use the dependencies folder that can is found on the [github repository](https://github.com/Hellsice/Readability-metrics-blogpost).
This tool breaks down a word into its syllables, which we then split to count how many syllables there are.

In [None]:
def get_syllable_count(sentences):
    total_syllables = 0
    for sentence in sentences:
        input_words = get_words(sentence)
        for input_word in input_words:
            word = Word(input_word)
            syllables = word.get_split_word()
            total_syllables += len(syllables.split('-'))
    return total_syllables

### **Metrics**
With all functions for text preprocessing implemented, we can calculate the metrics. The first metric is the Flesch-Douma score, where we use the average sentence length and average word length in syllables to determine readability.<br>
For this we call all functions we created earlier and calculate the average sentence length and average word length in syllables. With this we determine the Flesch-Douma score. Lastly, we adjust the score to always be within 0 to 100, as are the boundaries established by Rudolf Flesch himself [[5](https://web.archive.org/web/20160712094308/http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml)].

In [None]:
def flesch_douma(text):
    sentences = split_into_sentences(text)
    words = get_words(text)
    syllable_count = get_syllable_count(sentences)
    
    avg_sentence_length = len(words)/len(sentences) if sentences else 0
    avg_syl = syllable_count/len(words) if words else 0

    score = 206.835 - (0.93 * avg_sentence_length) - (77 * avg_syl)
    score = max(0, score)
    score = min(100, score)
    return score

Next we implement the LIX score, which focuses on sentence length and the proportion of long words (words with more than 6 letters). Just like the Flesch-Douma metric, we first request all sentences but have them also be split at each colon, and extract all words. We then check how many words have more than 6 letters. And with these three values we calculate the LIX score. The score is then scaled according to the boundaries established by [[7](https://aclanthology.org/2024.lrec-main.558)] and inverted to represent 0 being difficult and 100 being easy.

In [None]:
def lix(text):
    sentences = split_into_sentences(text, 'lix')
    words = get_words(text)

    total_words = len(words)
    total_sentences = len(sentences)
    long_words = sum(1 for word in words if len(word) > 6)

    lix_score = (total_words / total_sentences) + (100 * long_words / total_words) if total_words > 0 and total_sentences > 0 else 0
    lix_score = max(20, lix_score)
    lix_score = min(60, lix_score)
    scaled = (lix_score-20)/(60-20)*100
    return 100 - scaled

Lastly we implement the Lexical density using the SpaCy model we downloaded earlier. With this model we can determine which words are nouns, verbs, adverbs, or adjectives and with that count the number of lexical words present in a text. Also here is the score inverted to have 0 represent difficult text and 100 represent easy texts.

In [None]:
def lexical_density(text):
    nlp = spacy.load("nl_core_news_sm")
    doc = nlp(text)

    lexical_words = [word for word in doc if word.pos_ in {'NOUN', 'VERB', 'ADJ', 'ADV'}]
    total_words = len(get_words(text))
    if total_words == 0:
        return 0
    density = (len(lexical_words) / total_words) * 100
    return 100 - density

### **Application to Texts**
With all three metrics implemented, we can now apply them to various Dutch texts to assess their readability. By running these functions on different types of texts--ranging from children's books to technical reports--we can gain valuable insights into how well each metric performs, what some weaknesses might be and which metric is best suited for Dutch readability assessment.
The used Dutch texts included:
- News Articles: 3 from De Volkskrant, 3 from NRC, and 3 from NOS.
- Children's Literature: 5 fairy tales or children short stories, and 3 books aimed at young readers aged 7-10.
- Novels: 4 full-lenght novels.
- Academic Texts: 7 research papers.
- Informational Texts: 5 Wikipedia articles.

All texts and files used in this analysis are available on the [github repository](https://github.com/Hellsice/Readability-metrics-blogpost), allowing readers to explore the data and replicate the results.

Applying all metrics to the tests and plotting them next to each other shows the following result:<br>
![image.png](attachment:image.png)

From the results shown in the figure, it is clear that LIX provides the most nuanced and interpretable range of scores for Dutch texts. It effectively differentiates between various text categories, from simple children’s stories to complex research papers, making it highly reliable for readability assessment.

While Flesch-Douma comes close in performance, its results are more compact, offering less differentiation between text types. This makes it a good secondary option but falls short of the nuance provided by LIX.

On the other hand, Lexical Density proves to be unsuitable for Dutch readability assessment. The results show almost no distinction between text types and no distinction between difficulty, making it ineffective as a standalone metric.

### References
[1] P. L. Carrell, “Readability in ESL”. In: Reading in a Foreign Language 4.1 (1987), pp. 21–40. Available: [https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/8b3139ea-3fd2-4e41-806d-27a6d616505b/content](https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/8b3139ea-3fd2-4e41-806d-27a6d616505b/content)<br>
[2] R. F. Flesch, “A new readability yardstick.” In: The Journal of applied psychology 32 3 (1948), pp. 221–33. Available: [https://api.semanticscholar.org/CorpusID:39344661](https://api.semanticscholar.org/CorpusID:39344661).<br>
[3] V. Vandeghinste and B. Bulté, “Linguistic Proxies of Readability: Comparing Easy-to-Read and Regular Newspaper Dutch”. Available: [https://lirias.kuleuven.be/retrieve/553891](https://lirias.kuleuven.be/retrieve/553891)<br>
[4] S. Kleijn, "Clozing in on readability: How linguistic features affect and predict text comprehension and on-line processing". *Utrecht Institute of Linguistics*, 2018 [online]. Available: [https://lint.hum.uu.nl/assets/kleijn-2018.pdf](https://lint.hum.uu.nl/assets/kleijn-2018.pdf)<br>
[5] R. Flesch, "How to write plain english". Accessed via the Internet Archive on December 16, 2024. [Online]. Available: [https://web.archive.org/web/20160712094308/http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml](https://web.archive.org/web/20160712094308/http://www.mang.canterbury.ac.nz/writing_guide/writing/flesch.shtml).<br>
[6] C.H. Björnsson. "Läsbarhet: Pedagogiskt Utvecklingsar-bete vid Stockholms Skolor. 6". Liber; [Solna, Seelig],1968. URL: [https://books.google.nl/books?id=QMiXugEACAAJ](https://books.google.nl/books?id=QMiXugEACAAJ).<br>
[7] S. Wold, P. Mæhlum, and O. Hove, “Estimating lexical complexity from document-level distributions,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), N. Calzolari, M. - Y. Kan, V. Hoste, A. Lenci, S. Sakti, and N. Xue, Eds., Torino, Italia: ELRA and ICCL, May 2024, pp. 6309–6318. [Online]. Available: [https://aclanthology.org/2024.lrec-main.558](https://aclanthology.org/2024.lrec-main.558)<br>
[8] J. Ure. “Lexical Density and Register Differentiation”. In: Applications of Linguistics. Ed. by G. Perren and J. L. M. Trim. Cambridge: Cambridge University Press, 1971, pp. 443–452. Available: [https://aclanthology.org/2024.lrec-main.558/](https://aclanthology.org/2024.lrec-main.558/)<br>
[9] G. Staphorsius and N.D. Verhelst. “Indexering van de leestechniek”. Dutch. In: Pedagogische studi¨en 74.3 (1997), pp. 154–164. ISSN: 0165-0645. URL: [https://ris.utwente.nl/ws/portalfiles/portal/6765858/Staphorsius97indexering.pdf](https://ris.utwente.nl/ws/portalfiles/portal/6765858/Staphorsius97indexering.pdf)<br>