# Readability Metrics
This notebook will introduce an overview of readability metrics and how to use them in Python. Readability metrics are used to measure how easy it is to read a text. They are used in various fields, such as education, linguistics, and natural language processing. In this notebook, we will cover the following topics:

- What are readability metrics?
- How to calculate readability metrics in Python
- How to interpret the results of readability metrics
- How to use readability metrics in practice

## What are readability metrics?
Readability metrics are quantitative measures that are used to assess the readability of a text. They are used to evaluate how easy or difficult it is to read and understand a text. Readability metrics are based on various linguistic and cognitive factors, such as sentence length, word length, and vocabulary complexity.

There are many different readability metrics, each with its own formula and interpretation. Some of the most commonly used readability metrics include:
  
- **Simple Measure of Gobbledygook (SMOG)**: This metric estimates the years of formal education required to understand a text. The formula for the SMOG is:

  $1.043 \times \sqrt{\text{number of complex words} \times \frac{30}{\text{number of sentences}}} + 3.1291$
  
- **Dale-Chall Readability Score**: This metric estimates the grade level required to understand a text. The formula for the Dale-Chall Readability Score is:

  $0.1579 \times (\text{percentage of difficult words} + 0.0496 \times \text{average words per sentence})$
  
- **Spache Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the Spache Readability Formula is:

  $0.121 \times \text{average sentence length} + 0.082 \times \text{average syllables per word} - 0.659$
  
- **Linsear Write Formula**: This metric estimates the grade level required to understand a text. The formula for the Linsear Write Formula is:

  $\frac{(\text{number of easy words} + \text{number of hard words}) \times 2}{\text{number of sentences}} - 2$
  
- **FORCAST Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the FORCAST Readability Formula is:

  $20 - \frac{\text{number of syllables} \times 0.1}{\text{number of sentences}}$
  
- **Raygor Readability Estimate**: This metric estimates the grade level required to understand a text. The formula for the Raygor Readability Estimate is:

  $0.1579 \times \text{average words per sentence} + 0.0496 \times \text{percentage of difficult words} + 3.6365$
  
- **LIX Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the LIX Readability Formula is:

  $\frac{\text{number of words}}{\text{number of sentences}} + \frac{\text{number of long words} \times 100}{\text{number of words}}$
  
- **RIX Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the RIX Readability Formula is:

  $\frac{\text{number of long words}}{\text{number of sentences}}$
  
- **Strain Index**: This metric estimates the grade level required to understand a text. The formula for the Strain Index is:

  $\frac{\text{number of long words} \times 100}{\text{number of sentences}}$
  
- **Readability Consensus Grade**: This metric estimates the grade level required to understand a text. The formula for the Readability Consensus Grade is:

  $\frac{\text{Flesch-Kincaid Grade Level} + \text{Gunning Fog Index} + \text{Coleman-Liau Index} + \text{Automated Readability Index} + \text{SMOG} + \text{Dale-Chall Readability Score} + \text{Spache Readability Formula} + \text{New Dale-Chall Readability Score} + \text{Linsear Write Formula} + \text{FORCAST Readability Formula} + \text{Raygor Readability Estimate} + \text{LIX Readability Formula} + \text{RIX Readability Formula} + \text{Strain Index}}{14}$

## Notebook Setup

In [1]:
# Importing the necessary Python libraries
from whetstone.metrics.text.readability_metrics import (
    calculate_flesch_kincaid_reading_ease,
    calculate_flesch_kincaid_grade_level,
    calculate_gunning_fog_index, 
    calculate_coleman_liau_index,
    calculate_automated_readability_index,
    calculate_smog_index,
    calculate_dale_chall_readability_score,
    calculate_spache_readability_formula,
    calculate_linsear_write_formula,
    calculate_forcast_readability_formula,
    calculate_raygor_readability_estimate,
    calculate_lix_readability_score,
    calculate_rix_readability_score,
    calculate_strain_index,
    calculate_new_dale_chall_readability_score,
    calculate_readability_consensus_grade,
    calculate_all_readability_metrics
)

In [2]:
# Creating some high quality and low quality text
high_quality_text = """Creating a strong password is your first line of defense in the digital age, where cyber threats constantly evolve. Hackers are increasingly adept at exploiting vulnerabilities, making it crucial to understand the importance of robust security measures. While it may be convenient to use simple and memorable combinations like birthdays or pet names, such choices are perilously insecure. They leave your personal and financial information exposed to malicious attacks.

Think of a password as a digital fortress. To build this fortress, security experts recommend a mix of uppercase and lowercase letters, numbers, and special characters. The complexity of such a password makes it significantly more difficult for hackers to crack using brute-force methods. Additionally, the length of your password plays a pivotal role; each extra character exponentially increases the time required to breach it.

It’s not just about creating a single strong password, though. Using unique passwords for each of your accounts is essential. A single breach could otherwise grant hackers access to multiple platforms, compounding the damage.

Cybercrime costs the global economy billions of dollars annually. The effort required to create and maintain strong passwords pales in comparison to the consequences of a security failure. Just as you would invest in a sturdy lock to protect your home, you should prioritize secure passwords to safeguard your digital assets. In a world where data is increasingly valuable, this small step can make a world of difference."""



low_quality_text = """So, like, passwords are super important for, like, cyber-security and stuff. You know how there are hackers everywhere these days? They’re always trying to get into people’s accounts, which is kinda scary. It reminds me of this one time my cousin got hacked. It was so bad—like, they stole his email and even tried to get into his bank account or whatever. Anyway, this is why you’ve gotta make sure you’re using good passwords, but not something obvious like your birthday or your dog’s name. (By the way, my dog’s name is Max. He’s super cute, but yeah, don’t use that as a password.)

So, like, those IT security experts and computer nerds are always saying stuff like, “Use capitals and lowercase letters and numbers and those weird symbols on your keyboard.” You know, the ones you never use unless you’re doing something nerdy like coding or whatever? Oh, and they say to make your passwords really long. Like, the longer the better, I guess. I don’t know why, but it makes it harder for hackers, which is good.

Passwords are kinda like keys to your house, but for the internet. Or maybe it’s more like a vault or something? I dunno, but it’s super important because if a hacker gets in, they can take all your stuff. And not just money—they can steal your identity or your Instagram account, which would be awful. So yeah, make good passwords. Like, seriously."""

## Calculating Each Readability Metric


In [3]:
# Calculating the Flesch-Kincaid Reading Ease score for the high and low quality text
print('Results for Flesch-Kincaid Reading Ease:')
high_quality_flesch_kincaid_reading_ease_score = calculate_flesch_kincaid_reading_ease(high_quality_text)
low_quality_flesch_kincaid_reading_ease_score = calculate_flesch_kincaid_reading_ease(low_quality_text)
print(f'High quality text: {high_quality_flesch_kincaid_reading_ease_score}')
print(f'Low quality text: {low_quality_flesch_kincaid_reading_ease_score}')


Results for Flesch-Kincaid Reading Ease:
High quality text: [36.36]
Low quality text: [73.09]


In [4]:
# Calculating the Flesch-Kincaid Grade Level score for the high and low quality text
print('Results for Flesch-Kincaid Grade Level:')
high_quality_flesch_kincaid_grade_level_score = calculate_flesch_kincaid_grade_level(high_quality_text)
low_quality_flesch_kincaid_grade_level_score = calculate_flesch_kincaid_grade_level(low_quality_text)
print(f'High quality text: {high_quality_flesch_kincaid_grade_level_score}')
print(f'Low quality text: {low_quality_flesch_kincaid_grade_level_score}')

Results for Flesch-Kincaid Grade Level:
High quality text: [12.1]
Low quality text: [6.59]


In [5]:
# Calculating the Gunning Fog Index score for the high and low quality text
print('Results for Gunning Fog Index:')
high_quality_gunning_fog_index_score = calculate_gunning_fog_index(high_quality_text)
low_quality_gunning_fog_index_score = calculate_gunning_fog_index(low_quality_text)
print(f'High quality text: {high_quality_gunning_fog_index_score}')
print(f'Low quality text: {low_quality_gunning_fog_index_score}')

Results for Gunning Fog Index:
High quality text: 15.422296918767508
Low quality text: 8.670723684210527


In [6]:
# Calculating the Coleman-Liau Index score for the high and low quality text
print('Results for Coleman-Liau Index:')
high_quality_coleman_liau_index_score = calculate_coleman_liau_index(high_quality_text)
low_quality_coleman_liau_index_score = calculate_coleman_liau_index(low_quality_text)
print(f'High quality text: {high_quality_coleman_liau_index_score}')
print(f'Low quality text: {low_quality_coleman_liau_index_score}')

Results for Coleman-Liau Index:
High quality text: [14.03]
Low quality text: [6.63]


In [7]:
# Calculating the Automated Readability Index score for the high and low quality text
print('Results for Automated Readability Index:')
high_quality_automated_readability_index_score = calculate_automated_readability_index(high_quality_text)
low_quality_automated_readability_index_score = calculate_automated_readability_index(low_quality_text)
print(f'High quality text: {high_quality_automated_readability_index_score}')
print(f'Low quality text: {low_quality_automated_readability_index_score}')

Results for Automated Readability Index:
High quality text: [11.89]
Low quality text: [5.64]


In [8]:
# Calculating the SMOG Index score for the high and low quality text
print('Results for SMOG Index:')
high_quality_smog_index_score = calculate_smog_index(high_quality_text)
low_quality_smog_index_score = calculate_smog_index(low_quality_text)
print(f'High quality text: {high_quality_smog_index_score}')
print(f'Low quality text: {low_quality_smog_index_score}')

Results for SMOG Index:
High quality text: [13.97]
Low quality text: [9.48]


In [9]:
# Calculating the Dale-Chall Readability Score for the high and low quality text
print('Results for Dale-Chall Readability Score:')
high_quality_dale_chall_readability_score = calculate_dale_chall_readability_score(high_quality_text)
low_quality_dale_chall_readability_score = calculate_dale_chall_readability_score(low_quality_text)
print(f'High quality text: {high_quality_dale_chall_readability_score}')
print(f'Low quality text: {low_quality_dale_chall_readability_score}')

Results for Dale-Chall Readability Score:
High quality text: [3.71]
Low quality text: [1.41]


In [10]:
# Calculating the Spache Readability Formula score for the high and low quality text
print('Results for Spache Readability Formula:')
high_quality_spache_readability_formula_score = calculate_spache_readability_formula(high_quality_text)
low_quality_spache_readability_formula_score = calculate_spache_readability_formula(low_quality_text)
print(f'High quality text: {high_quality_spache_readability_formula_score}')
print(f'Low quality text: {low_quality_spache_readability_formula_score}')

Results for Spache Readability Formula:
High quality text: [1.41]
Low quality text: [1.28]


In [11]:
# Calculating the Linsear Write Formula score for the high and low quality text
print('Results for Linsear Write Formula:')
high_quality_linsear_write_formula_score = calculate_linsear_write_formula(high_quality_text)
low_quality_linsear_write_formula_score = calculate_linsear_write_formula(low_quality_text)
print(f'High quality text: {high_quality_linsear_write_formula_score}')
print(f'Low quality text: {low_quality_linsear_write_formula_score}')

Results for Linsear Write Formula:
High quality text: [29.73]
Low quality text: [28.12]


In [12]:
# Calculating the FORCAST Readability Formula score for the high and low quality text
print('Results for FORCAST Readability Formula:')
high_quality_forcast_readability_formula_score = calculate_forcast_readability_formula(high_quality_text)
low_quality_forcast_readability_formula_score = calculate_forcast_readability_formula(low_quality_text)
print(f'High quality text: {high_quality_forcast_readability_formula_score}')
print(f'Low quality text: {low_quality_forcast_readability_formula_score}')

Results for FORCAST Readability Formula:
High quality text: [17.13]
Low quality text: [17.94]


In [13]:
# Calculating the Raygor Readability Estimate score for the high and low quality text
print('Results for Raygor Readability Estimate:')
high_quality_raygor_readability_estimate_score = calculate_raygor_readability_estimate(high_quality_text)
low_quality_raygor_readability_estimate_score = calculate_raygor_readability_estimate(low_quality_text)
print(f'High quality text: {high_quality_raygor_readability_estimate_score}')
print(f'Low quality text: {low_quality_raygor_readability_estimate_score}')

Results for Raygor Readability Estimate:
High quality text: [7.27]
Low quality text: [6.42]


In [14]:
# Calculating the LIX Readability Score for the high and low quality text
print('Results for LIX Readability Score:')
high_quality_lix_readability_score = calculate_lix_readability_score(high_quality_text)
low_quality_lix_readability_score = calculate_lix_readability_score(low_quality_text)
print(f'High quality text: {high_quality_lix_readability_score}')
print(f'Low quality text: {low_quality_lix_readability_score}')

Results for LIX Readability Score:
High quality text: [51.16]
Low quality text: [29.51]


In [15]:
# Calculating the RIX Readability Score for the high and low quality text
print('Results for RIX Readability Score:')
high_quality_rix_readability_score = calculate_rix_readability_score(high_quality_text)
low_quality_rix_readability_score = calculate_rix_readability_score(low_quality_text)
print(f'High quality text: {high_quality_rix_readability_score}')
print(f'Low quality text: {low_quality_rix_readability_score}')

Results for RIX Readability Score:
High quality text: [5.6]
Low quality text: [2.18]


In [16]:
# Calculating the Strain Index score for the high and low quality text
print('Results for Strain Index:')
high_quality_strain_index_score = calculate_strain_index(high_quality_text)
low_quality_strain_index_score = calculate_strain_index(low_quality_text)
print(f'High quality text: {high_quality_strain_index_score}')
print(f'Low quality text: {low_quality_strain_index_score}')

Results for Strain Index:
High quality text: [560.0]
Low quality text: [217.65]


In [17]:
# Calculating the New Dale-Chall Readability Score for the high and low quality text
print('Results for New Dale-Chall Readability Score:')
high_quality_new_dale_chall_readability_score = calculate_new_dale_chall_readability_score(high_quality_text)
low_quality_new_dale_chall_readability_score = calculate_new_dale_chall_readability_score(low_quality_text)
print(f'High quality text: {high_quality_new_dale_chall_readability_score}')
print(f'Low quality text: {low_quality_new_dale_chall_readability_score}')

Results for New Dale-Chall Readability Score:
High quality text: [8.01]
Low quality text: [5.68]


In [18]:
# Calculating the Readability Consensus Grade for the high and low quality text
print('Results for Readability Consensus Grade:')
high_quality_readability_consensus_grade = calculate_readability_consensus_grade(high_quality_text)
low_quality_readability_consensus_grade = calculate_readability_consensus_grade(low_quality_text)
print(f'High quality text: {high_quality_readability_consensus_grade}')
print(f'Low quality text: {low_quality_readability_consensus_grade}')

Results for Readability Consensus Grade:
High quality text: [53.67]
Low quality text: [24.8]


In [19]:
# Calculating all readability metrics for the high and low quality text
print('Results for all Readability Metrics:')
high_quality_all_readability_metrics = calculate_all_readability_metrics(high_quality_text)
low_quality_all_readability_metrics = calculate_all_readability_metrics(low_quality_text)
print(f'High quality text: {high_quality_all_readability_metrics}')
print(f'Low quality text: {low_quality_all_readability_metrics}')

Results for all Readability Metrics:
High quality text: [{'flesch_kincaid_reading_ease': 36.36, 'flesch_kincaid_grade_level': 12.1, 'gunning_fog_index': 15.422296918767508, 'coleman_liau_index': 14.03, 'automated_readability_index': 11.89, 'smog_index': 13.97, 'dale_chall_readability_score': 3.71, 'spache_readability_formula': 1.41, 'new_dale_chall_readability_score': 8.01, 'linsear_write_formula': 29.73, 'forcast_readability_formula': 17.13, 'raygor_readability_estimate': 7.27, 'lix_readability_score': 51.16, 'rix_readability_score': 5.6, 'strain_index': 560.0, 'readability_consensus_grade': 53.67}]
Low quality text: [{'flesch_kincaid_reading_ease': 73.09, 'flesch_kincaid_grade_level': 6.59, 'gunning_fog_index': 8.670723684210527, 'coleman_liau_index': 6.63, 'automated_readability_index': 5.64, 'smog_index': 9.48, 'dale_chall_readability_score': 1.41, 'spache_readability_formula': 1.28, 'new_dale_chall_readability_score': 5.68, 'linsear_write_formula': 28.12, 'forcast_readability_form