# Readability Metrics
This notebook will introduce an overview of readability metrics and how to use them in Python. Readability metrics are used to measure how easy it is to read a text. They are used in various fields, such as education, linguistics, and natural language processing. In this notebook, we will cover the following topics:
- What are readability metrics?
- How to calculate readability metrics in Python
- How to interpret the results of readability metrics
- How to use readability metrics in practice

## What are readability metrics?
Readability metrics are quantitative measures that are used to assess the readability of a text. They are used to evaluate how easy or difficult it is to read and understand a text. Readability metrics are based on various linguistic and cognitive factors, such as sentence length, word length, and vocabulary complexity.

There are many different readability metrics, each with its own formula and interpretation. Some of the most commonly used readability metrics include:

- **Flesch-Kincaid Grade Level**: This metric estimates the grade level required to understand a text. The formula for the Flesch-Kincaid Grade Level is:

  $0.39 \times \text{average words per sentence} + 11.8 \times \text{average syllables per word} - 15.59$
  
- **Gunning Fog Index**: This metric estimates the years of formal education required to understand a text. The formula for the Gunning Fog Index is:

  $0.4 \times (\text{average words per sentence} + 100 \times \text{percentage of complex words})$
  
- **Coleman-Liau Index**: This metric estimates the grade level required to understand a text. The formula for the Coleman-Liau Index is:

  $0.0588 \times \text{average letters per 100 words} - 0.296 \times \text{average sentences per 100 words} - 15.8$
  
- **Automated Readability Index (ARI)**: This metric estimates the grade level required to understand a text. The formula for the ARI is:

  $4.71 \times \text{average characters per word} + 0.5 \times \text{average words per sentence} - 21.43$
  
- **Simple Measure of Gobbledygook (SMOG)**: This metric estimates the years of formal education required to understand a text. The formula for the SMOG is:

  $1.043 \times \sqrt{\text{number of complex words} \times \frac{30}{\text{number of sentences}}} + 3.1291$
  
- **Dale-Chall Readability Score**: This metric estimates the grade level required to understand a text. The formula for the Dale-Chall Readability Score is:

  $0.1579 \times (\text{percentage of difficult words} + 0.0496 \times \text{average words per sentence})$
  
- **Spache Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the Spache Readability Formula is:

  $0.121 \times \text{average sentence length} + 0.082 \times \text{average syllables per word} - 0.659$
  
- **Linsear Write Formula**: This metric estimates the grade level required to understand a text. The formula for the Linsear Write Formula is:

  $\frac{(\text{number of easy words} + \text{number of hard words}) \times 2}{\text{number of sentences}} - 2$
  
- **FORCAST Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the FORCAST Readability Formula is:

  $20 - \frac{\text{number of syllables} \times 0.1}{\text{number of sentences}}$
  
- **Raygor Readability Estimate**: This metric estimates the grade level required to understand a text. The formula for the Raygor Readability Estimate is:

  $0.1579 \times \text{average words per sentence} + 0.0496 \times \text{percentage of difficult words} + 3.6365$
  
- **LIX Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the LIX Readability Formula is:

  $\frac{\text{number of words}}{\text{number of sentences}} + \frac{\text{number of long words} \times 100}{\text{number of words}}$
  
- **RIX Readability Formula**: This metric estimates the grade level required to understand a text. The formula for the RIX Readability Formula is:

  $\frac{\text{number of long words}}{\text{number of sentences}}$
  
- **Strain Index**: This metric estimates the grade level required to understand a text. The formula for the Strain Index is:

  $\frac{\text{number of long words} \times 100}{\text{number of sentences}}$
  
- **Readability Consensus Grade**: This metric estimates the grade level required to understand a text. The formula for the Readability Consensus Grade is:

  $\frac{\text{Flesch-Kincaid Grade Level} + \text{Gunning Fog Index} + \text{Coleman-Liau Index} + \text{Automated Readability Index} + \text{SMOG} + \text{Dale-Chall Readability Score} + \text{Spache Readability Formula} + \text{New Dale-Chall Readability Score} + \text{Linsear Write Formula} + \text{FORCAST Readability Formula} + \text{Raygor Readability Estimate} + \text{LIX Readability Formula} + \text{RIX Readability Formula} + \text{Strain Index}}{14}$

## Notebook Setup

In [1]:
# Importing the necessary Python libraries
from whetstone.metrics.text.readability_metrics import calculate_flesch_kincaid_reading_ease, calculate_flesch_kincaid_grade_level

In [2]:
# Creating some high quality and low quality text
high_quality_text = '''Creating a strong password serves as your first line of defense in the digital world, where cyber threats constantly evolve and become more sophisticated. While it might be tempting to use simple, memorable combinations like birthdays or pet names, such choices leave your sensitive information vulnerable to malicious actors. A robust password acts as a virtual fortress, protecting everything from financial data to personal communications.
Security experts recommend using passwords that combine uppercase and lowercase letters, numbers, and special characters, creating a complex sequence that’s significantly harder to breach. The length of your password matters considerably; each additional character exponentially increases the time required for potential hackers to crack it through brute-force methods. Furthermore, using unique passwords for different accounts prevents a single security breach from compromising multiple services.
Consider your password as a digital key to your personal and professional life. Just as you wouldn’t use a flimsy lock to protect valuable physical possessions, you shouldn’t rely on weak passwords to secure your digital assets. With cybercrime causing billions in damages annually, the small effort required to create and maintain strong passwords pales in comparison to the potential consequences of a security breach.'''

low_quality_text = '''so like passwords are super important for your cyber-security and stuff because theres hackers everywhere these days trying to get into your accounts which reminds me of this one time my cousin got hacked and it was really bad but anyway you gotta make sure your using good passwords but not like birthday’s or your dogs name (I have a really cute dog named Max) because those are really easy to crack with modern encryption algorithms and stuff but some people still use them which is crazy lol.
the IT security experts and computer nerds say your supposed to use capitals and lowercase and numbers and those weird symbols above the numbers on your keyboard that nobody ever uses except in PassWords and coding which is really complicated and i tried to learn it once but gave up anyway you should make your password really long because that makes it more secure and stuff and dont use the same password for everything because if a Hacker gets one password theyll get everything which would be terrible.
its kinda like having a key to your house except its for the internet and cyber space and the world wide web which are all different things i think but im not sure exactly how but anyway you should definately make good passwords because otherwise hackers will steal your identity and money and stuff and noone wants that to happen to them because its really annoying and bad.'''

## Calculating Each Readability Metric


### Flesch Reading Ease
This metric measures the readability of a text on a scale from 0 to 100, with higher scores indicating easier readability. The formula for the Flesch Reading Ease score is:

  $206.835 - 1.015 \times \text{average words per sentence} - 84.6 \times \text{average syllables per word}$

This score may be interpreted using the table below:

| Score  | Reading Level       | Description           |
|--------|---------------------|-----------------------|
| 90-100 | 5th grade           | Very easy to read     |
| 80-89  | 6th grade           | Easy to read          |
| 70-79  | 7th grade           | Fairly easy           |
| 60-69  | 8th-9th grade       | Plain English         |
| 50-59  | 10th-12th grade     | Fairly difficult      |
| 30-49  | College             | Difficult             |
| 0-29   | College graduate    | Very difficult        |

Generally speaking, writers should aim for a score of 60 or higher, which indicates that the text is easily understood by most adults. Flesch Reading Ease is widely used in the field of education and is often used to evaluate the readability of textbooks and other educational materials.

In [6]:
text = "I’m looking to do a sentence tokenization using Python; however, my company has banned us from downloading models, which prevents things like nltk or spacy that requires a model to be downloaded to perform its sentence tokenization. What do you think I should do?"

# Example usage
print('Results for Flesch-Kincaid Reading Ease:')
high_quality_flesch_kincaid_reading_ease_score = calculate_flesch_kincaid_reading_ease(high_quality_text)
low_quality_flesch_kincaid_reading_ease_score = calculate_flesch_kincaid_reading_ease(low_quality_text)
print(f'High quality text: {high_quality_flesch_kincaid_reading_ease_score}')
print(f'Low quality text: {low_quality_flesch_kincaid_reading_ease_score}')

print('\nResults for Flesch-Kincaid Grade Level:')
high_quality_flesch_kincaid_grade_level_score = calculate_flesch_kincaid_grade_level(high_quality_text)
low_quality_flesch_kincaid_grade_level_score = calculate_flesch_kincaid_grade_level(low_quality_text)
print(f'High quality text: {high_quality_flesch_kincaid_grade_level_score}')
print(f'Low quality text: {low_quality_flesch_kincaid_grade_level_score}')


Results for Flesch-Kincaid Reading Ease:
High quality text: 17.5
Low quality text: -2.27

Results for Flesch-Kincaid Grade Level:
High quality text: 16.17
Low quality text: 33.78
