# Readability Scores

There are many readability score available. Some of the them are as follows -
<ol>
    <li> Flesch-Kincaid: This score is useful when determining the grade level needed to understand a piece of text. It takes into account sentence length and word complexity, and provides a grade-level score. It can be useful for educational materials or wehn targeting a specific reading level.</li>
    <li>Coleman-Liau: This score also provides a grade-level score but places greater emphasis on the length of the text and the number of characters per word. It can be useful for determining the reading level of technical or scientific writing.</li>
    <li>Dale-Chall: This score is useful for identifying difficult words that may be challenging for readers. It uses a list of common words that are easy to understand and compares them to the text. This score is useful when creating materials for readers with limited vocabulary.</li>
    <li>SMOG: This score is similar to the Flesch-Kincaid score, but it places greater emphasis on longer words. It can be useful for determining the reading level of technical or scientific writing.</li>
    <li>Automated Readability Index: This score is useful for determining the reading level of general writing. It takes into account the length of the text and the number of words per sentence.</li>
    <li>Flesch Reading Ease: This score is useful for determining the readability of general writing. It provides a score between 0 and 100, with higher scores indicating easier readability. </li>
    <li>The Gunning fog formula: This score is useful for determining the reading level of general writing. It takes into account sentence length and the number of complex words used. </li>
    <li>Fry readability graph: This score is useful for determining the reading level of children's books. It provides a graph that shows the readability level of the text.</li>
    <li>The FORECAST formula: This score is useful for determining the reading level of technical or scientific writing. It takes into account sentence length, word complexity, and the number of syllables per word.</li>
</ol>

From this we can conclude that to check the readability score for a technical document such as Terms of Service, the following scores can be used -
<li>Flesch-Kincaid</li>
<li>Automated Readability Index</li>
<li>The Gunning fog formula</li>

We are using Flesch-Kincaid Readability Score.

Generral steps to apply the formula:-
<ul>
    <li>Select several 100-word samples throughout the text. </li>
    <li>Compute the average sentence length in words (divide the number of words by the number of sentences).</li> 
    <li>Compute the percentage of words NOT on the Dale–Chall word list of 3, 000 easy words.</li> 
    <li>Compute the respective equations</li> 
</ul> 

In [1]:
!pip install textstat

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting textstat
  Downloading textstat-0.7.3-py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyphen
  Downloading pyphen-0.14.0-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyphen, textstat
Successfully installed pyphen-0.14.0 textstat-0.7.3


In [2]:
import spacy
from textstat.textstat import textstatistics
from textstat import textstat
import pandas as pd

In [3]:
def break_sentences(text):
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    return list(doc.sents)

In [4]:
def word_count(text):
    sentences = break_sentences(text)
    words = 0
    for sentence in sentences:
        words += len([token for token in sentence])
    return words

In [5]:
def sentence_count(text):
    sentences = break_sentences(text)
    return len(sentences)

In [6]:
def syllables_count(word):
    return textstatistics().syllable_count(word)

In [7]:
def flesch_kincaid(text):
    num_words = word_count(text)
    num_sentences = sentence_count(text)
    num_syllables = syllables_count(text)
    score =  0.39 * (num_words / num_sentences) + 11.8 * (num_syllables / num_words) - 15.59
    return score

In [8]:
text1 = "account termination policy youtube will terminate a user s access to the service if under appropriate circumstances the user is determined to be a repeat infringer. youtube reserves the right to decide whether content violates these terms of service for reasons other than copyright infringement such as but not limited to pornography obscenity or excessive length. youtube may at any time without prior notice and in its sole discretion remove such content and or terminate a user s account for submitting such material in violation of these terms of service."
text2 = "if you infringe copyright multiple times we close your account. if you are in violation of our community guidelines we may do that immediately."
score_text1 = flesch_kincaid(text1)
score_text2 = flesch_kincaid(text2)
print(score_text1, score_text2)

15.786021505376347 7.633846153846157


In [21]:
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/billsum_summarization.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,original text,generated_summary
0,0,welcome to the pokémon go video game services ...,you can delete your account from this service ...
1,1,by using our services you are agreeing to thes...,you agree to the terms and the privacy policy ...
2,2,if you want to use certain features of the ser...,you can create an account with the service if ...
3,3,during game play please be aware of your surro...,the service reserves the right to hold you lia...
4,4,subject to your compliance with these terms ni...,niantic grants you a limited nonexclusive nont...


In [22]:
to_drop = ['Unnamed: 0']
df.drop(to_drop, axis = 1, inplace = True)
df.head()

Unnamed: 0,original text,generated_summary
0,welcome to the pokémon go video game services ...,you can delete your account from this service ...
1,by using our services you are agreeing to thes...,you agree to the terms and the privacy policy ...
2,if you want to use certain features of the ser...,you can create an account with the service if ...
3,during game play please be aware of your surro...,the service reserves the right to hold you lia...
4,subject to your compliance with these terms ni...,niantic grants you a limited nonexclusive nont...


In [23]:
df['Original Text Score'] = df['original text'].apply(flesch_kincaid)
df['Generated Text Score'] = df['generated_summary'].apply(flesch_kincaid)
# df['Reference Summary Score'] = df['reference_summary'].apply(flesch_kincaid)

In [24]:
df.head()

Unnamed: 0,original text,generated_summary,Original Text Score,Generated Text Score
0,welcome to the pokémon go video game services ...,you can delete your account from this service ...,12.749048,9.264
1,by using our services you are agreeing to thes...,you agree to the terms and the privacy policy ...,11.404545,19.570588
2,if you want to use certain features of the ser...,you can create an account with the service if ...,15.043356,10.329906
3,during game play please be aware of your surro...,the service reserves the right to hold you lia...,24.990303,10.891852
4,subject to your compliance with these terms ni...,niantic grants you a limited nonexclusive nont...,20.626128,18.285426


In [56]:
min_val = round(df['Original Text Score'].min(), 2)
max_val = round(df['Original Text Score'].max(), 2)
mean_val = round(df['Original Text Score'].mean(), 2)
median_val = round(df['Original Text Score'].median(), 2)

print('For Original Text')
print('Minimum Readability Score', min_val)
print('Maximum Readability Score', max_val)
print('Mean for Readability Score', mean_val)
print('Median for Readability Score', median_val)


For Original Text
Minimum Readability Score 4.0
Maximum Readability Score 30.0
Mean for Readability Score 14.1
Median for Readability Score 12.9


In [57]:
min_val = round(df['Generated Text Score'].min(), 2)
max_val = round(df['Generated Text Score'].max(), 2)
mean_val = round(df['Generated Text Score'].mean(), 2)
median_val = round(df['Generated Text Score'].median(), 2)
median_val = round(df['Generated Text Score'].median(), 2)

print('For Generated Text')
print('Minimum Readability Score', min_val)
print('Maximum Readability Score', max_val)
print('Mean for Readability Score', mean_val)
print('Median for Readability Score', median_val)



For Generated Text
Minimum Readability Score 4.17
Maximum Readability Score 30.0
Mean for Readability Score 15.49
Median for Readability Score 13.22


In [58]:
original = round(df['Original Text Score'].max(), 2)
generated = round(df['Generated Text Score'].max(), 2)

In [59]:
print("Readability Score of Original Text: ", original)
print("Readability Score of Generated Text: ", generated)

Readability Score of Original Text:  69.52
Readability Score of Generated Text:  30.81


In [None]:
min_val = round(df['Reference Summary Score'].min(), 2)
max_val = round(df['Reference Summary Score'].max(), 2)
mean_val = round(df['Reference Summary Score'].mean(), 2)
median_val = round(df['Reference Summary Score'].median(), 2)
print('For Reference Summary')
print('Minimum Readability Score', min_val)
print('Maximum Readability Score', max_val)
print('Mean for Readability Score', mean_val)
print('Median for Readability Score', median_val)

