# Readability Scores

There are many readability score available. Some of the them are as follows -
<ol>
    <li> Flesch-Kincaid: This score is useful when determining the grade level needed to understand a piece of text. It takes into account sentence length and word complexity, and provides a grade-level score. It can be useful for educational materials or wehn targeting a specific reading level.</li>
    <li>Coleman-Liau: This score also provides a grade-level score but places greater emphasis on the length of the text and the number of characters per word. It can be useful for determining the reading level of technical or scientific writing.</li>
    <li>Dale-Chall: This score is useful for identifying difficult words that may be challenging for readers. It uses a list of common words that are easy to understand and compares them to the text. This score is useful when creating materials for readers with limited vocabulary.</li>
    <li>SMOG: This score is similar to the Flesch-Kincaid score, but it places greater emphasis on longer words. It can be useful for determining the reading level of technical or scientific writing.</li>
    <li>Automated Readability Index: This score is useful for determining the reading level of general writing. It takes into account the length of the text and the number of words per sentence.</li>
    <li>Flesch Reading Ease: This score is useful for determining the readability of general writing. It provides a score between 0 and 100, with higher scores indicating easier readability. </li>
    <li>The Gunning fog formula: This score is useful for determining the reading level of general writing. It takes into account sentence length and the number of complex words used. </li>
    <li>Fry readability graph: This score is useful for determining the reading level of children's books. It provides a graph that shows the readability level of the text.</li>
    <li>The FORECAST formula: This score is useful for determining the reading level of technical or scientific writing. It takes into account sentence length, word complexity, and the number of syllables per word.</li>
</ol>

From this we can conclude that to check the readability score for a technical document such as Terms of Service, the following scores can be used -
<li>Flesch-Kincaid</li>
<li>Automated Readability Index</li>
<li>The Gunning fog formula</li>

We are using Flesch-Kincaid Readability Score.

Generral steps to apply the formula:-
<ul>
    <li>Select several 100-word samples throughout the text. </li>
    <li>Compute the average sentence length in words (divide the number of words by the number of sentences).</li> 
    <li>Compute the percentage of words NOT on the Dale–Chall word list of 3, 000 easy words.</li> 
    <li>Compute the respective equations</li> 
</ul> 

In [3]:
import spacy
from textstat.textstat import textstatistics
from textstat import textstat
import pandas as pd

In [4]:
def break_sentences(text):
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    return list(doc.sents)

In [5]:
def word_count(text):
    sentences = break_sentences(text)
    words = 0
    for sentence in sentences:
        words += len([token for token in sentence])
    return words

In [6]:
def sentence_count(text):
    sentences = break_sentences(text)
    return len(sentences)

In [8]:
def syllables_count(word):
    return textstatistics().syllable_count(word)

In [17]:
def flesch_kincaid(text):
    num_words = word_count(text)
    num_sentences = sentence_count(text)
    num_syllables = syllables_count(text)
    score =  0.39 * (num_words / num_sentences) + 11.8 * (num_syllables / num_words) - 15.59
    return score

In [18]:
text1 = "account termination policy youtube will terminate a user s access to the service if under appropriate circumstances the user is determined to be a repeat infringer. youtube reserves the right to decide whether content violates these terms of service for reasons other than copyright infringement such as but not limited to pornography obscenity or excessive length. youtube may at any time without prior notice and in its sole discretion remove such content and or terminate a user s account for submitting such material in violation of these terms of service."
text2 = "if you infringe copyright multiple times we close your account. if you are in violation of our community guidelines we may do that immediately."
score_text1 = flesch_kincaid(text1)
score_text2 = flesch_kincaid(text2)
print(score_text1, score_text2)

15.786021505376347 7.633846153846157


In [11]:
df = pd.read_csv('/content/all_v1_transpose.csv')
df.head()

Unnamed: 0,doc,id,original_text,reference_summary,title,uid,case_code,case_text,note,title_code,title_text,urls,tldr_code,tldr_text
0,Pokemon GO Terms of Service,5786730a6cca83a54c0035b7,welcome to the pokémon go video game services ...,hi.,,legalsum01,,,,,,,,
1,Pokemon GO Terms of Service,57866df76cca83a54c0035a1,by using our services you are agreeing to thes...,by playing this game you agree to these terms....,Agreement To Terms,legalsum02,,,,,,,,
2,Pokemon GO Terms of Service,5786730a6cca83a54c0035b6,if you want to use certain features of the ser...,you have to use google pokemon trainer club or...,Eligibility and Account Registration,legalsum03,,,,,,,,
3,Pokemon GO Terms of Service,57866df76cca83a54c0035a0,during game play please be aware of your surro...,don t die or hurt others and if you do it s no...,Safe Play,legalsum04,,,,,,,,
4,Pokemon GO Terms of Service,57866df76cca83a54c00359f,subject to your compliance with these terms ni...,don t copy modify resell distribute or reverse...,Rights in App,legalsum05,,,,,,,,


In [12]:
to_drop = ['id', 'doc', 'title', 'uid', 'case_code', 'case_text', 'note', 'title_code', 'title_text', 'urls', 'tldr_code', 'tldr_text']
df.drop(to_drop, axis = 1, inplace = True)
df.head()

Unnamed: 0,original_text,reference_summary
0,welcome to the pokémon go video game services ...,hi.
1,by using our services you are agreeing to thes...,by playing this game you agree to these terms....
2,if you want to use certain features of the ser...,you have to use google pokemon trainer club or...
3,during game play please be aware of your surro...,don t die or hurt others and if you do it s no...
4,subject to your compliance with these terms ni...,don t copy modify resell distribute or reverse...


In [13]:
df['Original Text Score'] = df['original_text'].apply(flesch_kincaid)
df['Reference Summary Score'] = df['reference_summary'].apply(flesch_kincaid)

In [14]:
df.head()

Unnamed: 0,original_text,reference_summary,Original Text Score,Reference Summary Score
0,welcome to the pokémon go video game services ...,hi.,12.749048,-8.91
1,by using our services you are agreeing to thes...,by playing this game you agree to these terms....,11.404545,2.501
2,if you want to use certain features of the ser...,you have to use google pokemon trainer club or...,15.043356,4.198571
3,during game play please be aware of your surro...,don t die or hurt others and if you do it s no...,24.990303,2.45
4,subject to your compliance with these terms ni...,don t copy modify resell distribute or reverse...,20.626128,7.773333


In [15]:
min_val = round(df['Original Text Score'].min(), 2)
max_val = round(df['Original Text Score'].max(), 2)
mean_val = round(df['Original Text Score'].mean(), 2)
print('For Oriinal Text')
print('Minimum Readability Score', min_val)
print('Maximum Readability Score', max_val)
print('Mean for Readability Score', mean_val)

For Oriinal Text
Minimum Readability Score -0.86
Maximum Readability Score 69.52
Mean for Readability Score 14.57


In [16]:
min_val = round(df['Reference Summary Score'].min(), 2)
max_val = round(df['Reference Summary Score'].max(), 2)
mean_val = round(df['Reference Summary Score'].mean(), 2)
print('For Reference Summary')
print('Minimum Readability Score', min_val)
print('Maximum Readability Score', max_val)
print('Mean for Readability Score', mean_val)

For Reference Summary
Minimum Readability Score -8.91
Maximum Readability Score 22.79
Mean for Readability Score 6.38
