# Experimental feature: Detecting grammar errors in poems and assigning a penalty score
Philipp, March 28th 2022

There is a tool called LanguageServer, which is a grammar checker implemented in Java. It's used for example in the OpenOffice suite. language_tool_python (https://github.com/jxmorris12/language_tool_python) is a wrapper around that tool. It can be installed and imported, and starts a local LanguageTool server instance when instantiated.

I implemented a couple of functions that check lines of a poem for errors. Some matches represent trivial errors, e.g. lower case line beginning or lower case i (should this maybe be an actual error?), so I added some functionality to filter those matches out.

I also propose to have some kind of error-to-penalty mapping where different grammar errors get assigned different weights so that we can have poems that are "more wrong" than others, but maybe just filtering out erroneous poems will be enough. 

See below for examples.

In [51]:
pip install --upgrade language_tool_python



In [52]:
import language_tool_python
from collections import defaultdict
tool = language_tool_python.LanguageTool('en-US')

In [53]:
poems_raw = """
I'd rather watch the clouds in the sky
though don't climb to a field, and i'll try
make the sky always clear
because nobody's here
you'll say i'll not fly off the eye

I'd rather watch the clouds in the sky
but ignore it, take care if you'd die
if you fly on a fly
you'd be seen in the sky
or a comerfly, put your way high

I'd rather watch the clouds in the sky
and clouds, if i'd take on a try
of the cloud, and, say
that the clouds had held sway..
being such an observant was i

I'd rather watch the clouds in the sky
that were conically shaped like a pie
and they'd fall in the night
simply fall, not just right
and to fall, they could fall way up high

I'd rather watch the clouds in the sky
for his eyes with a scientist's eye
to observe and observe
to observe, observe, observe
are a change from my mind, smile and sigh

I'd rather watch the clouds in the sky
ae and stars that are bigger than i
far from sea to up high
from a view in the sky
help me up. so thanks to heaven, i'm high

I'd rather watch the clouds in the sky
i look over; my love, so i cry
if i give you my love
and you come up above
if i sit there? bye, dear, goodbye

I'd rather watch the clouds in the sky
like the clouds, though they fly way up high
cloudy arcs in the sky
all that arc as they fly
or the shadow that flies like the sky

I'd rather watch the clouds in the sky
at convenience store, purchase and buy
hop to shop for a day
do some think they're away
at the convenience store, purchase and buy

I'd rather watch the clouds in the sky
i tried hard just to climb up, and then try
to come up and to fly
i'm to get to the sky
it would sure come away with my sigh
"""

In [54]:
def raw_output_to_poems(raw_output: str):
    """Parse a raw text representation of multiple poems into a list of poems

    :param raw_output: poem as a string (like txt files we sentto Rita)
    :return: list of poems, where each poem is a list of lines
    """
    poems = raw_output.split('\n\n')
    poems = [poem.strip() for poem in poems]
    poems = [poem.split('\n') for poem in poems]
    return poems


def filter_bad_rules(matches, bad_rules):
    """Filter out bad (e.g. common, non-problematic) LanguageTool error matches

    :param matches: list of LanguageTool error matches for a given line
    :param bad_rules: list of ruleId's to ignore
    :return: list of matches with bad matches filtered out
    """
    return [match for match in matches if match.ruleId not in bad_rules]


def count_errors_in_poems(poems: list, bad_rules=None):
    """Count grammar errors of different types in a list of poems

    :param poems: list of poems, where each poem is a list of lines
    :param bad_rules: list of ruleId's of grammar rules to be ignored
    :return: dictionary with ruleId as key and count as value
    """
    error_counter = defaultdict(int)
    for poem in poems:
        for line in poem:
            matches = tool.check(line)
            if bad_rules:
                matches = filter_bad_rules(matches, bad_rules)
            for match in matches:
                error_counter[match.ruleId] += 1
    return dict(error_counter)


def print_erroneous_poems(poems: list, bad_rules=None):
    """Print erroneous poems together with their errors

    :param poems: list of poems, where each poem is a list of lines
    :param bad_rules: list of ruleId's of grammar rules to be ignored
    """
    for poem in poems:
        total_matches = []
        for line in poem:
            total_matches += tool.check(line)
        if bad_rules:
            total_matches = filter_bad_rules(total_matches, bad_rules)
        if len(total_matches) > 0:
            print()
            print('\n'.join(poem))
            print(total_matches)
            print()

def get_poem_error_penalty(poem: list, rule_penatlies: dict, bad_rules=None):
    """Calculate grammar error penalty for a given poem

    :param poem: poem as a list of lines
    :param rule_penatlies: dictionary where key is ruleId and value is an
                           integer representing the penalty for violating that
                           rule
    :return: tuple of penalty (int), list of default penalty ruleIds encountered
    """
    errors = count_errors_in_poems([poem], bad_rules=bad_rules)
    penalty = 0
    default_rules = []
    for ruleId, error_count in errors.items():
        if ruleId in rule_penalties:
            penalty += error_count * rule_penalties[ruleId]
        else:
            penalty += error_count * rule_penalties['default']
            default_rules.append(ruleId)
    return penalty, default_rules

In [55]:
# Count different error types in poems
poems = raw_output_to_poems(poems_raw)
print(f"Found {len(poems)} poems")
print("Non-filtered error counts:", count_errors_in_poems(poems))


# Filter out non-problematic errors by definion "bad rules"
bad_rules = [
    'UPPERCASE_SENTENCE_START',
    'I_LOWERCASE',
]

print("Filtered error counts:",
      count_errors_in_poems(poems, bad_rules=bad_rules))


# Print out the poems that have actual (i.e. problematic) errors in them
print_erroneous_poems(poems, bad_rules)


# Calculate a penalty score for each poem given some error-to-penalty mapping
rule_penalties = {
    'default': 10,
    'UPPERCASE_SENTENCE_START': 1,
    'I_LOWERCASE': 1,
    'MORFOLOGIK_RULE_EN_US': 5,
    'DOUBLE_PUNCTUATION': 7
}
for i, poem in enumerate(poems):
    penalty, default_rules = get_poem_error_penalty(poem, rule_penalties,
                                                    bad_rules=bad_rules)
    print(f"Poem {i+1} error penalty: {penalty}")
    if len(default_rules) > 0:
        print(f"Found rules without specific penalty assigned: {default_rules}")

Found 10 poems
Non-filtered error counts: {'UPPERCASE_SENTENCE_START': 39, 'I_LOWERCASE': 11, 'MORFOLOGIK_RULE_EN_US': 1, 'DOUBLE_PUNCTUATION': 1}
Filtered error counts: {'MORFOLOGIK_RULE_EN_US': 1, 'DOUBLE_PUNCTUATION': 1}

I'd rather watch the clouds in the sky
but ignore it, take care if you'd die
if you fly on a fly
you'd be seen in the sky
or a comerfly, put your way high
[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['comer fly'], 'offsetInContext': 5, 'context': 'or a comerfly, put your way high', 'offset': 5, 'errorLength': 8, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'or a comerfly, put your way high'})]


I'd rather watch the clouds in the sky
and clouds, if i'd take on a try
of the cloud, and, say
that the clouds had held sway..
being such an observant was i
[Match({'ruleId': 'DOUBLE_PUNCTUATION', 'message': 'Two consecutive dots', 'replacements': ['.', '…'], 'offsetInContext': 29, 'context': 

In [56]:
tool.close()