In [1]:
# import libraries
import numpy as np
import pandas as pd
import re
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline

import language_check
import warnings

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import joblib
from sklearn.model_selection import train_test_split

# Customizing Matplotlib with style sheets
plt.style.use('seaborn-colorblind')

# Setup Pandas
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_colwidth', 100)

warnings.simplefilter("ignore", DeprecationWarning)

In [2]:
# training data 
training_data_set = pd.read_csv('training_set_rel3.tsv', sep='\t', encoding = "ISO-8859-1")\
            .rename(columns={'essay_set': 'topic', 'domain1_score': 'target_score', 'domain2_score': 'topic2_target'})
training_data_set.sample()

Unnamed: 0,essay_id,topic,essay,rater1_domain1,rater2_domain1,rater3_domain1,target_score,rater1_domain2,rater2_domain2,topic2_target,rater1_trait1,rater1_trait2,rater1_trait3,rater1_trait4,rater1_trait5,rater1_trait6,rater2_trait1,rater2_trait2,rater2_trait3,rater2_trait4,rater2_trait5,rater2_trait6,rater3_trait1,rater3_trait2,rater3_trait3,rater3_trait4,rater3_trait5,rater3_trait6
8762,13510,5,That he was happy. He relizes he has a better life in @LOCATION1 than what he would of had in Cu...,1,1,,1,,,,,,,,,,,,,,,,,,,,,


In [3]:
text = 'The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetratevirtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it\'s a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.'

In [4]:
from textblob import TextBlob
data = TextBlob(text)
print (data.correct())

The tubular threat of The Low has always struck me as the ultimate movie monster: an invariably hungry, amoeba-like mass able to penetratevirtually any safeguard, capable of--as a doomed doctor willingly describes it--"assimilating flesh on contact. Side comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey go scenario proposed by technological theorists fearful of artificial intelligence run rampart.


In [5]:
tool = language_check.LanguageTool('en-US')
matches = tool.check(text)
language_check.correct(text, matches)

"The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--”assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the Grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant."

In [7]:
text = training_data_set.essay[1600]
print(text)

Dear local newspaper, I think the effects of computers are good. Some reasons I think this is that we get a lot of knowledge from it, contact people you dont live near and haven't seen in a while. My first reason why I think the effects of computers are good because you can get a lot of knowledge from it. One example would be if you want to make a @CAPS1 meal, you wouldn't find it in an ordinary @CAPS2 @CAPS3 @CAPS4 you would have to look up on the internet for the recipe. Another example is if you wanted to travel to @LOCATION1 by car and you didn't have a map how would you get the directions? You would go on the computer, look up directions to @LOCATION1 and print it out. You would have never been able to go to @LOCATION1 if you didn't have a map or a computer. My second reason why I think the effects of computers is good is because you can contact friends or family that you have not seen in a while. One example would be if your sister just graduated from college and you wanted to c

In [8]:
matches = tool.check(text)
language_check.correct(text, matches)

"Dear local newspaper, I think the effects of computers are good. Some reasons I think this is that we get a lot of knowledge from it, contact people you done live near and haven't seen in a while. My first reason why I think the effects of computers are good because you can get a lot of knowledge from it. One example would be if you want to make a @CAPS1 meal, you wouldn't find it in an ordinary @CAPS2 @CAPS3 @CAPS4 you would have to look up on the internet for the recipe. Another example is if you wanted to travel to @LOCATION1 by car and you didn't have a map how would you get the directions? You would go on the computer, look up directions to @LOCATION1 and print it out. You would have never been able to go to @LOCATION1 if you didn't have a map or a computer. My second reason why I think the effects of computers is good is because you can contact friends or family that you have not seen in a while. One example would be if your sister just graduated from college and you wanted to c

language_check python wrapper:

to correct for most spelling and grammatical errors. 

Also count the applied corrections.

In [9]:
tool = language_check.LanguageTool('en-US')

training_data_set['matches'] = training_data_set['essay'].apply(lambda txt: tool.check(txt))
training_data_set['corrections'] = training_data_set.apply(lambda l: len(l['matches']), axis=1)
training_data_set['corrected'] = training_data_set.apply(lambda l: language_check.correct(l['essay'], l['matches']), axis=1)

# save work
training_data_set.to_pickle('training_corr.pkl')

In [10]:
print('Original:')
print(training_data_set.essay[16])
print('Corrected using languagetool:')
print(training_data_set.corrected[16])

Original:
Dear Local Newspaper, I belive that computers have a negative effect on peoples lives. I belive this because who spend to much time on the computer don't get out as much as they should, don't spend enough time with their family, and the computer can't do everything. My first reason is I belive that people need to get out more. When they don't get out, they don't exersise and that is very unhealthy. Instead of watching the games or the scores they should get out and play the game. I also belive that they should enjoy nature because I feel like they are wasting the beauty of nature all around them. We wouldn't want to waste our abilities and privalges would we? Another reason is that they do not spend enough time with family. If you have family near you, then you should take advantage of that and interact with one another. You can have fun with your family by playing games. You can also have fun by just hanging out, which boost your social skills and the computer can't always d