# Data Analysis

In this notebook, I will begin the process of data analysis. 

In [1]:
pip install language-tool-python

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Start with importing our libraries
import language_tool_python as ltp  # This guy is new! Trying this out as a grammaticality parser
import pandas as pd 
import nltk
import re

In [3]:
# This sets up our parsing tool
tool = ltp.LanguageTool('en-US')

Let's begin by reading in our final CSV files, created in the dataOrganization notebook.

In [143]:
legalAdvice = pd.read_csv("../final-data/finalLegalData.csv")
adulting = pd.read_csv('../final-data/finalAdData.csv')
medicine = pd.read_csv('../final-data/finalMedData.csv')
highschool = pd.read_csv('../final-data/finalHsData.csv')
broadway = pd.read_csv('../final-data/finalBwayData.csv')
pittsburgh = pd.read_csv('../final-data/finalPghData.csv')
rant = pd.read_csv('../final-data/finalRantData.csv')
ccq = pd.read_csv('../final-data/finalCcqData.csv')
anime = pd.read_csv('../final-data/finalAnimeData.csv')
eli5 = pd.read_csv('../final-data/finalElifData.csv')
college = pd.read_csv('../final-data/finalCollegeData.csv')
sports = pd.read_csv('../final-data/finalSportsData.csv')
crypto = pd.read_csv('../final-data/finalCryptoData.csv')
lawyertalk = pd.read_csv('../final-data/finalLawyerData.csv')
gaming = pd.read_csv('../final-data/finalGamingData.csv')

Removing the random column from the CSV files

In [144]:
legalAdvice = legalAdvice.drop('Unnamed: 0', axis = 1)
adulting = adulting.drop('Unnamed: 0', axis = 1)
medicine = medicine.drop('Unnamed: 0', axis = 1)
highschool = highschool.drop('Unnamed: 0', axis = 1)
broadway = broadway.drop('Unnamed: 0', axis = 1)
pittsburgh = pittsburgh.drop('Unnamed: 0', axis = 1)
rant = rant.drop('Unnamed: 0', axis = 1)
ccq = ccq.drop('Unnamed: 0', axis = 1)
anime = anime.drop('Unnamed: 0', axis = 1)
eli5 = eli5.drop('Unnamed: 0', axis = 1)
college = college.drop('Unnamed: 0', axis = 1)
sports = sports.drop('Unnamed: 0', axis = 1)
crypto = crypto.drop('Unnamed: 0', axis = 1)
lawyertalk = lawyertalk.drop('Unnamed: 0', axis = 1)
gaming = gaming.drop('Unnamed: 0', axis = 1)

Let's get a refresher on what the dataframe looks like:

In [15]:
ccq

Unnamed: 0,Title,Id,Text,Author,Number of Comments,Number of upvotes,Ratio of Upvotes
0,"Big N Discussion - March 19, 2023",11ve46y,Please use this thread to have discussions abo...,CSCQMods,7,5,0.73
1,"Daily Chat Thread - March 19, 2023",11ve5o1,"Please use this thread to chat, have casual di...",CSCQMods,0,1,0.60
2,Is it acceptable to do lunch 12-1pm at work? A...,11voie0,Asking as a new grad who is trying to understa...,TheCockatoo,214,225,0.74
3,Number of Open Tech Jobs has increased for 2 c...,11vqmgd,https://www.trueup.io/job-trend\n\nThis is a f...,TheCopyPasteLife,46,95,0.89
4,How to enforce good practices in my workplace?,11viy3c,"My team doesn't enforce good practices, and my...",Old-Fennel9061,58,149,0.91
...,...,...,...,...,...,...,...
1495,Nerves about starting first SWE Role,11k8to9,I’m graduating from a top CS university this s...,BringMeTheBRBS,3,0,0.33
1496,Reaching out to someone for help with a position,11k87kz,"Hello all, to cut to the chase, I was recently...",businessbee89,1,1,1.00
1497,Portfolio projects - better to create somethin...,11k7q3y,So I'm getting started creating a project for ...,GroundFallsOnly,4,5,1.00
1498,"SWEs in the UK, what is your day-to-day actual...",11k7dz7,We see lots of YouTube videos where young SWEs...,nonbog,7,3,0.80


Now, let's just focus in on one post from that subreddit

Posts can potentially contain a lot of characters that the grammaticality library may not like. By not removing them, this could lead to unfair results as *technically* these posts are grammatical, they just contain words such as "Reddit" or "Subreddit". Potentially, as I am working on this analysis, because there are so many cases to consider, I can look at later whether these types of errors fall within the same category so that I could just ignore that category.

In [56]:
onePost = ccq['Text'][3]
onePost

"https://www.trueup.io/job-trend\n\nThis is a follow up from [last week's post.](https://old.reddit.com/r/cscareerquestions/comments/11odfe7/number_of_open_tech_jobs_has_increased_for_the/) It definitely seems like the market is starting to turn around. I also have anecdotal evidence of my own. Feel free to add yours.\n\nPossible risks include reduced lending to startups due to regional bank liquidity. Also another wave of layoffs, like Facebook, but I think that Facebook's layoffs come from a dying business, not an industry-wide concern."

Essentially, for each error, it will put the entire error in parentheses, preceded by "Match". The tool says what type of error it is, provides a bit of an explanation, and then suggests fixes. For the post above, there are two errors.

In [57]:
# Use the tool to see what it will give us for the post above
oneCcqPost = tool.check(onePost)
oneCcqPost

[Match({'ruleId': 'VERB_NOUN_CONFUSION', 'message': 'When ‘follow-up’ is used as a noun or modifier, it needs to be hyphenated.', 'replacements': ['follow-up'], 'offsetInContext': 43, 'context': "...ps://www.trueup.io/job-trend  This is a follow up from [last week's post.](https://old.re...", 'offset': 43, 'errorLength': 9, 'category': 'COMPOUNDING', 'ruleIssueType': 'uncategorized', 'sentence': "This is a follow up from [last week's post.](https://old.reddit.com/r/cscareerquestions/comments/11odfe7/number_of_open_tech_jobs_has_increased_for_the/) It definitely seems like the market is starting to turn around."}),
 Match({'ruleId': 'SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA', 'message': 'A comma may be missing after the conjunctive/linking adverb ‘Also’.', 'replacements': ['Also,'], 'offsetInContext': 43, 'context': '...tartups due to regional bank liquidity. Also another wave of layoffs, like Facebook,...', 'offset': 401, 'errorLength': 4, 'category': 'PUNCTUATION', 'ruleIssueType':

In [58]:
# This syntax will be important... will allow us to directly access the errors of each paragraph
oneCcqPost[0].ruleId

'VERB_NOUN_CONFUSION'

In [59]:
oneCcqPost[1].ruleId

'SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA'

In [72]:
onePost = ccq['Text'][1]
onePost

"Please use this thread to chat, have casual discussions, and ask casual questions. Moderation will be light, but don't be a jerk.\n\nThis thread is posted **every day at midnight PST**. Previous Daily Chat Threads can be found [here](https://www.reddit.com/r/cscareerquestions/search?q=Daily+Chat+Thread&restrict_sr=on&sort=new&t=all)."

If the post has no errors, it will run silently like below!

In [71]:
oneCcqPost = tool.check(onePost)
oneCcqPost

[Match({'ruleId': 'OUTSIDE_OF', 'message': 'This phrase is redundant. Consider using “outside”.', 'replacements': ['outside'], 'offsetInContext': 43, 'context': '... Posts focusing solely on Big N created outside of this thread will probably be removed.  ...', 'offset': 233, 'errorLength': 10, 'category': 'REDUNDANCY', 'ruleIssueType': 'style', 'sentence': 'Posts focusing solely on Big N created outside of this thread will probably be removed.'})]

In [83]:
onePost = ccq['Text'][22]
onePost

"Crossposting from r/AskAcademia \\- thought I could get some useful knowledge from a different group of folks.\n\nI'm in a really tough situation and have been feeling acutely ill over making a decision. I would be really grateful for some advice.\n\nI graduated from a prestigious university with a bachelors in CS last spring, and have been working and making a really comfortable salary in a big tech company (return offer on a summer 2021 internship). I had worked with an assistant professor on ML research for the last two years and had a good and productive time; the professor really believed in me, and towards the last semester, suggested that I apply for PhD programs in that area.\n\nI did not believe that I could get into any PhD programs this cycle, as this particular subfield is extremely competitive, and so figured that I would not seriously decide whether or not to do a PhD unless I get in. I only targeted very competitive places and applied to my undergrad's fully-funded, res

As seen below however, words such as 'r/AskAcademia' are marked as a spelling mistake. This is problematic, as a lot of posts refer to r/subredditname, but that does not make them ungrammatical in the perspective that I am looking at it. This poses an issue with the parser.

In [84]:
oneCcqPost = tool.check(onePost)
oneCcqPost

[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Cross posting'], 'offsetInContext': 0, 'context': 'Crossposting from r/AskAcademia \\- thought I could g...', 'offset': 0, 'errorLength': 12, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Crossposting from r/AskAcademia \\- thought I could get some useful knowledge from a different group of folks.'}),
 Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Academia'], 'offsetInContext': 20, 'context': 'Crossposting from r/AskAcademia \\- thought I could get some useful know...', 'offset': 20, 'errorLength': 11, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Crossposting from r/AskAcademia \\- thought I could get some useful knowledge from a different group of folks.'}),
 Match({'ruleId': 'ENGLISH_WORD_REPEAT_BEGINNING_RULE', 'message': 'Three successive sentences begin with the same word. C

However, if we use regex to substitute out "reddit", the parser does not mark this as an error. This may be helpful in my pursuit of finding meaninful grammatical errors

In [93]:
onePost = re.sub(r"[rR]\/\S+", "--", onePost)
onePost = re.sub(r"[s/S]ubreddit", "--", onePost)
onePost

"Crossposting from -- \\- thought I could get some useful knowledge from a different group of folks.\n\nI'm in a really tough situation and have been feeling acutely ill over making a decision. I would be really grateful for some advice.\n\nI graduated from a prestigious university with a bachelors in CS last spring, and have been working and making a really comfortable salary in a big tech company (return offer on a summer 2021 internship). I had worked with an assistant professor on ML research for the last two years and had a good and productive time; the professor really believed in me, and towards the last semester, suggested that I apply for PhD programs in that area.\n\nI did not believe that I could get into any PhD programs this cycle, as this particular subfield is extremely competitive, and so figured that I would not seriously decide whether or not to do a PhD unless I get in. I only targeted very competitive places and applied to my undergrad's fully-funded, research-focus

In [91]:
oneCcqPost = tool.check(onePost)
oneCcqPost

[Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Cross posting'], 'offsetInContext': 0, 'context': 'Crossposting from -- \\- thought I could get some use...', 'offset': 0, 'errorLength': 12, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Crossposting from -- \\- thought I could get some useful knowledge from a different group of folks.'}),
 Match({'ruleId': 'ENGLISH_WORD_REPEAT_BEGINNING_RULE', 'message': 'Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.', 'replacements': ['Furthermore, I', 'Likewise, I', 'Not only that, but I'], 'offsetInContext': 43, 'context': '...ld be really grateful for some advice.  I graduated from a prestigious university...', 'offset': 235, 'errorLength': 1, 'category': 'STYLE', 'ruleIssueType': 'style', 'sentence': 'I graduated from a prestigious university with a bachelors in CS last spring, and have been wo

Now, for analysis purposes, we want to grab all of the CSV files "Text" columns and put them into a list for iteration

In [145]:
ccq['Text'] = ccq['Text'].str.replace("reddit", "--", regex = False, case=False)
ccq['Text'] = ccq['Text'].str.replace("subreddit", "--", regex = False, case=False)

legalAdvice['Text'] = legalAdvice['Text'].str.replace("reddit", "--", regex = False, case=False)
legalAdvice['Text'] = legalAdvice['Text'].str.replace("subreddit", "--", regex = False, case=False)

adulting['Text'] = adulting['Text'].str.replace("reddit", "--", regex = False, case=False)
adulting['Text'] = adulting['Text'].str.replace("subreddit", "--", regex = False, case=False)

medicine['Text'] = medicine['Text'].str.replace("reddit", "--", regex = False, case=False)
medicine['Text'] = medicine['Text'].str.replace("subreddit", "--", regex = False, case=False)

highschool['Text'] = highschool['Text'].str.replace("reddit", "--", regex = False, case=False)
highschool['Text'] = highschool['Text'].str.replace("subreddit", "--", regex = False, case=False)

broadway['Text'] = broadway['Text'].str.replace("reddit", "--", regex = False, case=False)
broadway['Text'] = broadway['Text'].str.replace("subreddit", "--", regex = False, case=False)

pittsburgh['Text'] = pittsburgh['Text'].str.replace("reddit", "--", regex = False, case=False)
pittsburgh['Text'] = pittsburgh['Text'].str.replace("subreddit", "--", regex = False, case=False)

rant['Text'] = rant['Text'].str.replace("reddit", "--", regex = False, case=False)
rant['Text'] = rant['Text'].str.replace("subreddit", "--", regex = False, case=False)

anime['Text'] = anime['Text'].str.replace("reddit", "--", regex = False, case=False)
anime['Text'] = anime['Text'].str.replace("subreddit", "--", regex = False, case=False)

eli5['Text'] = eli5['Text'].str.replace("reddit", "--", regex = False, case=False)
eli5['Text'] = eli5['Text'].str.replace("subreddit", "--", regex = False, case=False)

college['Text'] = college['Text'].str.replace("reddit", "--", regex = False, case=False)
college['Text'] = college['Text'].str.replace("subreddit", "--", regex = False, case=False)

sports['Text'] = sports['Text'].str.replace("reddit", "--", regex = False, case=False)
sports['Text'] = sports['Text'].str.replace("subreddit", "--", regex = False, case=False)

crypto['Text'] = crypto['Text'].str.replace("reddit", "--", regex = False, case=False)
crypto['Text'] = crypto['Text'].str.replace("subreddit", "--", regex = False, case=False)

lawyertalk['Text'] = lawyertalk['Text'].str.replace("reddit", "--", regex = False, case=False)
lawyertalk['Text'] = lawyertalk['Text'].str.replace("subreddit", "--", regex = False, case=False)

gaming['Text'] = gaming['Text'].str.replace("reddit", "--", regex = False, case=False)
gaming['Text'] = gaming['Text'].str.replace("subreddit", "--", regex = False, case=False)

In [126]:
allSportsVals = list(sports['Text'].values)
allLegalVals = list(legalAdvice['Text'].values)
allAdultVals = list(adulting['Text'].values)
allMedVals = list(medicine['Text'].values)
allHsVals = list(highschool['Text'].values)
allBwayVals = list(broadway['Text'].values)
allPghVals = list(pittsburgh['Text'].values)
allRantVals = list(rant['Text'].values)
allCcqVals = list(ccq['Text'].values)
allAnimeVals = list(anime['Text'].values)
allEli5Vals = list(eli5['Text'].values)
allCollegeVals = list(college['Text'].values)
allCryptoVals = list(crypto['Text'].values)
allLawyerVals = list(lawyertalk['Text'].values)
allGamingVals = list(gaming['Text'].values)

In [151]:
sportsErrors = []
legalErrors = []
adultErrors = []
medErrors = []
hsErrors = []
bwayErrors = []
pghErrors = []
rantErrors = []
ccqErrors = []
animeErrors = []
eli5Errors = []
collegeErrors = []
cryptoErrors = []
lawyerErrors = [] 
gamingErrors = []

[sportsErrors.append(tool.check(x)) for x in allSportsVals]
[legalErrors.append(tool.check(x)) for x in allLegalVals]
[adultErrors.append(tool.check(x)) for x in allAdultVals]
[medErrors.append(tool.check(x)) for x in allMedVals]
[hsErrors.append(tool.check(x)) for x in allHsVals]
[bwayErrors.append(tool.check(x)) for x in allBwayVals]
[pghErrors.append(tool.check(x)) for x in allPghVals]
[rantErrors.append(tool.check(x)) for x in allRantVals]
[ccqErrors.append(tool.check(x)) for x in allCcqVals]
[animeErrors.append(tool.check(x)) for x in allAnimeVals]
[eli5Errors.append(tool.check(x)) for x in allEli5Vals]
[collegeErrors.append(tool.check(x)) for x in allCollegeVals]
[cryptoErrors.append(tool.check(x)) for x in allCryptoVals]
[lawyerErrors.append(tool.check(x)) for x in allLawyerVals]
[gamingErrors.append(tool.check(x)) for x in allGamingVals]

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,

In [152]:
collegeErrors[]

[Match({'ruleId': 'MD_BASEFORM', 'message': 'The modal verb ‘will’ requires the verb’s base form.', 'replacements': ['become'], 'offsetInContext': 43, 'context': '...-2023 school year: 2022-2023 FAFSA will became available October 1, 2021. Requires 202...', 'offset': 522, 'errorLength': 6, 'category': 'GRAMMAR', 'ruleIssueType': 'grammar', 'sentence': '2022-2023 school year: 2022-2023 FAFSA will became available October 1, 2021.'}),
 Match({'ruleId': 'ENGLISH_WORD_REPEAT_BEGINNING_RULE', 'message': 'Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.', 'replacements': [], 'offsetInContext': 43, 'context': '...ir own FSA account, they must use that. If your parent does not have an SSN, they ...', 'offset': 997, 'errorLength': 2, 'category': 'STYLE', 'ruleIssueType': 'style', 'sentence': 'If your parent does not have an SSN, they must print and sign the signature page manually, then mail it in.'}),
 Match({'ruleId': '

In [174]:
simplifiedColErrors = []
[simplifiedColErrors.append(y.ruleId) for x in collegeErrors for y in x]

simplifiedSportsErrors = []
[simplifiedSportsErrors.append(y.ruleId) for x in sportsErrors for y in x]

simplifiedLegalErrors = []
[simplifiedLegalErrors.append(y.ruleId) for x in legalErrors for y in x]

simplifiedAdultErrors = []
[simplifiedAdultErrors.append(y.ruleId) for x in adultErrors for y in x]

simplifiedMedErrors = []
[simplifiedMedErrors.append(y.ruleId) for x in medErrors for y in x]

simplifiedHsErrors = []
[simplifiedHsErrors.append(y.ruleId) for x in hsErrors for y in x]

simplifiedBwayErrors = []
[simplifiedBwayErrors.append(y.ruleId) for x in bwayErrors for y in x]

simplifiedPghErrors = []
[simplifiedPghErrors.append(y.ruleId) for x in pghErrors for y in x]

simplifiedRantErrors = []
[simplifiedRantErrors.append(y.ruleId) for x in rantErrors for y in x]

simplifiedCcqErrors = []
[simplifiedCcqErrors.append(y.ruleId) for x in ccqErrors for y in x]

simplifiedAnimeErrors = []
[simplifiedAnimeErrors.append(y.ruleId) for x in animeErrors for y in x]

simplifiedEli5Errors = []
[simplifiedEli5Errors.append(y.ruleId) for x in eli5Errors for y in x]

simplifiedCryptoErrors = []
[simplifiedCryptoErrors.append(y.ruleId) for x in cryptoErrors for y in x]

simplifiedLawyerErrors = []
[simplifiedLawyerErrors.append(y.ruleId) for x in lawyerErrors for y in x]

simplifiedGamingErrors = []
[simplifiedGamingErrors.append(y.ruleId) for x in gamingErrors for y in x]


[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,

In [175]:
simplifiedGamingErrors[50:]

['COMMA_COMPOUND_SENTENCE',
 'MORFOLOGIK_RULE_EN_US',
 'IDK',
 'MORFOLOGIK_RULE_EN_US',
 'CAUSE_BECAUSE',
 'MORFOLOGIK_RULE_EN_US',
 'WHITESPACE_RULE',
 'NONE_THE_LESS',
 'WHEN_WHERE',
 'EN_SPECIFIC_CASE',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'EN_SPECIFIC_CASE',
 'EN_A_VS_AN',
 'MORFOLOGIK_RULE_EN_US',
 'COMMA_COMPOUND_SENTENCE',
 'COMMA_COMPOUND_SENTENCE',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'HOW_YOU_DOING',
 'I_LOWERCASE',
 'IT_IS',
 'I_LOWERCASE',
 'IT_IS_2',
 'MISSING_HYPHEN',
 'MORFOLOGIK_RULE_EN_US',
 'COMMA_PARENTHESIS_WHITESPACE',
 'TO_NON_BASE',
 'MORFOLOGIK_RULE_EN_US',
 'COMMA_COMPOUND_SENTENCE',
 'MORFOLOGIK_RULE_EN_US',
 'ENGLISH_WORD_REPEAT_BEGINNING_RULE',
 'EN_CONTRACTION_SPELLING',
 'COMMA_COMPOUND_SENTENCE',
 'THERE_WAS_MANY',
 'ENGLISH_WORD_REPEAT_BEGINNING_RULE',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE_EN_US',
 'MORFOLOGIK_RULE