# Data Analysis (Preliminary)
#### John R. Starr; jrs294@pitt.edu
Time for some analysis! Let's load in the usual modules and then begin our analysis.

In [1]:
import nltk
import numpy as np
import pandas as pd

In [2]:
df = pd.read_pickle('ordered_only_df.pkl')

A breakdown on what we've got:

In [3]:
df.head()

Unnamed: 0,ID,Eng,Far,Eng_Tok,Far_Tok,Eng_Len,Far_Len,Eng_Types,Far_Types,Far_POS,Far_Chunks,Eng_POS,Eng_Chunks,Word_Order
4,5,stop please stop,دست نگه داريد خواهش ميکنم دست نگه داريد,"[stop, please, stop]","[دست, نگه, داريد, خواهش, ميکنم, دست, نگه, داريد]",3,8,"{please, stop}","{دست, داريد, خواهش, نگه, ميکنم}","[(دست, N), (نگه, N), (داريد, V), (خواهش, Ne), ...",[دست NP] [نگه داريد VP] [خواهش ميکنم دست NP] [...,"[(stop, JJ), (please, NN), (stop, VB)]","[[(stop, JJ), (please, NN)], [[('stop', 'VB')]]]",SVO
8,9,god damn it put that down,لعنت به تو اونو بذار زمين,"[god, damn, it, put, that, down]","[لعنت, به, تو, اونو, بذار, زمين]",6,6,"{god, it, down, put, damn, that}","{به, تو, اونو, بذار, لعنت, زمين}","[(لعنت, N), (به, P), (تو, PRO), (اونو, PRO), (...",[لعنت NP] [به PP] [تو NP] [اونو NP] [بذار VP] ...,"[(god, NN), (damn, VBZ), (it, PRP), (put, VBD)...","[[(god, NN)], [[('damn', 'VBZ')]], (it, PRP), ...",SOV
10,11,its the last feed weve got,اين آخرين علوفه اي بود که ما داشتيم,"[its, the, last, feed, weve, got]","[اين, آخرين, علوفه, اي, بود, که, ما, داشتيم]",6,8,"{last, got, feed, its, weve, the}","{علوفه, ما, اين, بود, داشتيم, اي, آخرين, که}","[(اين, Ne), (آخرين, NUM), (علوفه, N), (اي, N),...",[اين آخرين علوفه NP] [اي NP] [بود VP] که [ما N...,"[(its, PRP$), (the, DT), (last, JJ), (feed, NN...","[(its, PRP$), [(the, DT), (last, JJ), (feed, N...",SOV
14,15,you lied to me dan,تو به من دروغ گفتي دن,"[you, lied, to, me, dan]","[تو, به, من, دروغ, گفتي, دن]",5,6,"{to, dan, you, me, lied}","{دن, به, من, تو, دروغ, گفتي}","[(تو, PRO), (به, P), (من, PRO), (دروغ, AJ), (گ...",[تو NP] [به PP] [من NP] [دروغ NP] [گفتي VP] دن,"[(you, PRP), (lied, VBD), (to, TO), (me, PRP),...","[(you, PRP), [[('lied', 'VBD')]], (to, TO), (m...",SOV
15,16,you told me we made payments to hollander we did,تو به من گفتي قرضمونو به هلندر پرداخت کرديم ما...,"[you, told, me, we, made, payments, to, hollan...","[تو, به, من, گفتي, قرضمونو, به, هلندر, پرداخت,...",10,12,"{made, to, we, hollander, told, payments, you,...","{به, ما, کرديم, من, هلندر, پرداخت, تو, قرضمونو...","[(تو, PRO), (به, P), (من, PRO), (گفتي, V), (قر...",[تو NP] [به PP] [من NP] [گفتي VP] [قرضمونو NP]...,"[(you, PRP), (told, VBD), (me, PRP), (we, PRP)...","[(you, PRP), [[('told', 'VBD')]], (me, PRP), (...",SOV


In [4]:
df.reset_index(drop = True, inplace = True)

In [5]:
print(len(df))
print()
print(df['Word_Order'].value_counts())

76715

SOV    52756
SVO    23959
Name: Word_Order, dtype: int64


Well, we have over double the number of SOV sentences than we do SVO sentences -- this isn't a bad thing, since Persian is underlyingly SOV. Let's separate the different orderings into their own respective DFs and then examine some of the structures that we find in both.

In [6]:
#  Sorting by the values
df.sort_values(by=['Word_Order'], inplace = True)

In [7]:
# Creating new DFs
sov_df = df.loc[df.Word_Order == 'SOV']
svo_df = df.loc[df.Word_Order == 'SVO']

In [8]:
# Making sure they are the proper lengths
print(len(sov_df))
print(len(svo_df))

52756
23959


Cool! Let's start with the SOV data first.

### SOV Data
Just a snippet of what we're working with:

In [9]:
sov_df.head()

Unnamed: 0,ID,Eng,Far,Eng_Tok,Far_Tok,Eng_Len,Far_Len,Eng_Types,Far_Types,Far_POS,Far_Chunks,Eng_POS,Eng_Chunks,Word_Order
38357,276474,i will tell your wife he is your son,من به زنت ميگم ، اون پسر توئه,"[i, will, tell, your, wife, he, is, your, son]","[من, به, زنت, ميگم, ،, اون, پسر, توئه]",9,8,"{son, tell, i, will, is, he, wife, your}","{به, زنت, توئه, پسر, من, اون, ،, ميگم}","[(من, PRO), (به, P), (زنت, N), (ميگم, V), (،, ...",[من NP] [به PP] [زنت NP] [ميگم VP] ، [اون پسر ...,"[(i, NN), (will, MD), (tell, VB), (your, PRP$)...","[[(i, NN)], (will, MD), [[('tell', 'VB')]], (y...",SOV
37914,273398,someones trying to shoot us,يكي ميخواد به ما شليك كنه,"[someones, trying, to, shoot, us]","[يكي, ميخواد, به, ما, شليك, كنه]",5,6,"{someones, trying, to, us, shoot}","{به, ما, شليك, كنه, يكي, ميخواد}","[(يكي, DET), (ميخواد, N), (به, P), (ما, PRO), ...",[يكي ميخواد NP] [به PP] [ما NP] [شليك VP] كنه,"[(someones, NNS), (trying, VBG), (to, TO), (sh...","[(someones, NNS), [[('trying', 'VBG')]], (to, ...",SOV
64714,479066,sa yong's right. is there a ransom demand?,حق با سا يونگ است. ممكنه كه نقشه ديگه اي هم در...,"[sa, yong, 's, right, ., is, there, a, ransom,...","[حق, با, سا, يونگ, است, ., ممكنه, كه, نقشه, دي...",11,16,"{ransom, ., is, a, yong, demand, 's, sa, there...","{هم, ممكنه, است, ., سا, كه, اي, ديگه, حق, يونگ...","[(حق, N), (با, P), (سا, N), (يونگ, N), (است, V...",[حق NP] [با PP] [سا يونگ NP] [است VP] . [ممكنه...,"[(sa, NN), (yong, PRP), ('s, POS), (right, JJ)...","[[(sa, NN)], (yong, PRP), ('s, POS), [(right, ...",SOV
37917,273426,actually a thank you would be nice,درحقيقت يک تشكر ميتونه خوب باشه,"[actually, a, thank, you, would, be, nice]","[درحقيقت, يک, تشكر, ميتونه, خوب, باشه]",7,6,"{thank, would, actually, a, be, you, nice}","{خوب, يک, تشكر, باشه, ميتونه, درحقيقت}","[(درحقيقت, Ne), (يک, N), (تشكر, P), (ميتونه, N...",[درحقيقت يک NP] [تشكر PP] [ميتونه خوب NP] [باش...,"[(actually, RB), (a, DT), (thank, NN), (you, P...","[(actually, RB), [(a, DT), (thank, NN)], (you,...",SOV
37918,273430,i dont want an axe thats crazy why would you o...,من يک تبر نميخوام اين احمقانه است,"[i, dont, want, an, axe, thats, crazy, why, wo...","[من, يک, تبر, نميخوام, اين, احمقانه, است]",14,7,"{why, i, would, want, axe, dont, an, offer, yo...","{احمقانه, است, من, يک, تبر, اين, نميخوام}","[(من, PRO), (يک, NUM), (تبر, N), (نميخوام, V),...",[من NP] [يک تبر NP] [نميخوام VP] [اين احمقانه ...,"[(i, NNS), (dont, VBP), (want, VBP), (an, DT),...","[(i, NNS), [[('dont', 'VBP')]], [[('want', 'VB...",SOV


In [10]:
# Making the indexes easier to work with
sov_df.reset_index(drop = True, inplace = True)

Just out of curiousity, I want to see if there's any big difference between the average lengths of the English and Persian SOV sentences:

In [11]:
avg_eng_len = []
for ln in sov_df['Eng_Len']:
    avg_eng_len.append(ln)
print('The average English sentence length in the SOV file is...', (sum(avg_eng_len))/len(sov_df))

The average English sentence length in the SOV file is... 8.479926453863067


In [12]:
avg_far_len = []
for ln in sov_df['Far_Len']:
    avg_far_len.append(ln)
print('The average Persian sentence length in the SOV file is...', (sum(avg_far_len))/len(sov_df))

The average Persian sentence length in the SOV file is... 8.163753885813936


Not too big of a difference, it seems. I figured this wouldn't have much of an impact, but it was worth at least checking.

Let's see how sentences involving the word phrase "is like" fared! This phrase is common (and easily searchable) way of looking for similes.

In [28]:
# Question for graders: how can I do this more efficiently?
is_like_bool = []
for line in sov_df['Eng']:
    if 'is like' in line:
        is_like_bool.append(True)
    else:
        is_like_bool.append(False)
has_like = pd.Series(is_like_bool)

In [18]:
like_sov_df = sov_df[has_like]

In [29]:
print(len(like_sov_df))
like_sov_df.head(10)

35


Unnamed: 0,ID,Eng,Far,Eng_Tok,Far_Tok,Eng_Len,Far_Len,Eng_Types,Far_Types,Far_POS,Far_Chunks,Eng_POS,Eng_Chunks,Word_Order
110,272686,capricorns castle is like a fivestar hotel,قلعه كاپريكورن مثل يک هتل 5 ستاره است,"[capricorns, castle, is, like, a, fivestar, ho...","[قلعه, كاپريكورن, مثل, يک, هتل, 5, ستاره, است]",7,8,"{fivestar, hotel, is, a, like, castle, caprico...","{قلعه, است, يک, هتل, ستاره, كاپريكورن, 5, مثل}","[(قلعه, Ne), (كاپريكورن, N), (مثل, ADVe), (يک,...",[قلعه كاپريكورن NP] [مثل PP] [يک هتل NP] [5 ست...,"[(capricorns, NNS), (castle, NN), (is, VBZ), (...","[(capricorns, NNS), [(castle, NN)], [[('is', '...",SOV
1434,478344,you little rascal who is like a mouse where do...,هي آدم حقه باز، فكر مي كني كجا داري ميري؟,"[you, little, rascal, who, is, like, a, mouse,...","[هي, آدم, حقه, باز،, فكر, مي, كني, كجا, داري, ...",16,10,"{'re, who, rascal, where, a, is, going, mouse,...","{آدم, هي, داري, فكر, مي, كجا, باز،, ميري؟, حقه...","[(هي, CONJ), (آدم, Ne), (حقه, AJ), (باز،, Pe),...",[هي آدم حقه NP] [باز، PP] [فكر مي NP] [كني VP]...,"[(you, PRP), (little, JJ), (rascal, JJ), (who,...","[(you, PRP), [(little, JJ), (rascal, JJ)], (wh...",SOV
3810,302406,that woman is like an angelic goddess who only...,اون زن مثل يک فرشته الهيه که به اين دنيا مهربو...,"[that, woman, is, like, an, angelic, goddess, ...","[اون, زن, مثل, يک, فرشته, الهيه, که, به, اين, ...",14,12,"{who, woman, angelic, to, is, this, like, only...","{به, اين, يک, مهربوني, اون, زن, که, فرشته, اله...","[(اون, DET), (زن, N), (مثل, ADVe), (يک, NUM), ...",[اون زن NP] [مثل PP] [يک فرشته NP] [الهيه VP] ...,"[(that, DT), (woman, NN), (is, VBZ), (like, IN...","[[(that, DT), (woman, NN)], [[('is', 'VBZ')], ...",SOV
4892,304178,for sunlight is like gold,For sunlight is like gold براي نور خورشيد که ب...,"[for, sunlight, is, like, gold]","[For, sunlight, is, like, gold, براي, نور, خور...",5,12,"{is, gold, like, for, sunlight}","{به, is, طلاست, like, For, gold, نور, براي, که...","[(For, RES), (sunlight, RES), (is, RES), (like...",[For sunlight is like gold NP] [براي PP] [نور ...,"[(for, IN), (sunlight, NN), (is, VBZ), (like, ...","[[[('for', 'IN')], [('sunlight', 'NN')]], [[('...",SOV
5098,306144,a crime scene is like a code,صحنه جرم رازهاي زيادي توش داره,"[a, crime, scene, is, like, a, code]","[صحنه, جرم, رازهاي, زيادي, توش, داره]",7,6,"{is, a, scene, like, crime, code}","{رازهاي, داره, صحنه, جرم, توش, زيادي}","[(صحنه, Ne), (جرم, N), (رازهاي, V), (زيادي, AD...",[صحنه NP] [جرم NP] [رازهاي VP] [زيادي ADVP] [ت...,"[(a, DT), (crime, NN), (scene, NN), (is, VBZ),...","[[(a, DT), (crime, NN), (scene, NN)], [[('is',...",SOV
7065,494907,power is like grains of sand.,قدرت مثل دانه هاي شنه,"[power, is, like, grains, of, sand, .]","[قدرت, مثل, دانه, هاي, شنه]",7,5,"{., is, like, power, of, grains, sand}","{دانه, قدرت, هاي, مثل, شنه}","[(قدرت, N), (مثل, ADVe), (دانه, N), (هاي, V), ...",[قدرت NP] [مثل PP] [دانه NP] [هاي VP] [شنه NP],"[(power, NN), (is, VBZ), (like, IN), (grains, ...","[[(power, NN)], [[('is', 'VBZ')]], [(like, IN)...",SOV
10086,206774,i know what it is like to try and scrape by wi...,من ميدونم كه زندگي بدون پشتيبان چه جون كندني ميشه,"[i, know, what, it, is, like, to, try, and, sc...","[من, ميدونم, كه, زندگي, بدون, پشتيبان, چه, جون...",14,10,"{it, i, know, to, is, a, like, danna, scrape, ...","{زندگي, چه, من, ميشه, كه, پشتيبان, جون, كندني,...","[(من, PRO), (ميدونم, Ne), (كه, DET), (زندگي, N...",[من NP] [ميدونم كه زندگي NP] [بدون PP] [پشتيبا...,"[(i, NN), (know, VBP), (what, WP), (it, PRP), ...","[[(i, NN)], [[('know', 'VBP')]], (what, WP), (...",SOV
13367,488580,it is like leader who worked so hard with his ...,اين مثل سرپرست هست، کسي که سخت کار مي کنه تا ش...,"[it, is, like, leader, who, worked, so, hard, ...","[اين, مثل, سرپرست, هست،, کسي, که, سخت, کار, مي...",15,15,"{it, with, hammer, who, hard, his, create, is,...","{سرپرست, اين, کنه, مي, کسي, ., هست،, کار, سخت,...","[(اين, N), (مثل, ADVe), (سرپرست, Ne), (هست،, N...",[اين NP] [مثل PP] [سرپرست هست، NP] [کسي VP] که...,"[(it, PRP), (is, VBZ), (like, JJ), (leader, NN...","[(it, PRP), [[('is', 'VBZ')], [('like', 'JJ'),...",SOV
14867,393343,physician lady jang-geum's practice of medicin...,تمرینات بانوی پزشک یانگوم در پزشکی مانند کار ی...,"[physician, lady, jang-geum, 's, practice, of,...","[تمرینات, بانوی, پزشک, یانگوم, در, پزشکی, مانن...",13,11,"{jang-geum, practice, is, a, lady, like, mothe...","{مادر, است, در, کار, بانوی, یانگوم, یک, پزشک, ...","[(تمرینات, Ne), (بانوی, Ne), (پزشک, Ne), (یانگ...",[تمرینات بانوی پزشک یانگوم NP] [در PP] [پزشکی ...,"[(physician, JJ), (lady, NN), (jang-geum, NN),...","[[(physician, JJ), (lady, NN), (jang-geum, NN)...",SOV
16379,367922,duk-gu is like a father to me i'll be right back,داگو مثل پدر من میمونه زود برمیگردم,"[duk-gu, is, like, a, father, to, me, i, 'll, ...","[داگو, مثل, پدر, من, میمونه, زود, برمیگردم]",12,7,"{i, back, to, is, a, right, be, like, duk-gu, ...","{میمونه, برمیگردم, من, زود, پدر, مثل, داگو}","[(داگو, N), (مثل, ADVe), (پدر, Ne), (من, PRO),...",[داگو NP] [مثل PP] [پدر من NP] [میمونه VP] [زو...,"[(duk-gu, NN), (is, VBZ), (like, IN), (a, DT),...","[[(duk-gu, NN)], [[('is', 'VBZ')], [(P like/IN...",SOV


Persian's functional equivalents to the words "like" and "that" are "مثل" and "که" respectively. If you look closely at the data, nearly all the Persian sentences in this dataset use those words! And, conveniently enough, you see that those same sentences maintain the SOV structure that we anticipated before.

NOTE: More analysis needs to be done!

Now for the SVO data.

### SVO Data
Similar to what we did for the SOV data, let's flash a little bit of what we're working with:

In [None]:
svo_df.head()

In [20]:
# Making the indexes easier to work with
svo_df.reset_index(drop = True, inplace = True)

In [21]:
avg_eng_len2 = []
for ln in svo_df['Eng_Len']:
    avg_eng_len2.append(ln)
print('The average English sentence length in the SOV file is...', (sum(avg_eng_len2))/len(svo_df))

The average English sentence length in the SOV file is... 8.849826787428524


In [22]:
avg_far_len2 = []
for ln in svo_df['Far_Len']:
    avg_far_len2.append(ln)
print('The average Persian sentence length in the SOV file is...', (sum(avg_far_len2))/len(svo_df))

The average Persian sentence length in the SOV file is... 8.371760090154012


In [25]:
is_like_bool = []
for line in svo_df['Eng']:
    if 'is like' in line:
        is_like_bool.append(True)
    else:
        is_like_bool.append(False)
has_like = pd.Series(is_like_bool)

In [26]:
like_svo_df = svo_df[has_like]

In [31]:
print(len(like_svo_df))
like_svo_df

13


Unnamed: 0,ID,Eng,Far,Eng_Tok,Far_Tok,Eng_Len,Far_Len,Eng_Types,Far_Types,Far_POS,Far_Chunks,Eng_POS,Eng_Chunks,Word_Order
94,434104,rounds is like being on a game show.,اين ويزيتها شبيه اغاز يه مسابقه تلوزيونيه,"[rounds, is, like, being, on, a, game, show, .]","[اين, ويزيتها, شبيه, اغاز, يه, مسابقه, تلوزيونيه]",9,7,"{rounds, being, on, ., is, a, like, game, show}","{شبيه, يه, اغاز, ويزيتها, تلوزيونيه, اين, مسابقه}","[(اين, DET), (ويزيتها, N), (شبيه, V), (اغاز, N...",[اين ويزيتها NP] [شبيه VP] [اغاز يه مسابقه تلو...,"[(rounds, NNS), (is, VBZ), (like, IN), (being,...","[(rounds, NNS), [[('is', 'VBZ')]], [(like, IN)...",SVO
4192,434767,that song is like a virus.,اين آهنگه مثله خوره ميمونه,"[that, song, is, like, a, virus, .]","[اين, آهنگه, مثله, خوره, ميمونه]",7,5,"{., is, a, song, like, virus, that}","{مثله, ميمونه, آهنگه, اين, خوره}","[(اين, N), (آهنگه, V), (مثله, Ne), (خوره, Ne),...",[اين NP] [آهنگه VP] [مثله خوره ميمونه NP],"[(that, DT), (song, NN), (is, VBZ), (like, IN)...","[[(that, DT), (song, NN)], [[('is', 'VBZ')], [...",SVO
7740,124304,this is like shoe art,اين هست هنر کفش,"[this, is, like, shoe, art]","[اين, هست, هنر, کفش]",5,4,"{art, is, this, like, shoe}","{کفش, اين, هست, هنر}","[(اين, N), (هست, V), (هنر, Ne), (کفش, N)]",[اين NP] [هست VP] [هنر کفش NP],"[(this, DT), (is, VBZ), (like, IN), (shoe, JJ)...","[[(this, DT)], [[('is', 'VBZ')], [(P like/IN),...",SVO
8347,190435,christmas is like me racing back and forth bet...,کريسمس توي کورس من بين مامان و بابا خلاصه ميشه,"[christmas, is, like, me, racing, back, and, f...","[کريسمس, توي, کورس, من, بين, مامان, و, بابا, خ...",12,10,"{between, back, mom, is, christmas, dad, like,...","{کورس, و, ميشه, من, بين, کريسمس, خلاصه, توي, م...","[(کريسمس, N), (توي, V), (کورس, Ne), (من, PRO),...",[کريسمس NP] [توي VP] [کورس من NP] [بين VP] [ما...,"[(christmas, NN), (is, VBZ), (like, IN), (me, ...","[[(christmas, NN)], [[('is', 'VBZ')]], [(like,...",SVO
9775,183359,our love is like a red red rose,عشقمون شبيه يک گل رز قرمزه,"[our, love, is, like, a, red, red, rose]","[عشقمون, شبيه, يک, گل, رز, قرمزه]",8,6,"{red, love, is, a, like, our, rose}","{شبيه, يک, گل, رز, عشقمون, قرمزه}","[(عشقمون, N), (شبيه, V), (يک, NUM), (گل, Ne), ...",[عشقمون NP] [شبيه VP] [يک گل رز قرمزه NP],"[(our, PRP$), (love, NN), (is, VBZ), (like, IN...","[(our, PRP$), [(love, NN)], [[('is', 'VBZ')], ...",SVO
10643,34687,living below is like living in space you dont ...,زندگي اين پايين شبيه زندگي تو فضاست نبايد زياد...,"[living, below, is, like, living, in, space, y...","[زندگي, اين, پايين, شبيه, زندگي, تو, فضاست, نب...",13,11,"{you, very, many, below, is, like, dont, livin...","{شبيه, زندگي, پايين, تو, زياد, نبايد, کني, اين...","[(زندگي, Ne), (اين, DET), (پايين, N), (شبيه, V...",[زندگي اين پايين NP] [شبيه VP] [زندگي تو NP] [...,"[(living, VBG), (below, NN), (is, VBZ), (like,...","[[[('living', 'VBG')], [('below', 'NN')]], [[(...",SVO
12867,82480,to have an indecent behavior is like trampling...,رفتار هاي شرم آور درست شبيه اينه كه خون شهدا ر...,"[to, have, an, indecent, behavior, is, like, t...","[رفتار, هاي, شرم, آور, درست, شبيه, اينه, كه, خ...",14,13,"{trampling, to, on, is, like, of, an, have, in...","{شبيه, آور, پايمال, هاي, كه, شرم, شهدا, رفتار,...","[(رفتار, N), (هاي, V), (شرم, N), (آور, V), (در...",[رفتار NP] [هاي VP] [شرم آور VP] [درست شبيه اي...,"[(to, TO), (have, VB), (an, DT), (indecent, JJ...","[(to, TO), [[('have', 'VB')], [('an', 'DT'), (...",SVO
14413,81908,dying as a martyr is like injecting blood in t...,مردن بعنوان شهيد شبيه تزريق خون در شَريانه جام...,"[dying, as, a, martyr, is, like, injecting, bl...","[مردن, بعنوان, شهيد, شبيه, تزريق, خون, در, شَر...",13,10,"{martyr, injecting, veins, is, a, like, of, so...","{مردن, شبيه, تزريق, ست, شَريانه, جامعه, بعنوان...","[(مردن, N), (بعنوان, Pe), (شهيد, N), (شبيه, V)...",[مردن NP] [بعنوان شهيد شبيه VP] [تزريق خون NP]...,"[(dying, VBG), (as, IN), (a, DT), (martyr, NN)...","[[[('dying', 'VBG')], [(P as/IN), (NP a/DT mar...",SVO
14498,72418,you kidding me this stuff is like waterto me,شوخي ميکني اينجور چيزا براي من مثل آب ميمونن,"[you, kidding, me, this, stuff, is, like, wate...","[شوخي, ميکني, اينجور, چيزا, براي, من, مثل, آب,...",9,9,"{is, waterto, this, like, stuff, kidding, you,...","{ميکني, چيزا, اينجور, من, ميمونن, براي, شوخي, ...","[(شوخي, Ne), (ميکني, AJe), (اينجور, N), (چيزا,...",[شوخي ميکني اينجور NP] [چيزا براي VP] [من NP] ...,"[(you, PRP), (kidding, VBG), (me, PRP), (this,...","[(you, PRP), [[('kidding', 'VBG')]], (me, PRP)...",SVO
15348,339780,exercise is like a religion to me no pun intended,تمرين كردن براي من مثل يک مذهب ميمونه خالي نمي...,"[exercise, is, like, a, religion, to, me, no, ...","[تمرين, كردن, براي, من, مثل, يک, مذهب, ميمونه,...",10,10,"{pun, to, is, a, intended, like, exercise, rel...","{كردن, تمرين, من, يک, نميبندم, خالي, ميمونه, ب...","[(تمرين, N), (كردن, N), (براي, V), (من, PRO), ...",[تمرين كردن NP] [براي VP] [من NP] [مثل PP] [يک...,"[(exercise, NN), (is, VBZ), (like, IN), (a, DT...","[[(exercise, NN)], [[('is', 'VBZ')], [(P like/...",SVO


As mentioned before, مثل and که are the functional words here that indicate an _obvious_ simile. However, only three of the sentences in this DF use those functional words! These sentences still translate the idea of a simile, but convert it to a more metaphorical approach. This approach may have an influence on why the structure of the translation is SVO...! However, this is definitely not a conclusion I can make quite yet -- there's much, much, _much_ more to do! 

Plan for the future:
- Build some sort of function that will help me sift through the data faster and search for keywords
- Determine what factors may influence a SVO word order to appear over an SOV
    - Examine functional words first, and then look for trends
    - Machine learning?
- Potentially incorporate a machine learning model from this data to predict whether or not a sentence will be SOV or SVO?