Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #135 from MukundVarmaT/tense
add tense transform
- Loading branch information
Showing
5 changed files
with
348 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Tense Tranformation 🦎 + ⌨️ → 🐍 | ||
This transformation converts sentences from one tense to the other, example: simple present to simple past. | ||
|
||
Author name: Tanay Dixit, Mukund Varma T | ||
|
||
## What type of a transformation is this? | ||
|
||
In this transformation, we convert a sentence into the target tense based on a verb, subject conjugation. | ||
This ensures that the context of the given sentence remains the same while the attribute of time changes. | ||
|
||
The following are some representative examples: | ||
|
||
Input: My father goes to gym every day | ||
Target Tense: past | ||
Transformed Text: My father went to gym every day | ||
|
||
Input: I went to the park | ||
Target Tense: future | ||
Transformed Text: I will go to the park | ||
|
||
Input: I will go to the park. | ||
Target Tense: present | ||
Transformed Text: I go to the park. | ||
|
||
## What tasks does it intend to benefit? | ||
|
||
The task is designed to measure the capacity of language understanding in language models, specifically to understand the given tense of a sentence. | ||
This task is nominally simple for humans, since we have an understanding of time / a sequence of events but is difficult for a language model as they do not have any prior information about time. | ||
There have been a couple of attempts to perform controlled attribute text transformation (Logeswaran et. al) but is yet to be seen on language models trained in a general setting. | ||
|
||
## Citations | ||
|
||
```bibtex | ||
@article{DBLP:journals/corr/abs-1811-01135, | ||
author = {Lajanugen Logeswaran and | ||
Honglak Lee and | ||
Samy Bengio}, | ||
title = {Content preserving text generation with attribute controls}, | ||
journal = {CoRR}, | ||
volume = {abs/1811.01135}, | ||
year = {2018}, | ||
url = {http://arxiv.org/abs/1811.01135}, | ||
archivePrefix = {arXiv}, | ||
eprint = {1811.01135}, | ||
timestamp = {Thu, 22 Nov 2018 17:58:30 +0100}, | ||
biburl = {https://dblp.org/rec/journals/corr/abs-1811-01135.bib}, | ||
bibsource = {dblp computer science bibliography, https://dblp.org} | ||
} | ||
``` | ||
### Data and Source Code | ||
change tense and verb infliction borrowed from https://github.com/bendichter/tenseflow | ||
|
||
## What are the limitations of this transformation? | ||
|
||
The transformation is not robust to all complex cases and is limited to only simple past/present/future tense conversions. | ||
Examples where it fails: <br> | ||
Input: I will go for dinner after I am done playing tennis. | ||
to_tense: past | ||
Output: I went for dinner after I was did playing tennis. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from .transformation import * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
pattern @ git+https://github.com/tanay2001/pattern.git |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
{ | ||
"type": "tense_transformation", | ||
"test_cases": [ | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "I will go to the park." | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "I went to the park." | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "It smells very delicious in the kitchen, what are you cooking?" | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "It smelt very delicious in the kitchen, what were you cooking?" | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "I can come to the party" | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "I can came to the party" | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "I will go to the park" | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "I went to the park" | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "I go to the park." | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "I went to the park." | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "I visit the hospital" | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "I visited the hospital" | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "I will go for dinner after I am done playing tennis" | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "I went for dinner after I was did playing tennis" | ||
} | ||
] | ||
}, | ||
{ | ||
"class": "TenseTransformation", | ||
"args": { | ||
"to_tense": "past" | ||
}, | ||
"inputs": { | ||
"sentence": "My father goes to gym every day" | ||
}, | ||
"outputs": [ | ||
{ | ||
"sentence": "My father went to gym every day" | ||
} | ||
] | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
from interfaces.SentenceOperation import SentenceOperation | ||
from tasks.TaskTypes import TaskType | ||
import string | ||
from pattern.en import conjugate, PAST, PRESENT, SINGULAR, PLURAL | ||
import spacy | ||
from spacy.symbols import NOUN | ||
import random | ||
from initialize import spacy_nlp | ||
|
||
SUBJ_DEPS = {'agent', 'csubj', 'csubjpass', 'expl', 'nsubj', 'nsubjpass'} | ||
|
||
def _get_conjuncts(tok): | ||
""" | ||
Return conjunct dependents of the leftmost conjunct in a coordinated phrase, | ||
e.g. "Burton, [Dan], and [Josh] ...". | ||
""" | ||
return [right for right in tok.rights | ||
if right.dep_ == 'conj'] | ||
|
||
|
||
def is_plural_noun(token): | ||
""" | ||
Returns True if token is a plural noun, False otherwise. | ||
Args: | ||
token (``spacy.Token``): parent document must have POS information | ||
Returns: | ||
bool | ||
""" | ||
if token.doc.is_tagged is False: | ||
raise ValueError('token is not POS-tagged') | ||
return True if token.pos == NOUN and token.lemma != token.lower else False | ||
|
||
|
||
def get_subjects_of_verb(verb): | ||
if verb.dep_ == "aux" and list(verb.ancestors): | ||
return get_subjects_of_verb(list(verb.ancestors)[0]) | ||
"""Return all subjects of a verb according to the dependency parse.""" | ||
subjs = [tok for tok in verb.lefts if tok.dep_ in SUBJ_DEPS] | ||
# get additional conjunct subjects | ||
subjs.extend(tok for subj in subjs for tok in _get_conjuncts(subj)) | ||
if not len(subjs): | ||
ancestors = list(verb.ancestors) | ||
if len(ancestors) > 0: | ||
return get_subjects_of_verb(ancestors[0]) | ||
return subjs | ||
|
||
|
||
def is_plural_verb(token): | ||
if token.doc.is_tagged is False: | ||
raise ValueError('token is not POS-tagged') | ||
subjects = get_subjects_of_verb(token) | ||
if not len(subjects): | ||
return False | ||
plural_score = sum([is_plural_noun(x) for x in subjects])/len(subjects) | ||
|
||
return plural_score > .5 | ||
|
||
def preserve_caps(word, newWord): | ||
"""Returns newWord, capitalizing it if word is capitalized.""" | ||
if word[0] >= 'A' and word[0] <= 'Z': | ||
newWord = newWord.capitalize() | ||
return newWord | ||
|
||
''' | ||
change tense function borrowed from https://github.com/bendichter/tenseflow/blob/master/tenseflow/change_tense.py | ||
''' | ||
|
||
class TenseTransformation(SentenceOperation): | ||
tasks = [ | ||
TaskType.TEXT_CLASSIFICATION, | ||
TaskType.TEXT_TO_TEXT_GENERATION | ||
] | ||
languages = ["en"] | ||
|
||
def __init__(self, to_tense): | ||
super().__init__() | ||
assert to_tense in ['past', 'present', 'future', 'random'] | ||
self.to_tense = to_tense | ||
self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm") | ||
|
||
def change_tense(self, text, to_tense): | ||
"""Change the tense of text. | ||
Args: | ||
text (str): text to change. | ||
to_tense (str): 'present','past', or 'future' | ||
npl (SpaCy model, optional): | ||
Returns: | ||
str: changed text. | ||
""" | ||
tense_lookup = {'future': 'inf', 'present': PRESENT, 'past': PAST} | ||
tense = tense_lookup[to_tense] | ||
|
||
doc = self.nlp(text) | ||
print(doc[0], doc) | ||
out = list() | ||
out.append(doc[0].text) | ||
words = [] | ||
for word in doc: | ||
words.append(word) | ||
if len(words) == 1: | ||
continue | ||
if (words[-2].text == 'will' and words[-2].tag_ == 'MD' and words[-1].tag_ == 'VB') or \ | ||
words[-1].tag_ in ('VBD', 'VBP', 'VBZ', 'VBN') or \ | ||
(not words[-2].text in ('to', 'not') and words[-1].tag_ == 'VB'): | ||
|
||
if words[-2].text in ('were', 'am', 'is', 'are', 'was') or \ | ||
(words[-2].text == 'be' and len(words) > 2 and words[-3].text == 'will'): | ||
this_tense = tense_lookup['past'] | ||
else: | ||
this_tense = tense | ||
|
||
subjects = [x.text for x in get_subjects_of_verb(words[-1])] | ||
if ('I' in subjects) or ('we' in subjects) or ('We' in subjects): | ||
person = 1 | ||
elif ('you' in subjects) or ('You' in subjects): | ||
person = 2 | ||
else: | ||
person = 3 | ||
if is_plural_verb(words[-1]): | ||
number = PLURAL | ||
else: | ||
number = SINGULAR | ||
if (words[-2].text == 'will' and words[-2].tag_ == 'MD') or words[-2].text == 'had': | ||
out.pop(-1) | ||
if to_tense == 'future': | ||
if not (out[-1] == 'will' or out[-1] == 'be'): | ||
out.append('will') | ||
# handle will as a noun in future tense | ||
if words[-2].text == 'will' and words[-2].tag_ == 'NN': | ||
out.append('will') | ||
oldWord = words[-1].text | ||
out.append(preserve_caps(oldWord, conjugate(oldWord, tense=this_tense, person=person, number=number))) | ||
else: | ||
out.append(words[-1].text) | ||
|
||
# negation | ||
if words[-2].text + words[-1].text in ('didnot', 'donot', 'willnot', "didn't", "don't", "won't"): | ||
if tense == PAST: | ||
out[-2] = 'did' | ||
elif tense == PRESENT: | ||
out[-2] = 'do' | ||
else: | ||
out.pop(-2) | ||
|
||
# future perfect | ||
if words[-1].text in ('have', 'has') and len(list(words[-1].ancestors)) and words[-1].dep_ == 'aux': | ||
out.pop(-1) | ||
|
||
text_out = ' '.join(out) | ||
|
||
# Remove spaces before/after punctuation: | ||
for char in string.punctuation: | ||
if char in """(<['""": | ||
text_out = text_out.replace(char+' ', char) | ||
else: | ||
text_out = text_out.replace(' '+char, char) | ||
|
||
for char in ["-", "“", "‘"]: | ||
text_out = text_out.replace(char+' ', char) | ||
for char in ["…", "”", "'s", "n't"]: | ||
text_out = text_out.replace(' '+char, char) | ||
|
||
return text_out | ||
|
||
def generate(self, sentence: str): | ||
""" | ||
takes in a input sentence and transforms it's tense to the target tense | ||
""" | ||
perturbed_texts = self.change_tense(sentence, to_tense = random.choice(['past', 'present', 'future']) if self.to_tense == 'random' else self.to_tense) | ||
return [perturbed_texts] |