Skip to content

Commit

Permalink
Merge pull request #135 from MukundVarmaT/tense
Browse files Browse the repository at this point in the history
add tense transform
  • Loading branch information
kaustubhdhole committed Oct 2, 2021
2 parents 6a550b1 + eca93fd commit 736fa20
Show file tree
Hide file tree
Showing 5 changed files with 348 additions and 0 deletions.
59 changes: 59 additions & 0 deletions transformations/tense/README.md
@@ -0,0 +1,59 @@
# Tense Tranformation 🦎 + ⌨️ → 🐍
This transformation converts sentences from one tense to the other, example: simple present to simple past.

Author name: Tanay Dixit, Mukund Varma T

## What type of a transformation is this?

In this transformation, we convert a sentence into the target tense based on a verb, subject conjugation.
This ensures that the context of the given sentence remains the same while the attribute of time changes.

The following are some representative examples:

Input: My father goes to gym every day
Target Tense: past
Transformed Text: My father went to gym every day

Input: I went to the park
Target Tense: future
Transformed Text: I will go to the park

Input: I will go to the park.
Target Tense: present
Transformed Text: I go to the park.

## What tasks does it intend to benefit?

The task is designed to measure the capacity of language understanding in language models, specifically to understand the given tense of a sentence.
This task is nominally simple for humans, since we have an understanding of time / a sequence of events but is difficult for a language model as they do not have any prior information about time.
There have been a couple of attempts to perform controlled attribute text transformation (Logeswaran et. al) but is yet to be seen on language models trained in a general setting.

## Citations

```bibtex
@article{DBLP:journals/corr/abs-1811-01135,
author = {Lajanugen Logeswaran and
Honglak Lee and
Samy Bengio},
title = {Content preserving text generation with attribute controls},
journal = {CoRR},
volume = {abs/1811.01135},
year = {2018},
url = {http://arxiv.org/abs/1811.01135},
archivePrefix = {arXiv},
eprint = {1811.01135},
timestamp = {Thu, 22 Nov 2018 17:58:30 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1811-01135.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
### Data and Source Code
change tense and verb infliction borrowed from https://github.com/bendichter/tenseflow

## What are the limitations of this transformation?

The transformation is not robust to all complex cases and is limited to only simple past/present/future tense conversions.
Examples where it fails: <br>
Input: I will go for dinner after I am done playing tennis.
to_tense: past
Output: I went for dinner after I was did playing tennis.
1 change: 1 addition & 0 deletions transformations/tense/__init__.py
@@ -0,0 +1 @@
from .transformation import *
1 change: 1 addition & 0 deletions transformations/tense/requirements.txt
@@ -0,0 +1 @@
pattern @ git+https://github.com/tanay2001/pattern.git
117 changes: 117 additions & 0 deletions transformations/tense/test.json
@@ -0,0 +1,117 @@
{
"type": "tense_transformation",
"test_cases": [
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "I will go to the park."
},
"outputs": [
{
"sentence": "I went to the park."
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "It smells very delicious in the kitchen, what are you cooking?"
},
"outputs": [
{
"sentence": "It smelt very delicious in the kitchen, what were you cooking?"
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "I can come to the party"
},
"outputs": [
{
"sentence": "I can came to the party"
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "I will go to the park"
},
"outputs": [
{
"sentence": "I went to the park"
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "I go to the park."
},
"outputs": [
{
"sentence": "I went to the park."
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "I visit the hospital"
},
"outputs": [
{
"sentence": "I visited the hospital"
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "I will go for dinner after I am done playing tennis"
},
"outputs": [
{
"sentence": "I went for dinner after I was did playing tennis"
}
]
},
{
"class": "TenseTransformation",
"args": {
"to_tense": "past"
},
"inputs": {
"sentence": "My father goes to gym every day"
},
"outputs": [
{
"sentence": "My father went to gym every day"
}
]
}
]
}
170 changes: 170 additions & 0 deletions transformations/tense/transformation.py
@@ -0,0 +1,170 @@
from interfaces.SentenceOperation import SentenceOperation
from tasks.TaskTypes import TaskType
import string
from pattern.en import conjugate, PAST, PRESENT, SINGULAR, PLURAL
import spacy
from spacy.symbols import NOUN
import random
from initialize import spacy_nlp

SUBJ_DEPS = {'agent', 'csubj', 'csubjpass', 'expl', 'nsubj', 'nsubjpass'}

def _get_conjuncts(tok):
"""
Return conjunct dependents of the leftmost conjunct in a coordinated phrase,
e.g. "Burton, [Dan], and [Josh] ...".
"""
return [right for right in tok.rights
if right.dep_ == 'conj']


def is_plural_noun(token):
"""
Returns True if token is a plural noun, False otherwise.
Args:
token (``spacy.Token``): parent document must have POS information
Returns:
bool
"""
if token.doc.is_tagged is False:
raise ValueError('token is not POS-tagged')
return True if token.pos == NOUN and token.lemma != token.lower else False


def get_subjects_of_verb(verb):
if verb.dep_ == "aux" and list(verb.ancestors):
return get_subjects_of_verb(list(verb.ancestors)[0])
"""Return all subjects of a verb according to the dependency parse."""
subjs = [tok for tok in verb.lefts if tok.dep_ in SUBJ_DEPS]
# get additional conjunct subjects
subjs.extend(tok for subj in subjs for tok in _get_conjuncts(subj))
if not len(subjs):
ancestors = list(verb.ancestors)
if len(ancestors) > 0:
return get_subjects_of_verb(ancestors[0])
return subjs


def is_plural_verb(token):
if token.doc.is_tagged is False:
raise ValueError('token is not POS-tagged')
subjects = get_subjects_of_verb(token)
if not len(subjects):
return False
plural_score = sum([is_plural_noun(x) for x in subjects])/len(subjects)

return plural_score > .5

def preserve_caps(word, newWord):
"""Returns newWord, capitalizing it if word is capitalized."""
if word[0] >= 'A' and word[0] <= 'Z':
newWord = newWord.capitalize()
return newWord

'''
change tense function borrowed from https://github.com/bendichter/tenseflow/blob/master/tenseflow/change_tense.py
'''

class TenseTransformation(SentenceOperation):
tasks = [
TaskType.TEXT_CLASSIFICATION,
TaskType.TEXT_TO_TEXT_GENERATION
]
languages = ["en"]

def __init__(self, to_tense):
super().__init__()
assert to_tense in ['past', 'present', 'future', 'random']
self.to_tense = to_tense
self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")

def change_tense(self, text, to_tense):
"""Change the tense of text.
Args:
text (str): text to change.
to_tense (str): 'present','past', or 'future'
npl (SpaCy model, optional):
Returns:
str: changed text.
"""
tense_lookup = {'future': 'inf', 'present': PRESENT, 'past': PAST}
tense = tense_lookup[to_tense]

doc = self.nlp(text)
print(doc[0], doc)
out = list()
out.append(doc[0].text)
words = []
for word in doc:
words.append(word)
if len(words) == 1:
continue
if (words[-2].text == 'will' and words[-2].tag_ == 'MD' and words[-1].tag_ == 'VB') or \
words[-1].tag_ in ('VBD', 'VBP', 'VBZ', 'VBN') or \
(not words[-2].text in ('to', 'not') and words[-1].tag_ == 'VB'):

if words[-2].text in ('were', 'am', 'is', 'are', 'was') or \
(words[-2].text == 'be' and len(words) > 2 and words[-3].text == 'will'):
this_tense = tense_lookup['past']
else:
this_tense = tense

subjects = [x.text for x in get_subjects_of_verb(words[-1])]
if ('I' in subjects) or ('we' in subjects) or ('We' in subjects):
person = 1
elif ('you' in subjects) or ('You' in subjects):
person = 2
else:
person = 3
if is_plural_verb(words[-1]):
number = PLURAL
else:
number = SINGULAR
if (words[-2].text == 'will' and words[-2].tag_ == 'MD') or words[-2].text == 'had':
out.pop(-1)
if to_tense == 'future':
if not (out[-1] == 'will' or out[-1] == 'be'):
out.append('will')
# handle will as a noun in future tense
if words[-2].text == 'will' and words[-2].tag_ == 'NN':
out.append('will')
oldWord = words[-1].text
out.append(preserve_caps(oldWord, conjugate(oldWord, tense=this_tense, person=person, number=number)))
else:
out.append(words[-1].text)

# negation
if words[-2].text + words[-1].text in ('didnot', 'donot', 'willnot', "didn't", "don't", "won't"):
if tense == PAST:
out[-2] = 'did'
elif tense == PRESENT:
out[-2] = 'do'
else:
out.pop(-2)

# future perfect
if words[-1].text in ('have', 'has') and len(list(words[-1].ancestors)) and words[-1].dep_ == 'aux':
out.pop(-1)

text_out = ' '.join(out)

# Remove spaces before/after punctuation:
for char in string.punctuation:
if char in """(<['""":
text_out = text_out.replace(char+' ', char)
else:
text_out = text_out.replace(' '+char, char)

for char in ["-", "“", "‘"]:
text_out = text_out.replace(char+' ', char)
for char in ["…", "”", "'s", "n't"]:
text_out = text_out.replace(' '+char, char)

return text_out

def generate(self, sentence: str):
"""
takes in a input sentence and transforms it's tense to the target tense
"""
perturbed_texts = self.change_tense(sentence, to_tense = random.choice(['past', 'present', 'future']) if self.to_tense == 'random' else self.to_tense)
return [perturbed_texts]

0 comments on commit 736fa20

Please sign in to comment.