Merge pull request #135 from MukundVarmaT/tense

add tense transform
GEM-benchmark · Oct 2, 2021 · 736fa20 · 736fa20
2 parents 6a550b1 + eca93fd
commit 736fa20
Show file tree

Hide file tree

Showing 5 changed files with 348 additions and 0 deletions.
diff --git a/transformations/tense/README.md b/transformations/tense/README.md
@@ -0,0 +1,59 @@
+# Tense Tranformation 🦎  + ⌨️ → 🐍
+This transformation converts sentences from one tense to the other, example: simple present to simple past. 
+
+Author name: Tanay Dixit, Mukund Varma T
+
+## What type of a transformation is this?
+
+In this transformation, we convert a sentence into the target tense based on a verb, subject conjugation. 
+This ensures that the context of the given sentence remains the same while the attribute of time changes. 
+
+The following are some representative examples:
+
+    Input: My father goes to gym every day
+    Target Tense: past
+    Transformed Text: My father went to gym every day
+
+    Input: I went to the park
+    Target Tense: future
+    Transformed Text: I will go to the park
+
+    Input: I will go to the park.
+    Target Tense: present
+    Transformed Text: I go to the park.
+
+## What tasks does it intend to benefit?
+
+The task is designed to measure the capacity of language understanding in language models, specifically to understand the given tense of a sentence. 
+This task is nominally simple for humans, since we have an understanding of time / a sequence of events but is difficult for a language model as they do not have any prior information about time. 
+There have been a couple of attempts to perform controlled attribute text transformation (Logeswaran et. al) but is yet to be seen on language models trained in a general setting.  
+
+## Citations
+
+```bibtex
+@article{DBLP:journals/corr/abs-1811-01135,
+    author    = {Lajanugen Logeswaran and
+                Honglak Lee and
+                Samy Bengio},
+    title     = {Content preserving text generation with attribute controls},
+    journal   = {CoRR},
+    volume    = {abs/1811.01135},
+    year      = {2018},
+    url       = {http://arxiv.org/abs/1811.01135},
+    archivePrefix = {arXiv},
+    eprint    = {1811.01135},
+    timestamp = {Thu, 22 Nov 2018 17:58:30 +0100},
+    biburl    = {https://dblp.org/rec/journals/corr/abs-1811-01135.bib},
+    bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```
+### Data and Source Code
+change tense and verb infliction borrowed from https://github.com/bendichter/tenseflow
+
+## What are the limitations of this transformation?
+
+The transformation is not robust to all complex cases and is limited to only simple past/present/future tense conversions.
+Examples where it fails: <br>
+Input: I will go for dinner after I am done playing tennis.
+to_tense: past
+Output: I went for dinner after I was did playing tennis.
diff --git a/transformations/tense/__init__.py b/transformations/tense/__init__.py
@@ -0,0 +1 @@
+from .transformation import *
diff --git a/transformations/tense/requirements.txt b/transformations/tense/requirements.txt
@@ -0,0 +1 @@
+pattern @ git+https://github.com/tanay2001/pattern.git
diff --git a/transformations/tense/test.json b/transformations/tense/test.json
@@ -0,0 +1,117 @@
+{
+   "type": "tense_transformation",
+   "test_cases": [
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "I will go to the park."
+         },
+         "outputs": [
+            {
+               "sentence": "I went to the park."
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "It smells very delicious in the kitchen, what are you cooking?"
+         },
+         "outputs": [
+            {
+               "sentence": "It smelt very delicious in the kitchen, what were you cooking?"
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "I can come to the party"
+         },
+         "outputs": [
+            {
+               "sentence": "I can came to the party"
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "I will go to the park"
+         },
+         "outputs": [
+            {
+               "sentence": "I went to the park"
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "I go to the park."
+         },
+         "outputs": [
+            {
+               "sentence": "I went to the park."
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "I visit the hospital"
+         },
+         "outputs": [
+            {
+               "sentence": "I visited the hospital"
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "I will go for dinner after I am done playing tennis"
+         },
+         "outputs": [
+            {
+               "sentence": "I went for dinner after I was did playing tennis"
+            }
+         ]
+      },
+      {
+         "class": "TenseTransformation",
+         "args": {
+            "to_tense": "past"
+         },
+         "inputs": {
+            "sentence": "My father goes to gym every day"
+         },
+         "outputs": [
+            {
+               "sentence": "My father went to gym every day"
+            }
+         ]
+      }
+   ]
+}
diff --git a/transformations/tense/transformation.py b/transformations/tense/transformation.py
@@ -0,0 +1,170 @@
+from interfaces.SentenceOperation import SentenceOperation
+from tasks.TaskTypes import TaskType
+import string
+from pattern.en import conjugate, PAST, PRESENT, SINGULAR, PLURAL
+import spacy
+from spacy.symbols import NOUN
+import random
+from initialize import spacy_nlp
+
+SUBJ_DEPS = {'agent', 'csubj', 'csubjpass', 'expl', 'nsubj', 'nsubjpass'}
+
+def _get_conjuncts(tok):
+    """
+    Return conjunct dependents of the leftmost conjunct in a coordinated phrase,
+    e.g. "Burton, [Dan], and [Josh] ...".
+    """
+    return [right for right in tok.rights
+            if right.dep_ == 'conj']
+
+
+def is_plural_noun(token):
+    """
+    Returns True if token is a plural noun, False otherwise.
+    Args:
+        token (``spacy.Token``): parent document must have POS information
+    Returns:
+        bool
+    """
+    if token.doc.is_tagged is False:
+        raise ValueError('token is not POS-tagged')
+    return True if token.pos == NOUN and token.lemma != token.lower else False
+
+
+def get_subjects_of_verb(verb):
+    if verb.dep_ == "aux" and list(verb.ancestors):
+        return get_subjects_of_verb(list(verb.ancestors)[0])
+    """Return all subjects of a verb according to the dependency parse."""
+    subjs = [tok for tok in verb.lefts if tok.dep_ in SUBJ_DEPS]
+    # get additional conjunct subjects
+    subjs.extend(tok for subj in subjs for tok in _get_conjuncts(subj))
+    if not len(subjs):
+        ancestors = list(verb.ancestors)
+        if len(ancestors) > 0:
+            return get_subjects_of_verb(ancestors[0])
+    return subjs
+
+
+def is_plural_verb(token):
+    if token.doc.is_tagged is False:
+        raise ValueError('token is not POS-tagged')
+    subjects = get_subjects_of_verb(token)
+    if not len(subjects):
+        return False
+    plural_score = sum([is_plural_noun(x) for x in subjects])/len(subjects)
+
+    return plural_score > .5
+
+def preserve_caps(word, newWord):
+    """Returns newWord, capitalizing it if word is capitalized."""
+    if word[0] >= 'A' and word[0] <= 'Z':
+        newWord = newWord.capitalize()
+    return newWord
+
+'''
+change tense function borrowed from https://github.com/bendichter/tenseflow/blob/master/tenseflow/change_tense.py
+'''
+
+class TenseTransformation(SentenceOperation):
+    tasks = [
+        TaskType.TEXT_CLASSIFICATION,
+        TaskType.TEXT_TO_TEXT_GENERATION
+    ]
+    languages = ["en"]
+
+    def __init__(self, to_tense):
+        super().__init__()
+        assert to_tense in ['past', 'present', 'future', 'random']
+        self.to_tense = to_tense
+        self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")
+
+    def change_tense(self, text, to_tense):
+        """Change the tense of text.
+        Args:
+            text (str): text to change.
+            to_tense (str): 'present','past', or 'future'
+            npl (SpaCy model, optional):
+        Returns:
+            str: changed text.
+        """
+        tense_lookup = {'future': 'inf', 'present': PRESENT, 'past': PAST}
+        tense = tense_lookup[to_tense]
+
+        doc = self.nlp(text)
+        print(doc[0], doc)
+        out = list()
+        out.append(doc[0].text)
+        words = []
+        for word in doc:
+            words.append(word)
+            if len(words) == 1:
+                continue
+            if (words[-2].text == 'will' and words[-2].tag_ == 'MD' and words[-1].tag_ == 'VB') or \
+                words[-1].tag_ in ('VBD', 'VBP', 'VBZ', 'VBN') or \
+                (not words[-2].text in ('to', 'not') and words[-1].tag_ == 'VB'):
+
+                if words[-2].text in ('were', 'am', 'is', 'are', 'was') or \
+                    (words[-2].text == 'be' and len(words) > 2 and words[-3].text == 'will'):
+                    this_tense = tense_lookup['past']
+                else:
+                    this_tense = tense
+
+                subjects = [x.text for x in get_subjects_of_verb(words[-1])]
+                if ('I' in subjects) or ('we' in subjects) or ('We' in subjects):
+                    person = 1
+                elif ('you' in subjects) or ('You' in subjects):
+                    person = 2
+                else:
+                    person = 3
+                if is_plural_verb(words[-1]):
+                    number = PLURAL
+                else:
+                    number = SINGULAR
+                if (words[-2].text == 'will' and words[-2].tag_ == 'MD') or words[-2].text == 'had':
+                    out.pop(-1)
+                if to_tense == 'future':
+                    if not (out[-1] == 'will' or out[-1] == 'be'):
+                        out.append('will')
+                    # handle will as a noun in future tense
+                    if words[-2].text == 'will' and words[-2].tag_ == 'NN':
+                        out.append('will')
+                oldWord = words[-1].text
+                out.append(preserve_caps(oldWord, conjugate(oldWord, tense=this_tense, person=person, number=number)))
+            else:
+                out.append(words[-1].text)
+
+            # negation
+            if words[-2].text + words[-1].text in ('didnot', 'donot', 'willnot', "didn't", "don't", "won't"):
+                if tense == PAST:
+                    out[-2] = 'did'
+                elif tense == PRESENT:
+                    out[-2] = 'do'
+                else:
+                    out.pop(-2)
+
+            # future perfect
+            if words[-1].text in ('have', 'has') and len(list(words[-1].ancestors)) and words[-1].dep_ == 'aux':
+                out.pop(-1)
+
+        text_out = ' '.join(out)
+
+        # Remove spaces before/after punctuation:
+        for char in string.punctuation:
+            if char in """(<['""":
+                text_out = text_out.replace(char+' ', char)
+            else:
+                text_out = text_out.replace(' '+char, char)
+
+        for char in ["-", "“", "‘"]:
+            text_out = text_out.replace(char+' ', char)
+        for char in ["…", "”", "'s", "n't"]:
+            text_out = text_out.replace(' '+char, char)
+
+        return text_out
+
+    def generate(self, sentence: str): 
+        """
+        takes in a input sentence and transforms it's tense to the target tense
+        """
+        perturbed_texts = self.change_tense(sentence, to_tense = random.choice(['past', 'present', 'future']) if self.to_tense == 'random' else self.to_tense)
+        return [perturbed_texts]