Skip to content

Commit

Permalink
Merge pull request #80 from tanfiona/main
Browse files Browse the repository at this point in the history
Add NegationStrengthen
  • Loading branch information
kaustubhdhole committed Aug 2, 2021
2 parents 9c7ab91 + a1836e5 commit 1275179
Show file tree
Hide file tree
Showing 5 changed files with 623 additions and 0 deletions.
47 changes: 47 additions & 0 deletions transformations/negate_strengthen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Causal Negation & Strengthen 🦎 + ⌨️ → 🐍
This transformation is targetted at augmenting Causal Relations in text and adapts the code from paper ['Causal Augmentation for Causal Sentence Classification'](https://openreview.net/pdf/17eafef9e25b48eb90a9a7f32c4f52e21177cc73.pdf) (Anon, 2021). In a nutshell, we have two operations:
1. **Causal Negation:** We introduce negative words like "not, no, did not" into sentences to unlink the causal relation.
2. **Causal Strengthening:** We strengthen the causal meaning by converting weaker modal words like "may" to "will" to assert causal strength.

Users have the option to amend causal meaning automatically from root word of sentence, or by explicitly highlighting the index of the word they wish to amend. Additionally, we include WordNet synonyms and tense matching to allow for more natural augmentations.

##### Example Negation:
```
"TyG is effective to identify individuals at risk for NAFLD." | "Direct Causal"
--> "TyG is ineffective to identify individuals at risk for NAFLD." | "No Relationship"
```

##### Example Strengthen:
```
"Moreover, TT genotype may reduce the risk of CAD in diabetic patients." | "Conditional Causal"
--> "Moreover, TT genotype will reduce the risk of CAD in diabetic patients." | "Direct Causal"
```

Original test sentences are based on corpus [AltLex (Hidey et al, 2016)](https://github.com/chridey/altlex) and [PubMed by (Yu et al, 2019)](https://github.com/junwang4/causal-language-use-in-science). More expected examples and output grouped by grammar method is available in the Appendix of the [aforementioned code paper](https://openreview.net/pdf/17eafef9e25b48eb90a9a7f32c4f52e21177cc73.pdf).

Note: This augment may work for general relations too (shown below), but is not properly investigated. Longer sentences might result in unnatural edits.
```
"She is related to John" | "Direct Relation"
--> "She is not related to John." | "No Relationship"
```

In summary, the current available transformation of targets are
* "Direct Causal" -> "No Relationship"
* "Conditional Causal" -> "Direct Causal"
* "Direct Relation" -> "No Relationship"

**Author name:** Fiona Anting Tan <br>
**Author email:** tan.f@u.nus.edu <br>
**Author Affiliation:** Institute of Data Science, National University of Singapore

## What type of a transformation is this?
This transformation acts like a perturbation to test robustness. Root word or signal word of a sentence is highlighted for negation or strengthening via insert/replace operations Generated transformations display high similarity to the source sentences i.e. the code outputs highly precise generations.

## What tasks does it intend to benefit?
This perturbation would benefit tasks which have a cause-effect sentence/paragraph/document as input like text classification, text generation, etc. In the main paper, the authors report improved performance and generalisability to out-of-domain contexts across models.

## Previous Work
This work is predominantly inspired by the aforementioned paper, ['Causal Augmentation for Causal Sentence Classification'](https://openreview.net/pdf/17eafef9e25b48eb90a9a7f32c4f52e21177cc73.pdf) (Anon, 2021), which is related to previous work ['Evaluating Models' Local Decision Boundaries via Contrast Sets'](https://arxiv.org/abs/2004.02709) by Gardner et al., 2020.

## What are the limitations of this transformation?
The transformation's outputs do not apply for all sentence structures as they are based on grammar rules and pattern matching. Outputs are also not linguistically diverse like paraphrasers.
1 change: 1 addition & 0 deletions transformations/negate_strengthen/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .transformation import *
3 changes: 3 additions & 0 deletions transformations/negate_strengthen/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pattern3==3.0.0
nltk==3.6.2
pandas
122 changes: 122 additions & 0 deletions transformations/negate_strengthen/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
{
"type": "NegateStrengthen",
"test_cases": [
{
"class": "NegateStrengthen",
"inputs": {
"sentence": {
"sentence": "1200 Japanese Canadians were sent to Greenwood because of the Japanese Canadian internment in 1942 .",
"signal_id": 7
},
"target": "Direct Causal"
},
"outputs": [{
"sentence": "1200 Japanese Canadians were sent to Greenwood not because of the Japanese Canadian internment in 1942 .",
"target": "No Relationship"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": {
"sentence": "The extra light made the journey home easier and safer in the absence of street lighting .",
"signal_id": 7
},
"target": "Direct Causal"
},
"outputs": [{
"sentence": "The extra light made the journey home not easier nor safer in the absence of street lighting .",
"target": "No Relationship"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "TyG is effective to identify individuals at risk for NAFLD.",
"target": "Direct Causal"
},
"outputs": [{
"sentence": "TyG is ineffective to identify individuals at risk for NAFLD.",
"target": "No Relationship"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "Moreover, TT genotype may reduce the risk of CAD in diabetic patients.",
"target": "Conditional Causal"
},
"outputs": [{
"sentence": "Moreover, TT genotype will reduce the risk of CAD in diabetic patients.",
"target": "Direct Causal"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "Moreover, TT genotype will reduce the risk of CAD in diabetic patients.",
"target": "Direct Causal"
},
"outputs": [{
"sentence": "Moreover, TT genotype will not reduce the risk of CAD in diabetic patients.",
"target": "No Relationship"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "Collectively, these findings indicate that energy-matched high intensity and moderate intensity exercise are effective at decreasing IHL and NAFLD risk.",
"target": "Direct Causal"
},
"outputs": [{
"sentence": "Collectively, these findings did not indicate that energy-matched high intensity and moderate intensity exercise are effective at decreasing IHL and NAFLD risk.",
"target": "No Relationship"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "The rs7044343 polymorphism could be involved in regulating the production of IL-33.",
"target": "Conditional Causal"
},
"outputs": [{
"sentence": "The rs7044343 polymorphism was involved in regulating the production of IL-33.",
"target": "Direct Causal"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "Thus, mammographic density should possibly influence adjuvant therapy decisions in the future.",
"target": "Conditional Causal"
},
"outputs": [{
"sentence": "Thus, mammographic density would influence adjuvant therapy decisions in the future.",
"target": "Direct Causal"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "hsTnI may have a role in personalizing preventive strategies in patients with diabetes mellitus based on risk.",
"target": "Conditional Causal"
},
"outputs": [{
"sentence": "hsTnI had a role in personalizing preventive strategies in patients with diabetes mellitus based on risk.",
"target": "Direct Causal"
}]
},
{
"class": "NegateStrengthen",
"inputs": {
"sentence": "She is related to John.",
"target": "Direct Relation"
},
"outputs": [{
"sentence": "She is not related to John.",
"target": "No Relationship"
}]
}
]
}

0 comments on commit 1275179

Please sign in to comment.