## Word Sense Disambiguation

For Word Sense Disambiguation task, ``textflint`` provides ``WSDSample`` to load and utilize data from **SensEval** and **SemEval**. Here is an example of how to initialize a ``WSDSample``.

Tips:
We would not support 'WSD' task until version 0.0.6 update.

In [2]:
from textflint.input import WSDSample

data={
"sentence":["Your", "Oct.", "6", "editorial", "``", "The", "Ill",
            "Homeless", "``", "referred", "to",
            "research", "by", "us", "and", "six", "of", "our",
            "colleagues", "that", "was", "reported", "in",
            "the", "Sept.", "8", "issue", "of", "the", "Journal", "of",
            "the", "American", "Medical",
            "Association", "."],
            
"pos":["PRON", "NOUN", "NUM", "NOUN", ".", "DET", "NOUN", "NOUN", ".", "VERB",
       "X", "NOUN", "ADP", "PRON", "CONJ", "NUM", "ADP", "PRON", "NOUN", "ADP",
       "VERB", "VERB", "ADP", "DET", "NOUN", "NUM", "NOUN", "ADP", "DET",
       "NOUN", "ADP", "DET", "ADJ", "ADJ", "NOUN", "."],
       
"lemma":["Your", "Oct.", "6", "editorial", "``", "The", "Ill",
         "Homeless", "``", "refer", "to", "research", "by", "us",
         "and", "six", "of", "our", "colleague", "that", "be",
         "report", "in", "the", "Sept.", "8", "issue", "of", "the",
         "Journal", "of", "the", "American", "Medical",
         "Association", "."],
         
"instance":[["d000.s000.t000", 9, 10, "referred", "refer%2:32:01::"],
            ["d000.s000.t001", 11, 12, "research",
             "research%1:04:00::"],
            ["d000.s000.t002", 21, 22, "reported",
             "report%2:32:04::"]],
             
"source": "semeval2007"
}
wsd_sample=WSDSample(data)

ModuleNotFoundError: No module named 'textflint.input.component.sample.wsd_sample'

-``sentence``: a list of words

-``pos``: a list of pos tags for the words

-``lemma``: a list of lemma tags for the words

-`` instance``: a list of instances in the sentence, each of which is recorded by its id, start position, end position, word itself and sense key

-``source``: dataset name (semeval2007,senseval2,senseval3,semeval2013,semeval2015,ALL)

### Transformations for WSD
Build-in Transformations of ``textflint`` can be divided into universal transformations and task-specific transformations. 
As for **universal transformations**, there are a few objects that may cause the WSD examples to be meaningless ones, like ``Prejudice`` and ``BackTrans``. Therefore, we recommend you to adopt only the followed universal transformations in WSD task.

* **InsertAdv**: Transforms an input by adding adverb word before verb
* **AppendIrr**: Extend sentences by irrelevanting sentences
* **WordCase**: Transform an input to upper and lower case or capitalize case.
* **Contraction**: Contraction replaces phrases like `will not` and `he has` with contracted forms, namely, `won’t` and `he’s`
* **Keyboard**: Keyboard turn to the way how people type words and change tokens into mistaken ones with errors caused by the use of keyboard, like `word → worf` and `ambiguous → amviguius`.
* **SwapNamedEnt**: Transforms an input by replacing the named entities in it.
* **SwapNum**: Transforms an input by replacing the numbers in it.
* **Ocr**: Transformation that simulate ocr error by random values.
* **Punctuation**: Transforms input by adding punctuation at the end of sentence.
* **ReverseNeg**: Transforms input by adding or deleting negative words in the sentence.
* **SpellingError**: Transformation that leverage pre-defined spelling mistake dictionary to simulate spelling mistake.
* **Tense**: Transforms all verb tenses in sentence.
* **TwitterType**: Transforms input by common abbreviations in TwitterType.
* **Typos**: Randomly inserts, deletes, swaps or replaces a single letter within one word (Ireland → Irland).
* **SwapSynWordEmbedding**: Transforms an input by replacing its words by Glove.
* **SwapSynWordNet**: Transforms an input by replacing its words with synonyms provided by WordNet.
* **SwapAntWordNet**: Transforms an input by Reverse gender or place names in sentences.

In addition to transformations, ``textflint`` also provides ``SubPopulations`` to verify the robustness of nlp models, and you can find a quick-start tutorial in this [website](https://github.com/textflint/textflint/blob/master/docs/source/user/2_SubPopulation.ipynb). Task-specific ``SubPopulations`` are still in progress.
For WSD, we implement 1 task-specific transformations, including ``SwapTarget_syn``. **This transformation transforms input by replacing target words with its synset in wordnet.**

In the next kernel, we will show you how to perform ``Transformations`` and ``SubPopulations`` .

1. Initialize the **Engine** and **Config** of ``textflint``.
2. Set the ``Transformations`` and ``SubPopulations`` you want to adopt.
3. Set the config for ``Transformations`` and ``SubPopulations`` so that **Engine would generate different flint objects with different parameters**. 
4. Feed the data and **Config** to the **Engine** then run it.

### Conduct robustness experiment on WSD task

In [3]:
import os
from textflint.engine import Engine
from textflint.adapter import auto_config
from textflint.input.config import Config
from textflint.common.utils.install import download_if_needed

engine = Engine()
# initialize the config for WSD task
config = auto_config(task='WSD')
# transformation methods
config.trans_methods = ['SwapTarget', 'WordCase']
# parameters for transformations
config.trans_config = {
    'WordCase': [
        {"case_type": "upper"}
    ],
    'SwapTarget': [
        {"case_type": "syn"}
    ]
}
# subpopulation methods
config.sub_methods = ['LengthSubPopulation', 'PrejudiceSubPopulation']
# parameters for subpopulation
config.sub_config = {
    "LengthSubPopulation": [
        {"intervals": ["0%", "20%"]},
        {"intervals": ["80%", "100%"]}
    ],
    "PrejudiceSubPopulation": [
        {"mode": "man"},
        {"mode": "woman"}
    ]
}
# the dir to save the transformed data
config.out_dir = './output'
# We have already pushed test datasets to our server, you can download them through the code of next line.
engine.run(os.path.normcase(download_if_needed('TEST/WSD.json')), config=config)

KeyError: 'WSD'

The transformed data would be stored in config.out_dir in a json format, and we can use the data to evaluate the WSD models. Textflint offers you a simple way to analyze and visualize the experimental results.

### Give robustness report on WSD task

In [None]:
# ori represents the results of original dataset while trans indicates the transformed one.
evaluate_json = {
    "model_name": "BEM",
    "dataset_name": "SensEval2",
    "transformation": {
        "SwapTarget_syn": {
            "ori_f1": 0.8085,
            "trans_f1": 0.7727,
            "size": 242,
        },
        "WordCase_upper": {
            "ori_f1": 0.8098,
            "trans_f1": 0.7831,
            "size": 242,
        }
    }

}

In [None]:
from textflint.adapter import auto_report_generator
report_generator = auto_report_generator()
report_generator.plot(evaluate_json)