# CheckList - Tests' execution

## Initial install & imports

`conda install python=3.6`

`!pip install checklist`
`!pip install --upgrade checklist`

`!pip install -U spacy`
`!pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz`

`!pip install torch`

`tar xvzf release_data.tar.gz`
`import tarfile`
`tar = tarfile.open('checklist-master/release_data.tar.gz', "r:gz")`
`tar.extractall('checklist-master')`
`tar.close()`

In [1]:
import sys
import checklist
from checklist.test_suite import TestSuite
from checklist.viewer import *
from checklist.viewer.test_summarizer import TestSummarizer

## Uploading the suites and the predictions

Suites

In [4]:
suite_path = '/Users/Marta/opt/anaconda3/lib/python3.6/site-packages/checklist/release_data/sentiment/NLP_Tests.pkl'
suite = TestSuite.from_file(suite_path)

Predictions

In [5]:
pred_path = '/Users/Marta/opt/anaconda3/lib/python3.6/site-packages/checklist/release_data/sentiment/predictions/output_NLP_Tests.txt'

## Running the tests

### Customized suite (for English) 

In [6]:
suite.run_from_file(pred_path, overwrite=True)
suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'single positive word…

In [7]:
suite.summary()

Vocabulary

single positive words
Test cases:      54
Fails (rate):    0 (0.0%)


single negative words
Test cases:      134
Fails (rate):    97 (72.4%)

Example fails:
0.1 0.0 0.9 blame
----
0.3 0.0 0.7 gay
----
0.1 0.0 0.9 beanqueen
----


single neutral words
Test cases:      48
Fails (rate):    2 (4.2%)

Example fails:
0.8 0.0 0.2 muslim
----
0.5 0.0 0.5 atheist
----


sentiment-laden words in context
Test cases:      1200
Test cases run:  500
Fails (rate):    128 (25.6%)

Example fails:
0.1 0.0 0.9 She is a terrible witch.
----
0.0 0.0 1.0 You regret that witch.
----
0.0 0.0 1.0 I blame that witch.
----


neutral words in context
Test cases:      600
Test cases run:  500
Fails (rate):    0 (0.0%)


intensifiers for positive samples
Test cases:      400
Fails (rate):    2 (0.5%)

Example fails:
0.0 0.0 1.0 She is a caring woman.
0.2 0.0 0.8 She is an utterly caring woman.

----
0.0 0.0 1.0 She was a fun woman.
0.2 0.0 0.8 She was an utterly fun woman.

----


intensifiers and reduc

In [8]:
for item in suite.tests:
    print(item)

single positive words
single negative words
single neutral words
sentiment-laden words in context
neutral words in context
intensifiers for positive samples
intensifiers and reducers for negative samples
change neutral words with BERT
add positive phrases
add negative phrases
add random urls and handles
punctuation
typos
2 typos
contractions
change with English names
change with german names
change with vietnamese names
change with brazilian names
used to, but now
"used to" should reduce
"before" should reduce
simple negations: negative
simple negations: not negative
simple negations: not neutral is still neutral, i.e. not-hateful
simple negations (negative)
simple negations (neutral or positive)
simple negations (neutral)
hard negations: negative
hard negations: positive or neutral
negation of neutral
my opinion is what matters, not negative
my opinion is what matters, not positive
Q & A: yes, not negative
Q & A: yes, not positive
Q & A: yes (neutral)
M/F failure rates should be simil

In [9]:
stats = {}
for test in suite.tests:
    stats[test] = suite.tests[test].get_stats()

In [10]:
stats

{'single positive words': Munch({'testcases': 54, 'fails': 0, 'fail_rate': 0.0}),
 'single negative words': Munch({'testcases': 134, 'fails': 97, 'fail_rate': 72.38805970149254}),
 'single neutral words': Munch({'testcases': 48, 'fails': 2, 'fail_rate': 4.166666666666667}),
 'sentiment-laden words in context': Munch({'testcases': 1200, 'testcases_run': 500, 'fails': 128, 'fail_rate': 25.6}),
 'neutral words in context': Munch({'testcases': 600, 'testcases_run': 500, 'fails': 0, 'fail_rate': 0.0}),
 'intensifiers for positive samples': Munch({'testcases': 400, 'fails': 2, 'fail_rate': 0.5}),
 'intensifiers and reducers for negative samples': Munch({'testcases': 600, 'testcases_run': 500, 'after_filtering': 162, 'after_filtering_rate': 32.4, 'fails': 4, 'fail_rate': 2.4691358024691357}),
 'change neutral words with BERT': Munch({'testcases': 56, 'fails': 5, 'fail_rate': 8.928571428571429}),
 'add positive phrases': Munch({'testcases': 60, 'fails': 25, 'fail_rate': 41.666666666666664}),
 

In [11]:
import csv
with open('/Users/Marta/CheckList - FBK/results_suite_NLP_Tests.csv', 'w') as f:
    for key in stats.keys():
        f.write("%s,%s\n"%(key,stats[key]))