# Demo1: Role Keywords Extraction (抽取角色关键词)

In [11]:
from keywords_extractor_w import KeywordsExtractor
KE = KeywordsExtractor(lang='en')

Language: English
Loading word vectors......


In [12]:
# load training dataset
import pandas as pd
data = pd.read_csv('data/negative2015.csv')
contents = list(data['Text'])
labels = list(data['Label'])

In [13]:
# extract keywords
kws_dict = KE.global_role_kws_extraction_one_line(contents, labels, output_dir='saved_keywords',name='negative2015')
kws_dict.keys()

tokenizing....


100%|██████████| 316/316 [00:00<00:00, 16662.48it/s]


computing Label-Similarity for label: neutral


100%|██████████| 157/157 [00:00<00:00, 545.47it/s]


computing Label-Similarity for label: negative


100%|██████████| 823/823 [00:02<00:00, 362.12it/s]


saved at saved_keywords/global_ls_dict_negative2015.pkl
counting #doc for each word...


100%|██████████| 316/316 [00:00<00:00, 19739.37it/s]


computing Label-Correlation for label: neutral


100%|██████████| 158/158 [00:00<00:00, 478829.50it/s]


computing Label-Correlation for label: negative


100%|██████████| 824/824 [00:00<00:00, 538586.02it/s]

saved at saved_keywords/global_lr_dict_negative2015.pkl
First level keys:  ['neutral', 'negative']
Second level keys:  ['lr', 'ls', 'ccw', 'scw', 'fcw', 'iw']
already saved at saved_keywords/global_kws_dict_negative2015.pkl





dict_keys(['global_ls', 'global_lr', 'global_roles'])

In [14]:
for key in kws_dict['global_roles']:
    print(f"keywords for \"{key}\":")
    for each in ['ccw','scw','fcw','iw']:
        print(f"{each}: {kws_dict['global_roles'][key][each][:10]}")

keywords for "neutral":
ccw: ['lowered', 'split', 'mediorce', 'beat', 'generally', 'casual', 'simple', 'settled', 'per', 'depending']
scw: ['rather', 'overpriced', 'price', 'back', 'best', 'away', 'would', 'one']
fcw: ['eaten', 'trip', 'companion', 'huge', 'village', 'son', 'waiter', 'pancakes', 'five', 'requests']
iw: ['restaurant', 'make', 'restaurants', 'worth']
keywords for "negative":
ccw: ['bad', 'unhelpful', 'unfriendly', 'horrible', 'terrible', 'wrong', 'lousy', 'worst', 'dissapointing', 'miserable']
scw: ['good', 'mediocre', 'kind', 'kanish', 'however', 'poor', 'prixe', 'okay', 'guess', 'expect']
fcw: ['filling', 'hampton', 'absurdly', 'penne', 'made', 'cake', 'selection', 'sauce', 'ingredients', 'gig']
iw: ['makes', 'dumplings', 'watering', 'downtown', 'fix', 'sashimi', 'list', 'italian', 'crammed', 'potato']


dict_keys(['lr', 'ls', 'ccw', 'scw', 'fcw', 'iw'])

# Demo2: Selective Text Augmentation (针对性文本增强)

In [55]:
from text_augmenter import TextAugmenter
TA = TextAugmenter(lang='en')

Language: English


在`TextAugmenter`类中，对删除、替换、插入、顺序互换等增强操作(operations)做了统一的接口:
- .aug_by_deletion(text, p, mode, selected_words)
- .aug_by_replacement(text, p, mode, selected_words)
- .aug_by_insertion(text, p, mode, selected_words)
- .aug_by_swap(text, p, mode, selected_words)
- .aug_by_selection(text, selected_words)

上述5中方法中，除了`aug_by_selection()`之外，其余方法均可通过设置`mode='random'`或者`mode='selective'`来决定使用“随机”增强还是“针对性”增强。

## 当使用随机增强时 (`mode='random'`):

In [56]:
contents[192]

'the last two times i ordered from here my food was soo spicy that i could barely eat it and the $t$ took away from the flavor of the dish'

In [57]:
sentence = "the last two times i ordered from here my food was soo spicy that i could barely eat it and the $t$ took away from the flavor of the dish"
p = 0.2
print(' '.join(TA.aug_by_deletion(text=sentence,p=p,mode='random')))
print(' '.join(TA.aug_by_replacement(text=sentence,p=p,mode='random')))
print(' '.join(TA.aug_by_insertion(text=sentence,p=p,mode='random')))
print(' '.join(TA.aug_by_swap(text=sentence,p=p,mode='random')))

the last two times ordered food was soo spicy that i could barely eat and the $ t $ took away from flavor dish
the last two times i Judge_Clifford_Cretan from here my nutritious_foods was soo garlicky that i could barely eat it and the $ t $ relinquished away from the tart_flavor of the roast_suckling_pig
pathetically the last two times i ordered from here my food was soo spicy that i could barely eat feebly million into it two and the $ t $ took away from the flavor outside of the dish
the last two times the ordered from of my food was that spicy and i could soo eat it barely i $ the $ took away from the flavor here t dish


## 当使用针对性增强时 (`mode='selective'`)
跟随机增强相比，针对性增强只需要指定对应的`selected_words`即可：

In [18]:
print(' '.join(TA.aug_by_deletion(text=sentence,p=p,mode='selective',selected_words=['Everything','excellent'])))
print(' '.join(TA.aug_by_replacement(text=sentence,p=p,mode='selective',selected_words=['service','excellent'])))
print(' '.join(TA.aug_by_insertion(text=sentence,p=p,mode='selective',selected_words=['service','excellent'])))
print(' '.join(TA.aug_by_swap(text=sentence,p=p,mode='selective',selected_words=['service','excellent'])))
print(' '.join(TA.aug_by_selection(text=sentence, selected_words=['Everything','cooked'])))

is always cooked to perfection , the service is excellent
Everything is always cooked to perfection , the service is unrivaled
Everything is always cooked to perfection , great the service is excellent
Everything is always cooked to perfection excellent the service is ,
Everything cooked


在文本分类任务中，不同的词可能会有不同的角色(roles)。在我们的论文中，我们提出如下规则：
- 对于 deletion/replacement 操作，应避开 gold words
- 对于 insertion 操作，应避开 venture words
- 对于 selection 操作，直接选取 gold words 和标点

In [58]:
# read saved keywords
import pickle
name = 'negative2015_w'
global_kws_dict_path = f'saved_keywords/global_kws_dict_{name}.pkl'
with open(global_kws_dict_path, 'rb') as f:
    global_kws_dict = pickle.load(f)

In [59]:
category = 'negative'
kws = global_kws_dict[category]
print(' '.join(TA.aug_by_deletion(sentence, p, 'selective', print_info=True,
                   selected_words=kws['scw']+kws['fcw']+kws['iw'])))  # except ccw
print(' '.join(TA.aug_by_replacement(sentence, p, 'selective', print_info=True,
                   selected_words=kws['scw']+kws['fcw'])))  # except ccw
print(' '.join(TA.aug_by_insertion(sentence, p, 'selective', print_info=True,
                   selected_words=kws['ccw']+kws['fcw'])))  # except fcw
print(' '.join(TA.aug_by_swap(sentence, p, 'selective', print_info=True,
                   selected_words=kws['iw'])))  

punc_list = [w for w in ',.，。!?！？;；、']
print(' '.join(TA.aug_by_selection(sentence, print_info=True,
                    selected_words=kws['ccw']+punc_list)))

TypeError: 'bool' object is not iterable