### A) Setting up

In [1]:
import os
os.chdir('..')

In [8]:
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
import nlpaug.flow as nafc
from nlpaug.util import Action

In [13]:
# Download fasttext model, only run once
#from nlpaug.util.file.download import DownloadUtil
#DownloadUtil.download_fasttext(model_name = 'wiki-news-300d-1M', dest_dir = 'Models')

In [14]:
#import nltk
#nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\shaun\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


True

In [3]:
model_dir = 'Models/'

In [4]:
import pandas as pd
SSOC_2020 = pd.read_csv('Data/Processed/Training/train-aws/SSOC_2020.csv')

### B) Testing different types of augmentation

In [5]:
text = SSOC_2020['Description'][0]

In [6]:
print(text)

Legislator determines, formulates and directs government policies. He/she makes, ratifies, amends or repeals laws, public rules and regulations within a statutory or constitutional framework. . presiding over or participating in the proceedings of parliament. determining, formulating and directing government policies. making, ratifying, amending or repealing laws, public rules and regulations. investigating matters of concern to the public and promoting the interests of the constituencies which they represent. as members of the government, directing senior administrators and officials of government departments and statutory boards in the interpretation and implementation of government policies


#### 1. Using pretrained word embeddings (`fasttext`)

In [23]:
fasttext_aug = naw.WordEmbsAug(model_type = 'fasttext', 
                               model_path = model_dir + 'wiki-news-300d-1M.vec',
                               action = "substitute",
                               top_k = 5,
                               aug_p = 0.5,
                               aug_min = 10,
                               aug_max = None)

In [40]:
fasttext_augmented_text = aug.augment(text, num_thread = 4)
print(fasttext_augmented_text)

Politician determine, formulates which directs governement policies. His / --she creates, ratifies, amends either repeals laws, pubic rules. but regulations withing the statutory whether constitutional framework. . supreme nearly whether participating in to hearings the parliament. determining, formulating in directed govenment policies. make, approving, amending even repeals rules, pulic rule with regulations. investigating issues with interest to a public with promoting on interests in the electorate nevertheless they comprise. although memebers of the government, directing juniors admins and officals with govenment departments with statutory boards in a interpretation and implemenation with governemnt procedures


#### 2. Using back translation
Back translation means translating the whole text to another language and back to English.

In [36]:
back_translation_aug = naw.BackTranslationAug(from_model_name='facebook/wmt19-en-de', 
                                              to_model_name='facebook/wmt19-de-en',
                                              device = 'cuda',
                                              max_length = 2000)

In [39]:
backtransl_augmented_text = back_translation_aug.augment(text, num_thread = 4)
print(backtransl_augmented_text)

The legislator determines, formulates and directs government policy. He / she adopts, ratifies, amends or repeals laws, public rules and ordinances within a legal or constitutional framework. He / she presides or participates in the proceedings of parliament. He / she determines, formulates and directs government policy. He / she adopts, ratifies, amends or repeals laws, public rules and ordinances. He / she investigates matters concerning the public and promotes the interests of the constituencies he / she represents. As members of government, heads senior administrative officers and officials of government departments and legal bodies in the interpretation and implementation of government policy.


#### 3. Using synonyms

In [10]:
synonym_aug = naw.SynonymAug(aug_src = 'ppdb', 
                             model_path = model_dir + 'ppdb-2.0-tldr',
                             aug_p = 0.5,
                             aug_min = 10,
                             aug_max = None)

In [15]:
synonym_augmented_text = synonym_aug.augment(text, num_thread = 4)
print(synonym_augmented_text)

Legislator notes, formulae and directs campaigns directives. He / she enjoys, signs, modifications or repealing acts, public roles and companies inside a obligatory or constitutional foundations. . presiding over or aiding requests the factors of senators. charges, emanating and levelling territories entitled. decisions, concluding, pertaining or discontinuing bylaws, pubic enactments and authorities. checking ingredients of worry to the perceptions and consolidating the interests of the interviewees which they originated. whereas rep of the provinces, advancing high level owners and magistrates of stakeholders parties and regulatory directories during the reinterpretation and applicability of skills agreements


#### 4. Using contextual word embeddings

In [57]:
distilbert_aug = naw.ContextualWordEmbsAug(model_path = 'distilbert-base-uncased', 
                                           action = "substitute",
                                           top_k = 10,
                                           aug_p = 0.7,
                                           aug_min = 5,
                                           aug_max = None,
                                           device = 'cuda')

In [58]:
distilbert_augmented_text = distilbert_aug.augment(text, num_thread = 4)
print(distilbert_augmented_text)

he prepares, determines and manages constitutional policy. he / she establishes, ratifies, proposing or enforcing laws, establishing institutions and regulations within a legal or regulatory context.. sitting over or presiding in the sessions of legislatures. interpreting, modifying and interpreting public regulations. proposing, ratifying, enforcing or modifying laws, enforcing laws and regulations. maintaining laws of relevance to the constitution and maintaining the integrity of the parties whom they contest. as president of the legislature, appointing legislative committees and heads of public institutions and governing boards in the implementation and enforcement of laws ;


#### 5. Using sentence augmentation

In [29]:
sentence_aug = nas.ContextualWordEmbsForSentenceAug(model_path = 'distilgpt2',
                                                    min_length = 100,
                                                    max_length = 300,
                                                    top_k = 50,
                                                    top_p = .9,
                                                    device = 'cuda')

In [30]:
sentence_augmented_text = sentence_aug.augment(text, num_thread = 4)
print(sentence_augmented_text)

Using pad_token, but it is not set yet.


Legislator determines, formulates and directs government policies. He/she makes, ratifies, amends or repeals laws, public rules and regulations within a statutory or constitutional framework. . presiding over or participating in the proceedings of parliament. determining, formulating and directing government policies. making, ratifying, amending or repealing laws, public rules and regulations. investigating matters of concern to the public and promoting the interests of the constituencies which they represent. as members of the government, directing senior administrators and officials of government departments and statutory boards in the interpretation and implementation of government policies, and coordinating, or managing the functioning of government functions. , implementing, or directing legislation, public rules and regulations. investigating matters of concern to the public and promoting the interests of the constituencies which they represent. As members of the government, dire

#### 6. Using summarisation

In [41]:
summ_aug = nas.AbstSummAug(model_path = 't5-base', 
                           min_length = 50,
                           max_length = 100,
                           top_k = 20)

In [45]:
summ_augmented_text = summ_aug.augment(text, num_thread = 4)

In [46]:
summ_augmented_text

'Legislator determines, formulates and directs government policies . makes, ratifies, amends or repeals laws, public rules and regulations . investigating matters of concern to the public and promoting the interests of the constituencies which they represent .'

### C) Implementing text augmentation

