# **Weak Supervision in Analysis of News: Application to Economic Policy Uncertainty**
## **Authors**:Paul Trust, Rosane Minghim, Ahmed Zahran


# **Abstract**
The need for timely data for economic decisions has prompted  most economists and policy makers to search for supplementary sources of data. In that context, text data is being explored to enrich traditional economic data sources due to its abundance and ease to collect. Our work  focuses on studying the capability of textual data, in particular news pieces, for detecting and measuring economic policy uncertainty. Understanding economic policy uncertainty is of great importance to policy makers, economists and investors since it influences  their expectations about the future economic fundamentals with impact on their policy, investment and saving decisions. This research tackles the data bottleneck challenge that has hindered the adoption of machine learning in measuring economic policy uncertainty from text data. We test various approaches of classifying news pieces in regards to presenting economic uncertainty content. We propose a solution involving a weak supervision approach, which expresses domain knowledge and heuristics through labeling functions. These labeling functions are used to generate probabilistic labels that can be used for training an end model without need for human annotated data, after we generated a weak supervision based economic policy uncertainty index that we used to conduct extensive econometric analysis along with the Irish macroeconomic indicators to validate whether our generated index foreshadows weaker macroeconomic performance.
    

# Proposed Weak Supervision Framework
<img src="https://github.com/TrustPaul/data/raw/main/weakfinal22.png" width="500">

The figure above shows our proposed framework that integrates three key stages. The first stage leverages expert-defined labeling functions to automatically generate  a label matrix, in which each article is assigned a number of labels. The second stage includes an unsupervised generative model that  assigns every article an auto-generated noisy label by only observing the conflicts and correlations in the label matrix. In the third stage, a discriminative model is trained, in a supervised fashion using the generated labels to provide the final label. The following subsections present these stages in more details. 

# Installing the Neccessary Packages for the Project




In [None]:
!pip install transformers
!pip install contractions
!pip install simpletransformers
!pip install tensorboardX
!pip install snorkel
!pip install transformers
!pip install -U sentence-transformers
#!pip uninstall scikit-learn
!pip install scikit-learn==0.22
!pip install pip install spherecluster
!pip install "rubrix[server]==0.14.0" "transformers[torch]" datasets
!pip install -U sentence-transformers
!python -m pip install git+https://github.com/autonlab/weasel#egg=weasel[all]
!pip install flyingsquid
!pip install pgmpy
#To proceed without issues, you must restart the run-time after installing these models for those using google colab
!pip install scikit-learn==0.22
!pip install pip install spherecluster

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting weasel[all]
  Cloning https://github.com/autonlab/weasel to /tmp/pip-install-1i_pje1n/weasel_0824f920051b4bcfa9e6749c27f9fd4c
  Running command git clone -q https://github.com/autonlab/weasel /tmp/pip-install-1i_pje1n/weasel_0824f920051b4bcfa9e6749c27f9fd4c
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


#Importing the Neccessary Libraries



In [None]:
import numpy as np
import pandas as pd  
import logging
import string
import torch
import re
import scipy
import sklearn
from sentence_transformers import SentenceTransformer, util
from snorkel.labeling import labeling_function
from snorkel.labeling import PandasLFApplier
from snorkel.labeling import LabelingFunction
from snorkel.preprocess import preprocessor
from textblob import TextBlob
from snorkel.labeling.model import MajorityLabelVoter
from snorkel.labeling.model import LabelModel
from sklearn.metrics import accuracy_score
from sentence_transformers import SentenceTransformer,util
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
from sklearn.model_selection import train_test_split 
from transformers import LongformerTokenizer, TFLongformerForSequenceClassification,AutoTokenizer
from simpletransformers.classification import ClassificationModel
from transformers import BertModel,BertTokenizer
from torch.utils.data import Dataset,DataLoader
from simpletransformers.classification import ClassificationModel
import nltk
import re
nltk.download('stopwords')
from nltk.corpus import stopwords  
from bs4 import BeautifulSoup    
import contractions
stop_words = stopwords.words("english")
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
from nltk.corpus import wordnet as wn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import model_selection, naive_bayes, svm
from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Flatten, LSTM, Conv1D, MaxPooling1D, Dropout, Activation, Bidirectional
from keras.layers.embeddings import Embedding
import numpy as np
import os
from weasel.models.downstream_models.transformers import Transformers
from weasel.models import Weasel
from transformers import AutoTokenizer
from weasel.datamodules.transformers_datamodule import TransformersDataModule, TransformersCollator
# tokenizer for our transformers end model
tokenizer = AutoTokenizer.from_pretrained("google/electra-small-discriminator")
from spherecluster import SphericalKMeans, VonMisesFisherMixture, sample_vMF
import csv

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Data Preprocessing

# Import Data
Replace with your own data files<br>
We are using a sample(only 30%) from from the USA Newspapers Annotated Dataset<br>
<br>
This can be obtained from [EPU Website](https://www.policyuncertainty.com/)

In [None]:
import pandas as pd
train_data = pd.read_excel('https://github.com/TrustPaul/data/raw/main/usa_train.xlsx')
test_data = pd.read_csv('https://github.com/TrustPaul/epu_data/raw/main/test_usa.csv')
validation_data = pd.read_csv('https://github.com/TrustPaul/epu_data/raw/main/validation_usa.csv')
keywords = pd.read_excel('https://github.com/TrustPaul/data/raw/main/keywords_data.xlsx')


# Data Cleaning of the News articles


In [None]:
def clean_data(text):
    text = text.translate(string.punctuation)
    text = text.lower().split()
    stop_words = set(stopwords.words("english"))
    text = [w for w in text if not w in stop_words]
    text = " ".join(text)
    text = re.sub(r"[^A-Za-z0-9^,!.\/'+-=]", " ", text)
    text = re.sub(r"\'s", " ", text)
    text = re.sub(r",", " ", text)
    text = re.sub(r"\.", " ", text)
    text = re.sub(r"!", " ! ", text)
    text = re.sub(r"\/", " ", text)
    text = re.sub(r"\^", " ^ ", text)
    text = re.sub(r"\+", " ", text)
    text = re.sub(r"\-", " ", text)
    text = re.sub(r"\:", " ", text)
    text = re.sub(r"\;", " ", text)
    text = re.sub(r"\=", " ", text)
    text = re.sub(r"'", " ", text)

   
    text = BeautifulSoup(text, "html.parser").text 

    text = contractions.fix(text)

    return text

In [None]:
clean_train = train_data['article\n'].map(lambda x: clean_data(x))
clean_test = test_data ['article\n'].map(lambda x: clean_data(x))
clean_validation = validation_data['article\n'].map(lambda x: clean_data(x))

In [None]:
train_df = pd.DataFrame({
    'text': clean_train.astype(str), 
    'label':train_data['EPU']
})


eval_df = pd.DataFrame({
   'text': clean_validation.astype(str), 
   'label':validation_data['EPU']
})


test_df = pd.DataFrame({
    'text': clean_test.astype(str), 
    'label':test_data['EPU']
})

# Label Outcomes of News Article
In our case we formulate the problem as a binary classification Problem with:
 - **1(PRESENCE)**: Indicates the Article Describes Policy Uncertanity
 - **0(ABSCENCE)**: The Article doesnot describe Policy Uncertaniy
 - **-1(ABSTAIN)**: The Labeling Function Abstains from Labeling the News Article

In [None]:

ABSTAIN=0
PRESENCE=1
ABSCENCE=-1

# Writing Labeling Functions

This Section Registers the Functions that were used to Construct Labeling Functions:<br>
The labeling Functions included the following
- Keywords
- Patterns
- Sentiment of News Articles
- Semantic Similarity of News Articles and the Keywords


      

#Keywords

# Keywords
We used the Keywords that are are hypothesised to indicate that the article decribes Policy Uncertanity if it contains them.<br>

In [None]:
policy_words = keywords['policy_words'].tolist()
uncertain_events = keywords['known_uncertain_events'].tolist()
economy_words = keywords['econony_related_words'].tolist()
uncertain_keywords = keywords['uncertain'].tolist()

In [None]:
policy_words = [x for x in policy_words if str(x) != 'nan']
uncertain_events = [x for x in uncertain_events if str(x) != 'nan']
economy_words = [x for x in economy_words if str(x) != 'nan']
uncertain_keywords = [x for x in uncertain_keywords if str(x) != 'nan']
keyword_query = policy_words + uncertain_events + economy_words + uncertain_keywords

In [None]:
#List of Economic Keywords
policy = ["legislation", 'deficit', 'legislation','congress','white house','federal reserve','the fed','regulations','regulatory','deficits','congressional',' legislative',' legislature']
uncertain = ["uncertain",'uncertanity','Uncertain','Uncertanity','UNCERTAIN','UNCERTANITY','unclear','unsure','uncertainties','turmoil','confusion','worries']
economy = ['economic','economy','economics','growth','economies','financial',' recession','slowdown']
known_events = ['lehman brothers','brexit','trump elections','greece debt','debt crisis','covid','9/11','bombing', 'great depression','crisis']
known_uncertain_words = ['doubt','fall','unclear','fall','pressure','dropped','future','pessimistic']

In [None]:
def keyword_lookup(x, keywords, label):
    if any(word in x.text.lower() for word in keywords):
        return label
    return ABSTAIN


def make_keyword_lf(keywords, label=PRESENCE):
    return LabelingFunction(
        name=f"keyword_{keywords[0]}",
        f=keyword_lookup,
        resources=dict(keywords=keywords, label=label),
    )

uncertain_lf = make_keyword_lf(keywords=uncertain_keywords)
policy_lf = make_keyword_lf(keywords=policy_words)
economy_lf = make_keyword_lf(keywords=economy_words)
known_events_lf = make_keyword_lf(keywords=uncertain_events)

# Patterns
We are also search the coocuurance of some keywords and some other keywors<br>
The following just demonstrates one labeling functions that can be revised according to the user's needs


In [None]:
@labeling_function(resources=dict(uncertain=uncertain_keywords, policy=policy_words))
def pattern(x, uncertain,policy):
    if len(set(uncertain).intersection(set(x.text.split()))) > 0 and len(set(policy).intersection(set(x.text.split()))) > 0:
        return PRESENCE
    else:
      return ABSTAIN


# Sentiment Polarity
We also hypotheised that News articles describing POlicy Uncertanity are more likely to have a negative sentiment polarity. <br>
This is our own hypothesis and is not neccesary supported by an economic theories<br>
The sentiment polarity is a hyper parameter that can be adjusted according to the Policy Analyst<br>

In [None]:
pipe=pipeline('sentiment-analysis',model='google/reformer-crime-and-punishment')
@preprocessor(memoize=True)
def textblob_sentiment(x):
    scores = pipe(x.text)
    x.polarity=scores [0]['score']
    return x

Downloading:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.5M [00:00<?, ?B/s]

You might want to disable causal masking for sequence classification
Some weights of the model checkpoint at google/reformer-crime-and-punishment were not used when initializing ReformerForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.decoder.bias', 'lm_head.bias']
- This IS expected if you are initializing ReformerForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ReformerForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ReformerForSequenceClassification were not initialized from the model checkpoint at google/reformer-crime-and-punishment and are newly initialized: ['classifier.out_proj.bias', 'reformer.encoder.layers.5

Downloading:   0%|          | 0.00/236k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/316k [00:00<?, ?B/s]

In [None]:
@labeling_function(pre=[textblob_sentiment])
def textblob_polarity(x):
    return ABSCENCE if x.polarity <=0.9 else ABSTAIN

# Semantic Similarity
We computed the semantic similarity of the News articles with the Policy Keywords to identify the most related articles in the continous vector space<br>
This was achieved by first representing both the keywords and the News artcicles with SBERT word embeding<br>
We then computed the cosine similarity between the embedings of the keywords and the embeding of the news artcles to generate the labeling functions

In [None]:
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
@preprocessor(memoize=True)
def cosine(x):
    query = keyword_query
    query_embeddings = model.encode(query)
    document_embedings = model.encode(x)
    cosine_score = util.cos_sim(query_embeddings,document_embedings)
    x.cosine_score1 = torch.sum(cosine_score)/len(query)
    return x

Downloading:   0%|          | 0.00/690 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.69k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/314 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
@labeling_function(pre=[cosine])
def cosine_similarity(x):
    return ABSCENCE if x.cosine_score1 <=0.08 else ABSTAIN

# Information Retrieval
We also used a labeling function that retrieves articles that satifiy a certain threshold on the BM25 score

In [None]:
import math
class BM25:
    """
    Best Match 25.

    Parameters
    ----------
    k1 : float, default 1.5

    b : float, default 0.75

    Attributes
    ----------
    tf_ : list[dict[str, int]]
        Term Frequency per document. So [{'hi': 1}] means
        the first document contains the term 'hi' 1 time.

    df_ : dict[str, int]
        Document Frequency per term. i.e. Number of documents in the
        corpus that contains the term.

    idf_ : dict[str, float]
        Inverse Document Frequency per term.

    doc_len_ : list[int]
        Number of terms per document. So [3] means the first
        document contains 3 terms.

    corpus_ : list[list[str]]
        The input corpus.

    corpus_size_ : int
        Number of documents in the corpus.

    avg_doc_len_ : float
        Average number of terms for documents in the corpus.
    """

    def __init__(self, k1=1.5, b=0.75):
        self.b = b
        self.k1 = k1

    def fit(self, corpus):
        """
        Fit the various statistics that are required to calculate BM25 ranking
        score using the corpus given.

        Parameters
        ----------
        corpus : list[list[str]]
            Each element in the list represents a document, and each document
            is a list of the terms.

        Returns
        -------
        self
        """
        tf = []
        df = {}
        idf = {}
        doc_len = []
        corpus_size = 0
        for document in corpus:
            corpus_size += 1
            doc_len.append(len(document))

            # compute tf (term frequency) per document
            frequencies = {}
            for term in document:
                term_count = frequencies.get(term, 0) + 1
                frequencies[term] = term_count

            tf.append(frequencies)

            # compute df (document frequency) per term
            for term, _ in frequencies.items():
                df_count = df.get(term, 0) + 1
                df[term] = df_count

        for term, freq in df.items():
            idf[term] = math.log(1 + (corpus_size - freq + 0.5) / (freq + 0.5))

        self.tf_ = tf
        self.df_ = df
        self.idf_ = idf
        self.doc_len_ = doc_len
        self.corpus_ = corpus
        self.corpus_size_ = corpus_size
        self.avg_doc_len_ = sum(doc_len) / corpus_size
        return self

    def search(self, query):
        scores = [self._score(query, index) for index in range(self.corpus_size_)]
        return scores

    def _score(self, query, index):
        score = 0.0

        doc_len = self.doc_len_[index]
        frequencies = self.tf_[index]
        for term in query:
            if term not in frequencies:
                continue

            freq = frequencies[term]
            numerator = self.idf_[term] * freq * (self.k1 + 1)
            denominator = freq + self.k1 * (1 - self.b + self.b * doc_len / self.avg_doc_len_)
            score += (numerator / denominator)

        return score

In [None]:
@preprocessor(memoize=True)
def bm25score(x):
    query = keyword_query
    bm25 = BM25()
    bm25.fit(x.text)
    doc_scores = bm25.search(query)
    x.bm25_score = np.average(doc_scores)
    return  x

@labeling_function(pre=[bm25score])
def scorebm25(x):
    return ABSCENCE if    x.bm25_score <=0.1 else ABSTAIN

# Generative Model
Multiple noisy sources provides by labeling functions are combined to form a label matrix<br>
A generative model is the used to find a joint label for from the label matrix by modeling the correlation, conflicts as factor<br>
We dont have access to the ground truth and supervision only comes from this generatiev modeling



##Snorkel and Majority Vote

In [None]:
m = 8
lfs=[uncertain_lf,policy_lf,economy_lf,known_events_lf,pattern,textblob_polarity,cosine_similarity,scorebm25]

In [None]:
applier = PandasLFApplier(lfs=lfs)
L_train =applier.apply(df=train_df)
L_test = applier.apply(df=test_df)

  query_key_dots = torch.where(mask, query_key_dots, mask_value)
100%|██████████| 2450/2450 [08:40<00:00,  4.70it/s]
100%|██████████| 817/817 [02:30<00:00,  5.43it/s]


In [None]:
from snorkel.labeling import LFAnalysis
LFAnalysis(L=L_train,lfs=lfs).lf_summary()

Unnamed: 0,j,Polarity,Coverage,Overlaps,Conflicts
keyword_uncertain,0,"[0, 1]",1.0,1.0,0.997143
keyword_regulation,1,"[0, 1]",1.0,1.0,0.997143
keyword_economy,2,"[0, 1]",1.0,1.0,0.997143
keyword_gulf wars,3,"[0, 1]",1.0,1.0,0.997143
pattern,4,"[0, 1]",1.0,1.0,0.997143
textblob_polarity,5,[],0.0,0.0,0.0
cosine_similarity,6,[0],1.0,1.0,0.997143
scorebm25,7,[],0.0,0.0,0.0


In [None]:
majority_model = MajorityLabelVoter()
preds_train = majority_model.predict(L=L_train)

##Labeling with Snorkel 
[paper and code](https://www.snorkel.org/resources/)

In [None]:
label_model = LabelModel(cardinality=2,verbose=True)
label_model.fit(L_train=L_train,n_epochs=10,log_freq=100)

INFO:root:Computing O...
INFO:root:Estimating \mu...
  0%|          | 0/10 [00:00<?, ?epoch/s]INFO:root:[0 epochs]: TRAIN:[loss=18.322]
100%|██████████| 10/10 [00:00<00:00, 478.20epoch/s]
INFO:root:Finished Training


#Labeling with Flying Squid
[Check out the paper and code](https://github.com/HazyResearch/flyingsquid)

In [None]:
from flyingsquid.label_model import LabelModel
label_model_flyingsquid = LabelModel(m)
label_model_flyingsquid.fit(L_train)



##Labeling with WeSUL
[Paper and code](https://github.com/autonlab/weasel)

In [None]:
from weasel.models.downstream_models.transformers import Transformers

# instantiate our transformers end model
end_model = Transformers("google/electra-small-discriminator", num_labels=2)

INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaugvubke
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaugvubke/_remote_module_non_sriptable.py


Downloading:   0%|          | 0.00/51.7M [00:00<?, ?B/s]

Some weights of the model checkpoint at google/electra-small-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense.weight', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at google/electra-small-discriminator and are newly initialized: ['classifier

In [None]:
from weasel.models import Weasel

# instantiate our weasel end-to-end model
weasel = Weasel(
    end_model=end_model,
    num_LFs=m,
    n_classes=2,
    encoder={'hidden_dims': [32, 10]},
    optim_encoder={'name': 'adam', 'lr': 1e-4},
    optim_end_model={'name': 'adam', 'lr': 5e-5},
)

  f"Attribute {k!r} is an instance of `nn.Module` and is already saved during checkpointing."


In [None]:
from transformers import AutoTokenizer
from weasel.datamodules.transformers_datamodule import TransformersDataModule, TransformersCollator

# tokenizer for our transformers end model
tokenizer = AutoTokenizer.from_pretrained("google/electra-small-discriminator")

# tokenize train and test data
X_train = [
    tokenizer(rec, truncation=True)
    for rec in train_df['text']
]
X_test = [
    tokenizer(rec, truncation=True)
    for rec in test_df['text']
]
Y_test = np.array(test_df['label'])

In [None]:
datamodule = TransformersDataModule(
    label_matrix=L_train,
    X_train=X_train,
    collator=TransformersCollator(tokenizer),
    X_test=X_test,
    Y_test=Y_test,
    batch_size=8
)

In [None]:
import pytorch_lightning as pl

# instantiate the pytorch-lightning trainer
trainer = pl.Trainer(
    gpus=1,  # >= 1 to use GPU(s)
    max_epochs=20,
    logger=None,
    callbacks=[pl.callbacks.ModelCheckpoint(monitor="Val/accuracy", mode="max")]
)

# fit the model end-to-end
trainer.fit(
    model=weasel,
    datamodule=datamodule,
)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
  "The `LightningModule.get_progress_bar_dict` method was deprecated in v1.5 and will be removed in v1.7."
INFO:weasel.datamodules.base_datamodule:Data split sizes for training, validation, testing: 2450, 81, 736
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type         | Params
-----------------------------------------------
0 | end_model     | Transformers | 13.5 M
1 | encoder       | MLPEncoder   | 878   
2 | accuracy_func | Softmax      | 0     
-----------------------------------------------
13.6 M    Trainable params
0         Non-trainable params
13.6 M    Total params
54.201    Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]



Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

#Evaluation

In [None]:
majority_acc = majority_model.score(L=L_test, Y=test_df['label'], tie_break_policy="random")[
    "accuracy"
]
print(f"{'Majority Vote Accuracy:':<25} {majority_acc * 100:.1f}%")

label_model_acc = label_model.score(L=L_test, Y=test_df['label'], tie_break_policy="random")[
    "accuracy"
]
print(f"{'Label Model Accuracy:':<25} {label_model_acc * 100:.1f}%")

preds = label_model_flyingsquid.predict(L_test).reshape(test_df['label'].shape)
accuracy = np.sum(preds == test_df['label']) / test_df['label'].shape[0]

print('Flying Squid accuracy: {}%'.format(int(100 * accuracy)))

Majority Vote Accuracy:   50.6%


AttributeError: ignored

In [None]:
preds = label_model_flyingsquid.predict(L_test).reshape(test_df['label'].shape)
accuracy = np.sum(preds == test_df['label']) / test_df['label'].shape[0]

print('Label model accuracy: {}%'.format(int(100 * accuracy)))

Label model accuracy: 37%


# Discriminative Model
We use the generated noisy labels to train a discriminative model using a noisy aware objective function<br>
In our case we shall use our generated labels to train RoBERTa model

In [None]:
preds_train = label_model.predict(L_train)

In [None]:
train_noisy = pd.DataFrame({
    'text':train_df['text'],
    'label':preds_train
})

In [None]:
model = ClassificationModel('roberta', 'roberta-base', 
                                    args={'fp16': False, 
                                          'evaluate_during_training': False, 
                                          'num_train_epochs':4, 
                                          'train_batch_size': 8, 
                                          'eval_batch_size': 8, 
                                          'max_seq_length': 200, 
                                          'learning_rate': 4e-5, 
                                          'output_dir': 'outputs/roberta/'
                                    })

Downloading:   0%|          | 0.00/481 [00:00<?, ?B/s]

ValueError: ignored

In [None]:
model.train_model(train_noisy)

In [None]:
result_roberta_train, model_outputs_train_rob, wrong_predictions_train_rob = model.eval_model(test_df, 
                                                                    verbose=True,
                                                                    train_acc_roberta = sklearn.metrics.accuracy_score,
                                                                    train_f1_score_roberta = sklearn.metrics.f1_score,
                                                                    train_roc_auc_roberta  = sklearn.metrics.roc_curve
                                                                    )

# Baselines
We also did experiments with models trained with human annotated data<br>
This defers from the weak supervision approach in that with weak supervision, there were no annotated data required to to train models but rather domain heuristics that were generetaed and can be adjusted by domain experts<br>
The implementations of these methods were largely obtained from the following website [Link](https://humboldt-wi.github.io/blog/research/information_systems_1920/uncertainty_identification_transformers/)


Support Vector Machines

In [None]:
Tfidf_vect = TfidfVectorizer(max_features=200)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(train_df['text'].values.astype('U'))
Tfidf_vect.fit(train_df['text'].values.astype('U'))

In [None]:
Train_X_Tfidf = Tfidf_vect.transform(train_df['text'].values.astype('U'))
Test_X_Tfidf = Tfidf_vect.transform(test_df['text'].values.astype('U'))

In [None]:
SVM = svm.SVC(C=1.0, kernel='linear', degree=1, gamma='auto', probability=True)
SVM.fit(Train_X_Tfidf,y_train)

In [None]:
predictions_SVM = SVM.predict(Test_X_Tfidf)
metrics.precision_score(predictions_SVM,test_df['labels'])

LSTM

In [None]:
vocabulary_size = 500
tokenizer = Tokenizer(num_words= vocabulary_size)
tokenizer.fit_on_texts(train_df['text'].astype(str))
sequences = tokenizer.texts_to_sequences(train_df['text'].astype(str))
data = pad_sequences(sequences, maxlen=200)
test_sequences= tokenizer.texts_to_sequences(test_df['text'].astype(str))
test_data_LSTM = pad_sequences(test_sequences, maxlen=200

In [None]:
from keras import backend as K

def recall_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def precision_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc',f1_m,precision_m, recall_m])
model.fit(data, train_df['labels'], validation_data=(test_data_LSTM, test_df['labels']), epochs=10, batch_size=128

In [None]:
## Network architecture
model = Sequential()
model.add(Embedding(5000, 128, input_length=200))
model.add(Bidirectional(LSTM(16)))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc',f1_m,precision_m, recall_m])

In [None]:
# evaluate the model
loss, accuracy, f1_score, precision, recall = model.evaluate(test_data_LSTM, test_df['labels'], verbose=0)

Transformers

In [None]:
model = ClassificationModel('roberta', 'roberta-base', 
                                    args={'fp16': False, 
                                          'evaluate_during_training': False, 
                                          'num_train_epochs':4, 
                                          'train_batch_size': 8, 
                                          'eval_batch_size': 8, 
                                          'max_seq_length': 200, 
                                          'learning_rate': 4e-5, 
                                          'output_dir': 'outputs/roberta/', 
                                          'overwrite_output_dir': True,)

In [None]:
model.train_model(train_df)

In [None]:
result_roberta_train, model_outputs_train_rob, wrong_predictions_train_rob = model.eval_model(test_df, 
                                                                    verbose=True,
                                                                    train_acc_roberta = sklearn.metrics.accuracy_score,
                                                                    train_f1_score_roberta = sklearn.metrics.f1_score,
                                                                    train_roc_auc_roberta  = sklearn.metrics.roc_curve
                                                                    )


# **Conclusions**


In this paper, we presented and evaluated the results  of applying neural models on automatic economic policy uncertainty detection from news articles. Both USA and Ireland news articles were employed using weak supervision and extensive labels. We find that even though state of art methods trained with many labels outperform weak supervision in some cases, the gap in performance is small and the trade off can be accommodated in most economic applications. With the weak supervision set up presented here, we aim at timely results for policy decisions, compared to spending hundreds of hours on data annotation. Our results show that weak supervision can play a significant role in applying ML methods in measuring policy uncertainty from text, with much higher precision of current policy, which is based in counting words from a query to construct EPU indices. For future work, we intend to explore complementing weak supervision with a small set of carefully selected annotated examples through active learning or data subset selection as well as working on strategies for labeling functions and multi-label classification in regards to different types of policy uncertainty.

