# Part 2: Norm Identification

In this notebook, we want to identify potential norms (powers and obligations) from our filtered dataset.

We do this heuristically. We store any sentences containing keywords (shall, must, may, will, obligated, required).

The result is a large set of potential norms from our dataset. This needs to be inspected manually to filter out false positives.

## Load the filtered set of contracts

In [1]:
import pickle
import re
import random

In [2]:
ROOT_PATH = '/content/drive/MyDrive/Masters/Thesis/contracts/rq3_actual'

INPUT_FILE = 'selected_docs.pickle'

OUTPUT_FILE = '2_output_tagged_norms.csv'

In [3]:
# Load contract subset
with open(f'{ROOT_PATH}/{INPUT_FILE}', 'rb') as f:
    c_docs = pickle.load(f)

print(len(c_docs))

109


## Extract the potential norms

In [4]:
def contains_norm(sent):
    keywords = ['shall', 'must', 'will', 'may', 'can', 'obligated', 'required']
    for k in keywords:
        if k in sent.text:
            return True
    
    return False


In [5]:
def extract_norms(c_doc, k):
    results = []

    for x in c_doc.sents:
        if contains_norm(x):
            next_res = (k, x.text)
            results.append(next_res)
    return results



In [6]:
def get_all_norms(c_docs):
    results = []
    for k in c_docs:
        c_doc = c_docs[k]
        next_res = extract_norms(c_doc, k)
        results.extend(next_res)

    return results


In [8]:
all_norms = get_all_norms(c_docs)

len(all_norms)

2736

## Add refinement heuristics

Here we want to perform searches against each sentence to check for keywords that may indicate certain refinements

In [None]:

before = ['before', 'prior', 'earlier', 'advance', 'ahead', 'by', 'until']

after = ['after', 'following', 'later']

interval = ['between', 'during', 'from', 'for']

cond_t = ['when', 'whenever']

cond_a = ['if', 'once', 'upon', 'provided', 'in the event', 'in case']

unless = ['unless', 'without', 'except']

keyword_dict = {
    'before': before,
    'after': after,
    'interval': interval,
    'cond_t': cond_t,
    'cond_a': cond_a,
    'unless': unless
}

In [None]:
next_res = []

for norm in all_norms:
    norm_keys = []
    for k in keyword_dict:
        keyword_list = keyword_dict[k]
        for w in keyword_list:
            if w in norm[1].lower():
                norm_keys.append(k)
                continue
    
    next_norm = (norm[0], norm[1], ','.join(norm_keys))
    next_res.append(next_norm)

In [None]:
len([x for x in next_res if x[2] == ''])

410

## Save

In [None]:
random.shuffle(next_res)

In [None]:
import csv

filename = f'{ROOT_PATH}/{OUTPUT_FILE}'

with open(filename,'w') as out:
    csv_out=csv.writer(out)
    csv_out.writerows(next_res)