# Camunda
Camunda appears to be a very useful tool to automate many processes for businesses. Being able to have an easy to use tool that manages these processes is a huge advantage. 

Implementing Camunda with Java and Spring should not be too difficult as Camunda is designed to work with Java and the Spring Framework. Once Camunda has been implemented, users can get started with the Spring Bean to use Camunda to automate businesses and workloads. There are plenty of tutorials online that provide users with a step by step guide to implement Camunda and one can always refer to the documents for further details. 

### Reading the Data

In [1]:
# import libaries
import slate3k
from nltk.tokenize import sent_tokenize
import nltk
nltk.download('punkt')
import re

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\diggy\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:
# Read PDF

FILE = '../data/LINCOLN_CONTRACT.pdf'

with open(FILE, mode='rb') as f:
    reader = slate3k.PDF(f)

## Preprocessing

In [3]:
# Tokenise the PDF and extracting text
token_doc = []
for page in reader:
    token_doc.append(sent_tokenize(page))

In [4]:
# Strip newlines from each sentence
i = 0
for page in token_doc:
    token_doc[i] = [sentence.replace('\n','') for sentence in page]
    i += 1

## Finding Terms and Conditions

In [112]:
# list of words/numbers that could mean terms and conditions
word_list = ["term", "terms", "condition", "conditions", "means", "must", "will", "has to", "required to"
            "acknowledge", "shall have", "entitled", "liable", "obligation", "olibgated"]
#r_match =  r"(?:\d\.)+"
# Match (a) (b) etc
r_match = r"\([a-k]\)"

In [113]:
def match_words(sentence):
    """
    Find potential words in a sentence that may refer to the terms and conditions
    
    sentence: A string from a page
    
    returns None if no word in a word list shows up in the sentence, otherwise, returns the sentence
    """
    sentence_copy = sentence.lower()
    for word in word_list:
        if word in sentence_copy:
            return sentence
    return None

In [119]:
# Find sentences that could refer to the terms and conditions.
terms_cond_list = []
for page in enumerate(token_doc):
    for sentence in page[1]:
        if ((match_words(sentence) != None) and (sentence.isupper() == False)) or (re.search(r_match,sentence) 
                                                                                 and (sentence.isupper() == False)):
            terms_cond_list.append(sentence)

In [120]:
# Checking the sentences to see if they look like terms and conditions.
for i in terms_cond_list:
    print(i)
    print()

The Builder and the Owner have previously entered into a major domestic building contract  that is in substantially the same terms for the carrying out of work in relation to the same home  or land; orThe Owner received independent advice from an Australian legal practitioner concerning the  Contract before entering the Contract.

SUSPENSION OF WORKS TERMINATION BY BUILDER TERMINATION BY OWNER DUE TO BUILDER’S DEFAULT SECTION 41 TERMINATION TERMINATION BY EITHER PARTY DUE TO BANKRUPTCY OR LIQUIDATION COMPLETION OF THE WORKS LIQUIDATED DAMAGES DEFECTS LIABILITY PERIOD MAINTENANCE OF THE WORKS DISPUTE RESOLUTION NOTICES GST COPYRIGHT MISCELLANEOUS CHECKLIST FOR THE OWNER INSTRUMENT OF AGREEMENT DEED OF GUARANTEE AND INDEMNITY SCHEDULE 1 – SPECIAL CONDITIONS SCHEDULE 2 – NOTICE OF COMMENCEMENT DATE SCHEDULE 3 – EXTENSION OF TIME NOTICE SCHEDULE 4 – REQUEST FOR VARIATION SCHEDULE 5 – AGENT’S AUTHORITY FORM SCHEDULE 6 – NOTICE OF COMPLETION PRE-INSPECTION SCHEDULE 7 – LIST OF DEFECTS SCHEDU

In [121]:
# Write output to text file
NEW_FILE = '../data/terms_conditions.txt'
with open(NEW_FILE, 'w', encoding="utf-8") as f:
    for sentence in terms_cond_list:
        f.write(sentence + "\n")
        f.write("\n")