Author: Ana Carmela Salazar

Filename: parser_demo.ipynb

Description: Data cleaning and classification thru pandas

# Independent Study on the Applicability of Heuristics and Natural Language Processing Techniques to Classify and Categorize Statutes of Law

## Part 1. Application of Simple Heuristics

In [1]:
import pandas as pd
import re

## Legal Corpora

Assuming this is the scraped data of Book IV, New Civil Code of the Philippines.

For this independent study, Articles 1200, 1306, 1318, 1319, and 1320 were selected.

### Assumptions

Generally, the general rule is written in a positive tone followed by an exception.

However, there are times that the statute itself is written in a negative tone.

An article usually contains a general rule, qualifiers, and several exceptions. Keywords such as 'except', 'with the exception that' and 'unless' signposts an exception while words like 'additionally' and 'provided' signposts a qualifier.

Some statutes are also written in several paragraphs. They may be in the form of enumeration of requisites like in Art. 1318, or they could pertain to different yet related topics like in Art. 1319.

#### Article 1200. The right of choice belongs to the debtor, unless it has been expressly granted to the creditor.

The parser should be able to identify the general rule from the exception.


#### Article 1306. The contracting parties may establish such stipulations, clauses, terms and conditions as they may deem convenient, provided they are not contrary to law, morals, good customs, public order, or public policy. (1255a)

The parser should be able to identify the qualifying conditions in order for the statute to apply. The word "provided" may be used as a qualifier or an exception depending on its usage.


#### Article 1318. There is no contract unless the following requisites concur:

#####    (1) Consent of the contracting parties;

#####    (2) Object certain which is the subject matter of the contract;

#####    (3) Cause of the obligation which is established. (1261)

The parser should be able to recognize the enumerated phrases as part of the article. However, this is an example of a statute made in a negative tone. The outcome may become erroneous such that it would accept "there is no contract" as a general rule.


#### Article 1319. Consent is manifested by the meeting of the offer and the acceptance upon the thing and the cause which are to constitute the contract. The offer must be certain and the acceptance absolute. A qualified acceptance constitutes a counter-offer.

#### Acceptance made by letter or telegram does not bind the offerer except from the time it came to his knowledge. The contract, in such a case, is presumed to have been entered into in the place where the offer was made. (1262a)

Here is an example of a statute that has numerous subjects. In this example, Consent and Acceptance were discussed in two paragraphs.


#### Article 1320. An acceptance may be express or implied. (n)

Here is an example of a statute that does not have any exceptions. It talks about two kinds of acceptance, express or implied.

In [2]:
legal_text = '''
Art. 1200. The right of choice belongs to the debtor, unless it has been expressly granted to the creditor.

Art. 1306. The contracting parties may establish such stipulations, clauses, terms and conditions as they may deem convenient, provided they are not contrary to law, morals, good customs, public order, or public policy. (1255a)

Art. 1318. There is no contract unless the following requisites concur:

    (1) Consent of the contracting parties;

    (2) Object certain which is the subject matter of the contract;

    (3) Cause of the obligation which is established. (1261)
    
Art. 1319. Consent is manifested by the meeting of the offer and the acceptance upon the thing and the cause which are to constitute the contract. The offer must be certain and the acceptance absolute. A qualified acceptance constitutes a counter-offer.

Acceptance made by letter or telegram does not bind the offerer except from the time it came to his knowledge. The contract, in such a case, is presumed to have been entered into in the place where the offer was made. (1262a)

Art. 1320. An acceptance may be express or implied. (n)
'''

#### The raw version:

In [3]:
legal_text

'\nArt. 1200. The right of choice belongs to the debtor, unless it has been expressly granted to the creditor.\n\nArt. 1306. The contracting parties may establish such stipulations, clauses, terms and conditions as they may deem convenient, provided they are not contrary to law, morals, good customs, public order, or public policy. (1255a)\n\nArt. 1318. There is no contract unless the following requisites concur:\n\n    (1) Consent of the contracting parties;\n\n    (2) Object certain which is the subject matter of the contract;\n\n    (3) Cause of the obligation which is established. (1261)\n    \nArt. 1319. Consent is manifested by the meeting of the offer and the acceptance upon the thing and the cause which are to constitute the contract. The offer must be certain and the acceptance absolute. A qualified acceptance constitutes a counter-offer.\n\nAcceptance made by letter or telegram does not bind the offerer except from the time it came to his knowledge. The contract, in such a ca

## Data Cleaning

### Transforming raw text into a DataFrame

By removing the newline characters, we can create a list for the Articles and transform them into rows in a DataFrame

In [4]:
articles = legal_text.replace("\n", "")
articles = re.split("Art. ", articles)
articles = [x for x in articles if x]
data_list = {"Article" : articles}
df = pd.DataFrame(data_list)

The data produced:

In [5]:
df

Unnamed: 0,Article
0,1200. The right of choice belongs to the debto...
1,1306. The contracting parties may establish su...
2,1318. There is no contract unless the followin...
3,1319. Consent is manifested by the meeting of ...
4,1320. An acceptance may be express or implied....


### Removing the Article number and isolating the Statutory text

For redundancy, Article column will be copied into Statute column, where split using regular expression (regex) will be applied to remove the article number.

In [6]:
df["Statute"] = df["Article"]
df["Statute"] = df["Statute"].apply(lambda x: "".join(re.split("\d{1,6}. ", x, 1)).strip())

The column produced:

In [7]:
df["Statute"]

0    The right of choice belongs to the debtor, unl...
1    The contracting parties may establish such sti...
2    There is no contract unless the following requ...
3    Consent is manifested by the meeting of the of...
4         An acceptance may be express or implied. (n)
Name: Statute, dtype: object

### Note

As mentioned earlier, the goal is to check if a simple heuristic may be able to correctly classify phrases or group of words into general rule, exception, qualifiers (or qualifying conditions), and tone (positive or negative).

In [8]:
exception_keywords = [
    'except',
    'but',
    'unless',
    'however',
    'with the exception',
    'provided'
]

qualifier_keywords = [
    'in addition',
    'additionally',
    'provided',
]

In [9]:
df["General_Rule"] = df["Statute"]
df["General_Rule"] = df["General_Rule"].str.lower()
df["General_Rule"]

0    the right of choice belongs to the debtor, unl...
1    the contracting parties may establish such sti...
2    there is no contract unless the following requ...
3    consent is manifested by the meeting of the of...
4         an acceptance may be express or implied. (n)
Name: General_Rule, dtype: object

In [10]:
def categorize_rule(statute):
    
    
    result = -1
    k = ""
    for x in exception_keywords:
        result = statute.find(x)
        if result != -1:
            k = x
    
    if k != "":
        return pd.Series([str(statute.split(k)[0]).strip(), str(statute.split(k)[1]).strip()])
    
    else:
        return pd.Series([str(statute).strip(), ""])

In [11]:
df[["General_Rule","Exception"]] = df["General_Rule"].apply(categorize_rule)

In [12]:
df

Unnamed: 0,Article,Statute,General_Rule,Exception
0,1200. The right of choice belongs to the debto...,"The right of choice belongs to the debtor, unl...","the right of choice belongs to the debtor,",it has been expressly granted to the creditor.
1,1306. The contracting parties may establish su...,The contracting parties may establish such sti...,the contracting parties may establish such sti...,"they are not contrary to law, morals, good cus..."
2,1318. There is no contract unless the followin...,There is no contract unless the following requ...,there is no contract,the following requisites concur: (1) consen...
3,1319. Consent is manifested by the meeting of ...,Consent is manifested by the meeting of the of...,consent is manifested by the meeting of the of...,from the time it came to his knowledge. the co...
4,1320. An acceptance may be express or implied....,An acceptance may be express or implied. (n),an acceptance may be express or implied. (n),


## Notes

For Art. 1200, the simple algorithm works because it follows this sentence structure: [General Rule], ["keyword"] [Exception]. See output below.

In [13]:
df["General_Rule"][0] # Article 1200 General Rule

'the right of choice belongs to the debtor,'

In [14]:
df["Exception"][0] # Article 1200 Exception

'it has been expressly granted to the creditor.'

For Art. 1306, the algorithm works also. However, the word "provided" is present in both `exception_keywords` and `qualifier_keywords` which might introduce problems in identifying whether the next phrase/s or sentence/s are part of the `exceptions` or qualifying conditions to the `general rule`.

In [15]:
df["General_Rule"][1] # Article 1306 General Rule

'the contracting parties may establish such stipulations, clauses, terms and conditions as they may deem convenient,'

In [16]:
df["Exception"][1] # Article 1306 Exception

'they are not contrary to law, morals, good customs, public order, or public policy. (1255a)'

For Art. 1318, the algorithm erroneously included the phrase "there is no contract" as part of the `general_rule`, as expected. The phrase "there is no contract" does not make sense as a general rule. It also erroneously classified the requisites as part of the `exception`.

In [17]:
df["General_Rule"][2] # Article 1318 General Rule

'there is no contract'

In [18]:
df["Exception"][2] # Article 1318 Exception

'the following requisites concur:    (1) consent of the contracting parties;    (2) object certain which is the subject matter of the contract;    (3) cause of the obligation which is established. (1261)'

For Art. 1319, the algorithm identified the first instance of "provided" in the second paragraph. It should be noted that after data cleaning, paragraphs 1 and 2 were merged.

In [19]:
df["General_Rule"][3] # Article 1319 General Rule

'consent is manifested by the meeting of the offer and the acceptance upon the thing and the cause which are to constitute the contract. the offer must be certain and the acceptance absolute. a qualified acceptance constitutes a counter-offer.acceptance made by letter or telegram does not bind the offerer'

In [20]:
df["Exception"][3] # Article 1319 Exception

'from the time it came to his knowledge. the contract, in such a case, is presumed to have been entered into in the place where the offer was made. (1262a)'

Lastly, the general rule for Art. 1320 was correctly identified in absence of exception keywords in `exception_keywords`.

In [21]:
df["General_Rule"][4] # Article 1320 General Rule

'an acceptance may be express or implied. (n)'

In [22]:
df["Exception"][4] # Article 1320 Exception

''

## Way Forward


Research on the following:

- ### How to recognize the tone of the individual articles of law?

 - This is important to be able to apply parsers uniformly.
 
- ### Is it possible to restructure sentences made in the negative tone into a positive tone?

 - This should not replace the original but a way to improve pattern recognition.
 
- ### How to recognize the usage of the word "provided" in the context of adjacent phrases and sentences?

 - Could the placing of the word "provided" affect the context?
 
- ### The numbers enclosed in parentheses

 - The numbers enclosed in paretheses points to the old Spanish Civil Code where it was originally adopted.