# Legal Text Classification with Blackstone

### Author: Tristan Koh, NUS Law Year 2
### GitHub: https://github.com/TristanKoh

This Jupyter notebook demonstrates the uses of Blackstone on Singaporean case law. By doing so, I hope to encourage others to get their hands dirty with basic programming and data science, especially law students that are interested in legal technology. Even for students whose interest lies in the law of technology rather than technology of law, I personally believe that one cannot simply discuss "technology" in the abstract when formulating legal rules that govern such technology.

At the same time, I empathise with those who may be apprehensive of programming / coding, as I was one and a half years ago. Hence, through this notebook, I aim to explain each step in the code as simply as possible, to demonstrate that one does not need to be particularly talented to self-learn programming.



## About Blackstone

Blackstone is a Python package that uses Natural Language Processing (NLP) techniques to detect linguistic features in case law and classify the text into 5 categories.

The five categories are:

AXIOM - The text appears to postulate a well-established principle

CONCLUSION - The text appears to make a finding, holding, determination or conclusion

ISSUE - The text appears to discuss an issue or question

LEGAL_TEST - The test appears to discuss a legal test

UNCAT - The text does not fall into one of the four categories above

## How Blackstone fits into the broader data science context

As computers cannot understand text as humans do, NLP packages like Blackstone provide a set of utilities that allows us to create a mathematical model of the text, such that computers are able to process natural language. We call these models "text representations". The simplest of text representations (not used in Blackstone) is the Bag-of-Words representation. It represents the text as a count of words in the document.

As you may imagine, such a representation loses significant semantic meaning, as it ignores word order and relative frequency of words in the text. This means that commonly used but less meaningful words like "can" and "one" have higher weightage in the model than more meaningful words like "technology" and "programming". 

Therefore, there are other, more complicated text representations that retain more semantic meaning in the text, such as as a Tf-IDF representation (which is essentially a weighted count of words) and word embeddings.

Blackstone uses the latter model. For brevity, this article (https://machinelearningmastery.com/what-are-word-embeddings/) better explains how word embeddings work much better than I can, so I shall not go further into the details here.

To demonstrate the usage and performance of Blackstone, I used the seminal Singaporean tort case of Spandeck v DSTA Agency (2007) 4 SLR(R) 100.

The rest of this jupyter notebook documents the text cleaning process and the prediction of the above legal categories.

## 1. Importing relevant packages and the data file

Before we begin, certain packages need to be imported such that it would allow neater and more efficient management of the data. Packages are basically code written in Python that provide specific functionality that are not present in Python itself.

The key packages that are used for this notebook are:

1. Pandas - Enables the structuring of data in a tabular format, with rows and columns (similar to an Excel spreadsheet).

2. Blackstone - As mentioned, a NLP package.

3. Path - Auxilliary package that creates the relative file path to the file that contains case that we are going to test Blackstone on.

In [10]:
# This code checks the location of the working directory, affects the definition of dataFilePath as defined below for the import of the spandeck file
# I have left the code commented since it only needs to be used for checking the file path before starting the rest of the project
# import os
# os.getcwd()

In [11]:
from pathlib import Path
import pandas as pd

# DatafilePath is a string that contains the relative file path to the data file
dataFilePath = Path("..", "data", "spandeck.txt")

# Import the spandeck case as text
spandeck = open(dataFilePath, "r", encoding= "utf8")

# .readlines() returns a stream (ie. the text is not saved in memory), hence we save it as a string called "text" which is saved in memory
text = spandeck.readlines()

In [12]:
# This code loads the blackstone NLP model, and saves it into the object called NLP
import blackstone
import en_blackstone_proto
nlp = en_blackstone_proto.load()

## 2. Text pre-processing

As the text as extracted directly from the pdf file is not "clean" (ie. contains formatting and other characters that do not carry any semantic meaning), we will first need to pre-process the text to remove these unessential characters.

There are various packages that come with pre-written functions that can be used for general situations, but here the text only contains line and tab breaks, and hence I have decided to define my own function that removes such text formatting.

### What is a function?

A function in programming is similar to mathematical functions; there is an input and an output, and a bunch of pre-defined steps are applied onto the input.

Apart from pre-defined functions (such as the "print" function), we can also define our own functions. We do so for our convenience, because we can reuse the same lines of code defined within the function later on just by calling the function name.

### Function for text pre-processing

This function replaces tabs and new lines and appends the sentences together to form a single string.

In [13]:
# Text preprocessing
def text_preprocessing(text):
    """ Accepts a list of unprocessed strings, returns a list of strings without string and tab breaks and empty strings """
    
    # This creates an empty list
    processed_text = []

    # This is a for loop; it iterates through the strings in the text, and performs some operations on each string. Hence the name for loop: "For" each string, apply X operations on the string.

    # In this case, for each string, we replace new lines ("\n") with an empty string, and replace tabs ("\t") with a space.
    for string in text:
        string = string.replace("\n", "")
        string = string.replace("\t", " ")
        processed_text.append(string)
    
    # This is a list comprehension; a more concise way of expressing a for loop.
    # We iterate through each string in the processed text, and we only retain strings which are not empty strings (since empty strings are meaningless in this context)

    processed_text = [string for string in processed_text if string != ""]

    return processed_text

# Run the function on the string
text = text_preprocessing(text)

### Function to split the text into individual strings

Since blackstone predicts at a sentence level (ie. we cannot use the entire case as one string as an input to blackstone), this function splits the text into individual strings using blackstone's sentence boundary detector.

The sentence boundary detector is a function within blackstone that detects individual sentences.

In [14]:
def legal_cats(sentences):
    """
    Function to identify the highest scoring category prediction generated by the text categoriser. 

    Arguments: 
    a list of strings
    
    converts to spacy generator object, splits into sentences using spacy's sentence detector

    returns a tuple of: 
    a list of the split sentences,
    a list of the max cat and max score for each doc in tuples
    """
    doc_sentences = []

    # This passes the input string through the nlp model, and converts it to doc object
    # This doc object contains both the original text, and tags the sentences with certain attributes, such as the sentence boundary detector.
    # A doc corresponds to a string.

    docs = nlp.pipe(sentences, disable = ["tagger", "ner", "textcat"])

    # We loop through each document in the documents, and loop again through each sentence in the document, and append the sentence to doc_sentences, an empty list
    for doc in docs:
        for sentence in doc.sents:
            doc_sentences.append(sentence.text)
    
    # We can now categorise each sentence into one of the five abovementioned categories.

    # We convert the newly detected sentences into a doc object again, as it contains the categoriser attribute that we can use to predict
    
    docs = nlp.pipe(doc_sentences, disable = ["tagger", "parser", "ner"])

    # We create a list to store the corresponding category and the score (ie the likelihood of the category that blackstone predicts the sentence to be)
    # This index of the list corresponds to doc_sentences (ie. the first item in cats_list contains the predicted category and score for the first sentence in doc_sentences, the second item in cats_list contains the predicted category and score for the second sentence, so on and so forth)

    cats_list = []

    # We loop through the doc (sentence) in the documents, and return the highest probability category and its score for each sentence

    # We have to select the highest scoring category because blackstone provides the probability of all five categories which the sentence can fall under.

    # We are only concerned with blackstone's best prediction, and hence we only save the highest scoring category.
    for doc in docs:
        cats = doc.cats
        max_score = max(cats.values()) 
        max_cats = [k for k, v in cats.items() if v == max_score]
        max_cat = max_cats[0]
        cats_list.append((max_cat, max_score))

    return doc_sentences, cats_list

## 3. Predicting on the processed text

With the above function defined, we can now finally use blackstone to predict the categories of the text. This just involves calling the function with the cleaned text as the argument.

The variable "cats" is a tuple (ie. two values inside parentheses). Each pair corresponds to each sentence in the processed text, as the function "legal_cats" assigns the top scoring category to each sentence. The first element of the pair contains a list of all the sentences in the text. 

The second element of the pair contains another pair: This nested pair contains the name of the category that the sentence is most likely to be classified, as well as the probability that the sentence falls under this category.

In [15]:
cats = legal_cats(text)

print(cats[0][10:30])

['Tort – Negligence – Duty of care – Applicable test to determine existence of duty of care – Relationship between two-stage test and incremental approach – Application of two-stage test comprising first proximity and second policy considerations with threshold consideration of factual foreseeability – Incremental approach as methodological aid in applying specific criterion of two-stage test', 'Tort – Negligence – Duty of care – Applicable test to determine existence of duty of care – Whether type of damage claimed should result in different test – Application of single (two-stage) test irrespective of type of damage claimed', 'Tort – Negligence – Duty of care – Whether there was proximity between contractor and certifier given that contractor could submit disputes for arbitration – Whether there was proximity between contractor and certifier given no direct contractual relationship between contractor and certifier – Whether policy considerations negating finding of duty of care', 'Fa

In [16]:
print(cats[1][:20])

[('UNCAT', 1.0), ('AXIOM', 0.4526105523109436), ('AXIOM', 0.5911222100257874), ('UNCAT', 0.9429351687431335), ('UNCAT', 1.0), ('UNCAT', 0.9965362548828125), ('CONCLUSION', 0.6653698086738586), ('UNCAT', 0.988778293132782), ('UNCAT', 0.9701417088508606), ('UNCAT', 0.8665805459022522), ('LEGAL_TEST', 0.9949630498886108), ('LEGAL_TEST', 0.5165076851844788), ('UNCAT', 0.9995095729827881), ('UNCAT', 1.0), ('UNCAT', 0.9995476603507996), ('UNCAT', 0.9678035974502563), ('UNCAT', 0.9433566927909851), ('UNCAT', 0.9999518394470215), ('UNCAT', 0.9877164959907532), ('UNCAT', 0.9806051254272461)]


## 4. Saving the predictions to a dataframe

For easy visualisation of the results, we create a new dataframe and append the variable "cats" to three new columns of this dataframe: The list of individual sentences to a new column called "sentence", the name of the top scoring category called "category", and the score of the highest scoring category called "score".

In [17]:
# This creates a new dataframe with the three columns
df_results = pd.DataFrame({"sentence" : cats[0], "category": [cat[0] for cat in cats[1]], "score": [cat[1] for cat in cats[1]]})

# The first 5 rows of the new dataframe.
df_results.head()

Unnamed: 0,sentence,category,score
0,Print,UNCAT,1.0
1,Spandeck Engineering (S) Pte Ltd v Defence Sci...,AXIOM,0.452611
2,[2007] 4 SLR(R) 100; [2007] SGCA 37,AXIOM,0.591122
3,Case Number : Civil Appeal No 3 of 2007,UNCAT,0.942935
4,Decision Date : 08 August 2007,UNCAT,1.0


We print the unique categories, and we see that they correspond to the five possible categories that spacy classifies text into.

We also print the number of sentences that fall into each category: We see that the vast majority of sentences do not relate to any of the four categories (which makes sense since most sentences in a judgement are either describing the facts of the case / other cases or applying a test to the fact scenario.)

In [18]:
print(df_results["category"].unique())

['UNCAT' 'AXIOM' 'CONCLUSION' 'LEGAL_TEST' 'ISSUE']


In [19]:
df_results["category"].value_counts()

UNCAT         483
LEGAL_TEST     97
AXIOM          66
CONCLUSION     44
ISSUE           6
Name: category, dtype: int64

### Visualising results of the classification

We then print the first 30 sentences which were classified as most likely to be a legal test.

#### Inspecting the sentences that were classified as a legal test:

We see that the first few results contain the keyword "test", which probably corresponds to how blackstone was trained: The presence of the word "test" in a sentence increases the likelihood that the sentence would be referring to a legal test.

Further, there are sentences that do not explicitly mention the word "test", but yet were correctly detected as a "test". For example, this sentence: "The focus was on the closeness of the relationship between the parties, including physical, circumstantial and causal proximity, supported by the twin criteria of voluntary assumption of responsibility and reliance."

This is likely because such keywords like "physical, circumstantial and causal proximity", "voluntary assumption of responsibility and reliance" directly relate to the test for duty of care in negligence. Hence while "test" itself does not appear in this sentence, these associated words that relate to the test for duty of care also increase the likelihood that the sentence refers to a legal test.

Lastly, blackstone also categorises cases like "Caparo Industries Plc v Dickman" as a legal test, probably also due to the fact that the ratio for these cases relate to the test for duty of care in the UK (and blackstone was trained on UK legal data).

In [20]:
for sentence in df_results.loc[df_results["category"] == "LEGAL_TEST", "sentence"][:40]:
    print(sentence)
    print("-" * 40)

Tort – Negligence – Duty of care – Applicable test to determine existence of duty of care – Relationship between two-stage test and incremental approach – Application of two-stage test comprising first proximity and second policy considerations with threshold consideration of factual foreseeability – Incremental approach as methodological aid in applying specific criterion of two-stage test
----------------------------------------
Tort – Negligence – Duty of care – Applicable test to determine existence of duty of care – Whether type of damage claimed should result in different test – Application of single (two-stage) test irrespective of type of damage claimed
----------------------------------------
(1)    A single test should determine the imposition of a duty of care in all claims arising out of negligence, irrespective of the type of the damages claimed.
----------------------------------------
There was no justification for a general exclusionary rule against recovery of all econ

#### Inspecting the sentences that were classified as an axiom:

Generally, we see that the sentences that are classified as axioms have the word "general principle". General principles are generally understood (pun intended) to be axioms, in the sense that both are broad statements that aim to generalise a certain area of knowledge.

However, we also see that blackstone is not so simplistic to simply classify a sentence as an axiom just because it has the phrase "general principle". 

For example, we have these sentences that are also classified as axioms (and which a reasonable person would also judge them to be axioms):

"There is no escape from the truth that, whatever formula be used, the outcome in a grey area case has to be determined by judicial judgment."

"In the tort of negligence, careless conduct cannot, by itself, be used as a basis for tortious liability."

From these examples, we can see the language model of blackstone is quite nuanced, as it takes into account the tone of the language used as well. For example, "no escape from the truth" (referring to a broad generalisation), "in the tort of negligence" (this phrase refers broadly to some concept in the tort of negligence). Generally, the method in which blackstone takes into account "tone" is just noting that such a sequence of words are more likely to occur in a sentence that relates to an axiom about the law. However, such a probabilistic model is still relatively successful in this regard.

In [21]:
for sentence in df_results.loc[df_results["category"] == "AXIOM", "sentence"][:40]:
    print(sentence)
    print("-" * 40)

Spandeck Engineering (S) Pte Ltd v Defence Science & Technology Agency
----------------------------------------
[2007] 4 SLR(R) 100; [2007] SGCA 37
----------------------------------------
[Observation: To balance fair and just results and the imposition of indeterminate liability on an indeterminate class of tortfeasors without compromising the tort of negligence as a tool for the fair redistribution of economic wealth was the crucial issue for the courts, and the answer was in legal control mechanisms developed by the courts: at [29] and [30].]
----------------------------------------
Edgeworth Constructions Ltd v ND Lea & Associates Ltd (1993) 107 DLR (4th) 169 (refd)
----------------------------------------
Elguzouli-Daf v Commission of Police of the Metropolis [1995] QB 335 (refd)
----------------------------------------
Governors of the Peabody Donation Fund v Sir Lindsay Parkinson & Co Ltd [1985] AC 210 (refd)
----------------------------------------
Hedley Byrne & Co Ltd v Hell

#### Inspecting the sentences that were classified as an issue:

There are only 6 sentences that were classified as issues, and only the first sentence corresponds to what we would think of as an legal issue.

The tone of the other sentences (apart from "the respondent's arguments") seem to relate to some kind of opinion of the court, usually after it has set out the issue. 

The miscategorisation could be because blackstone was trained on UK case law, and (anecdotally speaking), the older judgements in the UK courts tend to use run on sentences, which links phrases relating to the actual issue to the court's approach to how to resolve the issue (ie. phrases that include "in our view", "with respect", "the question has to be approached").

Nevertheless, at least the categorisation of these sentences are somewhat related to the context of the legal issue at hand. (In the case of Spandeck, the issue is the first sentences classified as an ISSUE: "the threshold issue was whether there was a duty of care owed by the respondent to the appellant and the applicable test for ascertaining the existence of a duty of care.")

In [22]:
for sentence in df_results.loc[df_results["category"] == "ISSUE", "sentence"][:696]:
    print(sentence)
    print("-" * 40)

On appeal, the threshold issue was whether there was a duty of care owed by the respondent to the appellant and the applicable test for ascertaining the existence of a duty of care.
----------------------------------------
Rather the question has to be approached in two stages.
----------------------------------------
35     In our view, these criticisms have arisen because of the perceived divorce of the particular from the universal (see [28] above).
----------------------------------------
71     As such, in our view, a single test is preferable in order to determine the imposition of a duty of care in all claims arising out of negligence, irrespective of the type of the damages claimed, and this should include claims for pure economic loss, whether they arise from negligent misstatements or acts/omissions.
----------------------------------------
Although this consideration has been incorporated as an element within the ‘three-part test’ itself, its incorporation is, with respect, 

#### Inspecting sentences that were classified as a conclusion:

On first glance, we see that blackstone generally classifies two types of "conclusions": 

The first type is the summary of case law that the court cites in its holding.

For example: "That pendulum finally became stationary in favour of an approach which meshed foreseeability with public policy in Anns ([23] supra), where Lord Wilberforce influentially said (at 751–752)"

"A fair reading of the sentence (see [33] above) beginning with “in order to establish … it is not necessary to bring the facts within those of previous situations” does not preclude an incremental approach as the expression “the facts” does necessarily connote all the facts, and is capable of implying that it is only necessary to bring only some of the facts."

"There can be no doubt that to depart from the decision would re-establish a degree of certainty in this field of law which it has done a remarkable amount to upset."

These sentences are summaries of prior precedent cases that are relevant to the decision of the case at hand.

The second type that blackstone classifies as a conclusion is the conclusion of the current case itself. For example:

"Notwithstanding these judicial views, we agree with Phang, Saw & Chan ([26] supra at 42) that these observations are “puzzling, to say the least”."

"On the facts of the present case, the same reasons above articulated by Russell LJ in Pacific Associates can also be characterised as policy considerations under the second stage of the test in Anns."


The last type of "conclusion" is the CA's summary of the lower court's decision about the current case: "The trial judge found that the respondent did not owe a duty of care to the appellant...."

Overall, blackstone seemed to have done a relatively good job at detecting sentences that are "summarial" in nature, without relying on specific key words such as "in conclusion" etc. It also did not classify sentences that are merely factual recounts of cases, and instead only classified the CA's summary of the case after the CA had decribed the relevant facts.

In [23]:
for sentence in df_results.loc[df_results["category"] == "CONCLUSION", "sentence"][:696]:
    print(sentence)
    print("-" * 40)

Coram : Chan Sek Keong CJ; Andrew Phang Boon Leong JA; V K Rajah JA
----------------------------------------
Policy considerations, such as the presence of a contractual matrix which clearly defined the rights and liabilities of the parties and their relative bargaining positions, then arose and were applied to the factual matrix to determine whether or not to negate this prima facie duty: at [77], [81], [83] and [115].
----------------------------------------
However, the absence of a factual precedent in analogous situations of proximity and/or policy considerations should not preclude the court from extending liability where it was just and fair to do so, taking into account the relevant policy consideration against indeterminate liability against a tortfeasor: at [43], [73] and [115].
----------------------------------------
Adopting an incremental approach with respect to the requirement of proximity and in view of cl 34, there was no voluntarily assumption of responsibility nor r

## 6. Concluding remarks

### About the data cleaning process:

We walked through the steps needed to prepare the data such that blackstone can be used: From importing the raw text file of the Spandeck case, to removing the formatting (like the new tab and new line formatting) from the text, to parsing the cleaned text into spacy to convert the raw text into a spacy document, and finally predicting using blackstone on individual sentences.

Generally, the "exciting" part of data analysis (ie getting classification and predictions) is also the easiest part of the process. The majority of the work is spent on data cleaning and preparation.

### About blackstone's performance on the Spandeck case:

We see that overall, blackstone has performed relatively well in its classification. Inspecting the classifications manually did not reveal many predictions by blackstone that did not seem to make sense, given the category. Most classifications could be plausibly explained to belong to a particular category because of certain phrases that indicated a "conclusion" or "axiom".

Interestingly, the classification of conclusions and axioms seem to overlap: A conclusion (or summary) of a often cited precedent (such as Donoghue v Stevenson) could be construed to be an axiom about the test for duty of care as well. (ie. In that the test is always something about the reasonable foreseeability of harm). Therefore, if one were to use blackstone to conduct data analytics of case precedent and principles, it would be good to look at those sentences classified as "conclusion" and and those classified as "axioms", as it casts a wider "net" on classifying legal principles.

### Possible future uses of Blackstone in Singapore's local context:

Blackstone's performance on local case law seems to be promising, especially since it was trained on UK case law. Perhaps blackstone can be further trained on local case law in the future. This would require labelled local case law (ie. each sentence in the case assigned a category by a human) across a wide spectrum of legal issues.

With more robust training of blackstone to adapt it to local case law, one direct application of it is the partial automation of case headnotes which are currently manually done by Justices Law Clerks. At the very least, blackstone can assist JLCs in writing these headnotes by first providing a preliminary classification of the case, which could point the JLCs to the relevant parts to further refine the summary. 