The data & directive for this project were outlined by the Professorship for Legal Tech for the course titled "Legal Data Science & Informatics (IN2395)" taught during Winter 2021-22 at the Technical University of Munich .
In the project we are tasked with developing a sentence level annotator for case text originating from the US Board of Veterans' Appeals.
As a part of this project the course participants were tasked with manually labelling legal cases. As a pre-requisite they were provided with some legal background via lectures and workshops then subsequently instructed to annotate sentences from a total of 141 BVA cases using the Gloss Legal Annotator Tool . The task of annotating was divided amongst the 50+ participants, hence the resulting annotated documents are the shared Intellectual Property of all course participants. For this reason I have not included any reference to the data in this repo, and I have also removed .json & .txt files used throughout the notebook LDSI-Project-SHM from the project directory.
Some functions to tokenize and parse the case text were taken from the LDSI_W21_Classifier_Workshop_clear.ipynb notebook provided by the Professorship for Legal Tech .
In the LDSI-Project-SHM notebook I have featurized the sentences as TF-IDF vectors & sentence embeddings then applied them to 28 machine learning models.
The top performing TF-IDF based and Sentence Embedding based models (F1-Score: 86% & 85% respectively) have been saved as "best_model.joblib" and "best2_model.joblib" and can be applied to a sample case text provided in "Check.txt" via "analyze.py" and "analyze_second_best.py".
$ python analyze.py Check.txt
$ python analyze_second_best.py Check.txt
To download the necessary dependencies please run the following command
pip install -r requirements.txt