Skip to content

Project done as a part of "Legal Data Science & Informatics (IN2395)"

License

Notifications You must be signed in to change notification settings

Husain0007/BVA-Legal-Annotator

Repository files navigation

Project: United States Board of Veterans' Appeals Sentence Annotator

The data & directive for this project were outlined by the Professorship for Legal Tech for the course titled "Legal Data Science & Informatics (IN2395)" taught during Winter 2021-22 at the Technical University of Munich .

In the project we are tasked with developing a sentence level annotator for case text originating from the US Board of Veterans' Appeals.

As a part of this project the course participants were tasked with manually labelling legal cases. As a pre-requisite they were provided with some legal background via lectures and workshops then subsequently instructed to annotate sentences from a total of 141 BVA cases using the Gloss Legal Annotator Tool . The task of annotating was divided amongst the 50+ participants, hence the resulting annotated documents are the shared Intellectual Property of all course participants. For this reason I have not included any reference to the data in this repo, and I have also removed .json & .txt files used throughout the notebook LDSI-Project-SHM from the project directory.

Some functions to tokenize and parse the case text were taken from the LDSI_W21_Classifier_Workshop_clear.ipynb notebook provided by the Professorship for Legal Tech .

In the LDSI-Project-SHM notebook I have featurized the sentences as TF-IDF vectors & sentence embeddings then applied them to 28 machine learning models.

The top performing TF-IDF based and Sentence Embedding based models (F1-Score: 86% & 85% respectively) have been saved as "best_model.joblib" and "best2_model.joblib" and can be applied to a sample case text provided in "Check.txt" via "analyze.py" and "analyze_second_best.py".

$ python analyze.py Check.txt
$ python analyze_second_best.py Check.txt

To download the necessary dependencies please run the following command

pip install -r requirements.txt

About

Project done as a part of "Legal Data Science & Informatics (IN2395)"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published