Skip to content

Developing NLP models for text and sentence classification using legal texts from the Bulgarian constitutional court.

License

Notifications You must be signed in to change notification settings

Paulj1989/bulgarian-constitutional-court-decisions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing Legal Texts from the Bulgarian Constitutional Court

Using natural language processing and deep learning methods for text and sentence classification tasks, applied to legal texts from the Bulgarian Constitutional Court.

Contents

Requirements

The Bulgarian Constitutional Court (BCC) project is managed in a virtual environment, using pipenv. All packages and their dependencies can be found in Pipfile and Pipfile.lock. To create a pipenv environment and install all the packages needed to run the codes in the repository, run the following in a terminal:

# install pipenv
pip install pipenv

# navigate to the repository directory
cd ~/path/to/bulgarian-constitutional-court-decisions

# install virtual environment and dependencies
pipenv install

All models that are currently in development are contained in the models folder. Text data and annotated documents can be found in the models/data folder, as well as a guide on converting documents from pdf to text, and a jupyter notebook tutorial on how to do this in python.

Current Results

The baseline models so far achieve the following performance on the training and validation data:

Baseline Model Test Accuracy
Logistic Regression 80%
Naive Bayes 84%
Support Vector Machines
(SVM)
81%

The deep learning models so far achieve the following performance on the training and validation data:

Deep Learning Model Test Accuracy Validation Accuracy
Convolutional Neural
Network (CNN)
89% 80%
Long Short-Term Memory
Neural Network (LSTM)
89% 80%

Project Plans

Status

This project is still in progress. Current models are in the early stages of development.

TODOs

Current TODOs for future development:

  • Tune baseline model hyperparameters to improve performance
  • Improve deep learning models
  • Visualize model performance
  • Further model testing
  • Add more annotated data to improve training process

Resources

If you are interested in using NLP or deep learning methods for analyzing legal texts, the following resources may be useful.

Legal Corpora

Research

Other Resources

License

The data for this project is licensed under the Creative Commons Attribution 3.0 Unported license, and the code used to train the models is licensed under the MIT license.

Contact

If you have any questions or comments, feel free to contact me by email, on Twitter, or in the repository discussions.

About

Developing NLP models for text and sentence classification using legal texts from the Bulgarian constitutional court.

Topics

Resources

License

Stars

Watchers

Forks