Using natural language processing and deep learning methods for text and sentence classification tasks, applied to legal texts from the Bulgarian Constitutional Court.
The Bulgarian Constitutional Court (BCC) project is managed in a virtual environment, using pipenv. All packages and their dependencies can be found in Pipfile and Pipfile.lock. To create a pipenv environment and install all the packages needed to run the codes in the repository, run the following in a terminal:
# install pipenv
pip install pipenv
# navigate to the repository directory
cd ~/path/to/bulgarian-constitutional-court-decisions
# install virtual environment and dependencies
pipenv install
All models that are currently in development are contained in the models folder. Text data and annotated documents can be found in the models/data folder, as well as a guide on converting documents from pdf to text, and a jupyter notebook tutorial on how to do this in python.
The baseline models so far achieve the following performance on the training and validation data:
Baseline Model | Test Accuracy |
---|---|
Logistic Regression | 80% |
Naive Bayes | 84% |
Support Vector Machines (SVM) |
81% |
The deep learning models so far achieve the following performance on the training and validation data:
Deep Learning Model | Test Accuracy | Validation Accuracy |
---|---|---|
Convolutional Neural Network (CNN) |
89% | 80% |
Long Short-Term Memory Neural Network (LSTM) |
89% | 80% |
This project is still in progress. Current models are in the early stages of development.
Current TODOs for future development:
- Tune baseline model hyperparameters to improve performance
- Improve deep learning models
- Visualize model performance
- Further model testing
- Add more annotated data to improve training process
If you are interested in using NLP or deep learning methods for analyzing legal texts, the following resources may be useful.
The data for this project is licensed under the Creative Commons Attribution 3.0 Unported license, and the code used to train the models is licensed under the MIT license.
If you have any questions or comments, feel free to contact me by email, on Twitter, or in the repository discussions.