This repository contains the code associated with the SARA v2 paper.
The data is hosted on NLLP@JHU. Look for "SARA v2".
The folder scripts
contains all you need to reproduce the experiments from the paper. The scripts assume that you have completed the requirements below, and will both prepare the data and run the experiments.
To run the scripts in this repository, you will need a number of other pieces of code. Either go through the instructions below, or run bash scripts/get_dependencies.sh
.
The code in this repository relies on bash and Python 3.5.3. Install all the needed packages into a dedicated virtual environment with pip install -r requirements.txt
Download SARA and SARAv2 and place them in this repository under sara
and sara_v2
respectively; or run bash scripts/get_data.sh
.
The scorer used for conventional coreference metrics can be found in this Github repository. Clone the latter repository from within your clone of the SARA v2 repository.
Download Legal Bert from the website of Tax Law NLP Resources or directly from here, and unzip the file into this repository, into a folder named LegalBert
, eg with the command unzip LegalBert.zip -d LegalBert
.
This repository uses Version 3.9.2 of the parser, timestamped 2018-10-17. Download the parser using the preceding link and unzip it into this repository.
Some of these scripts rely on GPUs. Since there is no universal configuration for these devices, it is up to you to modify the scripts in the right places. For that, grep for 'GPU' in scripts
and in code
.
Some scripts will need to download models from huggingface, which requires an internet connection.
Specifically for the scripts running argument identification with a BERT-based CRF, you need Python 3.7.9. and Allennlp and Huggingface. Those requireents are captured in requirements_crf.txt
. The requirements for the rest of the codebase are not compatible with requirements_crf.txt
, so you need a separate environment.