An Authority Measure for Brazilian Collective Bargaining Agreements

This repository is built upon the research paper titled "Unsupervised Extraction of Workplace Rights and Duties from Collective Bargaining Agreements" by Elliot Ash, Jeff Jacobs, Bentley MacLeod, Suresh Naidu, and Dominik Stammbach. The original code base can be found here. The pipeline is adapted for Brazilian collective bargaining agreements with modifications to the parsing algorithm and dictionaries to accommodate the Portuguese language.

The repository covers the following:

Cleaning Documents (Sample Code)
Parsing Documents in Portuguese
Computing Authority Scores on Statement Level
Aggregating Authority Scores on Contract Level
Extracting Corpus-Wide Subject-Verb Prefixes

The main output of the pipeline is the file $output_directory/05_aggregated.csv, which contains information for each document on the number of obligations, permissions, entitlements, and constraints for each agent type. The pipeline may be run with each document as an observation, or with each clause as an observation, whereby the '--clause' flag is used to specify the latter.

Within this repository are sample Jupyter notebooks used to clean collective bargaining agreements by document and by clause. Please note that minor adjustments may be necessary for the pipeline to function properly when used with different document formats.

Getting Started

Assuming Anaconda and Linux, the environment can be installed with the following commands:

conda create -n authority_measure python=3.9.13
conda activate authority_measure
pip install -r requirements.txt
pip install spacy==3.5.4
python -m spacy download pt_core_news_sm

Running the Pipeline

The pipeline accepts cleaned .txt files, where the file name is the contract ID for a given CBA. The input_directory should be the name of the folder containing the cleaned documents. In order to run the pipeline, run the following command:

input_directory="cleaned_cbas"
output_directory="output"
python src/pipeline.py --input_directory $input_directory --output_directory $output_directory

There is also an option to run the pipeline by clause rather than by document. With this option, the cleaned.txt files must contain a double array, where each interior array is composed of two elements: the clause name and clause text.

input_directory="cleaned_cbas_clause"
output_directory="output"
python src/pipeline.py --input_directory $input_directory --output_directory $output_directory --clause

References

E. Ash, J. Jacobs, B. MacLeod, S. Naidu and D. Stammbach, "Unsupervised Extraction of Workplace Rights and Duties from Collective Bargaining Agreements," 2020 International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 2020, pp. 766-774, doi: 10.1109/ICDMW51313.2020.00112.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
examples		examples
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
clause_groups.csv		clause_groups.csv
clean_documents.ipynb		clean_documents.ipynb
clean_documents_clause.ipynb		clean_documents_clause.ipynb
requirements.txt		requirements.txt
scratch_work.py		scratch_work.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Authority Measure for Brazilian Collective Bargaining Agreements

Getting Started

Running the Pipeline

References

About

Releases

Packages

Languages

Calvineng72/Authority-Measure

Folders and files

Latest commit

History

Repository files navigation

An Authority Measure for Brazilian Collective Bargaining Agreements

Getting Started

Running the Pipeline

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages