Skip to content

eXascaleInfolab/MARTA

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

MARTA: Leveraging Human Rationales for Explainable Text Classification

MARTA is a unified Bayesian Framework that integrates an attention-based model with labels and rationales contributed by workers.

Structure of MARTA's repository

  • This repo is composed of four main directories:
    • code: source code of MARTA
    • data: contains two subfolders which are "original data" with the two datasets Amazon and Wiki_tech used in the paper, and "processed_data" which contains the generated files from the original data needed to run MARTA.
    • scripts: contains two scripts to run the code on the datasets used in the paper
    • results: after running the code, the results will be saved in this directory

Using MARTA

Create a virtual environment and install requirements

We are using python3.6 on a Ubuntu 16.04 machine with 32 CPUs and 128GB RAM. You can create a virtual environment for MARTA using the following command:

sudo apt-get install python3-venv
python3.6 -m venv env-marta
source env-marta/bin/activate

Install all requirements using the following command:

pip install --upgrade pip
pip install -r requirements.txt

Running MARTA

To run MARTA on the amazon data, you can use the script: amazon.sh in the scripts folder

chmod u+x ./scripts/amazon.sh
cd code
../scripts/amazon.sh

To run MARTA on the wiki_tech data, you can use the script: wiki_tech.sh in the scripts folder

chmod u+x ./scripts/wiki_tech.sh
cd code
../scripts/wiki_tech.sh

Using MARTA with other datasets:

  • In order to generate the data needed for MARTA, you can use the python script gen_data_marta.py.
  • The script takes as input a csv file with the following header: doc_id,text,label,WorkerId,worker_label,rationale, where:
    • doc_id: the document id
    • text: content of the document in one line
    • label: ground truth binary label of the document
    • WorkerId: the worker id
    • worker_label: the label given by the worker
    • rationale: the part of text selected by the worker as a justification to her label The scripts generates three files:
    • textual_data.csv: contains for each document, the sentences composing it and the ground truth label of the document. The format of the generated file is 'doc_id', 'text','sentence','label'.
    • workers_answers.csv: contains the worker labeling. The format of the file is 'doc_id','WorkerId','worker_label'
    • workers_sentence_label.csv: contains worker labeling at the sentence level. The format of the file is 'doc_id','WorkerId','worker_label','sentence','rationale','sentence_label'

Examples of running the code:

cd ./code/data_process/
  • For Amazon data:
python gen_data_marta.py --original_data '../../data/original_data/amazon.csv' --dir_gen_marta '../../data/processed_data/amazon/'

Citation

Please cite the following paper when using MARTA:

@inproceedings{arous2021aaai,
  title = {MARTA: Leveraging Human Rationales for Explainable Text Classification},
  author = {Arous, Ines and Dolamic, Ljiljana and Yang, Jie and Bhardwaj, Akansha and Cuccu, Giuseppe and Cudr{\'e}-Mauroux, Philippe},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021)},
  year = {2021},
  address = {A Virtual Conference}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published