Skip to content

Jiaxin-Pei/Quantifying-Intimacy-in-Language

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Quantifying-Intimacy-in-Language

Official Github Repo for EMNLP 2020 paper Quantifying Intimacy in Language by Jiaxin Pei and David Jurgens.

Data

Annotated question intimacy data:

data/annotated_question_intimacy_data

Code

Python pacakge for intimacy prediction

If pip is installed, question-intimacy could be installed directly via pip:

pip3 install question-intimacy

Pre-trained model

Our model is also available on Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

#load tokenizer and model, both will be automatically downloaded for the first usage
tokenizer = AutoTokenizer.from_pretrained("pedropei/question-intimacy")
model = AutoModelForSequenceClassification.from_pretrained("pedropei/question-intimacy")

Code to train the intimacy regressor

To fine-tune the roberta-base model over our intimacy dataset

python3 train_intimacy_model.py --mode=train \
--model_name=roberta-base \
--pre_trained_model_name_or_path=roberta-base \
--train_path=data/annotated_question_intimacy_data/final_train.txt \
--val_path=data/annotated_question_intimacy_data/final_val.txt \
--test_path=data/annotated_question_intimacy_data/final_test.txt \
--model_saving_path=outputs 

The best model will be saved at outputs/

after training, to get the score on our annotated test and out-domain set,

python3 train_intimacy_model.py --mode=internal-test \
--model_name=roberta-base \
--pre_trained_model_name_or_path=outputs \
--train_path=data/annotated_question_intimacy_data/final_train.txt \
--val_path=data/annotated_question_intimacy_data/final_val.txt \
--test_path=data/annotated_question_intimacy_data/final_test.txt \
--predict_data_path=data/annotated_question_intimacy_data/final_external.txt 

to run the fine-tuned model over your own data, prepare a file with a list of input text like data/inference.txt and run the following command

python3 train_intimacy_model.py --mode=inference \
--model_name=roberta-base \
--pre_trained_model_name_or_path=outputs \
--predict_data_path=data/inference.txt \
--test_saving_path=ooo.txt

if you want to do language modeling fine-tuning for the roberta-base model, please checkout the code from Hugging Face Transformers

to train the fine-tuned roberta model over our intimacy dataset, put the model under saved_model and run the following command:

python3 train_intimacy_model.py --mode=train \
--model_name=roberta-ft \
--pre_trained_model_name_or_path=saved_model \
--train_path=data/annotated_question_intimacy_data/final_train.txt \
--val_path=data/annotated_question_intimacy_data/final_val.txt \
--test_path=data/annotated_question_intimacy_data/final_test.txt \
--model_saving_path=outputs 

Please email Jiaxin Pei (pedropei@umich.edu) to request the roberta-base model fine-tuned over 3M questions.

Contact

Jiaxin Pei (pedropei@umich.edu)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages