Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Abstract

Most text-classification approaches represent the input based on textual features, either feature-based or continuous. However, this ignores strong non-linguistic similarities like homophily: people within a demographic group use language more similar to each other than to non-group members. We use homophily cues to retrofit text-based author representations with non-linguistic information, and introduce a trade-off parameter. This approach increases in-class similarity between authors, and improves classification performance by making classes more linearly separable. We evaluate the effect of our method on two author-attribute prediction tasks with various training-set sizes and parameter settings. We find that our method can significantly improve classification performance, especially when the number of labels is large and limited labeled data is available. It is potentially applicable as preprocessing step to any text-classification task.

References

The paper appeared at EMNLP 2018:

Dirk Hovy and Tommaso Fornaciari. 2018. Improving Author Attribute Prediction by Retrofitting Linguistic Representations with Homophily. In Proceedings of EMNLP.

@inproceedings{HovyFornaciari2018increasing,
  title={{Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information}},
  author={Hovy, Dirk and Fornaciari, Tommaso},
  booktitle={Proceedings of the 2018 conference on Empirical Methods in Natural Language Processing},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
images		images
notebooks		notebooks
results/i1000/100runs		results/i1000/100runs
src		src
README.md		README.md
run_i1000.alpha0.01.sh		run_i1000.alpha0.01.sh
run_i1000.alpha0.1.sh		run_i1000.alpha0.1.sh
run_i1000.alpha0.25.sh		run_i1000.alpha0.25.sh
run_i1000.alpha0.5.sh		run_i1000.alpha0.5.sh
run_i1000.alpha0.75.sh		run_i1000.alpha0.75.sh
train_model.sh		train_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

notebooks

notebooks

results/i1000/100runs

results/i1000/100runs

src

src

README.md

README.md

run_i1000.alpha0.01.sh

run_i1000.alpha0.01.sh

run_i1000.alpha0.1.sh

run_i1000.alpha0.1.sh

run_i1000.alpha0.25.sh

run_i1000.alpha0.25.sh

run_i1000.alpha0.5.sh

run_i1000.alpha0.5.sh

run_i1000.alpha0.75.sh

run_i1000.alpha0.75.sh

train_model.sh

train_model.sh

Repository files navigation

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Abstract

References

About

Releases

Packages

Languages

Bocconi-NLPLab/retrofit_attributes

Folders and files

Latest commit

History

Repository files navigation

Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information

Abstract

References

About

Resources

Stars

Watchers

Forks

Languages