GitHub

PoliLean Reposiotry

This is the official repo for KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding @ ACL 2023.

Content

process_graph.py, process_text.py, and process_knowledge.py are data preprocessing codes for the three aspects.

graph_gnn_layer.py implements the knowledge-guided message passing GNN, model.py defines the modular components of KALM, dataloader.py, trainer.py, and utils.py are helper functions, main.py is the main executable file.

Data link (raw and preprocessed)

preprocessed data: link

Citation

If you find this repo useful, please cite our paper:

@inproceedings{feng-etal-2023-pretraining,
    title = "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair {NLP} Models",
    author = "Feng, Shangbin  and
      Park, Chan Young  and
      Liu, Yuhan  and
      Tsvetkov, Yulia",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.656",
    doi = "10.18653/v1/2023.acl-long.656",
    pages = "11737--11762",
    abstract = "Language models (LMs) are pretrained on diverse data sources{---}news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure media biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings which reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and media biases into misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dataloader.py

dataloader.py

graph_gnn_layer.py

graph_gnn_layer.py

main.py

main.py

model.py

model.py

process_graph.py

process_graph.py

process_knowledge.py

process_knowledge.py

process_text.py

process_text.py

trainer.py

trainer.py

utils.py

utils.py

Repository files navigation

PoliLean Reposiotry

Content

Data link (raw and preprocessed)

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
dataloader.py		dataloader.py
graph_gnn_layer.py		graph_gnn_layer.py
main.py		main.py
model.py		model.py
process_graph.py		process_graph.py
process_knowledge.py		process_knowledge.py
process_text.py		process_text.py
trainer.py		trainer.py
utils.py		utils.py

BunsenFeng/KALM

Folders and files

Latest commit

History

Repository files navigation

PoliLean Reposiotry

Content

Data link (raw and preprocessed)

Citation

About

Resources

Stars

Watchers

Forks

Languages