Skip to content

FairXAI/LLM8ValuesProbing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Fine-Grained Interpretation of Political Opinions in Large Language Models

arXiv HuggingFace

Overview

Our work includes (1) collecting and constructing fine-grained data, (2) implementing and evaluating LLMs interpretability techniques in the fine-grained way.

Dataset Construction

We collected Allsides dataset, which contains 970 news headlines extracted from Allsides across different domain dimensions (based on a fine-grained 8-values scheme). The collected data is released at HuggingFace Allsides-8Values.

Now we can let LLMs generate both left and right opinions for each event, and obtain the rephrased dataset. The code implementation for LLMs rephrasing can be found in \data.

Interpretability Techniques

\xllm contains key modules for the learning, detecting, and intervening tasks in LLM political internal opinions.

Other

Resources

Requirements

pip intsall torch,notebook,pandas,openai,transformers,seaborn,numpy,matplotlib,scikit-learn,datasets
pip install transformer-utils

Citation

Cite our paper if our paper/repository inspires or supports your work.

Cheers,

APA Format

Hu, J., Yang, M., Du, M., & Liu, W. (2025). Fine-Grained Interpretation of Political Opinions in Large Language Models. arXiv preprint arXiv:2506.04774.

BibTex

@article{hu2025fine,
  title={Fine-Grained Interpretation of Political Opinions in Large Language Models},
  author={Hu, Jingyu and Yang, Mengyue and Du, Mengnan and Liu, Weiru},
  journal={arXiv preprint arXiv:2506.04774},
  year={2025}
}

About

Fine-Grained Interpretation of Political Opinions in Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages