Fine-Grained Interpretation of Political Opinions in Large Language Models

Overview

Our work includes (1) collecting and constructing fine-grained data, (2) implementing and evaluating LLMs interpretability techniques in the fine-grained way.

Dataset Construction

We collected Allsides dataset, which contains 970 news headlines extracted from Allsides across different domain dimensions (based on a fine-grained 8-values scheme). The collected data is released at HuggingFace Allsides-8Values.

Now we can let LLMs generate both left and right opinions for each event, and obtain the rephrased dataset. The code implementation for LLMs rephrasing can be found in \data.

Interpretability Techniques

\xllm contains key modules for the learning, detecting, and intervening tasks in LLM political internal opinions.

Other

Resources

The designed templates are based on the document Model Prompting Guides

Requirements

pip intsall torch,notebook,pandas,openai,transformers,seaborn,numpy,matplotlib,scikit-learn,datasets
pip install transformer-utils

Citation

Cite our paper if our paper/repository inspires or supports your work.

Cheers,

APA Format

Hu, J., Yang, M., Du, M., & Liu, W. (2025). Fine-Grained Interpretation of Political Opinions in Large Language Models. arXiv preprint arXiv:2506.04774.

BibTex

@article{hu2025fine,
  title={Fine-Grained Interpretation of Political Opinions in Large Language Models},
  author={Hu, Jingyu and Yang, Mengyue and Du, Mengnan and Liu, Weiru},
  journal={arXiv preprint arXiv:2506.04774},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
xllm		xllm
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-Grained Interpretation of Political Opinions in Large Language Models

Overview

Dataset Construction

Interpretability Techniques

Other

Citation

About

Uh oh!

Releases

Packages

Languages

FairXAI/LLM8ValuesProbing

Folders and files

Latest commit

History

Repository files navigation

Fine-Grained Interpretation of Political Opinions in Large Language Models

Overview

Dataset Construction

Interpretability Techniques

Other

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages