Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization

This is the repository for Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization.

Authors: Vera Neplenbroek, Arianna Bisazza, Raquel Fernández.

Paper abstract

Generative Large Language Models (LLMs) infer user's demographic information from subtle cues in the conversation --- a phenomenon called implicit personalization. Prior work has shown that such inferences can lead to lower quality responses for users assumed to be from minority groups, even when no demographic information is explicitly provided. In this work, we systematically explore how LLMs respond to stereotypical cues using controlled synthetic conversations, by analyzing the models' latent user representations through both model internals and generated answers to targeted user questions. Our findings reveal that LLMs do infer demographic attributes based on these stereotypical signals, which for a number of groups even persists when the user explicitly identifies with a different demographic group. Finally, we show that this form of stereotype-driven implicit personalization can be effectively mitigated by intervening on the model's internal representations using a trained linear probe to steer them toward the explicitly stated identity. Our results highlight the need for greater transparency and control in how LLMs represent user identity.

Requirements

In order to run the code included in this project, install the requirements in your virtual environment by running:

pip install -r requirements.txt

This project was developed using Python 3.12.

This repository contains snippets of code from:

https://github.com/yc015/TalkTuner-chatbot-llm-dashboard

Using this repository

eval_convos contains the code to evaluate a model on all conversations for a demographic, including mitigation using probe weights wherever applicable.

Example usage:
```
python eval_conversations.py -m allenai/OLMo-2-1124-7B-Instruct -n 250 -bs 8 -d gender
```
train_probe.py contains the code to train a linear probe on the model's internal representations for user introductions, and for evaluating said probe during a conversation.

Example usage:
```
python train_probe.py -m allenai/OLMo-2-1124-7B-Instruct --save_probe
```
mitigate.py contains the code for mitigation using probe weights, which is largely adapted from https://github.com/yc015/TalkTuner-chatbot-llm-dashboard.
utils.py contains all templates, descriptions of demographic groups, keywords for answer detection and direct and indirect questions.
stereotypes.csv contains all stereotypical items we collected, as well as the set of neutral items.
conversations_250.json contains all conversations we use in the paper, 250 per type.
visualizations.ipynb contains the code for generating the visualizations, tables and statistical tests included in the paper.

Citation

If you use the code in this repository, please cite the following paper:

@misc{neplenbroek2025readingpromptsstereotypesshape,
      title={Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization}, 
      author={Vera Neplenbroek and Arianna Bisazza and Raquel Fernández},
      year={2025},
      eprint={2505.16467},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.16467}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization

Paper abstract

Requirements

Using this repository

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
README.md		README.md
conversations_250.json		conversations_250.json
eval_conversations.py		eval_conversations.py
figure.png		figure.png
get_steering_config.py		get_steering_config.py
mitigate.py		mitigate.py
requirements.txt		requirements.txt
stereotypes.csv		stereotypes.csv
train_probe.py		train_probe.py
utils.py		utils.py
visualizations.ipynb		visualizations.ipynb

Folders and files

Latest commit

History

Repository files navigation

Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization

Paper abstract

Requirements

Using this repository

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages