This project explores how explicit ethical value sets—such as feminist, ecological, and decolonial frameworks—can be injected into large language models through prompt conditioning. We aim to measure how moral system prompts shape model responses in terms of tone, framing, inclusivity, and alignment.
- Examine how ethical "value sets" influence the outputs of generative AI models.
- Establish methods to prime LLMs with ethical orientations through system-level prompts.
- Identify shifts in semantic meaning, toxicity, and value alignment across different moral framings.
- Provide an empirical foundation for future work in pluralistic alignment and value-conditioned generation.
To rigorously evaluate the impact of explicit ethical priming on LLM-generated content and assess:
- How values diverge or reinforce one another in generation.
- Whether certain moral framings redirect model biases toward inclusive outcomes.
- The limits and contradictions when values are complex or conflicting.
-
Define Moral Value Sets:
- Use ethical perspectives as system prompts: e.g.,
feminist
,ecological
,decolonial
.
- Use ethical perspectives as system prompts: e.g.,
-
Prompt Execution with LLMs:
- Run identical input prompts across each moral frame using GPT or LLaMA 3 models.
-
Divergence Analysis:
- 🔁 Semantic Similarity via Sentence-BERT
- 🚨 Toxicity & Moderation Filters (e.g., Perspective API or open models)
- 🎯 Value Alignment Classifiers (custom heuristics or open-source bias detectors)
Metric | Purpose |
---|---|
Semantic Similarity Score | Measure how values shift tone/content |
Toxicity / Bias Levels | Detect reduction or amplification of harm |
Value Alignment Accuracy | Measure conformance to intended ethical framing |
Inclusion/Diversity Indicators | Count inclusive terms, representation markers |
This roadmap is simply a starting point. We anticipate significant refinement during actual research. Key areas that require further investigation include:
- Fine-tuning styles and framing of value prompts.
- Modeling and resolving conflicting ethical values.
- Model-specific effects (e.g., base vs instruction-tuned, LLaMA vs Mistral).
- Extension to multilingual and culturally diverse value sets.
- Impact of value priming in downstream applications (e.g., summarization, dialogue).
Milestone # | Description |
---|---|
1️⃣ | Literature review on ethical prompting, alignment, and moral philosophy |
2️⃣ | Define moral value prompts and implement prompting pipeline using LLM APIs |
3️⃣ | Build evaluation toolkit and metrics: semantic similarity, toxicity, alignment scores |
4️⃣ | Run experiments, analyze divergences, quantify impact of value priming |
5️⃣ | Draft workshop paper, prepare visuals, submit and release public repo (if needed) |
Name | POC and Driver |
---|---|
Aarushi | Lead on experimentation, LLM workflows, project planning |
Neha Kamath | Lead on classifier analysis, metrics, and empirical evaluation |
Arjun | Lead on value set definition, prompt conditioning, and ethical framing |