🧠 Encoding Values: Injecting Morality into Machines

📌 Project Overview

This project explores how explicit ethical value sets—such as feminist, ecological, and decolonial frameworks—can be injected into large language models through prompt conditioning. We aim to measure how moral system prompts shape model responses in terms of tone, framing, inclusivity, and alignment.

🎯 Key Goals

Examine how ethical "value sets" influence the outputs of generative AI models.
Establish methods to prime LLMs with ethical orientations through system-level prompts.
Identify shifts in semantic meaning, toxicity, and value alignment across different moral framings.
Provide an empirical foundation for future work in pluralistic alignment and value-conditioned generation.

🔬 Research Objective

To rigorously evaluate the impact of explicit ethical priming on LLM-generated content and assess:

How values diverge or reinforce one another in generation.
Whether certain moral framings redirect model biases toward inclusive outcomes.
The limits and contradictions when values are complex or conflicting.

⚙️ Methodology

Define Moral Value Sets:
- Use ethical perspectives as system prompts: e.g., feminist, ecological, decolonial.
Prompt Execution with LLMs:
- Run identical input prompts across each moral frame using GPT or LLaMA 3 models.
Divergence Analysis:
- 🔁 Semantic Similarity via Sentence-BERT
- 🚨 Toxicity & Moderation Filters (e.g., Perspective API or open models)
- 🎯 Value Alignment Classifiers (custom heuristics or open-source bias detectors)

📏 Evaluation Metrics (non-exhaustive)

Metric	Purpose
Semantic Similarity Score	Measure how values shift tone/content
Toxicity / Bias Levels	Detect reduction or amplification of harm
Value Alignment Accuracy	Measure conformance to intended ethical framing
Inclusion/Diversity Indicators	Count inclusive terms, representation markers

🧭 Areas for Exploration

This roadmap is simply a starting point. We anticipate significant refinement during actual research. Key areas that require further investigation include:

Fine-tuning styles and framing of value prompts.
Modeling and resolving conflicting ethical values.
Model-specific effects (e.g., base vs instruction-tuned, LLaMA vs Mistral).
Extension to multilingual and culturally diverse value sets.
Impact of value priming in downstream applications (e.g., summarization, dialogue).

📅 Project Milestones

Milestone #	Description
1️⃣	Literature review on ethical prompting, alignment, and moral philosophy
2️⃣	Define moral value prompts and implement prompting pipeline using LLM APIs
3️⃣	Build evaluation toolkit and metrics: semantic similarity, toxicity, alignment scores
4️⃣	Run experiments, analyze divergences, quantify impact of value priming
5️⃣	Draft workshop paper, prepare visuals, submit and release public repo (if needed)

👥 Team Responsibilities

Name	POC and Driver
Aarushi	Lead on experimentation, LLM workflows, project planning
Neha Kamath	Lead on classifier analysis, metrics, and empirical evaluation
Arjun	Lead on value set definition, prompt conditioning, and ethical framing

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
backends		backends
constitutions		constitutions
docs		docs
evaluation		evaluation
queries		queries
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
cli_runner.py		cli_runner.py
eval_by_constitution_and_split.py		eval_by_constitution_and_split.py
generate.py		generate.py
modal_runner.py		modal_runner.py
requirements.txt		requirements.txt
run_saved_evaluation.py		run_saved_evaluation.py
test_lexicon_mapping.py		test_lexicon_mapping.py
test_semantic_distance_mapping.py		test_semantic_distance_mapping.py
test_value_alignment.py		test_value_alignment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Encoding Values: Injecting Morality into Machines

📌 Project Overview

🎯 Key Goals

🔬 Research Objective

⚙️ Methodology

📏 Evaluation Metrics (non-exhaustive)

🧭 Areas for Exploration

📅 Project Milestones

👥 Team Responsibilities

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

bias-check/encoding_values

Folders and files

Latest commit

History

Repository files navigation

🧠 Encoding Values: Injecting Morality into Machines

📌 Project Overview

🎯 Key Goals

🔬 Research Objective

⚙️ Methodology

📏 Evaluation Metrics (non-exhaustive)

🧭 Areas for Exploration

📅 Project Milestones

👥 Team Responsibilities

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages