A six-pillar ethical framework for AI alignment developed through multi-AI collaboration
This repository contains both:
- An ethical framework designed to address AI alignment challenges
- A working implementation (CORVUS 2.0) demonstrating Constitutional AI with explicit value specification
How do we develop ethical frameworks for AI alignment when humanity has never achieved consensus on values?
This framework explicitly addresses how to handle tensions between principlesβnot just what the principles are. It includes working code that validates AI actions against these principles in real-time.
- Curiosity and Truth-Seeking - Evidence-based reasoning, intellectual humility, diverse paths to meaning
- Empathy and Mutual Flourishing - Minimizing suffering, expanding moral consideration, balancing outcomes and rights
- Dignity and Agency - Inherent worth, autonomy, participatory governance, accountability
- Sustainability and Long-Term Stewardship - Intergenerational justice, ecological balance, technological alignment
- Adaptability and Diversity - Cultural pluralism, moral evolution, resilience through diversity
- Integrity and Responsibility - Aligning words and deeds, moral courage, accountability systems
NEW: This repo now includes a functional Constitutional AI system that uses the six-pillar framework.
- Real-time ethical filtering of commands and AI responses
- Tension detection between competing values
- Traceable reasoning for every decision
- Comprehensive logging for auditing and improvement
from ethics_engine import EthicsEngine
engine = EthicsEngine()
# Evaluate a command
decision = engine.evaluate_command("search for gardening tips")
print(f"Allowed: {decision.allowed}")
print(f"Reasoning: {decision.reasoning}")
# Output: Allowed: True
# Reasoning: Command aligns with all ethical pillars.
# Block harmful commands
decision = engine.evaluate_command("hack into someone's email")
print(f"Allowed: {decision.allowed}")
# Output: Allowed: False
# Reasoning: Command contains harmful patterns. Violates core ethical principles.# Clone the repository
git clone https://github.com/FrankleFry1/gold-standard-human-values.git
cd gold-standard-human-values
# Install CORVUS 2.0 dependencies
cd implementations/corvus-2.0
pip install -r requirements.txt
# Set up API keys (optional, for LLM integration)
cp .env.example .env
# Edit .env with your API keys
# Run basic demo
python examples/basic_usage.pyπ Full CORVUS 2.0 Documentation
Initial testing of CORVUS 2.0 shows:
- 100% accuracy blocking harmful commands (hack, cheat, fraud)
- 0% false positives on benign commands (search, help, analyze)
- Real-time tension detection between competing pillars
- Traceable decision logs for every evaluation
See ethics_log.json for sample logged decisions.
Most ethical frameworks list principles but struggle when they conflict. This framework includes detailed guidance (Article VII) on navigating tensions:
- Curiosity vs. Empathy - When truth-seeking causes harm
- Sustainability vs. Present Well-Being - When long-term survival requires sacrifice
- Dignity/Agency vs. Collective Flourishing - When individual freedom conflicts with social good
- Existential Tensions - When defending principles requires compromising them
See examples/Case Studies/ for detailed analysis of:
- Open-source AI model release decisions
- Climate emergency democratic restrictions
- AI content moderation dilemmas
This framework emerged from structured dialogue with four AI systems:
- Claude (Anthropic): Constitutional AI, emphasis on harmlessness
- ChatGPT (OpenAI): RLHF optimization, emphasis on helpfulness
- Grok (xAI): Maximum truth-seeking, less filtered
- Gemini (Google): Different training corpus and safety guidelines
The process:
- Posed alignment questions to each AI independently
- Identified convergences (robust principles) and divergences (requiring human judgment)
- Iteratively refined through synthesis
- Built working implementation to validate in practice
Use the six pillars as a prompt template for evaluating decisions:
Evaluate [decision] through these six pillars:
1. Curiosity and Truth-Seeking: [Assess evidence and openness]
2. Empathy and Mutual Flourishing: [Consider suffering and equity]
3. Dignity and Agency: [Evaluate autonomy and accountability]
4. Sustainability and Long-Term Stewardship: [Weigh long-term impacts]
5. Adaptability and Diversity: [Check for pluralism and evolution]
6. Integrity and Responsibility: [Ensure alignment of actions]
Provide balanced recommendations.
See examples/ai-prompt-template.md for more templates.
Integrate CORVUS 2.0 into your AI system:
from implementations.corvus_2_0.ethics_engine import EthicsEngine
engine = EthicsEngine()
# Before executing any AI action
decision = engine.evaluate_command(user_command)
if decision.allowed:
execute_action(user_command)
else:
log_blocked_action(decision.reasoning)Download the charter and test against your use cases:
- CHARTER.md - Full framework specification
- Case Studies - Detailed application examples
- Google Doc - Comment-enabled version
This framework and implementation could inform:
- Constitutional AI Development - Richer value specifications for training
- AI Governance Policy - Guiding regulation that balances innovation and safety
- Institutional Ethics Review - Framework for AI deployment decisions
- Red Teaming & Safety Testing - Systematic evaluation against explicit values
- Cross-Cultural Dialogue - Common ground for values discussions
This framework is incomplete and culturally situated:
- Emerged primarily from Western philosophical traditions
- Training data for all four AIs is predominantly English-language and Western-centric
- Human synthesizer brings own cultural assumptions and blind spots
- Implementation mechanisms need real-world stress testing
- Needs critique from non-Western philosophical traditions
This is explicitly version 1.0. The framework itself calls for adaptation based on evidence and experience.
- Six-pillar framework published
- Basic Constitutional AI implementation (CORVUS 2.0)
- Real-time ethical filtering working
- Case studies and prompt templates
- Enhanced tension resolution with LLM reasoning
- Comprehensive benchmark suite (TruthfulQA, safety evals)
- Integration examples for popular AI frameworks
- Community feedback incorporation
- Multi-model testing (GPT, Claude, Gemini integration)
- Advanced logging and interpretability tools
- Web dashboard for ethics monitoring
- Translations and cross-cultural validation
- Production-grade deployment tools
- Formal verification methods
- Academic paper submission
- Open-source community building
We're seeking rigorous critique and improvement:
For AI Safety Researchers:
- Could this inform Constitutional AI or reward modeling?
- Where does it fail under adversarial pressure?
- How can we make the ethics engine more robust?
For Philosophers:
- What cultural assumptions are we missing?
- Which philosophical traditions critique this framework?
- How can we improve tension resolution guidance?
For Implementers:
- How would you operationalize these principles in production?
- What edge cases break the current implementation?
- What features would make this more practical?
- Open an Issue - Point out flaws, gaps, or unclear sections
- Pull Request - Suggest improvements to charter or code
- Discussions - Share how you'd apply this to real dilemmas
- Case Studies - Test the framework against actual AI ethics dilemmas
- Translations - Help make this accessible in other languages
- CHARTER.md - Complete six-pillar framework
- CORVUS 2.0 Docs - Technical implementation guide
- Case Studies - Real-world applications
- Prompt Templates - Ready-to-use AI prompts
- EA Forum Post - Development story and discussion
If you reference this work:
Six-Pillar Framework for AI Alignment and Human Values (2025)
Developed through human-AI collaboration (Claude, ChatGPT, Grok, Gemini)
Haun, J. https://github.com/FrankleFry1/gold-standard-human-values
- GitHub Issues: For technical questions and suggestions
- Discussions: For philosophical debates and applications
- Email: johnhaun04@gmail.com
- EA Forum: Discussion thread
This framework builds on:
- Constitutional AI research (Anthropic)
- Value alignment work (Stuart Russell, Nick Bostrom, Toby Ord)
- Democratic deliberation theory
- Effective altruism and longtermism communities
- All those who will critique and improve this
This project is licensed under the MIT License - see the LICENSE file for details.
Status: Version 1.0 - Released October 2025 - Open for critique and revision
"When I started, I wanted a universal code. What I found instead was a mirror: four AIs reflecting fragments of us, and a reminder that alignment starts with human self-alignment."