Infrastructure for cognitive diversity in the post-LLM era
This is draft infrastructure exploring whether cognitive diversity enables LLM/non-LLM compatibility. The code is messy because we're learning. Contributions welcome, especially from non-English-dominant perspectives.
Large Language Models have amplified a subtle form of linguistic imperialism: code becomes more readable to English-speaking LLMs but potentially less accessible to programmers who think in other languages. When you write calculate_sum instead of calculer_somme or calcular_suma, you're optimizing for Silicon Valley's training data, not for human cognitive diversity.
What if we could have both?
Ouverture is a function pool where the same logic written in different human languages shares the same hash. A French developer can write:
def calculer_moyenne(nombres):
"""Calcule la moyenne d'une liste de nombres"""
return sum(nombres) / len(nombres)While a Spanish developer writes:
def calcular_promedio(numeros):
"""Calcula el promedio de una lista de números"""
return sum(numeros) / len(numeros)And an English developer writes:
def calculate_average(numbers):
"""Calculate the average of a list of numbers"""
return sum(numbers) / len(numbers)These three functions produce the same hash because they implement identical logic. They're stored together in a content-addressed pool, preserving each language's perspective while recognizing their logical equivalence.
Important clarification: Ouverture is not related to Bitcoin, blockchain, or cryptocurrency in any way. Yes, we use content-addressed storage and hashing. No, this is not a blockchain project.
The vision is about cognitive diversity and multilingual programming, not distributed ledgers or tokens. Content-addressed storage existed long before blockchain (see: Git, which we use daily). The value proposition is:
- Enabling programmers to think in their native languages
- Making code reuse language-agnostic for both humans and LLMs
- Preserving linguistic perspectives while recognizing logical equivalence
This vision holds value completely independent of blockchain technology. We're building tools for human cognitive diversity, not financial speculation.
Ouverture normalizes Python functions by:
- Parsing code to an Abstract Syntax Tree (AST)
- Extracting docstrings (language-specific)
- Renaming variables to canonical forms (
_ouverture_v_0,_ouverture_v_1, etc.) - Computing a hash on the logic only (excluding docstrings)
- Storing both the normalized code and language-specific name mappings
When you retrieve a function, it's reconstructed in your target language:
# Add functions in different languages
python3 ouverture.py add example_simple.py@eng
python3 ouverture.py add example_simple_french.py@fra
python3 ouverture.py add example_simple_spanish.py@spa
# All three produce the same hash!
# Retrieve in any language
python3 ouverture.py get <HASH>@fra # Returns French version
python3 ouverture.py get <HASH>@spa # Returns Spanish versionFor LLMs: They can work with canonical normalized forms, making code search and reuse language-agnostic.
For Humans: Developers maintain their cognitive workspace in their native language while accessing a global function pool.
For Collaboration: A French developer can use a function originally written in Korean without translation loss, because the system preserves both perspectives.
For Diversity: We challenge the assumption that "English variable names = universal readability."
This is research software. The current implementation:
- ✅ Normalizes Python ASTs
- ✅ Generates deterministic hashes for equivalent logic
- ✅ Stores multilingual variants in content-addressed pool
- ✅ Reconstructs code in target language
⚠️ Has known bugs (e.g.,couverturetypo in imports)⚠️ Limited to Python (for now)⚠️ No semantic understanding (purely syntactic)
# View examples
cat example_simple.py # English
cat example_simple_french.py # French
cat example_simple_spanish.py # Spanish
# Add a function to the pool
python3 ouverture.py add example_simple.py@eng
# Get the hash (stored in .ouverture/objects/)
find .ouverture/objects -name "*.json"
# Retrieve in different language
python3 ouverture.py get <HASH>@fraEnglish (example_simple.py):
def sum_list(items):
"""Sum a list of numbers"""
total = 0
for item in items:
total += item
return totalFrench (example_simple_french.py):
def somme_liste(elements):
"""Somme une liste de nombres"""
total = 0
for element in elements:
total += element
return totalThese hash to the same value.
English (example_with_import.py):
from collections import Counter
def count_frequency(items):
"""Count frequency of items"""
return Counter(items)Import names (Counter) are preserved, variable names (items) are normalized.
Functions can reference other functions from the pool:
from ouverture import abc123def as helper
def process_data(values):
"""Process data using helper function"""
return helper(values)The import is normalized to from ouverture import abc123def, making it language-agnostic.
French for "opening" or "overture" - the beginning of something larger. Also a nod to the multilingual nature of the project.
- Single file:
ouverture.py(~600 lines) - Storage:
.ouverture/objects/XX/YYYYYY.json(content-addressed) - Language codes: ISO 639-3 (eng, fra, spa, etc.)
- Hash algorithm: SHA256 on normalized AST
See CLAUDE.md for detailed technical documentation.
- Cognitive compatibility: Does writing in one's native language improve code comprehension and reduce bugs?
- LLM training: Could multilingual code pools improve LLM performance on non-English code?
- Semantic equivalence: When do syntactic differences reflect semantic distinctions vs. mere translation?
- Community building: Can language-diverse function pools foster more inclusive open source communities?
We especially welcome:
- Non-English examples: Add functions in your native language
- Bug reports: The code is messy, we know
- Linguistic insights: Are there language structures Python's AST can't normalize?
- Alternative implementations: Try this in other languages (Rust? JavaScript?)
- Criticism: Is this solving a real problem or creating new ones?
- Import normalization has a typo (
couvertureinstead ofouverture) - Only supports Python 3.9+ (requires
ast.unparse()) - No semantic analysis (purely syntactic)
- Limited testing on edge cases
- No package distribution yet
This project starts from a simple premise: linguistic diversity is cognitive diversity, and cognitive diversity is valuable. In a post-LLM world where AI systems are trained predominantly on English codebases, we risk optimizing for machine readability at the expense of human diversity.
Ouverture asks: what if we built tools that worked with multilingual thinking instead of around it?
MIT (see LICENSE file)
- Non-English-based programming languages: Wikipedia overview of programming languages designed for non-English speakers
- Content-addressed storage: Git, IPFS
- AST-based code similarity: Moss, JPlag
- Multilingual programming: Racket's #lang system, Babylonian programming
- Code normalization: Abstract interpretation, program synthesis
File issues on GitHub. We're learning in public.
"The limits of my language mean the limits of my world." – Ludwig Wittgenstein
Ouverture: What if we had more languages, not fewer?