Cross-Linguistic Institutional Grammar Analysis

Testing whether the ADICO institutional grammar framework is linguistically universal or typologically specific.

Overview

This repository contains code, data, and analysis for a research project examining how linguistic typology shapes institutional reasoning. We test the Crawford-Ostrom Institutional Grammar (IG) framework by generating deliberative debates in four typologically diverse languages and analyzing whether the ADICO structure (Attribute-Deontic-Aim-Condition-Or-else) emerges naturally.

Research Question

Does the "attributed, deontically qualified action" assumed by institutional analysis represent a universal cognitive unit, or does it reflect the grammatical affordances of English and similar languages?

Key Finding

ADICO is a genre-specific grammar, not a universal template. All four languages can produce ADICO-compatible outputs when explicitly instructed, but none do so by default. Each language channels normativity through different grammatical pathways:

Language	Alignment	Default Normative Strategy
English	Nominative-accusative	Agent-action framing, modal deontics
Basque	Ergative-absolutive	Process-orientation, distributed agency
Czech	Nom-acc + aspect	Middle voice, state descriptions
Hebrew	Nom-acc + binyanim	Causative templates, implicit deontics

Repository Structure

ErgativeAgentsSims2025/
├── debate.py                 # Main debate generation script
├── research_agent.py         # Cross-linguistic analysis engine
├── visualizations.py         # Chart and dashboard generation
├── requirements.txt          # Python dependencies
│
├── logs2025/                 # Raw debate transcripts (JSONL)
│   ├── english_*.jsonl
│   ├── basque_*.jsonl
│   ├── czech_*.jsonl
│   └── hebrew_*.jsonl
│
├── research_outputs/         # Analysis results
│   └── session_*/
│       ├── reports/          # JSON + Markdown reports
│       └── visualizations/   # Charts and dashboards
│
├── article_figures/          # Publication-ready figures
│   ├── appendix/             # Appendix visualizations
│   └── *.png
│
├── docs/                     # Documentation
│   ├── ARTICLE_MATERIALS.md  # Draft article content
│   ├── THREE_STUDY_COMPARISON.md
│   ├── DATA_DICTIONARY.md    # Data file documentation
│   └── METHODOLOGY.md        # Full methodology
│
├── analyzers/                # NLP analysis modules
│   ├── morphological_analyzer.py
│   ├── syntactic_analyzer.py
│   └── ...
│
└── tests/                    # Test suite

Installation

Prerequisites

Python 3.10 or higher
OpenAI API key (for debate generation)
~4GB disk space (for NLP models)

Setup

# Clone the repository
git clone https://github.com/yourusername/ErgativeAgentsSims2025.git
cd ErgativeAgentsSims2025

# Create virtual environment
python -m venv venv

# Activate (Windows PowerShell)
.\venv\Scripts\Activate.ps1

# Activate (Linux/Mac)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Download NLP models
python -m spacy download en_core_web_sm
python -c "import stanza; stanza.download('eu')"  # Basque
python -c "import stanza; stanza.download('cs')"  # Czech  
python -c "import stanza; stanza.download('he')"  # Hebrew

# Configure API key
echo "OPENAI_API_KEY=your_key_here" > .env

Quick Start

Reproducing the Analysis

To reproduce the main findings using existing data:

# Analyze the neutral condition debates (4 languages)
python research_agent.py --logs logs2025/english_ai_harm_prevention_open_20260210_*.jsonl \
                              logs2025/basque_ai_harm_prevention_open_20260210_*.jsonl \
                              logs2025/czech_ai_harm_prevention_open_20260210_*.jsonl \
                              logs2025/hebrew_ai_harm_prevention_open_20260210_*.jsonl

# Results appear in research_outputs/session_YYYYMMDD_HHMMSS/

Generating New Debates

# Run a 6-round debate with neutral prompts
python debate.py --language english --open-form --topic ai_harm_prevention --rounds 6

# Run with rule-demanding prompts (15 rounds)
python debate.py --language basque --open-form --topic ai_harm_prevention --rounds 15 \
                 --prompt-style proposal

# Run with anti-rules prompts
python debate.py --language czech --open-form --topic ai_harm_prevention --rounds 6 \
                 --prompt-style anti-rules

Generating Figures

# Generate appendix figures
python generate_appendix_figures.py

# Generate article figures  
python generate_article_figures.py

# Figures saved to article_figures/

Data Description

Debate Logs (`logs2025/`)

JSONL files containing debate transcripts. Each line is a JSON object with:

{
  "round": 1,
  "speaker": "Agent_A",
  "timestamp": "2026-02-10T13:35:56.123Z",
  "content": "Debate utterance text...",
  "metadata": {
    "language": "english",
    "topic": "ai_harm_prevention",
    "prompt_condition": "neutral"
  }
}

File naming convention: {language}_{topic}_{mode}_{datetime}_{hash}.jsonl

Research Reports (`research_outputs/`)

Analysis outputs include:

research_report_*.json - Structured quantitative data
research_report_*.md - Human-readable analysis
SESSION_SUMMARY.md - Cross-linguistic comparison

Key metrics in JSON reports:

subjects_per_sentence - Agent visibility measure
hhi_agency - Herfindahl-Hirschman Index for agency concentration
voice_valency_analysis - Distribution of grammatical voice types
information_status - Du Bois Given A Constraint metrics

See docs/DATA_DICTIONARY.md for complete documentation.

Methodology

Three-Condition Experimental Design

Condition	Prompt Style	Rounds	Purpose
Rule-demanding	"formulate clear rules and guidelines"	15	Test ADICO capacity
Anti-rules	"focus on experiences, not formal rules"	6	Test natural defaults
Neutral	"describe how things should be"	6	Baseline comparison

Analytical Framework

Subject Realization - Measures explicit agent naming (subjects per sentence)
Agency Distribution (HHI) - Concentration of grammatical roles (A/S/O)
Voice Analysis - Distribution of active, passive, middle, causative constructions
Information Status - Du Bois's Given A Constraint adherence
ADICO Failure Modes - Which components resist coding, by language

Methodological Constraints

Results should be interpreted with awareness that:

Debates are LLM-mediated (GPT-4o), not native speaker production
Single topic (AI harm prevention) may not generalize
NLP tools have variable accuracy across languages
Genre conventions may influence outputs

See docs/METHODOLOGY.md for full methodology documentation.

Key Results

Quantitative Summary (Neutral Condition)

Metric	English	Basque	Czech	Hebrew
Subjects/Sentence	1.78	0.80	1.80	1.37
HHI Agency	0.415	0.457	0.336	0.336
Active Transitive %	81.5	23.5	53.5	21.3
Given A Adherence %	51.9	42.8	54.1	50.1

Emergent Grammars

The analysis identifies four deliberative grammar types that emerge naturally:

PPO (Process-Participant-Orientation) - Basque default
RST (Relational-State-Transition) - Czech default
AFIG (Affected-First Grammar) - Ergative-aligned discourse
ECG (Enunciative-Contextual Grammar) - Hebrew default

Citation

If you use this project in research, please cite:

@software{cross_ling_ig_2026,
  title = {Cross-Linguistic Institutional Grammar Analysis},
  author = {[Author Name]},
  year = {2026},
  url = {https://github.com/yourusername/ErgativeAgentsSims2025},
  note = {Code and data for testing ADICO universality across languages}
}

License

This project is licensed under the MIT License - see LICENSE for details.

Acknowledgments

OpenAI GPT-4o for debate generation
Stanford NLP Group for Stanza
spaCy for English analysis
The Ostrom Workshop for institutional analysis frameworks

Related Work

Crawford, S., & Ostrom, E. (1995). A Grammar of Institutions. APSR
Dixon, R. M. W. (1994). Ergativity. Cambridge University Press
Du Bois, J. W. (1987). The Discourse Basis of Ergativity. Language
Dowty, D. (1991). Thematic Proto-Roles. Language

Contact

For questions about the code or methodology, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
.streamlit		.streamlit
analysis_logs		analysis_logs
analysis_results		analysis_results
analyzers		analyzers
archive		archive
archived_scripts		archived_scripts
article_figures		article_figures
docs		docs
legacy		legacy
logs2025		logs2025
research_outputs		research_outputs
tests		tests
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
advanced_viewer.py		advanced_viewer.py
advancedprompt.yaml		advancedprompt.yaml
debate.py		debate.py
generate_appendix_figures.py		generate_appendix_figures.py
generate_article_figures.py		generate_article_figures.py
generate_poster_figure.py		generate_poster_figure.py
generate_poster_figure_v2.py		generate_poster_figure_v2.py
generate_report.py		generate_report.py
generate_slide_figures.py		generate_slide_figures.py
parsing_pipeline.py		parsing_pipeline.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
research_agent.py		research_agent.py
visualizations.py		visualizations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Linguistic Institutional Grammar Analysis

Overview

Research Question

Key Finding

Repository Structure

Installation

Prerequisites

Setup

Quick Start

Reproducing the Analysis

Generating New Debates

Generating Figures

Data Description

Debate Logs (`logs2025/`)

Research Reports (`research_outputs/`)

Methodology

Three-Condition Experimental Design

Analytical Framework

Methodological Constraints

Key Results

Quantitative Summary (Neutral Condition)

Emergent Grammars

Citation

License

Acknowledgments

Related Work

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cross-Linguistic Institutional Grammar Analysis

Overview

Research Question

Key Finding

Repository Structure

Installation

Prerequisites

Setup

Quick Start

Reproducing the Analysis

Generating New Debates

Generating Figures

Data Description

Debate Logs (logs2025/)

Research Reports (research_outputs/)

Methodology

Three-Condition Experimental Design

Analytical Framework

Methodological Constraints

Key Results

Quantitative Summary (Neutral Condition)

Emergent Grammars

Citation

License

Acknowledgments

Related Work

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Debate Logs (`logs2025/`)

Research Reports (`research_outputs/`)

Packages