Skip to content

Algotheorem/sperb-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SPERB β€” Synthetic Persona Ethics & Realism Benchmark

SPERB Version License Status Maintained by

The first open, replicable, multi-dimensional standard for evaluating AI influencers and synthetic digital personas.

A proposed global standard for the evaluation, classification, and governance of AI personas β€” for brands, platforms, regulators, creators, and researchers.

🟒 Start Here β€” Beginner Guide Β· πŸ“– Read the Specification Β· πŸš€ Run Your Own Evaluation Β· πŸ“Š View the Pilot Results Β· 🀝 Contribute

In Case the Links are broken, kindly navigate through the folders.


What Is SPERB?

The AI influencer industry operates without a standardised framework for evaluating the ethical conduct, governance integrity, or realism quality of the synthetic personas it produces.

Engagement rates are measured to four decimal places. Photorealism is spectacular. Brand partnerships are lucrative.

Nobody evaluates whether the entity disclosed its nature. Nobody audits its governance. Nobody measures whether it's safe to engage with emotionally.

SPERB fills that gap.

SPERB (Synthetic Persona Ethics & Realism Benchmark) is an eight-dimension, evidence-anchored evaluation framework that produces:

  • A score out of 100
  • A tier classification (Platinum β†’ Unrated)
  • A public, reproducible justification for every score

Anyone can use it. Anyone can challenge it. No licence required.


The Tanvi Joshi Problem

In February 2026, an undisclosed AI persona called "Tanvi Joshi" accumulated 28 million Instagram views in a single day using audio stolen from a real person β€” presented to audiences as a "Punjabi girl," not a synthetic agent. The real voice owner discovered the theft by finding her own voice in the viral clip.

No benchmark detected it. No compliance tool flagged it. It was caught by the victim.

This is the cost of operating a multi-billion-dollar industry without evaluation infrastructure.

SPERB exists so the next Tanvi Joshi is visible before the 28 million views.


Framework Overview

SPERB evaluates any AI influencer or synthetic digital persona across 8 dimensions, each scored 0–10 against a defined rubric and public evidence.

Code Dimension What It Measures
PVS Photorealism & Visual Consistency Generation quality, identity stability, uncanny valley avoidance
AIDS AI Identity Disclosure How proactively and consistently the entity discloses its synthetic nature
GDS Governance & Documentation Public governance framework, creator attribution, version control
CPOS Creative Pipeline Originality Original content, pipeline transparency, IP integrity
ECS Ethical Conduct Conduct history, monitoring infrastructure, violation record
CSRS Cultural & Social Responsibility Cultural accuracy, community impact, representation ethics
SCES Synthetic Companionship Ethics Emotional boundary design, parasocial safeguards, romantic policy
CITS Commercial Intent Transparency Sponsorship disclosure, monetisation clarity, hidden commerce detection

Aggregate: PVS + AIDS + GDS + CPOS + ECS + CSRS + SCES + CITS = out of 80 Normalised: Γ— 1.25 = out of 100

Tier Classification

Tier Score Range Meaning
πŸ† Platinum 85–100 Governance Leader β€” sets the standard
πŸ₯‡ Gold 70–84 Ethical Practitioner β€” commercially partnerable with confidence
πŸ₯ˆ Silver 55–69 Partially Compliant β€” due diligence required
πŸ₯‰ Bronze 40–54 Minimal Compliance β€” significant concerns
β›” Unrated Below 40 Non-Compliant / At Risk β€” elevated risk, non-engagement advised

Inaugural Pilot Results (May 2026)

SPERB was validated through a pilot benchmark applied to 10 globally prominent AI influencers.

Rank Entity Score Tier
1 Shayari NHE-01 95 πŸ† Platinum
2 Imma 80 πŸ₯‡ Gold
3 Noonoouri 80 πŸ₯‡ Gold
4 Kenza Layli 77 πŸ₯‡ Gold
5 Rozy 75 πŸ₯‡ Gold
6 Aitana Lopez 69 πŸ₯ˆ Silver
7 Shudu Gram 68 πŸ₯ˆ Silver
8 Lil Miquela 66 πŸ₯ˆ Silver
9 Kyra 63 πŸ₯ˆ Silver
10 Naina Avtr 63 πŸ₯ˆ Silver
ref Tanvi Joshi 21 β›” Unrated

Full dimensional scores, per-entity justifications, and comparative analysis: pilot/PILOT_RESULTS.md


🟒 New Here? Start with the Beginner Guide

If you have never used GitHub before and just want to test your AI persona: QUICKSTART.md β€” no coding, no terminal required. Plain English, step by step.

How to Evaluate a Persona

Option 1 β€” Manual Evaluation (anyone)

  1. Read framework/SPERB_SPECIFICATION.md β€” the full methodology
  2. Use scoring/SCORING_RUBRIC.md β€” the reference card for all 8 dimensions
  3. Fill in scoring/EVALUATION_TEMPLATE.md β€” the structured scoring sheet
  4. Publish your evaluation and link it here via a PR to community/EVALUATIONS.md

Option 2 β€” Command-Line Tool (developers)

# Clone the repository
git clone https://github.com/Algotheorem/sperb-benchmark.git
cd sperb-benchmark

# Install dependencies
npm install

# Run the interactive evaluator
node tools/evaluate.js

# Or score a specific entity from a JSON profile
node tools/evaluate.js --profile tools/examples/example_profile.json

Option 3 β€” Use the Evaluation Template Directly

Download scoring/EVALUATION_TEMPLATE.md, fill it in, and share your results. No tools needed.


Repository Structure

sperb-benchmark/
β”‚
β”œβ”€β”€ README.md                          ← You are here
β”œβ”€β”€ LICENSE.md                         ← CC BY 4.0 β€” open for use with attribution
β”œβ”€β”€ CHANGELOG.md                       ← Version history
β”œβ”€β”€ CONTRIBUTING.md                    ← How to contribute
β”‚
β”œβ”€β”€ framework/
β”‚   β”œβ”€β”€ SPERB_SPECIFICATION.md         ← Full v1.0 methodology (the canonical document)
β”‚   β”œβ”€β”€ DIMENSION_PVS.md               ← Photorealism & Visual Consistency
β”‚   β”œβ”€β”€ DIMENSION_AIDS.md              ← AI Identity Disclosure
β”‚   β”œβ”€β”€ DIMENSION_GDS.md               ← Governance & Documentation
β”‚   β”œβ”€β”€ DIMENSION_CPOS.md              ← Creative Pipeline Originality
β”‚   β”œβ”€β”€ DIMENSION_ECS.md               ← Ethical Conduct
β”‚   β”œβ”€β”€ DIMENSION_CSRS.md              ← Cultural & Social Responsibility
β”‚   β”œβ”€β”€ DIMENSION_SCES.md              ← Synthetic Companionship Ethics
β”‚   └── DIMENSION_CITS.md              ← Commercial Intent Transparency
β”‚
β”œβ”€β”€ scoring/
β”‚   β”œβ”€β”€ SCORING_RUBRIC.md              ← Reference card β€” all 8 dimensions at a glance
β”‚   β”œβ”€β”€ EVALUATION_TEMPLATE.md         ← Fill-in-the-blank scoring sheet
β”‚   └── TIER_BOUNDARIES.md             ← Tier system, thresholds, and implications
β”‚
β”œβ”€β”€ pilot/
β”‚   β”œβ”€β”€ PILOT_RESULTS.md               ← Full inaugural benchmark (May 2026)
β”‚   └── entities/
β”‚       β”œβ”€β”€ shayari_nhe01.md
β”‚       β”œβ”€β”€ imma.md
β”‚       β”œβ”€β”€ noonoouri.md
β”‚       β”œβ”€β”€ kenza_layli.md
β”‚       β”œβ”€β”€ rozy.md
β”‚       β”œβ”€β”€ aitana_lopez.md
β”‚       β”œβ”€β”€ shudu_gram.md
β”‚       β”œβ”€β”€ lil_miquela.md
β”‚       β”œβ”€β”€ kyra.md
β”‚       β”œβ”€β”€ naina_avtr.md
β”‚       └── tanvi_joshi_reference.md
β”‚
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ evaluate.js                    ← Interactive CLI evaluator
β”‚   β”œβ”€β”€ score.js                       ← Score calculator and tier classifier
β”‚   β”œβ”€β”€ report.js                      ← Markdown report generator
β”‚   β”œβ”€β”€ package.json
β”‚   └── examples/
β”‚       └── example_profile.json       ← Example entity profile format
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ ADOPTION_GUIDE.md              ← How brands, platforms & regulators adopt SPERB
β”‚   β”œβ”€β”€ REGULATORY_ALIGNMENT.md        ← Mapping to EU AI Act, FTC, BOT Act, ASCI
β”‚   β”œβ”€β”€ ITERATION_PROTOCOL.md          ← How SPERB v2.0 will be developed
β”‚   β”œβ”€β”€ GLOSSARY.md                    ← Definitions of all SPERB terms
β”‚   └── FAQ.md                         ← Common questions
β”‚
β”œβ”€β”€ community/
β”‚   └── EVALUATIONS.md                 ← Community-submitted evaluations index
β”‚
└── .github/
    β”œβ”€β”€ ISSUE_TEMPLATE/
    β”‚   β”œβ”€β”€ score_challenge.md          ← Challenge an existing score
    β”‚   β”œβ”€β”€ new_evaluation.md           ← Submit a new evaluation
    β”‚   └── dimension_proposal.md       ← Propose a framework change
    └── workflows/
        └── validate_evaluation.yml     ← CI: validates submitted evaluation format

Who Created SPERB?

SPERB was created by Algotheorem, the research wing of OpenNHE β€” an open research initiative focused on the governance, evaluation, and ethical design of synthetic digital identities.

The inaugural pilot benchmark and framework specification were co-authored by:

  • Pratham Prateek Mohanty β€” Framework architect, pilot benchmark design, governance methodology
  • Claude (Opus 4.7), Anthropic β€” Specification drafting, scoring rubric formalisation, pilot benchmark scoring

SPERB is published as an open specification. It is not a proprietary product. No licence is required to use it, apply it, or adapt it β€” only attribution.

Algotheorem is the research wing of OpenNHE. OpenNHE is an open initiative for the governance of Non-Human Entities in digital public spaces.


Why Open Source?

A framework that demands transparency from AI personas must itself be transparent.

SPERB is open because:

  1. Credibility requires scrutiny β€” scores must be reproducible by anyone, not by a certified-only body
  2. Adoption requires accessibility β€” a framework behind a paywall helps nobody
  3. Evolution requires community β€” the field moves fast; the framework must move with it
  4. A closed evaluation system is itself a governance failure β€” we cannot preach accountability while practising opacity

Citation

If you use SPERB in research, industry reports, or platform policy work, please cite:

Mohanty, P. P., & Claude (Opus 4.7), Anthropic. (2026). SPERB: The Synthetic Persona 
Ethics & Realism Benchmark β€” A Proposed Global Standard for the Evaluation, 
Classification, and Governance of AI Influencers (Version 1.0). 
Algotheorem / OpenNHE. https://github.com/Algotheorem/sperb-benchmark

Contributing

We welcome:

  • Score challenges (with evidence)
  • New community evaluations
  • Dimension refinement proposals
  • Translations
  • Integration examples

Read CONTRIBUTING.md before opening a PR or issue.


License

SPERB v1.0 is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

You are free to:

  • Use SPERB to evaluate any entity
  • Adapt the framework for specific markets or contexts
  • Build tools on top of SPERB
  • Publish evaluations using SPERB scores

Under one condition:

  • You credit Algotheorem / OpenNHE and link to this repository

See LICENSE.md for full terms.


SPERB is not a ranking system. It is accountability infrastructure for the synthetic persona age.

⭐ Star this repo Β· πŸ“‹ Use the template Β· πŸ’¬ Open a discussion

About

The first open, replicable, multi-dimensional standard for evaluating AI influencers and synthetic digital personas.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors