Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
cff-version: 1.2.0
title: "EVA: A New Framework for Evaluating Voice Agents"
message: "If you use this software, please cite it as below."
type: software
authors:
- family-names: Bogavelli
given-names: Tara
- family-names: Gauthier Melançon
given-names: Gabrielle
- family-names: Stankiewicz
given-names: Katrina
- family-names: Bamgbose
given-names: Oluwanifemi
- family-names: Nguyen
given-names: Hoang
- family-names: Mehndiratta
given-names: Raghav
- family-names: Subramani
given-names: Hari
version: "0.1.1"
date-released: "2026-03-24"
repository-code: "https://github.com/ServiceNow/eva"
url: "https://servicenow.github.io/eva/"
license: MIT
keywords:
- voice agents
- evaluation
- speech
- conversational AI
42 changes: 42 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Code of Conduct

This code of conduct provides guidelines for participation in this open-source project.

## Discussion Forum Guidelines

Communities thrive when members support each other and provide useful feedback.

- Be polite and courteous. Respect and treat others as you would expect to be treated yourself.
- Respect your audience. Posts should not upset, annoy, threaten, harass, abuse or embarrass other members.
- User contributions must not include material that is defamatory, obscene, indecent, abusive, offensive, harassing, violent, hateful, inflammatory or otherwise objectionable.
- Lively and collegial discussions are always encouraged in a healthy community. It is okay to argue facts but not okay to argue personalities or personal beliefs.
- Do not use text formats such as all caps or bold that may be read as annoying, rude or send a strong message.
- Do not publish anyone's private personal information without their explicit consent.
- Avoid using abbreviations or terminology that others may not understand. An abbreviation may mean something to you but in another context or country, it may have another meaning.
- Be accountable for your actions by correcting your mistakes and indicating where you have changed a previous post of yours.
- Mark content as correct and helpful, and provide feedback. If you read a discussion post that you find helpful, we encourage you to leave a positive vote and comment in the replies. If you find a post that is unhelpful, please provide more information in the issue comments.

## Issue Board Guidelines

Many open-source projects provide an Issues board, with similar functionality to a Discussions forum. The same rules from the discussion forum guidelines apply to the Issues board.

We suggest the following technical support pathways for open-source projects:

1. Clearly identify and document the issue or question you have.
2. View the documentation.
3. Search the Discussions.
4. Search the project knowledge base or Wiki for known errors, useful solutions, and troubleshooting tips.
5. Check the project guidelines in the [`CONTRIBUTING.md`](CONTRIBUTING.md) file if you would like details on how you can submit a change. Community contributions are valued and appreciated!
6. Log an Issue if it hasn't already been logged. If the issue has already been logged by another user, vote it up, and add a comment with additional or missing information. Do your best to choose the correct category when logging a new issue. This will make it easier to differentiate bugs from new feature requests or ideas. If after logging an issue you find the solution, please close your issue and provide a comment with the solution. This will help the project owners and other users.
7. Contact the project team contributors of the project to see if they can help as a last resort only.

## Repositories

- Read and follow the license instructions.
- Remember to include citations if you use someone else's work in your own project. Use the [`CITATION.cff`](CITATION.cff) to find the correct project citation reference.
- 'Star' project repos to save for future reference.
- 'Watch' project repos to get notifications of changes – this can get noisy for some projects, so only watch the ones you really need to track closely.

## Disclaimer

We may, but are under no obligation to, monitor or censor comments made by users or content provided by contributors and we are not responsible for the accuracy, completeness, appropriateness or legality of anything posted, depicted or otherwise provided by third-party users and we disclaim any and all liability relating thereto.
37 changes: 37 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Contributing to EVA

Thank you for your interest in contributing to EVA!

This document should be able to guide contributors in their different types of contributions.

Just want to ask a question? Open a topic on our [Discussion page](https://github.com/ServiceNow/eva/discussions).

## Get Your Environment Setup

Go to our [Quick Start](README.md) section in the README to get set up.

## How to Submit a Bug Report

[Open an issue on GitHub](https://github.com/ServiceNow/eva/issues/new/choose) and select "Bug report". If you are not sure whether it is a bug or not, submit an issue and we will be able to help you.

Issues with reproducible examples are easier to work with. Do not hesitate to provide your configuration with generated data if need be.

If you are familiar with the codebase, providing a unit test is helpful, but not mandatory.

## How to Submit Changes

First, open an issue describing your desired changes, if it does not exist already.
1. [Fork the repo to your own account](https://github.com/ServiceNow/eva/fork).
2. Clone your fork of the repo locally.
3. Make your changes (the fun part).
4. Commit and push your changes to your fork.
5. [Open a pull request](https://github.com/ServiceNow/eva/compare) with your branch.
6. Once a team member approves your changes, we will merge the pull request promptly.

### Guidelines for a Good Pull Request

When coding, pay special attention to the following:

- Your code should be well commented for non-trivial sections, so it can be easily understood and maintained by others, but not over-commented. Good variable names and functions are your best friend.
- Do not expose any personal or sensitive data.
- Add unit tests when a notable functionality has been added or changed.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,15 +175,15 @@ pytest tests/integration/test_metrics.py -v

Existing benchmarks evaluate voice agent components in isolation — speech understanding, TTS quality, or conversational dynamics — but none assess the full pipeline end to end. In real deployed systems, errors compound across modules and failure modes interact in ways that component-level evaluation cannot capture. EVA addresses this by treating voice agent quality as an integrated whole, evaluating accuracy and experience jointly across complete multi-turn spoken conversations.

| **Framework** | **Interaction Mode** | **Multi-turn** | **Tool Calling** | **Goal Completion** | **Experience Metrics** | **Pass@k, Pass^k** | **Supported Systems** |
| **Framework** | **Interaction Mode** | **Multi-turn** | **Tool Calling** | **Goal Completion** | **Experience Metrics** | **Pass@k<br>Pass^k** | **Supported Systems** |
|---|---|---|---|---|---|--------------------|---|
| **EVA** | Live bot-to-bot | ✅ | ✅ | ✅ (Task Completion, Speech Fidelity, Faithfulness) | ✅ (Conciseness, Turn-taking, Latency, Progression) | ✅ | Audio-native, Cascade |
| **EVA** | Live bot-to-bot | ✅ | ✅ | ✅ <br>Task Completion, Speech Fidelity, Faithfulness | ✅ <br>Conciseness, Turn-taking, Latency, Progression | ✅ | Audio-native, Cascade |
| **VoiceAgent&shy;Bench** | Static, TTS-synthesized | ✅ | ✅ | ⚠️ | ❌ | ❌ | Audio-native, Cascade |
| **CAVA** | Partial simulation | ✅ | ✅ | ⚠️ | ⚠️ (Latency, Tone-awareness) | ❌ | Audio-native, Cascade |
| **FDB-v2** | Live, automated examiner | ✅ | ❌ | ❌ | ✅ (Turn-taking fluency, Correction handling, Safety) | ❌ | Audio-native |
| **FDB-v1** | Static, pre-recorded | ❌ | ❌ | ❌ | ✅ (Turn-taking, Backchanneling, Interruption) | ❌ | Audio-native |
| **FD-Bench** | Live, simulated | ❌ | ❌ | ❌ | ✅ (Interruption, Delay, Robustness) | ❌ | Audio-native |
| **Talking Turns** | Static, curated | ❌ | ❌ | ❌ | ✅ (Turn change, Backchannel, Interruption) | ❌ | Audio-native, Cascade |
| **CAVA** | Partial simulation | ✅ | ✅ | ⚠️ | ⚠️ <br>Latency, Tone-awareness | ❌ | Audio-native, Cascade |
| **FDB-v2** | Live, automated examiner | ✅ | ❌ | ❌ | ✅ <br>Turn-taking fluency, Correction handling, Safety | ❌ | Audio-native |
| **FDB-v1** | Static, pre-recorded | ❌ | ❌ | ❌ | ✅ <br>Turn-taking, Backchanneling, Interruption | ❌ | Audio-native |
| **FD-Bench** | Live, simulated | ❌ | ❌ | ❌ | ✅ <br>Interruption, Delay, Robustness | ❌ | Audio-native |
| **Talking Turns** | Static, curated | ❌ | ❌ | ❌ | ✅ <br>Turn change, Backchannel, Interruption | ❌ | Audio-native, Cascade |

## 🏗️ Architecture

Expand Down Expand Up @@ -253,6 +253,7 @@ eva/
├── compose.yaml # Docker Compose configuration
├── src/eva/
│ ├── cli.py # CLI interface
│ ├── run_benchmark.py # Benchmark runner
│ ├── models/ # Pydantic data models
│ ├── orchestrator/ # Framework execution
│ │ ├── runner.py # Main orchestrator
Expand All @@ -277,7 +278,6 @@ eva/
│ │ └── validation/ # Quality control metrics
│ └── utils/ # Utilities (LLM client, log processing)
├── scripts/ # Utility scripts
│ ├── run_benchmark.py # Benchmark runner
│ ├── run_text_only.py # Text-only evaluation runner
│ ├── docker_entrypoint.py # Docker entry point
│ ├── check_version_bump.py # Version checking
Expand Down
Loading