Open Telco

Collection of evals on telecommunications tasks.

Dataset Access Requirements

Before using this repository, you must request permission for the benchmark datasets on HuggingFace:

HuggingFace Configuration:

Get your access token from your HuggingFace account
Add the above repositories to "Repositories permissions"
Click "read access to contents of selected repos"

Prerequisites

Docker or OrbStack (required for sandbox execution)

Docker: https://www.docker.com/get-started
OrbStack (Mac): https://orbstack.dev

uv package manager

curl -LsSf https://astral.sh/uv/install.sh | sh

Setup

Install dependencies:

uv sync

Configure environment variables:

Create a .env file in the root folder with your API credentials:

# Required: HuggingFace token for dataset access
HF_TOKEN=your_huggingface_token_here

# Add API keys for the models you want to use
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here

List of all available models: https://inspect.aisi.org.uk/models.html

Usage

Run evals from the command line:

# TeleQnA
uv run inspect eval src/open_telco/teleqna/teleqna.py
#TeleMath
uv run inspect eval src/open_telco/telemath/telemath.py
#TeleLogs
uv run inspect eval src/open_telco/telelogs/telelogs.py
#TeleYaml
uv run inspect eval src/open_telco/teleyaml/teleyaml.py

With options:

# Specific model
uv run inspect eval src/open_telco/telemath/telemath.py --model openai/gpt-4o

# Limit samples
uv run inspect eval src/open_telco/telemath/telemath.py --limit 10

Alternative: Use the Inspect VS Code Extension or run the Web UI with python ui/app.py

Evals

Knowledge & QA

TeleQnA: Benchmark Dataset to Assess Large Language Models for Telecommunications A benchmark dataset of 10,000 question-answer pairs sourced from telecommunications standards and research articles. Evaluates LLMs' knowledge across general telecom inquiries and complex standards-related questions. Paper | Dataset
```
uv run inspect eval src/open_telco/teleqna/teleqna.py
```

Mathematical Reasoning

TeleMath: Evaluating Mathematical Reasoning in Telecom Domain 500 mathematically intensive problems covering signal processing, network optimization, and performance analysis. Implemented as a ReAct agent using bash and python tools to solve domain-specific mathematical computations. Paper | Dataset
```
uv run inspect eval src/open_telco/telemath/telemath.py
```
Metrics: pass@1, const@16 (majority voting over 16 answers)

Network Operations & Diagnostics

TeleLogs: Root Cause Analysis in 5G Networks A synthetic dataset for root cause analysis (RCA) in 5G networks. Given network configuration parameters and user-plane data (throughput, RSRP, SINR), models must identify which of 8 predefined root causes explain throughput degradation below 600 Mbps. Use -T <N> to specify epochs for pass@1 and maj@4 metrics. Paper | Dataset
```
uv run inspect eval src/open_telco/telelogs/telelogs.py -T 4
```
Metrics: pass@1 (averaged over N epochs), maj@4 (majority voting)

Network Configuration

TeleYaml: 5G Network Configuration Generation Evaluates the capability of LLMs to generate standard-compliant YAML configurations for 5G core network tasks, specifically AMF Configuration, Network Slicing, and UE Provisioning. Dataset
```
uv run inspect eval src/open_telco/teleyaml/teleyaml.py
```
Metrics: model-graded accuracy

Standardization

3GPP TSG: Technical Specification Group Classification Classifies 3GPP technical documents according to their working group. Models act as a distinguished expert in the telecommunication domain to identify the correct group for a given text. Dataset
```
uv run inspect eval src/open_telco/three_gpp/three_gpp.py
```
Metrics: accuracy, stderr

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data/sandboxes/coding_env		data/sandboxes/coding_env
src/open_telco		src/open_telco
ui		ui
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Telco

Dataset Access Requirements

Prerequisites

Setup

Usage

Evals

Knowledge & QA

Mathematical Reasoning

Network Operations & Diagnostics

Network Configuration

Standardization

About

Uh oh!

Releases

Packages

Languages

eaguaida/open_telco

Folders and files

Latest commit

History

Repository files navigation

Open Telco

Dataset Access Requirements

Prerequisites

Setup

Usage

Evals

Knowledge & QA

Mathematical Reasoning

Network Operations & Diagnostics

Network Configuration

Standardization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages