DocLD-TableBench

TypeScript evaluation toolkit for benchmarking table extraction accuracy using the RD-TableBench dataset and scoring methodology.

This repo contains:

A faithful TypeScript port of Reducto's Needleman-Wunsch table similarity scorer
A dataset loader that downloads the RD-TableBench dataset from HuggingFace
A benchmark runner that scores extraction predictions against ground truth
A test suite that validates the scorer produces correct results on known inputs

Background

RD-TableBench is an open benchmark created by Reducto containing 1,000 complex table images manually annotated by PhD-level human labelers. It uses a hierarchical Needleman-Wunsch alignment algorithm to compute table similarity — scoring both structural fidelity and content accuracy.

We used this benchmark to evaluate DocLD's agentic table extraction. This repo is the evaluation code. See our blog post for the full results and methodology.

Quick Start

# Install dependencies
npm install

# Download the RD-TableBench dataset from HuggingFace
npm run download

# Score predictions against ground truth
npm run score -- --predictions ./my-predictions --output results.json

# Run scorer tests
npm test

Scoring Methodology

The scorer implements the exact same algorithm as Reducto's grading.py with identical parameters:

Parameter	Value	Description
`S_ROW_MATCH`	5	Bonus for aligning two rows
`G_ROW`	-3	Penalty for row gap (insertion/deletion)
`S_CELL_MATCH`	1	Score for identical cells
`P_CELL_MISMATCH`	-1	Penalty for completely different cells
`G_COL`	-1	Penalty for column gap

Two-level alignment

Cell-level: Levenshtein distance normalized to [0, 1], mapped to the range [P_CELL_MISMATCH, S_CELL_MATCH]. Partial credit for near-matches.
Row-level: Needleman-Wunsch alignment using cell-level scores plus S_ROW_MATCH bonus. Free end gaps so cropped sub-tables aren't penalized.

Cell normalization

Before comparison, all cell content is normalized: whitespace collapsed, newlines removed, hyphens stripped — identical to grading.py.

Usage

As a library

import { tableScore, htmlTableToArray } from './src/table-scorer';

const groundTruth = [
  ['Name', 'Age', 'City'],
  ['Alice', '30', 'New York'],
  ['Bob', '25', 'London'],
];

const prediction = [
  ['Name', 'Age', 'City'],
  ['Alice', '30', 'New York'],
  ['Bob', '25', 'Londn'],  // minor typo — gets partial credit
];

const score = tableScore(groundTruth, prediction);
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);

Scoring HTML tables

import { tableScore, htmlTableToArray } from './src/table-scorer';

const gt = htmlTableToArray('<table><tr><td>A</td><td>B</td></tr></table>');
const pred = htmlTableToArray('<table><tr><td>A</td><td>B</td></tr></table>');
console.log(tableScore(gt, pred)); // 1.0

CLI benchmark runner

# Score pre-computed prediction HTML files against ground truth
npx tsx src/run-benchmark.ts \
  --data ./benchmark/data \
  --predictions ./my-predictions \
  --output results.json

# Limit to first 50 samples
npx tsx src/run-benchmark.ts --predictions ./preds --limit 50

Dataset

The benchmark uses the RD-TableBench dataset created by Reducto:

1,000 complex tables from diverse, publicly available documents
PhD-level human annotation — not programmatic labels
Challenging scenarios: scanned tables, handwriting, merged cells, multi-level headers, 20+ languages

Download with:

npm run download
# or
npx tsx src/dataset-loader.ts --output ./benchmark/data

Data Sources

Resource	Link
RD-TableBench announcement	Reducto blog
Dataset (labels + images)	HuggingFace
Original grading code (Python)	grading.py
Original conversion code	convert.py
Needleman-Wunsch algorithm	Wikipedia

License

MIT — see LICENSE.

The RD-TableBench dataset is licensed under CC BY-NC-ND 4.0 by Reducto, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocLD-TableBench

Background

Quick Start

Scoring Methodology

Two-level alignment

Cell normalization

Usage

As a library

Scoring HTML tables

CLI benchmark runner

Dataset

Data Sources

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Doc-LD/rd-tablebench

Folders and files

Latest commit

History

Repository files navigation

DocLD-TableBench

Background

Quick Start

Scoring Methodology

Two-level alignment

Cell normalization

Usage

As a library

Scoring HTML tables

CLI benchmark runner

Dataset

Data Sources

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages