Skip to content

TypeScript port of RD-TableBench table similarity scoring — Needleman-Wunsch hierarchical alignment for evaluating table extraction accuracy.

License

Notifications You must be signed in to change notification settings

Doc-LD/rd-tablebench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocLD-TableBench

TypeScript evaluation toolkit for benchmarking table extraction accuracy using the RD-TableBench dataset and scoring methodology.

This repo contains:

  • A faithful TypeScript port of Reducto's Needleman-Wunsch table similarity scorer
  • A dataset loader that downloads the RD-TableBench dataset from HuggingFace
  • A benchmark runner that scores extraction predictions against ground truth
  • A test suite that validates the scorer produces correct results on known inputs

Background

RD-TableBench is an open benchmark created by Reducto containing 1,000 complex table images manually annotated by PhD-level human labelers. It uses a hierarchical Needleman-Wunsch alignment algorithm to compute table similarity — scoring both structural fidelity and content accuracy.

We used this benchmark to evaluate DocLD's agentic table extraction. This repo is the evaluation code. See our blog post for the full results and methodology.

Quick Start

# Install dependencies
npm install

# Download the RD-TableBench dataset from HuggingFace
npm run download

# Score predictions against ground truth
npm run score -- --predictions ./my-predictions --output results.json

# Run scorer tests
npm test

Scoring Methodology

The scorer implements the exact same algorithm as Reducto's grading.py with identical parameters:

Parameter Value Description
S_ROW_MATCH 5 Bonus for aligning two rows
G_ROW -3 Penalty for row gap (insertion/deletion)
S_CELL_MATCH 1 Score for identical cells
P_CELL_MISMATCH -1 Penalty for completely different cells
G_COL -1 Penalty for column gap

Two-level alignment

  1. Cell-level: Levenshtein distance normalized to [0, 1], mapped to the range [P_CELL_MISMATCH, S_CELL_MATCH]. Partial credit for near-matches.
  2. Row-level: Needleman-Wunsch alignment using cell-level scores plus S_ROW_MATCH bonus. Free end gaps so cropped sub-tables aren't penalized.

Cell normalization

Before comparison, all cell content is normalized: whitespace collapsed, newlines removed, hyphens stripped — identical to grading.py.

Usage

As a library

import { tableScore, htmlTableToArray } from './src/table-scorer';

const groundTruth = [
  ['Name', 'Age', 'City'],
  ['Alice', '30', 'New York'],
  ['Bob', '25', 'London'],
];

const prediction = [
  ['Name', 'Age', 'City'],
  ['Alice', '30', 'New York'],
  ['Bob', '25', 'Londn'],  // minor typo — gets partial credit
];

const score = tableScore(groundTruth, prediction);
console.log(`Similarity: ${(score * 100).toFixed(1)}%`);

Scoring HTML tables

import { tableScore, htmlTableToArray } from './src/table-scorer';

const gt = htmlTableToArray('<table><tr><td>A</td><td>B</td></tr></table>');
const pred = htmlTableToArray('<table><tr><td>A</td><td>B</td></tr></table>');
console.log(tableScore(gt, pred)); // 1.0

CLI benchmark runner

# Score pre-computed prediction HTML files against ground truth
npx tsx src/run-benchmark.ts \
  --data ./benchmark/data \
  --predictions ./my-predictions \
  --output results.json

# Limit to first 50 samples
npx tsx src/run-benchmark.ts --predictions ./preds --limit 50

Dataset

The benchmark uses the RD-TableBench dataset created by Reducto:

  • 1,000 complex tables from diverse, publicly available documents
  • PhD-level human annotation — not programmatic labels
  • Challenging scenarios: scanned tables, handwriting, merged cells, multi-level headers, 20+ languages

Download with:

npm run download
# or
npx tsx src/dataset-loader.ts --output ./benchmark/data

Data Sources

Resource Link
RD-TableBench announcement Reducto blog
Dataset (labels + images) HuggingFace
Original grading code (Python) grading.py
Original conversion code convert.py
Needleman-Wunsch algorithm Wikipedia

License

MIT — see LICENSE.

The RD-TableBench dataset is licensed under CC BY-NC-ND 4.0 by Reducto, Inc.

About

TypeScript port of RD-TableBench table similarity scoring — Needleman-Wunsch hierarchical alignment for evaluating table extraction accuracy.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors