TEGRA-Table-Segmentation

TEGRA: Table Extraction by Global Record Alignment

Overview

This is the benchmark data set used in our experiments described here.

The data set has 10K randomly sampled Web tables. To automatically test table segmentation, for each table, we take out column seperators and concatenate all cells in the same row together. This results in 10K test where the ground truth are original tables.

Data set description

Each row in the file corresponds to a Web table. There are three columns for each row, separated by tabs.

The first column has table ids.
The second column has all rows from the same table, concatenated together and separated by underscores. Cell delimiters are removed, and this is the testing data used by our parsing algorithms.
The third column has the ground truth tables, where true cells boundaries (as they were extracted from Web pages) are denoted by vertical bars.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
TEGRA-Table-Segmentation-Benchmark		TEGRA-Table-Segmentation-Benchmark
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEGRA-Table-Segmentation-Benchmark

TEGRA-Table-Segmentation-Benchmark

README.md

README.md

Repository files navigation

TEGRA-Table-Segmentation

Overview

Data set description

About

Releases

Packages

Yeye-He/TEGRA-Table-Segmentation

Folders and files

Latest commit

History

TEGRA-Table-Segmentation-Benchmark

TEGRA-Table-Segmentation-Benchmark

README.md

README.md

Repository files navigation

TEGRA-Table-Segmentation

Overview

Data set description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages