Skip to content

TheJaeLal/LineFormer

Repository files navigation

LineFormer - Rethinking Chart Data Extraction as Instance Segmentation

Official repository for the ICDAR 2023 Paper

[Link] to the paper.

Quantitative Results

Dataset AdobeSynth19 Visual Element Detection1 Data Extraction2 UB-PMC22 Visual Element Detection Data Extraction LineEX Visual Element Detection Data Extraction
ChartOCR 84.67 55 83.89 72.9 86.47 78.25
Lenovo 99.29 98.81 84.03 67.01 - -
LineEX 82.52 81.97 50.23 47.03 71.13 71.08
Lineformer (Ours) 97.51 97.02 93.1 88.25 99.20 97.57

Model Usage

Install Environment

This code is based on MMdetection Framework.

Code has been tested on Pytorch 1.13.1 and CUDA 11.7.

Create Conda Environment and install dependencies:

conda create -n LineFormer python=3.8
conda activate LineFormer
bash install.sh

Inference

  1. Download the Trained Model Checkpoint here
  2. Use the demo inference snippet shown below
import infer
import cv2
import line_utils

img_path = "demo/PMC5959982___3_HTML.jpg"
img = cv2.imread(img_path) # BGR format

CKPT = "iter_3000.pth"
CONFIG = "lineformer_swin_t_config.py"
DEVICE = "cpu"

infer.load_model(CONFIG, CKPT, DEVICE)
line_dataseries = infer.get_dataseries(img, to_clean=False)

# Visualize extracted line keypoints
img = line_utils.draw_lines(img, line_utils.points_to_array(line_dataseries))
    
cv2.imwrite('demo/sample_result.png', img)

Example extraction result:

input image demo result

Citation

If you found our work useful, please cite us as follows:

@InProceedings{10.1007/978-3-031-41734-4_24,
author="Lal, Jay
and Mitkari, Aditya
and Bhosale, Mahesh
and Doermann, David",
editor="Fink, Gernot A.
and Jain, Rajiv
and Kise, Koichi
and Zanibbi, Richard",
title="LineFormer: Line Chart Data Extraction Using Instance Segmentation",
booktitle="Document Analysis and Recognition - ICDAR 2023",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="387--400",
abstract="Data extraction from line-chart images is an essential component of the automated document understanding process, as line charts are a ubiquitous data visualization format. However, the amount of visual and structural variations in multi-line graphs makes them particularly challenging for automated parsing. Existing works, however, are not robust to all these variations, either taking an all-chart unified approach or relying on auxiliary information such as legends for line data extraction. In this work, we propose LineFormer, a robust approach to line data extraction using instance segmentation. We achieve state-of-the-art performance on several benchmark synthetic and real chart datasets. Our implementation is available at https://github.com/TheJaeLal/LineFormer.",
isbn="978-3-031-41734-4"
}

Note: LineFormer returns data in form of x,y points w.r.t the image, to extract full data-values you need to extract axis information, which can be done using this repo.

Footnotes

  1. task-6a from CHART-Info challenge

  2. task-6b data score from CHART-Info challenge