Schema-Driven Information Extraction from Heterogeneous Tables

This repo contains code and data associated with the paper "Schema-Driven Information Extraction from Heterogeneous Tables".

@misc{bai2023schemadriven,
    title={Schema-Driven Information Extraction from Heterogeneous Tables},
    author={Fan Bai and Junmo Kang and Gabriel Stanovsky and Dayne Freitag and Alan Ritter},
    year={2023},
    eprint={2305.14336},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Task: Schema-to-JSON

Installment

Create conda environment.

git clone https://github.com/bflashcp3f/schema-to-json.git
cd schema-to-json
conda env create -f environment.yml
conda activate s2j

Set up OpenAI API key with the environment variable OPENAI_API_KEY. If you want to use Azure, set up the environment variable AZURE_API_KEY.
Install from the source

pip install -e .

Data

Four datasets (MlTables, ChemTables, DiSCoMat and SWDE) in our benchmark are available under the data directory.

Experiments

Below are the commands to reproduce paper results. Make sure you set up API_SOURCE (openai or azure) and BACKEND (model name) in the script. For open-source models, use scripts with suffix _os.sh.

MlTables

# Prompt (w/ error recovery)
bash scripts/mltables/prompt_error_recovery.sh

# Evaluation
bash scripts/mltables/eval.sh

ChemTables

# Prompt (w/ error recovery)
bash scripts/chemtables/prompt_error_recovery.sh

# Evaluation
bash scripts/chemtables/eval.sh

DiSCoMat

# Prompt
bash scripts/discomat/prompt_error_recovery.sh

# Evaluation
bash scripts/discomat/eval.sh

SWDE

# Prompt
bash scripts/swde/prompt.sh

# Evaluation
bash scripts/swde/eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Schema-Driven Information Extraction from Heterogeneous Tables

Task: Schema-to-JSON

Installment

Data

Experiments

MlTables

ChemTables

DiSCoMat

SWDE

Files

README.md

Latest commit

History

README.md

File metadata and controls

Schema-Driven Information Extraction from Heterogeneous Tables

Task: Schema-to-JSON

Installment

Data

Experiments

MlTables

ChemTables

DiSCoMat

SWDE