Skip to content

ding-haijie/SAN_D2T

Repository files navigation

SAN_D2T

Generate text from Structured-data (i.e. Wikipedia Table).

Dependencies

  • PyTorch - Models, computational graphs are built with PyTorch.
  • Numpy - Numpy provides the most-frequently-used operations for tensors.
  • Matplotlib - Matplotlib provides toolkits for visualizations in Python.
  • NLTK - Natural Language Toolkit, BLEU-Score calculator needed.
  • pyrouge - A Python wrapper for the ROUGE evaluation package.
  • tqdm - Make epoch loops show a progress meter.

Instructions

Prepare Data

Download dataset: WikiBio (Wikipedia biography dataset) released by David Grangier. It consists of the first paragraph and the infobox.

Decompressing the zip files and copy into a folder named 'data/original/'.

The cleaned data we have preprocessed can be downloaded at Google Drive .

Usage

Run python preprocess/preprocess.py to process original files into json-format files into 'data/' folder.

Run python train.py to continue training/save model, and evaluate the pre-trained models with metrics BLEU and ROUGE.

The model files we have trained can be downloaded at Google Drive .

Run python data2text.py to generate utterances and attention maps.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages