Generate text from Structured-data (i.e. Wikipedia Table).
- PyTorch - Models, computational graphs are built with PyTorch.
- Numpy - Numpy provides the most-frequently-used operations for tensors.
- Matplotlib - Matplotlib provides toolkits for visualizations in Python.
- NLTK - Natural Language Toolkit, BLEU-Score calculator needed.
- pyrouge - A Python wrapper for the ROUGE evaluation package.
- tqdm - Make epoch loops show a progress meter.
Download dataset: WikiBio (Wikipedia biography dataset) released by David Grangier. It consists of the first paragraph and the infobox.
Decompressing the zip files and copy into a folder named 'data/original/'.
The cleaned data we have preprocessed can be downloaded at Google Drive .
Run python preprocess/preprocess.py
to process original files into json-format files into 'data/' folder.
Run python train.py
to continue training/save model, and evaluate the pre-trained models with metrics BLEU and ROUGE.
The model files we have trained can be downloaded at Google Drive .
Run python data2text.py
to generate utterances and attention maps.