Code from "Automatically Generating Wikipedia Articles: A Structure-Aware Approach"
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Automatically Generating Wikipedia Articles: A Structure-Aware Approach

Christina Sauper Regina Barzilay


In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet. We use the high-level structure of human-authored texts to automatically induce a domain-specific template for the topic structure of a new overview. The algorithmic innovation of our work is a method to learn topicspecific extractors for content selection jointly for the entire template. We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview. The results of our evaluation confirm the benefits of incorporating structural information into the content selection process.

Full text: Sample articles:


This code is available for research use only.


Run options are available by running

Data format

A sample data file is provided in data/ In general, for the section /Section Name/ in the file sections.section_name, the format is as follows:

##Article## !!section title!! body text ...