Skip to content


Subversion checkout URL

You can clone with
Download ZIP
mediawiki dump parser for loading up wikipedia data
Branch: master



If you're like me, then you enjoy playing with lots of textual data and scour the internet for sources of it.

mediawiki's dumps are a pretty awesome chunk that's fun to work with.


go get


The parser takes any io.Reader as a source assuming it's a complete XML dump and lets you pull wikiparse.Page objects out of it. These typically arrive as bzip2 files, so I make my program open the file and set up a bzip reader over it and all that. But you don't need to do that if you want to read off of stdin. Here's a complete example that emits page titles from a decompressing stream on stdin:

package main

import (


func main() {
    p, err := wikiparse.NewParser(os.Stdin)
    if err != nil {
        fmt.Fprintf(os.Stderr, "Error setting up parser", err)

    for err == nil {
        var page *wikiparse.Page
        page, err = p.Next()
        if err == nil {

Example invocation:

bzcat enwiki-20120211-pages-articles.xml.bz2 | ./sample

Geographical Information

Because it's interesting to me, I wrote a parser for the wikiproject geographical coordinates that are found on many pages. Use this on the page's content to find out if it's a place or not. Then go there.

Something went wrong with that request. Please try again.