Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Parse MediaWiki dumps for the Get to Philosophy game
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
config
db
lib
log
temp
test
.gitignore
README
Rakefile
configuration.example.yml

README

Parses Wikipedia dumps for the Get to Philosophy game, as seen at http://en.wikipedia.org/wiki/Wikipedia:Get_to_Philosophy

Ultimately this will save data in a format suitable for loading into a Rails website, but until then, run bin/page_xml_parser_interface.rb on an XML file like simplewiki-20081029-pages-articles.xml

To get such an XML file, go to http://download.wikimedia.org/backup-index.html and then choose your language edition, and download a dump that contains the latest revision (as opposed to all revisions) of all articles.

To do:

Speed up analysis, preferably using less memory. Ensure iteration over pages doesn't load all of them at once.

Handle mainspace articles that have colons in their names.

Consider how to handle footnotes.
Should Philosophy count as linking to A.C. Grayling, the author of a cited publication?
What about an explanatory footnote in the United Kingdom?

Find the longest chain from any article to its loop.

Find out how long the terminating loops are.

Add the option to ignore certain links such as dates or Greek Language.

Handle templates containing text content, such as Template:Day.

Distinguish between articles and redirects?
Something went wrong with that request. Please try again.