Skip to content

86me/wikiracer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiRacer

Given two Wikipedia pages, WikiRacer will attempt to find the shortest path between the two using only links to other Wikipedia pages.

WikiRacer uses live data (via the MediaWiki API) and is extremely fast by using a bi-directional, depth-first search algorithm.

$ time ./wikiracer Pleiades OpenBSD
Pleiades
Bible
Système universitaire de documentation
Bill Joy
OpenBSD
Elapsed time:  3.120570172s
./wikiracer Pleiades OpenBSD  0.49s user 0.04s system 7% cpu 7.132 total

$ time ./wikiracer "Jim Beam" "King George"
Jim Beam
Kentucky
Letters patent
George IV of the United Kingdom
King George
Elapsed time:  2.672853825s
./wikiracer "Jim Beam" "King George"  0.32s user 0.03s system 7% cpu 4.684 total

Building

The WikiRacer HTTP service depends on gorrilla/mux. Fetch and build with: go get github.com/gorilla/mux && go get github.com/86me/wikiracer cd $GOPATH/src/github.com/86me/wikiracer && go build && ./wikiracer

Testing

Recursively run the tests with: go test ./...

Running

usage: ./wikiracer [-debug] "from_title" "to_title"

  -debug
        Output logs to stderr
  -help
        Additional help information
  -serve
        Run HTTP server

Examples:

$ ./wikiracer "Ada Lovelace" "Robert Frost"
Ada Lovelace
Artificial intelligence
Dartmouth College
Robert Frost
Elapsed time:  1.517820434s

$ ./wikiracer -serve 0.0.0.0:4040
[WikiRacer] service running at  0.0.0.0:4040

Limitations

  • wikirace adheres to the WikiMedia etiquette guide as faithfully as possible. To that end, it runs, at most, two simultaneous API requests to Wikipedia at a time.

Process

  • 1 hour - Researching possible strategies for building wikiracer including tools, libraries, algorithms, and Wikipedia's MediaWiki API. For algorithms, the bi-directional, depth-first method immediately stood out to me as the way to go, with normal breadth-first/depth-first complexity being [O(b^d)], executing two searches would reduce complexity to [O(b^(d/2))] for each search, bringing total complexity to [O(b^(d/2) + b^(d/2)], which is far less than the vanilla BFS/DFS method.
  • 1 hour - Familiarizing myself with the MediaWiki API. I started with patrickmn's go-wikimedia implementation, which was far too basic for anything other than making initial test queries to see how the API responded to basic requests. I studied a few different implementations of the MediaWiki API in other languages, but decided to use go because I really enjoy writing go code, I want to get more familiar with go internals, mutexes and concurrency patterns. I also really love the testing functionality of go and find it extremely powerful.
  • 2 hours - Started fleshing out the base functionality. Came across Tyson Mote's implementation that used the BDFS algoritm and used it as a frame of reference. I especially appreciated the RWMutex concurrency pattern. Implemented a batch request model to keep overall API requests to a minimum. This model adheres to Wikipedia's published API limits. Added regular expressions matching to the boring links to better filter out unwanted paths.
  • 1 hour - Implemented a REST API using gorrilla/mux. I could have just used the native go net/http package, but gorilla offered some helpful abstractions (mux/router). The REST API is accessible by running wikiracer with the -serve switch. It binds to port 8686 by default. The port and address can be passed in as a parameter to the -serve switch. eg: wikiracer -serve 0.0.0.0:4040
  • 2 hours - Testing, writing tests, writing README, testing, more testing.

About

Wiki Race implementation in go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages