Skip to content

DominicBurkart/wikipedia-revisions-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  _      ___ __    _   ___           _     _                 ____                    
 | | /| / (_) /__ (_) / _ \___ _  __(_)__ (_)__  ___  ___   / __/__ _____  _____ ____
 | |/ |/ / /  '_// / / , _/ -_) |/ / (_-</ / _ \/ _ \(_-<  _\ \/ -_) __/ |/ / -_) __/
 |__/|__/_/_/\_\/_/ /_/|_|\__/|___/_/___/_/\___/_//_/___/ /___/\__/_/  |___/\__/_/   

status status

This project serves wikipedia revision differences from a given time period, taking an http request with a start datetime and end datetime, and sending the revisions via a brotli-compressed stream. In the response stream, each line is a JSON-encoded revision.

documentation coming soon 🥧⏲️

Build the project:

docker build -t wikipedia-revisions-server .

Run (specifying working & storage directories, plus dump date):

docker run -it -v /local/path:/fast_dir -v /other/local/path:/big_dir wikipedia-revisions-server -d 20200601

If the data and index files have already been built, you can start the server without having to rebuild:

docker run -it -v /local/path:/fast_dir -v /other/local/path:/big_dir wikipedia-revisions-server

To find a valid date (-d param), go to the wiki archives and find a date with available .xml.bz2 files to download for "All pages with complete page edit history"

See the python wikipedia revisions repo for different download targets & schemes than those available here.

Thanks to JetBrains for providing an open source license to their IDEs for developing this project!