Skip to content

dlobba/parsewiki

Repository files navigation

parsewiki

Parse Wikipedia page dump to manage page entities and revisions.

Roadmap

  • Get some data (directly reslting from the parsing, not just entities) and go to the professor
  • Add a command line utility to specify the source of the dump(s)
  • Add an importer for the configurations (Mongo, spark folder, executors conf...)

How to...

For each new task:

  • create a separate branch
  • implement the task
  • create pull request
  • TEST that nothing is broken
  • merge branch to master

About

Parse Wikipedia page dump to manage page entities and revisions.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors