Parse Wikipedia page dump to manage page entities and revisions.
- Get some data (directly reslting from the parsing, not just entities) and go to the professor
- Add a command line utility to specify the source of the dump(s)
- Add an importer for the configurations (Mongo, spark folder, executors conf...)
For each new task:
- create a separate branch
- implement the task
- create pull request
- TEST that nothing is broken
- merge branch to
master