Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Metafacture-mediawiki is a plugin for Metafacture.
The modules in Metafacture-Mediawiki can be divided in three groups.
These modules provide MediaWiki xml and wikitext parsing. They create and augment
WikiXmlHandlerparses a MediaWiki xml document and emits a
WikiPageobject for every page found
WikiTextParseruses Sweble to parse the wikitext in a
WikiPageobject and attaches the abstract syntax tree (AST) to the object
Please note: Extractors are called analyzers in the code. The code will be updated with the next major revision (see issue #2) but until this happens the documentation is ahead of the code.
The extractors extract information from the different representations of a wiki page in
WikiPage object and turn these information into a Metafacture event stream.
AuthorityLinkExtractorextracts authority file links (GND, LOC, IMDB, VIAF) from Wikipedia articles
LinkExtractorextracts all internal links in a wiki page from an AST
SimpleLinkExtractorextracts links from a wiki page using regular expression
TemplateExtractorextracts all templates from a wiki pages whose name matches a pattern
MultiExtractorruns a list of extractors and merges the results into a single record. Additionally, it makes sure that each extractor receives a
WikiPagecontaining the representations of the wikitext it requires.
These modules help working with
AstToJsonadds a serialised representation of an AST to a
JsonToAstadds an AST to a
WikiPageobject which is reconstructed from a serialised represenation
Be the first to write a tutorial!