Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 40 million developers.Sign up
- Integrated with readthedocs
- Integrated with Travis CI
- Improved documentation
- Cleaned up sorts
script/maria2csv.pyfor converting MySQL/MariaDB dumps to CSV format.
- Major rework of
script/createlinks.shscript: Switched from column-number-based access to column-name-based access (this now includes Wikis that have a different DB layout, i.e. columns are not in the standard order).
- Removed second dictionary from
i+1PageRank score positions with iterations in a single dictionary.
In previous versions of this code, there was no strict separation between iterations. In this case, the order of the nodes can start to play a role (higher numbers are updated later and can therefore make use of mostly already updated scores from the incoming links). Over multiple iterations, this could introduce a skew that we want to avoid.
Wikipedia categories were not considered the previous version. Now these important pages (that all have also Wikidata Q-IDs) are also reflected in the computations.
Link-files are now compressed after computation. This safes disk space. For the ALL option, also some statistics can be found in the output (i.e., number of links per language).
We release the first stable version of danker. Current features include:
- Compute PageRank on any Wikipedia language edition.
- Compute PageRank with the BIGMEM option (faster)
- Compute PageRank over the union set (bag semantics) of links of ALL Wikipedia language editions.