Skip to content
charlie-wt edited this page Sep 1, 2017 · 8 revisions

The ranker module contains the main decision-making process for doing simulated readings.

It contains a number of functions, each of which takes a list of visible pages (as well as information on the user and story), and outputs a dictionary of scores on how 'desirable' a page is. This is then passed on to a decider function to make a choice on which page to go to.

The scores given to pages must add up to 1, meaning they can represent the probabilities of choosing each page (if given to dc.rand).

These functions are made to be given to traverser.traverse, though you could define your own (though it would need to have the same parameters).


using a ranker in a traversal

When performing a simulated reading (or many) with traverser.traverse, simply set the ranker argument to your ranker of choice from this module.

using a machine learning ranker

The machine learning based rankers (linreg, logreg & nn) require an extra step: first, you must call the corresponding function in the ml module to train the model.

using manually created weights in a machine learning ranker

There is a manual_heuristics variable in the ranker module, filled with manually tuned heuristic weights for use by the linear regression ranker linreg. However, you can change these values or create your own. The format is:

manual_heuristics = {
	'w': [
    	1.0,		# weight of walk dist heuristic
        2.0,		# weight of visits heuristic
        4.4,		# weight of altitude heuristic
        3.5,		# weight of points of interest heuristic
        7.0,		# weight of mention heuristic
        5.2,		# weight of walk dist ranking
        5.6,		# weight of visits ranking
        1.1,		# weight of altitude ranking
        8.2,		# weight of points of interest ranking
        3.6,		# weight of mention ranking
    ],
    'b': 0.3		# bias
}

For a linear regression. For a logistic regression, turn each value into a length 2 array, with the first element being the weight of the heuristic towards whether to not choose the page, and the second element being the weight of the heuristic towards whether to choose the page.

To use these in a ranker:

First: call normalise_inputs(paths_per_reading, cache=None, exclude_poi=False).

Then, set your manual_heuristics to be equal to...

  • ranker.linreg_model for a linear regression.
  • ranker.logreg_model for a logistic regression.
  • ranker.net_model for a neural network.

Then just use your chosen ranker as normal.


Functions

rand
rand(user, story, pages, cache=None)

Gives an equal score to every page.

dist
dist(user, story, pages, cache=None)

Gives a higher score to closer pages to the user, in a straight line.

walk dist
walk_dist(user, story, pages, cache=None)

Gives a higher score to closer pages to the user, via roads using the OSRM routing engine. Note: In order to use this ranker, an OSRM HTTP server must be running at localhost:5000. The osrm-py python module must also be installed.

visits
visits(user, story, pages, cache=None)

Gives a higher score to pages that have been visited less already in the current reading.

alt
alt(user, story, pages, cache=None)

Gives a higher score to pages with a lower altitude. Note: In order to use this ranker, the 'SRTM.py' module must be installed.

poi
poi(user, story, pages, cache=None)

Gives a higher score to pages surrounded by more points of interest.

mentioned
mentioned(user, story, pages, cache=None)

Gives a higher score to pages with titles mentioned more by the title & text of the current page.

logreg
logreg(user, story, pages, cache=None)

Uses a logistic regression model to give scores to pages, based on almost every heuristic and ranker. Note: In order to use this ranker, you must first run ml.logreg to train a logistic regression.

linreg
linreg(user, story, pages, cache=None)

Uses a linear regression model to give scores to pages, based on almost every heuristic and ranker. Note: In order to use this ranker, you must first run ml.linreg to train a linear regression.

nn
nn(user, story, pages, cache=None)

Uses a neural network model to give scores to pages, based on almost every heuristic and ranker. Note: In order to use this ranker, you must first run ml.nn to train a neural network.

normalise inputs
normalise_inputs(paths_per_reading, cache=None, exclude_poi=False)

Sets up input normalisation for when using manual_heuristics with a regression ranker. If you want to use manual_heuristics in your ranking function, this function must be called first.


auxiliary & helper functions

rank_by
rank_by(heuristic, inverse=False, no_loop=False)

Outputs a function that ranks pages according to the output of heuristic.

If you define a new heuristic function for pages, this can be an easy way to make a ranker out of it.

The heuristic function must have the parameters page, user, story and cache.

If inverse is True, pages with a higher output from heuristic will be given a better ranking (such as with poi; more points of interest is better). If it is False, pages with a lower output from heuristic will be given a better ranking (such as with dist; less distance to the user is better).

Setting no loop to True applies an extra filter to the ranking function, which eliminates pages already visited.

_net
_net(x, w, b)

Is the actual neural net used by the nn ranker.

x is the array of inputs, as output by ml.make_input.

w is the weight array of the network (signifying the weight of each neuron).

b is the bias array of the network.