Consider alternatives to Qrank #1

hugolpz · 2021-03-16T11:32:53Z

Simple suggestion. Your numbers seems to show views rather than rank. Qviews or Qvisits would be more accurate.

The text was updated successfully, but these errors were encountered:

brawer · 2021-03-17T08:25:11Z

Currently that’s true, but see Future work.

athalhammer · 2021-05-31T19:48:25Z

Nice work!
I'm doing some experiments on ~321 Wikimedia languages combined via Q-ID link graph PageRank computation. If you are interested in some discussion/collaboration on this topic please let me know.

brawer · 2021-08-11T06:49:32Z

@athalhammer, that’s super interesting! Basically, PageRank is a mathematical model to predict hypothetical popularity from the shape of the link graph, whereas QRank is trivially counting beans to measure actual popularity from logs analysis. Super curious, have you by chance compared the two approaches to ranking? It would be really interesting to know how well the mathematical model is able to predict reality in practice, and where exactly the differences are.

Also curious, have you tried seeding your PageRank computation with measured numbers from QRank, and then run the classic iterative algorithm until it converges? In the classic version of PageRank, all nodes start equal; but obviously you could also seed your initial weights by taking empiric measurements (like those from QRank) into account. In case you’ve tried that, what were your findings?

Personally, I could well imagine that the QRank pipeline gets eventually extended with some graph algorithm(s). Not really to replace the current logs analysis; why use a prediction model if we can measure reality? However, the majority of entities in Wikidata will never get a corresponding page in Wikipedia (or in Wikisource, Wikitravel, Wikispecies, Wikibooks, etc.); and currently such page-less entities are not getting ranked. Finding some way for fixing this problem would be quite interesting. For example, once the world’s citation graph gets stored in WikiData, it might be interesting to enrich QRank with a simulated random walk along the citation graph. Likewise, it might be interesting to propagate some of an artist’s fame to their works, in particular for works that have no Wikipedia page.

Anyhow, if you’re still interested, let’s talk! Feel free to contact me at sascha@brawer.ch.

athalhammer · 2021-08-12T14:44:02Z

Thanks @brawer, for your detailed answer and also for your questions!

Super curious, have you by chance compared the two approaches to ranking?

During my PhD I was performing some experiments that compared page-view-based rankings with link-based (PageRank) ones. Not surprisingly you will find more the pages of type 'Rihanna' and 'Eminem' in the top 100 of the page-view-based ranking. As a matter of fact, if page-view-based statistics are used for auto-suggest or recommendation they often have a self-amplifying effect. The current auto-suggest interface of English Wikipedia actually does that (it is page-view-based to the best of my knowledge). I'm not sure how they deal with this self-amplifying strategy as if they see it as an issue and/or address it at all.

Also curious, have you tried seeding your PageRank computation with measured numbers from QRank, and then run the classic iterative algorithm until it converges?

That's a good question. It would be one way of combining the two approaches. My initial thought on this is that the output of this will depend a lot on the number of PageRank iterations: With each iteration it will converge a bit more towards the equally seeded PageRank. There are other ways of combining this without this dependence on the iterations (e.g., Personalized PageRank).

However, the majority of entities in Wikidata will never get a corresponding page in Wikipedia (or in Wikisource, Wikitravel, Wikispecies, Wikibooks, etc.); and currently such page-less entities are not getting ranked.

That's an issue danker is facing as well... I thought about ways of combining a pure Wikidata-based PageRank with the Wikipedia-based ones that danker provides. At this point the same question arises again: What would be a good way of combining these?

Overall, I believe that exploring ways of combining different rankings (e.g., seeding as you suggest) and measuring the differences as well as trying to find something like an "optimal" ranking could probably be a whole PhD thesis in itself (or multiple ones)... In any case, I will get in touch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider alternatives to Qrank #1

Consider alternatives to Qrank #1

hugolpz commented Mar 16, 2021

brawer commented Mar 17, 2021

athalhammer commented May 31, 2021

brawer commented Aug 11, 2021

athalhammer commented Aug 12, 2021

Consider alternatives to Qrank #1

Consider alternatives to Qrank #1

Comments

hugolpz commented Mar 16, 2021

brawer commented Mar 17, 2021

athalhammer commented May 31, 2021

brawer commented Aug 11, 2021

athalhammer commented Aug 12, 2021