[Suggestion] Thoughts about the ranking algorythm #2014

wromanski · 2021-10-22T11:25:00Z

wromanski
Oct 22, 2021

This issue is based on my experience with Ludum Dare rating system, which is obviously subjective and I might not be right in everything what I write here.

To start off, I am not an expert in statistical mathematics so I wont flex with my knowledge of formulas on how to calculate smart guys' stuff 😄

Problem

As a sample I took top 24 games in Overall (Jam) category - just because that's what appears on the first page when you go into the results page.

The results seem a little bit skewed towards the less voted games. As can be seen, 20-39 is the dominant section, with 58.3% of the top games being in this vote range.

The graph I've linked is not any indicator on its' own, because first we'd have to compare it to the global statistics of the Jam category. Unfortunately, this kind of data cannot be found on Ludum Dare page, so if any site admin / developer could share some more info about that, it would be really appreciated! 🙂

For the sake of argument, we can assume that the games with lower vote count seem to do better in rankings (if the data provided by the admin shows that the graph is similar to the global statistics of the game jam, then majority of my point might not really be valid anymore, but I'll wait until that happens 😅 ).

Basically, a system that has a fixed number of votes needed to take part in the competition will usually favor the lower end of the spectrum in terms of gained votes - simply because, when there's only 20 votes needed to participate, there will be luck involved. This is the reason why polls / scientific experiments are not done on small sample sizes - because with a small vote count, there can be anomalies that skew the score dramatically and can put a game in a surprisingly high or low spot in rankings. Basically, the concept is the more votes you gain, the more accurate your score is.

Solutions

There are, however, solutions to this problem and they might not be that hard to implement (hopefully).

Solution 1: Change the number of needed votes to participate

This number could be dynamic, depending on the number of sumbissions in each Ludum Dare's edition, but overall it feels like with 20 rating there's a significant luck's impact on the final score of the game.

As can be seen on the LDjam page, the graph shows the exact amount of games that get less than 20 ratings:

Based on my own research, vast majority of games that got 0 - 5 votes are the games that simply do not work at all, can't be accessed due to a developer's mistake or even have no links to their own games.

My suggestions are:

Increase the number of votes needed to rate the game. This would reduce the impact of anomalies on the game's rating.
Make the number dynamic. For example, we could use one vote needed for each 100 submissions (in current edition, it would equal to 29.39 votes required as there are 2939 submissions)

Obviously, the con of increasing the number of votes is that fewer games would get to the final ranking, but the dynamic amount could be tweaked so the requirement wouldn't be so harsh.

Solution 2: Make votes of people with high karma for feedback be more meaningful

This might not be a direct solution to the problem I've mentioned above, but I feel like it would be generally a good thing to encourage people not only to rate games, but to also give feedback in order to have a bigger impact on the ranking. After all, feedback is all we want and the comments are the most fun to read, with many of them containing really valuable inputs to improve our games.

This would also mean that people who just give random ratings to games without playing them first, wouldn't be as impactful as they are now.

Solution 3: Make some balance changes for games with high vote count

Due to the fact that low vote count makes the whole ranking more luck-based, people with high number of votes are in much tighter spot in terms of getting high positions in the ranking. What I mean is that right now the whole system is counterintuitive - you are encouraged to give feedback and rate others' games, but having more than 40 votes makes you less favored by the current algorythm, so you wouldn't want your game to be recommended to others when you're already fulfilling the criteria of being in the ranking.

My suggestion is to add a change in the algorythm, which makes the amount of votes increase the score slightly (the growth could be logarithmic, with barely any difference between for example 100 and 200 votes). The system should compensate for the fact that getting a very high score (like 4.9) is basically impossible in a game with 150 ratings, but is possible for a game with 20 ratings. And the 20 ratings one will win in that case.

There are a few examples how the system could work:

Ex aequo example:

Place	Score	Vote count
1	4.8	20
1	4.766	40
1	4.733	80
1	4.7	160

A standard example:

Place	Score	Vote count
1	4.75	80
2	4.8	20
3	4.69	160
4	4.72	40

This solution would definitely need the most tweaking and adjustments to work properly, but I feel like it could make the biggest positive impact on the overall ranking algorythm.

Conclusion

I hope all the solutions and all my points are clear, I am open to criticism and feedback! Sorry for the long issue though 😄

zwrawr · 2021-10-22T12:09:31Z

zwrawr
Oct 22, 2021

Here's some info I have shared on various ldjam threads

see Consider: adapting smart filter to boost value of highly rated games #1193
This suggests manipulating the smart algo to recommend high scoring games more often. This is my personal preference as it leaves us with an unbiased scoring system. Whilst still addressing this issue. This solves the problem by ensuring that any games in the top 10 should end up more than 20 ratings (say 40 depending on implementation)
see Improving the "Smart" algorithm #1503
see https://ldjam.com/events/ludum-dare/49/bling/a-quick-post-to-clarify-the-scoring-system
see https://ldjam.com/events/ludum-dare/49/instability/scoring-system-seems-biased-to-games-played-less

Here is a scraped (public) csv of results

I don't have access to any private data but I did scrape some a couple of hours ago.
You can import this into google sheets or excel pretty easy
https://gist.github.com/zwrawr/4f25b1067eed60e0d24bcba7d6593489

Here is some data from the whole event ( also on ld site)

The vast majority of games have less than 50 ratings.

So it's not surprising that the winning games have less than 50 ratings.

pie	plot

In general games with more ratings score higher

This barely true it is pretty flat

Scatter plot

Quantized Ratings values

At the low range of ratings recived the overall rating is quite quantized. This is due to the 5 star rating system. It’s a discret system and if you only have a few votes there just isn’t many possible values. As expected the more ratings you recived the less of an effect this has.

The Gap at 20 ratings

There is a noticable gap in the chart when ratings recived = 20. This is expected. Reasons for this include user behaviour ( people stop rating when they get to 20-25 ratings given) , the smart filter and the danger filter.

Leaning ?

The distribution seems to leans towards higher overall ratings when the ratings recived is high. . . Looking at it on a graph, games with more ratings get a higher overall score (barely). But don’t think they’re getting a better overall score because they have been rated more

0 replies

local-minimum · 2021-10-22T13:10:15Z

local-minimum
Oct 22, 2021

I hade some suggestions for the scoring way back when too #986

0 replies

local-minimum · 2021-10-22T16:23:31Z

local-minimum
Oct 22, 2021

@zwrawr without knowing anything about the game, A or possibly B looks to me like the more reliable winner not exposed to as much random events from bad or great luck with reviewers.

0 replies

mikekasprzak · 2021-10-22T16:59:00Z

mikekasprzak
Oct 22, 2021
Maintainer

It's been on the infinite todo list for a long time, but the "fix" I've been considering exploring is something called the Bayesian Average.

https://en.wikipedia.org/wiki/Bayesian_average

https://www.algolia.com/doc/guides/solutions/ecommerce/relevance-optimization/tutorials/bayesian-average/

In essence it gives a lower weight to things with fewer samples. Switching to this has the added benefit of letting us give a reasonable score even if we don't have enough ratings.

Using a different averaging algorithm is only half the battle though. Today we say "get 20 ratings", but similar to "upping the minimum", I'm think it would be helpful to report how much their scores are being skewed by not having enough ratings. I do like the idea of changing the constant based on the number of submissions, but it's not going to be practical to ask people to play 50 games.

0 replies

local-minimum · 2021-10-23T06:32:40Z

local-minimum
Oct 23, 2021

@mikekasprzak that seems like a reasonable adjustment and quite similarto my angry voter suggestion, but I think I would hesitate on giving the individual indication on how much people are affected by the algorithm. As it somewhat would partially leak individual voter information. And if your game is underperformed would not the algorithm boost your score? It would be very disheartening seeing that while voting and playing. A general statement on how much games with X votes tend to be affected by the algorithm would be good though.

0 replies

mikekasprzak · 2021-10-23T10:48:08Z

mikekasprzak
Oct 23, 2021
Maintainer

it somewhat would partially leak individual voter information

No more than we do now. The constant would be the same for everyone, and it's not much different than saying there's a 20 vote minimum and you only have 12 of 20. We can exactly quantify how much a score is potentially being punished, and you might not care that you might be losing 0.05% points...at least until you realize some games win by 0.01% right now.

And if your game is underperformed would not the algorithm boost your score?

No, it's the opposite. If you're underperforming, your scores get punished. This is what the constant does. If your number of samples falls beneath a constant of say "20", having an average of 5 from 5x scores would be worth less then an average of 4.8 from 25x scores. Once you break the constant, it works more like a normal average.

0 replies

local-minimum · 2021-10-24T09:05:53Z

local-minimum
Oct 24, 2021

it somewhat would partially leak individual voter information

No more than we do now. The constant would be the same for everyone, and it's not much different than saying there's a 20 vote minimum and you only have 12 of 20. We can exactly quantify how much a score is potentially being punished, and you might not care that you might be losing 0.05% points...at least until you realize some games win by 0.01% right now.

It might not have been what you meant, but lets say the prior mean is 3.2 again and lets say you refresh your game page right as the first vote is registered you'd readily see if the first person gave you a good or bad grade depending on how many points your are gaining / loosing. If you then refresh the page as the second vote come in you could probably deduce partially how that person voted from how the effect of the algorithm shifted. As the C probably will be in the code, I imagine (though have not calculated) that one might even be able to fair amount of certainty say what was the first and second vote was.

And if your game is underperformed would not the algorithm boost your score?

No, it's the opposite. If you're underperforming, your scores get punished. This is what the constant does. If your number of samples falls beneath a constant of say "20", having an average of 5 from 5x scores would be worth less then an average of 4.8 from 25x scores. Once you break the constant, it works more like a normal average.

Not that type of underperforming I meant. Say we use the median for LD48 and lets say it's 3.2 and you have 10 votes with an average of 2.2 on your LD49 game. Correctly the algorithm would increase your games rating some. Now if you got to see that your game's score was boosted 0.1% or something you would know that your game is getting a below average grade and that might be detrimental to your continued participation in voting.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Suggestion] Thoughts about the ranking algorythm #2014

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Suggestion] Thoughts about the ranking algorythm #2014

wromanski Oct 22, 2021

To start off, I am not an expert in statistical mathematics so I wont flex with my knowledge of formulas on how to calculate smart guys' stuff 😄

Problem

Solutions

Solution 1: Change the number of needed votes to participate

Solution 2: Make votes of people with high karma for feedback be more meaningful

Solution 3: Make some balance changes for games with high vote count

Conclusion

Replies: 7 comments

zwrawr Oct 22, 2021

Here's some info I have shared on various ldjam threads

Here is a scraped (public) csv of results

Here is some data from the whole event ( also on ld site)

The vast majority of games have less than 50 ratings.

In general games with more ratings score higher

Scatter plot

Quantized Ratings values

The Gap at 20 ratings

Leaning ?

local-minimum Oct 22, 2021

local-minimum Oct 22, 2021

mikekasprzak Oct 22, 2021 Maintainer

local-minimum Oct 23, 2021

mikekasprzak Oct 23, 2021 Maintainer

local-minimum Oct 24, 2021

wromanski
Oct 22, 2021

zwrawr
Oct 22, 2021

local-minimum
Oct 22, 2021

local-minimum
Oct 22, 2021

mikekasprzak
Oct 22, 2021
Maintainer

local-minimum
Oct 23, 2021

mikekasprzak
Oct 23, 2021
Maintainer

local-minimum
Oct 24, 2021