Replies: 7 comments
-
Here's some info I have shared on various ldjam threads
Here is a scraped (public) csv of resultsI don't have access to any private data but I did scrape some a couple of hours ago. Here is some data from the whole event ( also on ld site)The vast majority of games have less than 50 ratings.So it's not surprising that the winning games have less than 50 ratings.
In general games with more ratings score higherThis barely true it is pretty flat Scatter plotQuantized Ratings valuesAt the low range of ratings recived the overall rating is quite quantized. This is due to the 5 star rating system. It’s a discret system and if you only have a few votes there just isn’t many possible values. As expected the more ratings you recived the less of an effect this has. The Gap at 20 ratingsThere is a noticable gap in the chart when ratings recived = 20. This is expected. Reasons for this include user behaviour ( people stop rating when they get to 20-25 ratings given) , the smart filter and the danger filter. Leaning ?The distribution seems to leans towards higher overall ratings when the ratings recived is high. . . Looking at it on a graph, games with more ratings get a higher overall score (barely). But don’t think they’re getting a better overall score because they have been rated more |
Beta Was this translation helpful? Give feedback.
-
I hade some suggestions for the scoring way back when too #986 |
Beta Was this translation helpful? Give feedback.
-
@zwrawr without knowing anything about the game, |
Beta Was this translation helpful? Give feedback.
-
It's been on the infinite todo list for a long time, but the "fix" I've been considering exploring is something called the Bayesian Average. https://en.wikipedia.org/wiki/Bayesian_average In essence it gives a lower weight to things with fewer samples. Switching to this has the added benefit of letting us give a reasonable score even if we don't have enough ratings. Using a different averaging algorithm is only half the battle though. Today we say "get 20 ratings", but similar to "upping the minimum", I'm think it would be helpful to report how much their scores are being skewed by not having enough ratings. I do like the idea of changing the constant based on the number of submissions, but it's not going to be practical to ask people to play 50 games. |
Beta Was this translation helpful? Give feedback.
-
@mikekasprzak that seems like a reasonable adjustment and quite similarto my angry voter suggestion, but I think I would hesitate on giving the individual indication on how much people are affected by the algorithm. As it somewhat would partially leak individual voter information. And if your game is underperformed would not the algorithm boost your score? It would be very disheartening seeing that while voting and playing. A general statement on how much games with X votes tend to be affected by the algorithm would be good though. |
Beta Was this translation helpful? Give feedback.
-
No more than we do now. The constant would be the same for everyone, and it's not much different than saying there's a 20 vote minimum and you only have 12 of 20. We can exactly quantify how much a score is potentially being punished, and you might not care that you might be losing 0.05% points...at least until you realize some games win by 0.01% right now.
No, it's the opposite. If you're underperforming, your scores get punished. This is what the constant does. If your number of samples falls beneath a constant of say "20", having an average of 5 from 5x scores would be worth less then an average of 4.8 from 25x scores. Once you break the constant, it works more like a normal average. |
Beta Was this translation helpful? Give feedback.
-
It might not have been what you meant, but lets say the prior mean is 3.2 again and lets say you refresh your game page right as the first vote is registered you'd readily see if the first person gave you a good or bad grade depending on how many points your are gaining / loosing. If you then refresh the page as the second vote come in you could probably deduce partially how that person voted from how the effect of the algorithm shifted. As the C probably will be in the code, I imagine (though have not calculated) that one might even be able to fair amount of certainty say what was the first and second vote was.
Not that type of underperforming I meant. Say we use the median for LD48 and lets say it's 3.2 and you have 10 votes with an average of 2.2 on your LD49 game. Correctly the algorithm would increase your games rating some. Now if you got to see that your game's score was boosted 0.1% or something you would know that your game is getting a below average grade and that might be detrimental to your continued participation in voting. |
Beta Was this translation helpful? Give feedback.
-
This issue is based on my experience with Ludum Dare rating system, which is obviously subjective and I might not be right in everything what I write here.
To start off, I am not an expert in statistical mathematics so I wont flex with my knowledge of formulas on how to calculate smart guys' stuff 😄
Problem
As a sample I took top 24 games in Overall (Jam) category - just because that's what appears on the first page when you go into the results page.
The results seem a little bit skewed towards the less voted games. As can be seen,
20-39
is the dominant section, with 58.3% of the top games being in this vote range.The graph I've linked is not any indicator on its' own, because first we'd have to compare it to the global statistics of the Jam category. Unfortunately, this kind of data cannot be found on Ludum Dare page, so if any site admin / developer could share some more info about that, it would be really appreciated! 🙂
For the sake of argument, we can assume that the games with lower vote count seem to do better in rankings (if the data provided by the admin shows that the graph is similar to the global statistics of the game jam, then majority of my point might not really be valid anymore, but I'll wait until that happens 😅 ).
Basically, a system that has a fixed number of votes needed to take part in the competition will usually favor the lower end of the spectrum in terms of gained votes - simply because, when there's only 20 votes needed to participate, there will be luck involved. This is the reason why polls / scientific experiments are not done on small sample sizes - because with a small vote count, there can be anomalies that skew the score dramatically and can put a game in a surprisingly high or low spot in rankings. Basically, the concept is the more votes you gain, the more accurate your score is.
Solutions
There are, however, solutions to this problem and they might not be that hard to implement (hopefully).
Solution 1: Change the number of needed votes to participate
This number could be dynamic, depending on the number of sumbissions in each Ludum Dare's edition, but overall it feels like with 20 rating there's a significant luck's impact on the final score of the game.
As can be seen on the LDjam page, the graph shows the exact amount of games that get less than 20 ratings:
Based on my own research, vast majority of games that got 0 - 5 votes are the games that simply do not work at all, can't be accessed due to a developer's mistake or even have no links to their own games.
My suggestions are:
Obviously, the con of increasing the number of votes is that fewer games would get to the final ranking, but the dynamic amount could be tweaked so the requirement wouldn't be so harsh.
Solution 2: Make votes of people with high karma for feedback be more meaningful
This might not be a direct solution to the problem I've mentioned above, but I feel like it would be generally a good thing to encourage people not only to rate games, but to also give feedback in order to have a bigger impact on the ranking. After all, feedback is all we want and the comments are the most fun to read, with many of them containing really valuable inputs to improve our games.
This would also mean that people who just give random ratings to games without playing them first, wouldn't be as impactful as they are now.
Solution 3: Make some balance changes for games with high vote count
Due to the fact that low vote count makes the whole ranking more luck-based, people with high number of votes are in much tighter spot in terms of getting high positions in the ranking. What I mean is that right now the whole system is counterintuitive - you are encouraged to give feedback and rate others' games, but having more than 40 votes makes you less favored by the current algorythm, so you wouldn't want your game to be recommended to others when you're already fulfilling the criteria of being in the ranking.
My suggestion is to add a change in the algorythm, which makes the amount of votes increase the score slightly (the growth could be logarithmic, with barely any difference between for example 100 and 200 votes). The system should compensate for the fact that getting a very high score (like 4.9) is basically impossible in a game with 150 ratings, but is possible for a game with 20 ratings. And the 20 ratings one will win in that case.
There are a few examples how the system could work:
This solution would definitely need the most tweaking and adjustments to work properly, but I feel like it could make the biggest positive impact on the overall ranking algorythm.
Conclusion
I hope all the solutions and all my points are clear, I am open to criticism and feedback! Sorry for the long issue though 😄
Beta Was this translation helpful? Give feedback.
All reactions