Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Scoring Method for Total Rank #112

Open
primo-ppcg opened this issue Oct 12, 2019 · 27 comments
Open

Alternative Scoring Method for Total Rank #112

primo-ppcg opened this issue Oct 12, 2019 · 27 comments
Labels

Comments

@primo-ppcg
Copy link
Contributor

@primo-ppcg primo-ppcg commented Oct 12, 2019

Closely related to #69

With the way the total score is currently calculated, a decent solution in a 'short' language will out-rank a phenomenal solution in a 'verbose' language. I think this may be causing some users to lose interest. We've seen evidence of this already; after J was introduced, interest in Perl 6 stagnated quickly (there are currently ~3 active Perl 6 golfers, where there used to be a dozen or so). It also fails to adequately reward great golfers on the site, if they prefer to use the 'wrong' language(s).

I propose an alternate ranking system (note: updated to reflect the points addressed in the discussion below):

  • Begin by computing a Bayesian estimator for the minimum solution length per language per hole. Call this Sb, details below.
  • Assign a score for each solution directly, as Score = ⌊ Sb ÷ Su × 1000 ⌉, where Su is the length of the user's solution. For single language leaderboards, the score can be computed against the shortest solution, without need for an estimator.
  • Order by score (affects rank), then by time of submission (does not affect rank).

@SirBogman has created a great tool to see the effects of the propsed scoring method.


Advantages of this system over the current system:

  • It's possible to reach a near-perfect score using any language, without needing to learn J and Perl (ugh! 😜).
  • It encourages users to improve the top solution for every language, and not just the shortest language.
  • Improving a solution will improve your score, even if you don't pass another solution in rank.
  • It properly rewards the top golfers of all languages (e.g. @romancortes)

Disadvantages of this system over the current system:

  • It would become more difficult to predict the ranking of a solution, however, because scores are assigned directly this would no longer be as important.

The Bayesian estimators can be computed as follows:

Sb = (n ÷ (n + m)) × S + (m ÷ (n + m)) × Sa

where:

  • n: the number of submissions in this hole for this language.
  • m: a small uncertainty factor (1 to 3).
  • S: the length of the shortest solution for this language.
  • Sa: the shortest solution among all languages for this hole.

For example, taking m as 1, if a language had three submissions, with the shortest being 80 in length, and the shortest solution overall were 60, the estimated shortest length for this language would be:

0.75 × 80 + 0.25 × 60 = 75

The assigned overall score would then be ⌊ 75 ÷ 80 × 1000 ⌉ = 938. If after 9 submissions the shortest solution were still 80, the estimated shortest length would be at 78, for a score of 975.

It should be noted that if all languages used the same m value (e.g. 1), more popular languages would have a more favorable Bayesian weighting due to their popularity alone, and therefore higher scores. To avoid this, the most popular language can be assigned a fixed m of 3, and the rest scaled against that on the range 1 .. 3:

m = 2 × n ÷ nmax + 1

where:

  • n: the total number of submissions for this language.
  • nmax: the maximum number of submissions for any language.

In this way, the most popular language (currently Python), would need approximately 3 times as many submissions in a given hole to reach the same Bayesian weighting as the least popular language (currently Nim).

A sample SQL query of the Bayesian estimator per language per hole would then be the following:

select
    t1.language,
    t1.hole,
    (N / (N + M)) * S + (M / (N + M)) * Sa as Sb
  from (
    select language, hole, count(*) as N, min(size) as S from solutions group by language, hole
  ) as t1
  inner join (
    select hole, min(size) as Sa from solutions group by hole
  ) as t2
    on t1.hole = t2.hole
  inner join (
    select
        t3.language,
        2.0 * N / Nmax + 1 as M
      from (
        select language, count(*) as N from solutions group by language
      ) as t3
      cross join (
        select count(*) as Nmax from solutions group by language order by count(*) desc limit 1
      ) as t4
  ) as t5
    on t1.language = t5.language

This could be stored as a view, and then joined with the solution table to calculate scores directly.

@JRaspass

This comment has been minimized.

Copy link
Owner

@JRaspass JRaspass commented Oct 12, 2019

So I like this idea, or at the very least I think we should see how it'll affect things (no idea how to structure the SQL yet, but that's tomorrow's problem :-P )

I'd like to get input from other people too, feel free to +1/-1 and/or comment this idea and we'll go from there.

Thank you for putting a lot of thought into this issue @primo-ppcg because like you it's always annoyed me how some languages are basically useless with the current design.

@JRaspass JRaspass added the idea label Oct 12, 2019
@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 12, 2019

I spent some time thinking about "the Python problem" over lunch (in my initial proposal, the bottom ranking solutions would all be Python, due to its popularity). I think I've found a good solution for this, without detracting any from the intended goal.

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 15, 2019

I like that you're thinking about how to make it more interesting to participate using a wider variety of languages. I completely agree that the goal should be to come as close as possible to the top solution in any language you enjoy golfing in. I would be interested to see how it affects things.

I'm not sure how well it would work for holes that have a small number of solutions. Ten-pin bowling has 15 solutions in 7 languages. My solution is the only one in Perl 6 and it's in second place. If someone were to come up with a better Perl 6 solution, my ranking would probably drop to near the bottom of the list and this would be frustrating. If I wanted to get back towards the top of the leaderboard, the easiest way might be to pick one of the 8 unused languages for that hole and implement a solution in one of them. Does anyone have thoughts on how to deal with this issue?

Ideally there wouldn't be holes with such a tiny number of solutions, but I'm not sure that a new ranking method could help with that significantly.

@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 15, 2019

If I wanted to get back towards the top of the leaderboard, the easiest way might be to pick one of the 8 unused languages for that hole and implement a solution in one of them.

I would say that's working as intended! If you see a language that doesn't have a solution for a hole, by all means, submit one. Even if - or perhaps especially if - it's not particularly well golfed, another user will undoubtedly come along sooner or later and improve it. The system encourages users to fill out the scoreboard. And as the scoreboard becomes more full, the ranking becomes more fair.

An alternative system, if you wanted a more fair scoreboard with relatively few submissions, would be to use a bayesian estimator to estimate the 'actual' shortest length for a language, and then rank according to that. Something like:

S = (n ÷ (n + m)) × s + (m ÷ (n + m)) × sa

where:

  • n: the number of submissions in this hole for this language.
  • m: a small uncertainty factor (1 to 3).
  • s: the length of the shortest solution for this language.
  • sa: the shortest solution among all languages for this hole.

For example, taking m as 1, if a language had three submission, with the shortest being 88 in length, and the shortest solution overall were 60, the estimated shortest length for this language would be:

0.75 × 88 + 0.25 × 60 = 81

If after 9 submissions the shortest solution were still 88, the estimated shortest length would be at 85.2.

However, I think that something like this would make it more difficult for users to predict the ranking of any solution.

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 16, 2019

I appreciate the thought you’ve put into this, but I’m concerned that the new system could make Perl 6 an even less attractive choice. One of the charms of this site is that it’s always been Perl 6 friendly. Because it counts codepoints instead of bytes, there are plenty of fun ways to use that to your advantage in Perl 6, even if you’re new to the language. I’ve only been using the language for about five months and I’ve only been using this site for three. For a while, the ratio of the length of my solutions to the top Perl 6 solutions was high. I’m sure it was greater than two for the longer holes. More recently, I’ve figured out some techniques to improve that and now the ratio is around 1.25 in many cases. While it’s taken quite a while for me to start getting close to the top solutions, many of my earlier solutions still ranked fairly well overall, because Perl 6 is a relatively good golfing language on this site. I was happy with my scores and I thought that eventually I would figure out some ways to improve them. If all that mattered were the ratio, I’m not sure I would have enjoyed golfing here in Perl 6 as much, since the high ratio would have been frustrating and it takes a lot of time to learn the language.

It’s difficult in general to come up with a good ranking method for a site like this. Unfortunately, only 11 people have done all of the holes. Maybe this situation used to be better, but I don’t know because I’ve been here for less than three months. It definitely wouldn’t hurt to have a clearer explanation of how the total scores are calculated. I was confused for a while about why the rankings on the front page were different from the rankings on the All Langs page for a hole, until I realized that people were only listed once on the front page, but could have multiple ranks on the All Langs page. I was also confused for a while about why my total score isn’t just the sum of the number of points for each hole. It was frustrating to see the number of points increase for a single hole, but not overall. Eventually I realized it was because the person whose score I had passed had higher scores in different languages. I ended up writing a script to pull the data and calculate the difference between the length of my solution and the length of the next solution above mine, filtering out all solutions that weren’t a user’s top solution. I used this for a while to help me focus on holes where I could improve my ranking without monumental effort and I hoped I would eventually develop techniques to help me deal with holes where I would need to dramatically improve my solution to incrementally improve my score in points. Maybe it would be nice for the site to have something like this built in, where it recommends holes to do to help with your overall score.

@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 17, 2019

@SirBogman I do understand what you mean. I think the primary issue is that scores are assigned according to rank, which is a zero sum game. If some solutions are being moved up, obviously others are being moved down. In particular, users with lots of 3rd and 4th place finishes in one of the more terse languages would be most negatively affected by the change.

Since we've come this far, let's see if we can address that too. Assuming we've already calculated a Bayesian average minimum length per language per hole, why not just assign a score directly from that? Specifically:

Score = ⌊ Sb ÷ Su × 1000

where:

  • Sb: the Bayesian average minimum, as detailed a few posts back
  • Su: the length of the user's submission.

The suggested scoring system in total would be the following:

  • Calculate the Bayesian weighted averages. This will only need to be computed once per language per hole. I don't believe this will consume much in way of database resources, nor is it very complicated (see attached SQL pseudo-code). If necessary, however, these could be pre-calculated, and adjusted as part of the new submission process.
  • Assign scores directly based on this value, as above.
  • Order by score (affects rank), then by time of submission (does not affect rank).
select
    t1.language,
    t1.hole,
    (N / (N + 1)) * S + (1 / (N + 1)) * Sa as Sb
  from (
    select language, hole, count(*) as N, min(size) as S from solutions group by language, hole
  ) as t1
  inner join (
    select hole, min(size) as Sa from solutions group by hole
  ) as t2
    on t1.hole = t2.hole

For comparison, the Ten-Pin Bowling score board would look like this after the changes suggested above:

User Lang Size Old Score New Score
primo-ppcg Ruby 75 1000 1000
romancortes JavaScript 99 875 952
prplz Python 116 813 941
SirBogman Perl 6 89 938 921
daedals Perl 121 750 873
primo-ppcg JavaScript 124 688 760
daedals Python 144 625 758
ElelockSystems C 288 250 630
JRaspass Perl 176 438 600
TFeld00 Python 183 375 597
retrohun Bash 405 125 593
ElelockSystems JavaScript 169 563 557
elisherer JavaScript 197 313 478
jaredcrystal Ruby 172 500 436
alangodfrey Python 350 188 312
estomagordo Python 541 63 202
@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 18, 2019

@primo-ppcg Your Bayesian estimator is a very clever idea. I also like the idea of scores not being directly tied to rank. It feels slightly wrong for golfing to be a zero sum game.

In the Ten-Pin Bowling example, it seems somewhat reasonable that @romancortes could have a higher score than me. It would likely be easier for you to beat my Perl 6 score than to beat his JavaScript score.

With your proposed solution, scores are still somewhat tied to rank. My first thought was that the easiest way to improve my score in points is to wait for more people to do the hole in Perl 6. As long as their solutions weren’t too much shorter than mine, my score would increase. With the current system, this already happens with my total score. It tends to slowly increase over time, because my solutions tend to have a favorable length ratio to the new solutions. It’s just a little more obvious with your proposed system and perhaps a little bit more susceptible to manipulation, although I would never do that and I hope others wouldn’t either. It could be possible to mitigate that by calculating some kind of overall multiplier per language using all of the holes. I’m not sure this makes sense though, because for some holes, specific languages are better than others. If we did something like that, we would have to show people the multipliers and it may influence their choice of language. That could be a good thing though and it might be more understandable to users. They wouldn’t have to be as concerned with exactly how the score is calculated. The multipliers wouldn’t change quickly.

After talking about this hole for so long, I revisited my solution and managed to save 9 characters. Now I would still be second on the leaderboard 😜 with this scoring method.

@JRaspass

This comment has been minimized.

Copy link
Owner

@JRaspass JRaspass commented Oct 18, 2019

@SirBogman I really like the idea of applying a multiplier to each lang calculated dynamically, we could displaying it clearly on the about page, and call it the handicap

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 18, 2019

Handicap certainly sounds a lot better than multiplier 😃.

It probably makes sense to have some kind of handicap.

Many holes require reading command line arguments and outputting one line per argument. It's interesting to look at a minimal instance of this, where each argument is simply output on its own line.

Here are a few solutions that would be hard to beat.

Python, 37 chars

import sys;[*map(print,sys.argv[1:])]

Perl 6, 11 chars

@*ARGS».say

Ruby, 6 chars

puts$*

In this simple example, Ruby would have a significant advantage. It could even be interesting to have a hole like this as a test case for the handicap.

@JRaspass

This comment has been minimized.

Copy link
Owner

@JRaspass JRaspass commented Oct 18, 2019

One true perl, 12 chars: say for@ARGV

@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 18, 2019

After talking about this hole for so long, I revisited my solution and managed to save 9 characters. Now I would still be second on the leaderboard 😜 with this scoring method.

Another advantage of this system, is that you can increase your own scores by improving your solutions, even if you don't pass anyone in the ranks. It makes a lot of sense to me that this should be true.

Regarding handicap, the bayesian estimator is meant to be an equalizer; it draws the top solutions from every language 'on par' with one another. By comparing each solution with the top solution per language, there is an implicit handicap already built in. For example, if the overall shortest solution is 60, and the shortest for another language is 120 (bayesian weighted), all solutions for that language will be scored as though each char only counted 0.5, comparatively. In other words, the handicap for that language in that hole would in effect be 0.5.

In the Ten-Pin Bowling example, it seems somewhat reasonable that @romancortes could have a higher score than me. It would likely be easier for you to beat my Perl 6 score than to beat his JavaScript score.

1st out of 4 is statistically more significant than 1st out of 1, and therefore has a more favorable bayesian weighting, although, this isn't necessarily entirely fair. In the sample above, I use an m of 1 for every language. However, not all languages can be reasonably expected to have the same number of submissions per hole. This m could - and probably should - be different for every language. The most statistically sound manner to compute this would be the following:

M = n ÷ navg

where:

  • n: the total number of submission for this language.
  • navg: the average number of submissions per language.

In other words, a language with twice as many submissions as the average would need approximately twice as many submissions in any given hole to reach the same bayesian weighting as a language with a total number of submissions closer to the average. I'd be tempted to slap a square root on that though, and probably would. It would prevent popular languages from being penalized too heavily, and would also prevent rarely used languages from having an m less than 0.01, which would pretty much guarantee rank 2.

There's one problem with this, though. Adding a new language would increase the m for every language, causing scores to drop across the board. There's probably a good solution for this, which of course I'll try to find ;)

Also:

Bash, 17 chars

for s;{ echo $s;}
@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 18, 2019

@primo-ppcg I agree that your method is more statistically sound than trying to compute an overall handicap per language, although both are interesting concepts.

Another advantage of this system, is that you can increase your own scores by improving your solutions, even if you don't pass anyone in the ranks. It makes a lot of sense to me that this should be true.

That would be an excellent feature.

The ultimate scoring method would be, for every hole, for every language, test all valid programs of increasing length until a solution is found. Then rank all other solutions according to that one. Of course this is intractable, but I think you're essentially trying to determine an approximate solution.

Brainfuck, 42 chars

-,+[>+<-[.[-]>-<]>[<+++[>+++<-]>.[-]]<-,+]
@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 18, 2019

I agree that your method is more statistically sound than trying to compute an overall handicap per language, although both are interesting concepts.

A global handicap would need to have at least two components: a shift, and a multiplier. Then you could do a least squares regression of the shortest solutions for that language vs. the shortest solutions overall, which each hole as a data point. However, this would be somewhat easy to manipulate. For example, if a language didn't have a solution for a hole, a user could intentionally submit a horrible solution to boost the rest of their scores. To mitigate this, instead of taking the top scores per language directly, you could use a bayesian estimator...

J, 12 chars

echo>2}.ARGV

and also

Brainfuck, 22 chars

++++++++++>,[[.,]<.>,]
@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 18, 2019

Nice Brainfuck solution.

Haskell, 59 chars

import System.Environment;main=getArgs>>=mapM putStrLn.tail
@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 18, 2019

Haskell, 59 chars

You just saved me 7 bytes. And here I was using do and drop 1 like a sucker ;)

@somebody1234

This comment has been minimized.

Copy link

@somebody1234 somebody1234 commented Oct 19, 2019

TL;DR: Quoting the original post, "a decent solution in a 'short' language will out-rank a phenomenal solution in a 'verbose' language" - it may be/is easier to get close to optimal in one language vs another.

I haven't done anything for ages but... I don't like having a handicap. I don't think a linear handicap fully (or even remotely, really) expresses the spread of solution lengths, since in e.g. J, many of the more simple functions are one or two characters in length, whereas in, say, Python, it's several characters, so a change of approach may cost X in one language and Y in another, and X and Y might not be related in any way.

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 19, 2019

Imagine that we had a quantum computer that could calculate the shortest possible solution for each hole in each language by trying all possible programs simultaneously. When we solve a hole, we're striving to reach this optimum solution. If we were actually able to calculate it, I think it would be fair to rank solutions according to the ratio of their length to the length of the shortest possible solution for the language. In the case where the optimal solution had been achieved in multiple languages, it could be considered a tie. Let’s call this the optimal scoring method for the sake of example.

I see the Bayesian approach as a simple approximation of the optimal scoring method. It would tend to approach the optimal scoring method as the number of solutions for a hole increased. For holes with a low number of solutions, it wouldn’t necessarily be a great approximation. It would also be difficult to explain to users.

Simpler Problem: Adjusting for Required Boilerplate

I think it’s interesting to attempt to solve a simpler problem of adjusting for required boilerplate based on language and input/output style. For languages such as Python and Haskell, it’s almost as if there were an automatic penalty for holes that take arguments. I mentioned the example minimal input/output hole, because it might be interesting to use the solutions to that hole as an offset for other holes that take arguments.

For example, we would have a non-ranking Echo hole with the top solutions public. It would look something like the following.

Echo Hole

Lang Size Code
Bash 17 for s;{ echo $s;}
Brainfuck 22 ++++++++++>,[[.,]<.>,]
C 57 #include<stdio.h>␤main(int a,int**v){while(1)puts(*++v);}
Haskell 59 import System.Environment;main=getArgs>>=mapM putStrLn.tail
J 12 echo>2}.ARGV
JavaScript 26 arguments.map(x=>print(x))
Perl 12 say for@ARGV
Perl 6 11 @*ARGS».say
Python 37 import sys;[*map(print,sys.argv[1:])]
Ruby 6 puts$*

Then we could use those lengths as offsets for programs with the same input/output style. Of course, the echo hole code might change significantly when modified to apply it to a different hole, but it should be impossible to have a shorter solution for another hole with the same input/output style.

Revisiting the Ten-Pin Bowling example, here's what the results would look like after updating the table based on the latest scoreboard.

Adjusted Ten-Pin Bowling

User Lang Offset Size Adjusted Size Old Rank New Rank
primo-ppcg Ruby 6 75 69 1 1
SirBogman Perl 6 11 80 69 2 1
romancortes JavaScript 26 99 73 3 3
prplz Python 37 115 78 4 4
primo-ppcg JavaScript 26 124 98 6 5
daedals Python 37 144 107 7 6
daedals Perl 12 121 109 5 7
ElelockSystems JavaScript 26 169 143 8 8
TFeld00 Python 37 183 146 11 9
JRaspass Perl 12 176 164 10 10
jaredcrystal Ruby 6 172 166 9 11
elisherer JavaScript 26 197 171 12 12
ElelockSystems C 57 288 231 13 13
alangodfrey Python 37 350 313 14 14
retrohun Bash 17 405 388 15 15
estomagordo Python 37 541 504 16 16
@somebody1234

This comment has been minimized.

Copy link

@somebody1234 somebody1234 commented Oct 20, 2019

Remember though... that for many languages, longer programs can be compressed, meaning they'd gain more benefit from this simple offset than shorter programs (basically, because different languages have different compression ratios, longer solutions in different languages will gain a differently sized advantage from this offset)

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 22, 2019

@somebody1234 That's true.

@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 22, 2019

I think a Bayesian estimator is a better system than a global handicap. I've found a good solution to "the m problem" which I'll detail in a bit.

But to spin the idea of a global handicap a bit further, a shift alone I think is insufficient. In the example above, Python would have a larger handicap than Brainfuck, which is obviously not ideal. The problem of outputting the arguments newline separated is also somewhat arbitrary; had the task been space separated instead, Perl 6's handicap would decrease by 1, and Ruby's would increase by 4. Any problem will surely need to output something anyway, so I don't think that output and looping would need to be counted as additional boilerplate for holes with arguments. Instead, perhaps, the shortest necessary overhead to obtain a reference to the argument array would be a better approach.

Better still, I think, would be to compute a shift and multiplier via least squares regression, treating holes with arguments as a separate series, solving for both series simultaneously with identical slope, but different y-intercept. For example, a language might have a shift of 5 and multiplier of 0.5 for fixed output challenges, and a shift of 12 also with a multiplier of 0.5 for argument challenges. Again, I don't much like this idea, but I think that this would be a good way to go about it, if it were done.

The m problem

Assuming an m of 1 for all languages is somewhat unfair. Popular languages will have a more favorable Bayesian weighting due to their popularity alone. It therefore makes sense to scale the m with the popularity of the language. However, scaling against the average number of solutions per language would cause all m's to go up if a new language were added (as the average number of solutions per language would go down), causing scores of all other languages to drop.

To avoid this, the most popular language can be assigned a fixed m (e.g. 3), and the rest scaled against that on the range 1 .. 3:

M = 2 n ÷ nmax + 1

where:

  • n: the total number of submission for this language.
  • nmax: the maximum number of submissions for any language.

The current m values for each language would then be the following:

Language m
Python 3.000
JavaScript 1.933
Ruby 1.416
Perl 6 1.356
PHP 1.335
C 1.223
Perl 1.222
Haskell 1.200
Lua 1.167
Bash 1.155
Brainfuck 1.112
Lisp 1.104
Julia 1.094
J 1.089
Nim 1.057

In other words, the most popular language (currently Python), would need approximately 3 times as many submissions in a given hole to reach the same Bayesian weighting as the least popular language (currently Nim).

A sample SQL query of the Bayesian estimator per language per hole would then be the following:

select
    t1.language,
    t1.hole,
    (N / (N + M)) * S + (M / (N + M)) * Sa as Sb
  from (
    select language, hole, count(*) as N, min(size) as S from solutions group by language, hole
  ) as t1
  inner join (
    select hole, min(size) as Sa from solutions group by hole
  ) as t2
    on t1.hole = t2.hole
  inner join (
    select
        t3.language,
        2 * N / Nmax + 1 as M
      from (
        select langauge, count(*) as N from solutions group by language
      ) as t3
      cross join (
        select count(*) as Nmax from solutions group by language order by count(*) desc limit 1
      ) as t4
  ) as t5
    on t1.language = t5.language

This could be stored as a view, and then joined with the solution table to calculate scores directly. And with that, I think my proposal is complete. I'll update the original issue, so that the implementation details aren't scattered throughout the thread.

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 22, 2019

Those are some good points about the limitations of using an offset alone. I no longer like that idea. Instead of calculating and subtracting an offset, it would be better to attempt to reduce it by supporting different input methods. It would probably only be possible to use this for new holes though.

For example, if input were provided via stdin, with one line per argument, Python would require 22 characters, instead of 37. Haskell would require 38 characters, instead of 59. For some languages, we could provide input via both arguments and stdin. For Ruby, Perl, and Perl 6, that wouldn't work well and we would have to choose one or the other.

@primo-ppcg Your latest idea sounds promising. Could you please update the Ten-Pin Bowling example with your new method? I think it would be interesting to have two versions, one with the solution lengths when you created your first table and one with the latest solution lengths, because it would be an example of scores changing over time.

@primo-ppcg

This comment has been minimized.

Copy link
Contributor Author

@primo-ppcg primo-ppcg commented Oct 22, 2019

@SirBogman when I update the initial post, I intend to make a before and after comparison of a hole with many solutions, and a hole with relatively few (most likely Ten-Pin Bowling) to demonstrate the effect of the proposed system.

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 24, 2019

For reference, I calculated how the latest proposal would affect the overall leaderboard. I included all users who were previously in the top 50 and all users who would be in the new top 50.

User New Score New Rank Old Score Old Rank Δ Score Δ Rank Strokes Holes
primo-ppcg 38993 1 38995 1 -2 +0 1835 39
romancortes 37802 2 35071 4 +2731 -2 3095 39
prplz 36587 3 35071 5 +1516 -2 3192 39
GuyJoKing 35612 4 36769 3 -1157 +1 2198 38
SirBogman 32091 5 37442 2 -5351 +3 2443 39
ElelockSystems 31046 6 30654 8 +392 -2 4178 39
TFeld00 30670 7 31830 6 -1160 +1 4527 39
ETHproductions 30484 8 29965 9 +519 -1 2595 34
elisherer 28416 9 31421 7 -3005 +2 4337 39
leon-w 28175 10 28744 14 -569 -4 3988 36
edre 28101 11 26810 16 +1291 -5 2777 32
daedals 28094 12 29229 13 -1135 -1 5025 39
thospel 27759 13 29358 11 -1599 +2 1452 30
retrohun 26987 14 25125 18 +1862 -4 11447 39
orthoplex 25820 15 22228 26 +3592 -11 65643 36
nwellnhof 25564 16 28086 15 -2522 +1 1466 29
HPWiz 25026 17 23070 23 +1956 -6 1350 26
dzhang314 24184 18 24554 22 -370 -4 5414 37
klapointe2 24072 19 15021 64 +9051 -45 12667 37
JRaspass 23888 20 29783 10 -5895 +10 8206 39
marcin7Cd 23687 21 22674 24 +1013 -3 2530 30
yokasuretsu 23455 22 29246 12 -5791 +10 5130 36
user189032 23334 23 20879 30 +2455 -7 3827 30
somebody1234 22747 24 24832 20 -2085 +4 1967 28
cafehaine 22738 25 12033 87 +10705 -62 12890 37
JayXon 22645 26 21612 27 +1033 -1 2539 28
GolfingSuccess 21606 27 21195 28 +411 -1 4134 30
nonplated 21056 28 15433 59 +5623 -31 21522 33
knutae 21036 29 18571 38 +2465 -9 5543 36
AlexDaniel 20722 30 24581 21 -3859 +9 1888 26
alangodfrey 20711 31 18137 44 +2574 -13 7724 39
hoodadam 20577 32 18400 42 +2177 -10 4268 30
bforte 20273 33 13024 77 +7249 -44 2315 24
jaredcrystal 20212 34 25528 17 -5316 +17 17366 35
skps2010 19860 35 19286 33 +574 +2 4371 32
alexkunin 19852 36 18992 35 +860 +1 4021 30
giacomogallina 19325 37 18261 43 +1064 -6 3237 27
jackkav 18946 38 16774 50 +2172 -12 12235 36
Starwort 18895 39 15921 55 +2974 -16 6910 35
UpwardProcess 18805 40 17870 45 +935 -5 15697 34
msrawson 18767 41 17137 47 +1630 -6 5129 32
frobinsonj 18598 42 18895 36 -297 +6 3491 27
sk- 18596 43 18479 41 +117 +2 1677 24
jslip 18352 44 18598 37 -246 +7 1784 21
n1LS 18214 45 22232 25 -4018 +20 5672 34
villfa 18154 46 18512 40 -358 +6 7615 29
mcreenan 18092 47 24891 19 -6799 +28 4757 33
janlelis 18081 48 20673 31 -2592 +17 1518 23
petar-donchev 17799 49 16404 53 +1395 -4 2608 25
michelebastione 17790 50 14722 68 +3068 -18 8019 36
tayloia 17138 54 17783 46 -645 +8 2771 24
labster 16179 62 21013 29 -4834 +33 4823 27
bluebear94 15946 64 19886 32 -3940 +32 2029 23
dloscutoff 15732 67 17066 48 -1334 +19 2115 22
hrrs01 14717 75 18539 39 -3822 +36 1400 22
adoxography 14556 77 17064 49 -2508 +28 4506 27
nephila-nacrea 13656 86 19235 34 -5579 +52 9174 33
@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 24, 2019

Here's an example of how it would affect Ten-Pin Bowling for two snapshots in time.

2019-10-13

User Lang. N M Sb Chars New Score New Rank Old Score Old Rank Δ Score Δ Rank
primo-ppcg ruby 2 1.419 75 75 1000 1 1000 1 0 0
romancortes javascript 4 1.944 91 99 921 2 867 3 +54 -1
SirBogman perl6 1 1.361 81 89 909 3 933 2 -24 +1
daedals perl 2 1.224 104 121 856 4 800 4 +56 0
daedals python 4 3.000 114 144 795 5 667 6 +128 -1
primo-ppcg javascript 4 1.944 91 124 735 6 733 5 +2 +1
TFeld00 python 4 3.000 114 183 625 7 400 10 +225 -3
ElelockSystems c 1 1.216 171 288 594 8 267 12 +327 -4
JRaspass perl 2 1.224 104 176 588 9 467 9 +121 0
retrohun bash 1 1.157 228 405 563 10 133 14 +430 -4
ElelockSystems javascript 4 1.944 91 169 539 11 600 7 -61 +4
elisherer javascript 4 1.944 91 197 463 12 333 11 +130 +1
jaredcrystal ruby 2 1.419 75 172 436 13 533 8 -97 +5
alangodfrey python 4 3.000 114 350 327 14 200 13 +127 +1
estomagordo python 4 3.000 114 541 212 15 67 15 +145 0

2019-10-24

User Lang. N M Sb Chars New Score New Rank Old Score Old Rank Δ Score Δ Rank
primo-ppcg ruby 2 1.416 75 75 1000 1 1000 1 0 0
SirBogman perl6 1 1.355 77 79 971 2 947 2 +24 0
romancortes javascript 5 1.932 92 99 932 3 895 3 +37 0
prplz python 7 3.000 96 105 914 4 842 4 +72 0
primo-ppcg python 7 3.000 96 109 881 5 789 5 +92 0
daedals perl 2 1.222 104 121 856 6 737 6 +119 0
prplz javascript 5 1.932 92 122 757 7 684 7 +73 0
primo-ppcg javascript 5 1.932 92 124 744 8 632 8 +112 0
daedals python 7 3.000 96 144 667 9 579 9 +88 0
ElelockSystems c 1 1.226 171 288 593 10 263 15 +330 -5
JRaspass perl 2 1.222 104 176 588 11 421 12 +167 -1
retrohun bash 1 1.155 228 405 563 12 158 17 +405 -5
ElelockSystems javascript 5 1.932 92 169 546 13 526 10 +20 +3
TFeld00 python 7 3.000 96 183 525 14 368 13 +157 +1
elisherer javascript 5 1.932 92 197 469 15 316 14 +153 +1
jaredcrystal ruby 2 1.416 75 172 436 16 474 11 -38 +5
alangodfrey python 7 3.000 96 350 274 17 211 16 +63 +1
estomagordo python 7 3.000 96 541 177 18 105 18 +72 0
TheAristos python 7 3.000 96 829 116 19 53 19 +63 0
@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 24, 2019

Even though improving your rank on a hole would no longer be the only way to improve your total score, it’s still interesting. For each row in a hole’s leaderboard, we should calculate and display the required number of chars for the score to move up one in rank. I’ll update the ten pin bowling example later to show this.

@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Oct 26, 2019

I've created a script that downloads all of the scores from code-golf.io and creates spreadsheets showing how the proposed Bayesian scoring method will affect leaderboards.

See the script and some spreadsheets here that show different snapshots in time of the leaderboards.

A couple of spreadsheets were created created with different snapshots of the scores. See bayesian-2019-10-24.xlsx and bayesian-2019-10-26.xlsx.

  • The spreadsheets contain one sheet for the overall leaderboard (all-holes) and one sheet per hole.
  • The hole sheets are written using formulas so that you can see how changes to things like m would affect the results.
  • The hole sheets include a "To Rank Up" column, which shows the number of characters required to match or exceed the score at the next highest rank. This is interesting, even though it’s no longer necessary to increase your rank to increase your score. If this scoring method were adopted, this column should be included on hole leaderboards.

Observations:

For the diamond hole, the top Brainfuck answer is ranked highly.

For the divisors hole, the 54 character Python answers are higher than the 23 character Perl 6 answers. The haskell answers are pretty high too. The "To Rank Up" column shows that, if the Perl 6 solutions were reduced to 22 characters, their score would surpass the 54 character Python scores.

The new system has some properties that the current system does not.

  • The score on the overall leaderboard is now simply the sum of a user’s best scores for each hole.
  • When filtering by language, the scores shown on a leaderboard will not change.
  • Users can improve their score for a hole without passing another solution in rank.
  • If the top solution for a hole is improved, the scores of others can be reduced. This also affects the overall leaderboard. See the Ten-Pin Bowling hole in the example spreadsheets.
@SirBogman

This comment has been minimized.

Copy link
Contributor

@SirBogman SirBogman commented Nov 18, 2019

I've updated the code-golf-scraper to work with the updates to the site. It's also now downloads data faster and more efficiently. I also added a new generated spreadsheet directly to the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.