New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2017 Rating #40

Closed
kyprizel opened this Issue Feb 14, 2017 · 165 comments

Comments

Projects
None yet
@kyprizel
Contributor

kyprizel commented Feb 14, 2017

Hi,
now, when people say that voting-based rating schema does not work - let's discuss how can we fix it.

Should we use ELO-like system? Something else?

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Feb 14, 2017

@v0s, you had some ideas.

@v0s

This comment has been minimized.

v0s commented Feb 14, 2017

Well, i loved the Glicko2 experiment https://kitctf.de/glicko2/
I'm quite puzzled that it heavily depends on the parameters though... but perhaps it could be worked out by trial.

So 1. Foundation is Glicko2

  1. Degrade rating with time
    Say, every day every team's rating drops by 0.2%. That prevents a team to outpwn everyone numerous times, gain 1st place, and then not play and stay 1st forever.
    −0.2%/day gives a 50% decrease if a team doesn't play any CTF for a year straight.

  2. Button to play an event unrated
    Sometimes there are shitty events, or just events when everyone's on the team is busy and there are couple guys playing for fun. While it's useful to have the results listed on CTFtime, not giving full effort to these events can ruin team's Elo rating.
    Teams should be able to opt out of counting them in the rating for a specific event. [ 💩 Shitty CTF ] button should be accessible for a reasonable amount of time that a team can evaluate the challenges, but not too long so that a team can't drop out because today opponents pwn them.
    First quarter of event runtime (but with a minimum of first 3 hours) looks reasonable.

@kara71

This comment has been minimized.

kara71 commented Feb 14, 2017

I definitely don't like the idea of an elo rating where we can lose ranks if we perform badly. I am in an academic team and it's hard to determine in advance how many players will come and how much effort they will put. Sometimes we start very slow but teammates who finish homework early join us and manage to put us in the top 50. Of course, it would not be impossible to anticipate how much time every team member can help for a CTF so we can determine whether we play or not (or 💩 as @v0s suggested) but it would take a huge effort of planification for every single CTF. Also, this would prevent us from playing some CTFs which would make us miss some great CTFs. What I'd imagine is a rating system where if a team plays, however bad their score is for the CTF, they still gain more points than if they were not playing.
The downside of that (which I feel is not that much of a downside) is that it would promote weak teams that play often compared to strong teams that rarely play.

@Pharisaeus

This comment has been minimized.

Pharisaeus commented Feb 14, 2017

As the Glicko2 experiment showed, it's not a good idea in my opinion. Mostly because it basically discourages from playing for the risk of falling down in ranking. This might also promote playing some shady ctfs no-one heard about, which appeared on ctftime after the event already finished.

Even the option to select the event not be counted in is not a good idea, because either it's available after the event and can be abused, or is available only before / shortly after starting when it's often difficult to decide if we're going to play seriously or not. We have a small team and for most of the events there are ~4 people playing. We usually can't tell before the even how seriously we will play, apart from a few large events we try to do our best.

If anything I would consider scoring CTFs based on number of top teams participating. In general this works because if a CTF is good, there are strong teams playing it, and if it's some noname indian-style CTF then a lot of teams don't bother with it. It also removes the option to abuse scoring by the top teams, because the only way to lower the score of an event is not to play, and then you lose points by not playing at all. It would be nice if someone made a statistics of last year events to compare the "score" of the event with percentage of let's say top 20 or top 30 teams in said event.

@msm-code

This comment has been minimized.

msm-code commented Feb 14, 2017

Let me start with stating that I think that most important "mission" of CTFtime.org should be encouraging people to play CTFs. Especially people new to security.

Things like points, ratings, rankings, comments, votes, etc, etc are secondary. Or rather, they are just a means to an end (encouraging people to play CTF), not a goal itself.

And in my opinion ELO and similar systems may be discouraging to some people (negative feedback when losing, fear of performing badly, pressure, tensions in team, etc).

On the other hand, giving some points for every team is creating positive feedback, and does just what it should: encourage everyone to play and learn as much as they can! Yes, it's not perfect, it's not 100% fair when it comes to choosing "best" teams, but it works and it's the best we had.

@waywardsun

This comment has been minimized.

waywardsun commented Feb 14, 2017

It seems to me that there is not a system that cannot be gamed in some way. So, instead of a completely different system, we could improve the current system.

I think it would be good to do the following:

  1. Cap the % of pts a CTF can increase in points from one year to another. This would also cap the % they could decline.

  2. Make points earned based on the current point value of the CTF. All voting would impact the next year score of the CTF. This would prevent the winners from up voting it and the losers down voting it. At least that would be the hope. If there isn't an immediate gain for the top N teams, then I think the voting will be based on the actual quality of the event.

  3. Brand new CTFs get 0 or 5 pts max. If they stick around, they go up. If they don't, then it doesn't matter.

  4. Any CTF with less than 20 teams gets a max of 0-10 pts. This would help with the CTFs that come out of nowhere. I think first year for an already played CTF that wasn't listed should be 0 but the system shouldn't be overly complex.

  5. Maybe add the idea of negative points (to a point...maybe -5) for CTFs that are just total rubbish. As it is now, there is really no incentive for a CTF to improve from one year to the next. They will increase in points because the top teams want more points.

The ideal system to me is one that makes it clear which CTFs are great and which are not as well as one that makes it clear that if you run a bad CTF, the standing will drop.

That way, the "strong teams" that don't want to play every CTF can skip the low point CTFs.

@pedroysb

This comment has been minimized.

pedroysb commented Feb 14, 2017

The idea of Pharisaeus makes total sense to me.

@msm-code

This comment has been minimized.

msm-code commented Feb 14, 2017

Yeah, after reading Pharisaeus post, I agree that this idea makes sense (maybe with some tweaks).

@nazywam

This comment has been minimized.

nazywam commented Feb 14, 2017

I'd say it looks pretty good, here's a list of last year's ctfs sorted by the amount of top #10 teams that played in an event.

http://pastebin.com/raw/qW9CiWnq

Almost all 10-7 events are known and well respected, although a slight modification might be needed for offline events.

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Feb 14, 2017

What if there is some good CTF (well-organized with good tasks and TOP teams participated) running for the first-time? Will it still have 10pts? Or it should have manual "weight"? Now it can get 15-25pts at most via voting.

@immerse

This comment has been minimized.

immerse commented Feb 14, 2017

@nazywam I'd be curious to see that list with top 30 teams instead of top 10. Something to notice from that list: hack.lu 2016 had all top 10 teams playing, but it was much lower quality than previous years. The list does not capture this fact. What is the condition you are using for a team to have "played"? Is it just that they signed up? Maybe the list would be more telling if we only consider a team to have "played" if they got a certain rank or number of points.

I agree that Phariseus's suggestion makes a lot of sense.

Brand new CTFs get 0 or 5 pts max. If they stick around, they go up. If they don't, then it doesn't matter.

I don't think this makes sense. Some CTFs are recent but surprisingly good (e.g. 0ctf) while others are old but have dropped in quality (hack.lu, codegate). It would also be more encouraging for organizers if the initial score is based on quality somehow. (The tricky part is how to assign the initial weight.)

@v0s

This comment has been minimized.

v0s commented Feb 14, 2017

@Pharisaeus interesting idea to calculate event weight based on number of top teams. How to prevent it from turning into an "oligarchy"? Looks like it's making favor to top teams (they play a CTF → it becomes high-weight → they gain more points for it and become even more "top")

@ngg

This comment has been minimized.

ngg commented Feb 14, 2017

We could steal some ideas from ELO-like systems like what topcoder uses.
It would really be dismotivating when your score decreases just because you try to play a CTF without your whole team or when you don't have a lot of time.
But the scores should increase based on which teams played the CTF. If you win a CTF against the best teams then it should increase by a lot.

@v0s

This comment has been minimized.

v0s commented Feb 14, 2017

I won't say it publicly, but i personally don't like the current meaning of CTFtime top as "who plays more single CTFs" and not "what team will typically win in a contest". Just compare year 2016 for dcua (1st place with 1625) and Dragon Sector (the runner-up with 1435 — 10% less).

oops, i said it publicly

Still, i agree about stimulating new teams to learn (play as many CTFs as they can). Maybe we can work out a way that solves both needs for objectivity and motivation?

@Pharisaeus

This comment has been minimized.

Pharisaeus commented Feb 14, 2017

@ngg by this is exactly why I don't like ELO -> teams will consider not-playing at all if they are not full strength. ELO works based on assumption that every time you play, you do your best, and this is simply not true for CTFs, especially when there is more than 1 CTF per week nowadays. If you look at top teams you will see that each one of them has multiple CTFs finished far behind what you would expect from them, and I'm pretty sure that in such cases there were just very few people available to play.

The idea I proposed earlier might not be the silver bullet (it would have to be somehow modified for on-site finals maybe, considering there is generally very few teams there at all), but it does have few strong points:

  • Good CTFs are played by strong teams, bad/easy CTFs are skipped by some of the top teams
  • There is large bonus on CTFtime for finishing close to the top of given CTF, and here we automatically consider the fact that winning against many top teams is harder.

There is a risk here, that a top team will not play a new CTF, so we might end-up with a good event with small score. But it's already like this anyway, since currently new CTFs could not get high scores either! Also maybe event creators should provide something like "target score" they aspire for? This way teams would know what they can hope to expect at least.

@v0s I doubt there is any risk of oligarchy for a few reasons:

  • Teams are rather diverse and there is little chance of them having a single agenda.
  • Teams compete against one another and in this configuration it is always good for you to play, so I can't see how a team or even a group of teams can "game" this. Especially if we take top 20 and therefore each team could have 5% impact on the final score of the event. So you could decide not to play to lower the CTF weight, but it would be hardly noticeable by anyone else. In most cases you would gain more by playing than by trying to lower the score of others by not playing.
  • After all it still comes down to scoring high on a CTF, so any team can get high, because there are no restrictions on who can play. I can't see how this favours top teams in any way. In fact it makes it more fair because of how CTFtime counts bonus points for teams finishing at the top of a certain event. Winners, even if they have the same amount of points as lower teams, get significantly more points, and in reality it's easier to win/be closer to the top if no/few strong teams participate. So even if the event is reasonably good, it makes sense for it to have lower score if no top teams played, simply because as a result it was easier to finish higher.
@niklasb

This comment has been minimized.

niklasb commented Feb 15, 2017

I wrote https://kitctf.de/glicko2/, and I have tried standard ELO and TopCoder ELO before Glicko-2. I think none of these are really working for CTFs to be honest, for the reasons stated by @Pharisaeus and also just because there are not enough data points to get a stable rating. Just try tweaking the initial values slightly and watch how the scoreboard changes drastically. Also note that the implementation for 2016 only considers the top 20 for each event, so as to not brutally punish teams that didn't play with full effort during any of the CTFs. This couldn't be done in 2017 because it would lead to some very weird dynamics.

I also think that "increase-only" ELO like @ngg suggests is not really possible, but would be happy to be proven wrong by somebody with more experience in statistics. If there is a good way to implement it that I don't know about, I think that would be a nice solution.

@bl4de

This comment has been minimized.

bl4de commented Feb 15, 2017

Why all CTFs can't have exactly the same amount of points available?

My idea is as follows:

Let's assume there is 1000 points limit/pool/gap/whatever.
Teams gain points for global ranking on CTFtime proportional for scoring in particular CTF:

  • if there are 3 teams played, first team gets 1000, second one 500 and last one 1 point.
  • if there are 10 teams played, first team get 1000, second 900, third 800, fourth 700 and so on, tenth team get 1 point.
  • 100 teams - winner gets 1000, second team 990, third 980 and so on, the last one, 100th team - again, 1 point.
  • 1000 teams - winner gets 1000, second team 999, third 998, team from position 500th - 500 points, position 501st - 499 points, position 900th - 100 points and so on (I probably did 'off-by-one' error here, forgive me, it's almost 1am in Ireland and I am little tired after long working day :P )
  • 2000 teams - 1st team 1000pts, 2nd - 999.5, 3rd 999, 4th 998.5 and so on. Last three teams respectively 2 pts, 1.5pts and 1 point.

Best classic CTFs collect a lot of teams, so even if team finish as 100th or 200th - still gets good points. And for the winner 1000pts is still well deserved, exactly how it works right now. It will encourage best teams to play even less advanced CTFs (because they will have to play to get points and finally place in top teams of the season) - and will work exactly in the same way for less experienced teams. Every team and CTF will be equal.

From the other point of view - weak CTF with only couple of teams playing won't give you good points, because less teams means less points even for place in Top 10. Winner still gets 1000pts, but hey, if team wins CTF, even weak, they are still the winner, right? :)

And it will be a natural way for ANY CTF to be considered as a good one - number of teams playing.
For CTF newcomers - first year will be the hardest, but from my own experience - it's just a mater of couple first challenges to know if CTF is hard, weak, interesting, boring, screwed up totally or is it something really hard, exciting and forcing to learn a lot.

Finally, there will be only one way to be All-Season Number One - playing and winning. Winning as many CTFs as possible - is there a better way to encourage people to play CTFs? :)

It also prevents from the situation, where really good team wins 4-5 CTFs with the highest weight and ended up in top 10 not playing any other CTFs. Although those teams are still the best, no doubt, it's rather weird situation IMHO.

Last thing - there are still very good CTFs finals played only on-site. This will prevent in natural way from situation, where average or weak teams will get high position in ranking only by winning mostly easy CTFs - because there's no way for such teams to get into eg. DefCon or insomni'hack finals.
Maybe it's worth to consider increasing point limits in such cases, because if CTF is played as on-site final, it's likely one of the best ones. So we still keep some CTFs as "elite".

@Pharisaeus

This comment has been minimized.

Pharisaeus commented Feb 15, 2017

@bl4de sorry but this makes absolutely no sense :) It would basically make it even worse, promoting not only teams who play a lot but also teams who play a lot bad CTFs. The problems we're trying to solve are:

  • How to score teams so that ranking shows "who is good" and not "who plays a lot"
  • How to score a CTF so it's easy to see which CTF is good or hard and which is bad or easy.

What you propose solves none of those, and in fact it makes the first point way worse than it's now. The whole point is not to score "winning as many CTFs as possible"!

Also you want to prevent teams playing rarely but winning hard competitions to be in the top, which is a bit weird, because I believe this is what the ranking is actually for. Not to show who plays a lot but to show who is really strong.

Your last point is not really valid since there are only a handful of on-site CTFs, significantly less than some shit-tier competitions, and in your case winning some 3h noname CTF will be equal to winning DefCON and those teams would be tied in the ranking. How does this make any sense at all?

Coming back to my initial idea - there is one problem I can see. Often before the event we can't be sure if it's going to be good or not, so it might happen that all top teams register and send some initial flags and only then it turns out that the CTF is no good at all. This would still count as "many top teams played", so I guess there would have to be some threshold as proposed by @immerse to consider a team actually playing (such as finishing with more than 10% of top 1 score? or finishing in the top 10%?).

@bl4de

This comment has been minimized.

bl4de commented Feb 15, 2017

@Pharisaeus
That's a lot of words for just 'WTF are you talking about?' :) You're very kind, thank you :)

@pedromigueladao

This comment has been minimized.

pedromigueladao commented Feb 15, 2017

Hi all,

First of all thanks for putting this up to discussion. I run an academic team and we have been discussing this among us for the past year. We were in fact giving up on the CTF ranking as it was going crazy with 1+ CTF/week and all rated very high. It was impossible to keep up to speed with this.

Now about rating:

  • ELO-like: It is weird for me to penalize someone for not doing "as good as usual". In fact, our team started playing CTF about 2 years ago and it is great to see how students learn so much during CTFs. In fact it is rewarding to see students trying to solve a problem for 24h+ even when they are not finding the solution. This usually leads to a big upside afterwards as they study the writeups in detail and learn a lot. Also, I really want them to play as much as possible, without the fear to fail. So no thanks ELO-like ratings.

  • I noticed that some teams are high in the ranking, without a single very-good performance (and my bad here as we also gamed the system for a while). I would expect top-50 teams to rank top-75 in the major events, eg, DefCon Quals, Plaid, BKP, RuCTFe, etc... honestly in my opinion we should try to find a solution that captures this.

Proposal:

  • why not an ATP (tennis)-like ranking system:
  1. name our Grand Slam CTFs - 70 points (all DefCon Qualifiers?!?)

  2. name our CTF-20 and CTF-5 competitions

  3. your rating would be your GrandSlam Points + N-best CTF-20 competions + M-best CTF-5 competitions

  4. Previous year's top-50 teams would grade NEW CTFs as CTF-20 or CTF-5.

  5. After each CTF, all participants would vote a promotion to CTF-20 or a demotion to CTF-5 applicable the year after.

  6. From my experience I would consider appropriate 8 Grand Slams, N=5 and M=5. It is about 2/month/school-year.

Advantages:

  • you cannot increase a lot your ranking by doing good in a lot of average competitions

  • item 4. can even be done after the CTF as your CTF-X competitions are capped by N and M.

  • one could even consider that if you have > N CTF-20 competitions then you could used them for the M CTF-5 (of course adjusted to a max score of 5)

  • conversely, if yours CTF-20 competitions < N, you could use your "excess" CTF-5 competitions in your CTF-20 pool (capped obviously to 5)

@kara71

This comment has been minimized.

kara71 commented Feb 15, 2017

@pedromigueladao pretty good system, but it would discourage us to play some CTFs when we know we're not going to beat our n-th best position.

A similar system (without a hard cut on a determined number of CTFs but with a smooth decreasing curve instead) is the one used by kattis. More info on https://open.kattis.com/help/ranklist about the ranking which would need to be adapted. The "combined scores" formula has two nice things :
-"Adding a new user [CTF ranking for us] with a non-zero score to a group [CTF team] always increases the combined score"
-"About 90% of the score is contributed by the 10 highest scores" (the parameter is adjustable)

@Pharisaeus

This comment has been minimized.

Pharisaeus commented Feb 15, 2017

@pedromigueladao I though about similar idea to split CTFs into "tiers" and score them based on tier they are in, but there are issues here as well:

  1. There might be no consensus for splitting CTFs into tiers because you can't decide before the event (you never know if it's going to be good) and afterwards your choice might be influenced by the results. This was already present last year, where some teams (I won't point fingers...) clearly stated that they're voting for their own immediate benefits. So if team X won/finished high a certain event they will try to promote it to higher tier. You can see this even now in voting for some events last week. There were two very poor CTFs but they are still getting upvoted by teams who finished high. Conversely, if a team already "capped" a certain tier they might push to move a certain event to a different (maybe lower) tier just so they can accumulate points also from this one. So again there is a lot of "gaming the system" potential here.
  2. This idea of selecting only N, M events of a certain tier might discourage people from playing, or at least not encourage them to play. Right now each event you participate in gives a reward, and if we consider only a certain number of events of a kind, then people might decide it's not worth playing when there is no benefit.
  3. The constants of 8, 5, 5 seems a bit low considering there were >70 CTFs last year. This might result in numerous ties.

you cannot increase a lot your ranking by doing good in a lot of average competitions

This seems a bit like a disadvantage, because it does not encourage participation.

@pedromigueladao

This comment has been minimized.

pedromigueladao commented Feb 15, 2017

@Pharisaeus of course you can game any system we come up with.

  • I guess we somehow agree that DefCon Quals are usually good CTFs (with high probability). This gives us the GrandSlam ones.

  • I agree that you can upvote/downvote according to you performance and stategy but notice that item 5. is applicable only to next year's competition. We only have this issue for new CTFs. That is why I suggested a poll of top-50 teams from year Y-1. We could even define a threshold. Say, if 60% consider it was a CTF-20, then most probably it was. It is fair unless 30+ teams are playing the system....

  • and yes, you could juggle CTFs around among tiers according to your strategy but wouldn't it be easier to just play a CTF-5 competition and win it (if you did well in a CTF-20 you might as well do it)

  • discouraging people: I guess you can always improve unless you get all-first-places in all categories. In fact, the only way you can be sure you'll be Number-1 is if you win all possible competitions, otherwise others might end up scoring as many points as you. Anyway, do you really believe there will be a lot of ties given that 8 competitions are worth much more than the others?

  • values N and M can obviously be adapted. but does it make sense to have N+M+8 > 24 (given that there are almost no CTFs in mid-Decemebr to January, and August?)

  • well you said in your previous comment

How to score teams so that ranking shows "who is good" and not "who plays a lot"

if you do not put a threshold, you will end up confusing the two

How to score a CTF so it's easy to see which CTF is good or hard and which is bad or easy.

I do not see how to do it before-the-fact.

@pedromigueladao

This comment has been minimized.

pedromigueladao commented Feb 15, 2017

@kara71 you are not trying to beat your n-th best position. You are trying to beat your n-th best rating-score (that would be computed as it is now rating*your_score/winner_score)

@kara71

This comment has been minimized.

kara71 commented Feb 15, 2017

@pedromigueladao my bad, read that wrong.
It seems inevitable to have a weight vote since a CTF's quality can only be observed by humans (until AIs take over the world). What I think would be a good system is have a weight voting similar to what's currently in place, but without the people voting extreme values (I admit I often tend to put the max rating). We could give a score to every team according to how close their ratings are to the final rating. The rating would be an average of everyone's vote weighted by their accuracy score. We should be careful however to balance the ratio from a very accurate team vs a low accuracy team if we don't want the rating to be decided by a handful of teams (as it is currently with CS:GO's overwatch system). I might propose some formulas for that system if you're interested

@bl4de

This comment has been minimized.

bl4de commented Feb 15, 2017

I think idea @pedromigueladao of "tiers" is something what should be considered as a ground for new scoring system. Maybe I'm wrong, but you all should agree that some of CTFs are natural "Grand Slams" and are "the core" of every season.

Maybe this is something you should start from:

https://ctftime.org/ctfs

Consider top 20 CTFs from this list as "Grand Slams" and let their scores as they are right now, rounded to the nearest INT value.
Other CTFs with at least 2 years history put on "Regulars", with actual score rounded as well.
Newcomers starts with 5 points.

After each CTF every participate team can vote (note - a TEAM, not every player as it is now - team is responsible to choose voting member, I suppose it will be a captain) up/down. Every note is worth 5 points - up equals +5, down is -5pts, based on whole team experience during the CTF (variant fo @kara71 idea of excluding extreme values)

Pros:

  • it will be up to organizers to make CTF as fun and enjoyable as possible for more than one season. Well deserved reputation will require years
  • reputation gained so far is preserved
  • if any CTF will reach last "Grand Slam" CTF score - it landed there and current last "Grand Slam" goes to Regulars

Cons:

  • really great newcomer will not have an opportunity to gain more than 5 points at once. But it can be resolved by additional request/discussion here
@pedromigueladao

This comment has been minimized.

pedromigueladao commented Feb 15, 2017

@bl4de but would you consider capping the number of CTFs/tier you could use for your ranking?

@Pharisaeus

This comment has been minimized.

Pharisaeus commented Feb 15, 2017

@pedromigueladao

I agree that you can upvote/downvote according to you performance and stategy but notice that item 5. is applicable only to next year's competition. We only have this issue for new CTFs

Then it's broken from the start ;) The fact that CTF was good this year does not mean it's going to be good next. The same goes for bad ones. We've seen this already. Some CTFs improve, other deteriorate, for various reasons. Often people who organize change (university teams change a lot over time etc). All ideas on "voting for next year" will suffer from the same issue.

We could even define a threshold. Say, if 60% consider it was a CTF-20, then most probably it was. It is fair unless 30+ teams are playing the system....

This could work if many teams were interested in voting, which is often not the case. And I think you would be surprised how many teams try to play the system ;)

but wouldn't it be easier to just play a CTF-5 competition and win it (if you did well in a CTF-20 you might as well do it)

Not really because the people who play are usually the same. There are some teams who skip easy CTFs, but apart from that you have a lot of teams playing most of the competitions. And in case of CTF-5 or CTF-20 there is little difference in participants list, so it's just as hard to win one as the other.

@bl4de downside of what you propose is that it would take many years for a CTF to get anywhere higher, even if the quality is high from the start.

@immerse

This comment has been minimized.

immerse commented Feb 15, 2017

@bl4de downside of what you propose is that it would take many years for a CTF to get anywhere higher, even if the quality is high from the start.

Exactly. People have brought up that the ranking should motivate teams to play (and thus learn). But to me it seems just as important to motivate skilled organizers to host quality competitions -- we have far too few of these as it is!

@pedromigueladao

This comment has been minimized.

pedromigueladao commented Feb 15, 2017

@Pharisaeus I guess that we have to assume that in a hacker's-group everyone will try to play the system. Let's just try to make it more difficult or just not worth it.

And yes, next-year's policy is like investments, no one knows how the future will be. But notice that you are discussing between CTF-5 and CTF-20. If you do not delay to a later stage, I guess you will always have the incentive to vote according to your current result.

@bl4de and @Pharisaeus also it is always unfair for newcomers. Eg, we played Google CTF and HackTheVote last year that in my opinion were very good competitions. This past two weeks we played 2 so-so CTFs and most probably they'll be worth the same as the previous two...

@kara71

This comment has been minimized.

kara71 commented Feb 17, 2017

Glad to see that the exponential decay has its fans :)
I can make some analytics on 2016 data to tune the parameters if we agree on it.

@dcua

This comment has been minimized.

dcua commented Feb 17, 2017

LOL folks, you all were against voting/rating weight accumulation, thinking when dcua supported it that it is because they want to cheat. Have fun with 8 best results per year -- got what a problem of team l33t-ization is?

@kyprizel I support your suggestion only as lesson for why democracy matters :)

@kara71

This comment has been minimized.

kara71 commented Feb 17, 2017

@dcua Not everyone was against dcua on that case, maybe the loudest it appears. Data analytics showed no strong indicators of cheating from any top team

@akrasuski1

This comment has been minimized.

akrasuski1 commented Feb 17, 2017

My previous comment was about scoring teams given event ranking. But scoring events is another matter.

Right now, we have voting. It doesn't seem to work well. It creates a lot of unnecessary conflicts, when a lot of people vote either 1 or max, which doesn't really make sense in most cases. People feel obliged to score ~max when they see no obvious problems with the CTF, even if it is not extraordinary and even the voters would agree it's not that good when asked to compare with a better one. Then, other teams try to even out the ranking by voting 1... In the end, the score is mostly determined by the maximum allowed ranking, which I believe is set by kyprizel based on previous year's score and his own judgment. I don't think that devastating majority of teams vote tactically, but still, there are some obvious problems.

My proposition to fix it:

  • create an official guideline saying what a particular score means (e.g. 25 point - nice CTF, but not significantly better than most, 0 points - not a CTF, 5 points - mostly non-CTF, recon, site does not work, exceptionally bad translation or other significant issues, and so on)
  • normalize team's votes. When a team constantly votes 50, it doesn't convey any message, it just disrupts the system. I'd say in this case we should take an average of team's votes, maybe also standard deviation, and scale it appropriately so it matches general expectations (probably pre-2016 (25avg +/- 10stdev) points is fine? Someone would have to calculate the actual values though based on mostly accurate 2015 scores)
  • if normalization is accepted, in the votes view do not show raw votes, but the ones after normalization. Or alternatively, change them to descriptive names (i.e. "above average" instead of 70 points, which after this particular team normalization would mean 35 points).
  • in calculating event score, consider dropping off outliers - i.e. take maybe 70% (or so) of the middle scores. This avoids one team voting off the charts making the others' votes meaningless.
@kara71

This comment has been minimized.

kara71 commented Feb 17, 2017

What if teams create fake accounts to make their vote count more ?

@akrasuski1

This comment has been minimized.

akrasuski1 commented Feb 17, 2017

I don't think fake accounts are a big issue currently, and even if they will become, additional steps in validating authenticity of the team could be introduced. In the end, I'd say most CTFers are reasonable and won't become so cynical to plain cheat (as opposed to soft cheating - tactical voting).

@TheZ3ro

This comment has been minimized.

TheZ3ro commented Feb 17, 2017

It's not soft cheating or tactical voting. Voting will never be unbiased due to psychology.

If you win a CTF it's more likely that you vote high because for you winning it was nice.
On the other hand, if you lose a CTF it's more likely that you vote low because the loss make you feel bad at that specific CTF.
And you vote without thinking to the CTF itself.

@xtrm0

This comment has been minimized.

xtrm0 commented Feb 20, 2017

@kyprizel @kara71 I think N best scores per tier + voting for tiers would solve both the participation and the ratting problems. You could make the N for the 5 points tier to be large enough so that weak teams still have a chance to go up a bit on small and easier ctfs, and teams like @dcua can still get rewarded for participating on every ctf (allowing for professional ctf teams to get a small boost for always participating whilst not prevent really good teams to still win by doing much better on the main events)
Each event's tier should be set beforehand and voting would only influence next year's tier in order to prevent score tangling. You could limit the number of ctfs on each tier, and only decide at the end of the year which ctfs should go up or down in tiers (based on the scores of each ctf, but also on the good judgement of the ctftime admins)

@HugoDelval

This comment has been minimized.

HugoDelval commented Feb 27, 2017

Hi,
as a member of a small team, I can say that being rewarded for every performance you make (in small CTFs for eg.) is really helping building the team. Every member wants to make progress and even to organize new CTF. Having a good ranking on CTFtime also helps finding good sponsors. I think that if you can't see a progress when playing regularly, people will be less motivated.

I feel like people who play regularly to CTFs should be rewarded as they contribute to the CTF community. Also I feel like writeups should be rewarded in a way (based on the ranking of the writeup and the number of writeups for this challenge), writeup is one of the best way to learn security and it would be awesome to reward them in a way.

@ZetaTwo

This comment has been minimized.

ZetaTwo commented Feb 27, 2017

ZetaTwo from HackingForSoju here.

  1. I think Gynvael's idea of majors is really good. Here's one elaboration on it: Decide beforehand each year which (top 10) CTFs are majors, rated at 100 fixed. Have a set (10-20) of mid-tier CTF:s rated at 50-70. All the rest are rate voted as previously. At the end of the year, there's a vote: the 2 worst from majors get downgraded to mid-tier, the two best mid-tier gets promoted to major, two worst mid-tier thrown out and two best from "other" cateogry is promoted to mid-tier. All new CTF:s are introduced as "other". This way a CTF needs at least 2 years of solid organizing to become a major.

  2. Any system which encourages not participating in a CTF is bad. First of all, we want as many people to play as much as possible to help grow the scene. It also creates incentives to impersonate teams to lower their rankings.

@kara71

This comment has been minimized.

kara71 commented Feb 27, 2017

@ZetaTwo I totally agree with your #2, and the system proposed in #1 totally makes sense, especially voting for small CTFs only.

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Feb 27, 2017

Team impersonation can easily be solved with per team auth tokens.
You say "mid-tier CTF:s rated at 50-70" - so 50 or 70? :) Who decide which get 50, which 70?

@kara71

This comment has been minimized.

kara71 commented Feb 27, 2017

I guess the value 50-70 is set in advance and the same for all CTFs

@ZetaTwo

This comment has been minimized.

ZetaTwo commented Mar 2, 2017

@kyprizel Yes but that would require some coupling between the competitions and CTFTime which would place more burden on the organizers. The nice thing about today is that you can just take the scoreboard and more or less don't care about the authenticity of the contents (as long as it isn't blatant cheating ofc).

@rofl0r

This comment has been minimized.

rofl0r commented Mar 5, 2017

i'd like if the rating was based on the relation of total score made in a CTF (by all teams) to the score made by a specific team.
so if a CTF was worth for example 1M CTF-satoshis, and all teams together made 42650 points, of which the top team made 2000, the rating for that team would be (1M/42650)*2K.

ctfrating

advantages:

  • democratic. score 4x more than other team, get exactly 4X more rating points.
  • counts skill, teams with same score will get same amount of satoshis (just that one team was faster doesn't mean it was more skilled, just maybe bigger or earlier).
  • the more teams score, the smaller the difference of the distributed points. so on an easy CTF, the points will be more evenly distributed.
    ...
@ZetaTwo

This comment has been minimized.

ZetaTwo commented Mar 6, 2017

@rofl0r While the core idea is good, the problem is that this creates incentives for teams to create fake teams to affect the worth of the competition. This may or may not be something which can be mitigated but some teams will definitely try this.

Also, I really think it's time to decide on something for this season and go with it. Some kind of system is better than no scoreboard at all.

@kara71

This comment has been minimized.

kara71 commented Mar 6, 2017

Totally agree with @ZetaTwo, the few problems raised with the previous version of the rating do not really matter, the actual problem is not having a scoreboard at all. If it takes long to discuss and implement the new scoring, we should consider putting back the old one temporarily while the new version is implemented.
@gynvael @kyprizel

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Mar 8, 2017

@rofl0r old formula already used events place and points

Ok, looks like no specific formulas so I'm doing it by my own (as usual, sorry).
Well, here are the changes in 2017 rating:

  1. Only 10 best results counted (play better CTFs to get more points)
  2. Last years event weight counted if available and >0
  3. If event runs for the first year (or last events weight == 0) - its weight will be determined by the voting process (max weight is limited to 25 pts). In some extraordinary cases we can discuss it. But it should be something really cool.
  4. Events weight can be increased x1.5 for Jeopardy and x2 for Attack-Defence vs previous years weight but it will be effective only for next year rating (may be, and will be normalized also).
  5. Event organizers still get 2*weight points of organized event (counted as one of 10 best results if applicable)
  6. Votes of the team who got 1st place at some new event is not counted for current year

Happy hacking!

@kyprizel kyprizel closed this Mar 8, 2017

@rofl0r

This comment has been minimized.

rofl0r commented Mar 9, 2017

@kyprizel the bsides ctf https://ctftime.org/event/414 is rated with 24.20, but all teams got 0.000 score. bug ?

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Mar 11, 2017

That was a cache issue. Please, report if you see more.

@adrienstoffel

This comment has been minimized.

adrienstoffel commented Mar 28, 2017

Hey @kyprizel, I don't know it we have the same cache issue as bsides ctf but Insomni'hack 2017 (
https://ctftime.org/event/383 ) got 0 points with a 0 future weight although it was rated in previous years ( https://ctftime.org/ctf/56 ) and is open to all.

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Mar 28, 2017

@adrienstoffel nope - i just forgot to mark it as "public votable". fixed.

@Pharisaeus

This comment has been minimized.

Pharisaeus commented Mar 28, 2017

@kyprizel but why is the max score for Insomnihack 25 if last year it got 45 and even the teaser was 30? :) Also I thought that current voting should go for the next year not for current year, in case of recurring events?

@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Mar 28, 2017

Fixed.

@gynvael

This comment has been minimized.

gynvael commented Mar 29, 2017

@kyprizel
QQ: At what time is the ranking recalculated? I.e. is it recalculated after next-year-weight voting is closed? Or immediately when the points are available?
It seems the prior (i.e. points for the aforementioned Insomnihack are not yet taken into account for the ranking)?

Thanks!

@gynvael

This comment has been minimized.

gynvael commented Apr 3, 2017

@kyprizel
So there is still sth weird about https://ctftime.org/event/383:

  1. I don't think it's calculated into the ranking score (i.e. DS' best 10 scores sums to ~260, but it's at 176pts now)
  2. Future weight is 0, even though the voting has ended and it should be way more than 0.
    Could you take a look?
@kyprizel

This comment has been minimized.

Contributor

kyprizel commented Apr 14, 2017

Calculated when points are available.
That was a bug - should be fixed now.
Please, open new issue if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment