Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Penalty / Award Standardization - Duplicate Report PoC Thoroughness #90

Closed
alex-ppg opened this issue Feb 23, 2023 · 26 comments
Closed

Comments

@alex-ppg
Copy link

alex-ppg commented Feb 23, 2023

Hello everyone, I would like to discuss today an issue w/ regard to penalty / award standardization for duplicate reports.

Description

Currently, there is no strict guideline in the C4 documentation concerning how a duplicate finding is meant to be penalized or how a finding is selected for the report. The following excerpt is the only guideline:

However, any submissions which do not identify or effectively rationalize the top identified severity case may be judged as “partial credit” and may have their shares in that finding’s pie divided by 2 or 4 at a judge’s sole discretion (e.g. 50% or 25% of the shares of a satisfactory submission in the duplicate set).

Argument

The point I would like to discuss is the necessity of a PoC (Proof-of-Concept) containing a code example. I believe that a Proof-of-Concept can be textual, not contain any exemplary code of how to exploit, and still be considered a valid finding. In the auditing industry, it is pretty standard to contain purely textual descriptions of how a vulnerability can be exploited (based on personal experience as well as publicly available reports).

A thorough textual description of how someone can exploit a vulnerability should be considered equivalent to a code-only Proof-of-Concept that demonstrates the vulnerability in practice with code. The outputs of a report are, in most cases, compiled to a browsable web-page. This is performed to contribute to the ecosystem and allow users to look up issues that have historically occurred and not repeat them.

By incentivizing code-related Proofs-of-Concept submissions rather than textual descriptions, we end up with PoCs that are specific to the project rather than specific to the vulnerability. Ensuring both types of findings are treated on equal grounds is, in my opinion, the ideal approach for the benefit of both the projects and the ecosystem.

Examples Requested

We welcome any warden and judge alike to provide us with a tangible example of a judgement discrepancy arising from PoCs.

As things stand, the most observable pattern is the selection-for-report pattern for submissions with a PoC over ones without one which does affect the topic discussed by this issue.

Parting Remarks

I would be glad to hear some judges' feedback on this and it would be good to clarify how penalization as well as selection-for-report works in the Code4rena documentation.

@GalloDaSballo
Copy link

GalloDaSballo commented Feb 28, 2023

Regarding POCs

My personal recommendation is to award POCs via the "Selected for Report" label, especially if the POC is a test the Sponsor can use.

But I'm also against punishing a report because there's no coded POC, and believe I was congruent with my own beliefs in judging above.

Looking forward to other people's feedback

Edit: Removed old comment on examples that are now removed

@alex-ppg
Copy link
Author

alex-ppg commented Feb 28, 2023

Thanks for the feedback @GalloDaSballo!

The core idea here is to discuss whether a code related PoC should affect any form of judgement either punitive or positive.

As a code related PoC is not advised in the submission form, favouring findings with a code related PoC works as if we are downgrading all the remaining ones.

I believe this needs to be strictly clarified in either the documentation of C4 or the submission form itself as presently a hidden grading rubric exists whereby findings with a code related PoC can be awarded a higher amount.

@alex-ppg
Copy link
Author

To re-iterate and highlight; the focus of the issue is not the exemplary cases but the way finding submissions are meant to be treated.

Creating a code related PoC when it is known to impact the grade of a finding is a guideline that most wardens will follow, however, it is presently unclear as to whether it has an impact in the final outcome of a submission.

@GalloDaSballo
Copy link

I would re-iterate that the best report is the one that explains the problem in the most thorough yet concise way, and naturally a coded demonstration does that better than most other options

However, on average, I believe that a lack of POC should not be used to downgrade a report.

I would instead encourage Judges to reward the presence of a coded POC via the "Selected For Report"

@alex-ppg
Copy link
Author

A selected for report finding is awarded a higher reward for the same finding than it’s duplicates due to the usage of a shared pool system.

As such, a guideline to select code PoCs must be incorporated to the docs of C4 to ensure an even playing field for all wardens.

As an individual I agree with @GalloDaSballo’s assessment and believe that code PoCs should be awarded for the effort required to produce them in the first place.

It would be good to get some more feedback from other judges to come to a grading agreement for duplicate findings on this topic. Once that’s done, the relevant sections of C4 (i.e. Judging Criteria, Submission Policy) need to be updated to match the discussions engaged here.

@alex-ppg alex-ppg changed the title Penalty Standardization - Duplicate Report PoC Thoroughness Penalty / Award Standardization - Duplicate Report PoC Thoroughness Feb 28, 2023
@kirk-baird
Copy link

I think coded PoCs are important and bring benefit to the sponsor and judging, especially when there is a complex issue that is not immediately clear from text. Coded PoCs show exactly how the issue can be exploited. Often times writing out one or two paragraphs is not clear in how the attack works, supplementing the text with a coded PoC makes the explanation more thorough.

I agree with @GalloDaSballo that coded PoCs should not be necessary to receive 100%. Some issues are so simple that it is unnecessary. However user's who do not have a PoC risk having a 25/50% rating is they have not explained the attack in enough detail or thoroughness.

I'm not sure if it is necessary to say a PoC is required or encourage to promote an issue to be "best". That is because "best" is given to the issue that provides the most benefit to the sponsor. Often times there will be another issue which has explained the impact better or has a better recommendation that could be the "best" without a coded PoC.

I would be for adding some criteria that is required for Medium and separate criteria for high for awarding / downgrading. e.g. a couple of examples of criteria could be

  • Quality of writing for each of (e.g. deduct for 1 line answers)
    • Impact
    • PoC
    • Recommednation
  • Sufficient formatting + code snippets
  • Coded PoC

I think these criteria should be documented for wardens to see, however it's still going to require a subjective decision by the judge. Due to the possible variance of each issue having a set grading criteria will still need to be supplemented by a judges opinion.

@alex-ppg
Copy link
Author

alex-ppg commented Mar 1, 2023

To continue the discussion, I'd like to contribute a couple of thoughts.

Given that there are a few innate criteria by which submissions are judged, I believe they should be listed in the documentation to aid wardens in increasing the quality of their work, thereby aiding in the whole lifecycle of a C4 contest's submission (judging, disputing, penalizing, etc.).

A grading system similar to the one in place for Q&As may be wise to implement for medium / high severity findings unless that is considered too much of a burden on the judges' side time-wise. I would like to detail an example adjustment to the "Judging Criteria" documentation page of C4 and in particular "Duplicate Submissions":

Duplicate Submissions

Should multiple submissions describing the same vulnerability be submitted, Judges have the discretion to place these bugs into the same bucket, in which case, the award will be shared among those who submitted. However, multiple submissions from the same warden (or warden team), are treated as one by the awarding algorithm and do not split the pie into smaller pieces.

To ensure an even playing field for all wardens and mandate a certain level of quality in all submissions, there are a set of criteria that need to be evaluated when assessing whether a duplicate finding is meant to receive the full reward, a penalization, or the "selected-for-report" bonus.

General Guidelines

Beyond the guidelines meant to eliminate low-effort submissions (i.e. formatting, correct language, etc.), each submission should be graded on a simplistic system ranging from A to C in the following categories:

  • Impact
  • PoC
  • Recommendation

Impact

The impact chapter should be graded based on the accuracy of the statements made within it as well as how thorough and concise its description is. One of the main principles of C4 is that a warden comprehends and understands the vulnerability they submit in full; as such, an impact chapter containing an inaccurate description of how the system is affected may become eligible for penalization at the Judge's discretion.

PoC

To adequately illustrate the impact of a vulnerability, a short yet easy-to-digest description must be provided for both the Judge and the sponsors to review. When it comes to code, a tangible example of how the vulnerability can be exploited is generally better received and understood over a textual description of it.

To this end, we advise wardens to provide a code-based PoC if they detect that their textual description of the PoC spans multiple lines/paragraphs and is generally hard to understand. Simple vulnerabilities may be possible to explain in text, however, a code-based PoC that executes successfully for the described vulnerability guarantees that the vulnerability's impact is conveyed properly.

Recommendation

The recommendation chapter is meant to provide instructions to the sponsors of a contest as to how they should remediate the vulnerability described and demonstrated in the previous chapters. As multiple avenues to remediate a vulnerability may exist, a single properly described one is sufficient to render the finding eligible for a full reward.

For this chapter, code snippets that visualize how the changes advised would be applied to the codebase greatly increase the usefulness of the submission for the sponsor and thus its overall quality. As a final note, if the recommended course of action in this chapter does not adequately remediate the vulnerability or introduces another the finding may become eligible for penalization at the Judge's discretion.

Medium Risk Findings

While medium-risk findings should be graded using the same rubric as high-risk findings, they can be a little more tolerant if only one of the chapters of the submission is of inadequate quality. This does not necessarily mean that findings of medium severity will be rarely eligible for penalization; it is meant to instead lower the effort required to grade medium-risk findings.

Medium-risk findings are generally submitted in higher numbers than high-risk findings and applying the same level of scrutiny as high-risk findings would significantly affect how much time is spent grading each contest and thus the burden of each Judge.

High Risk Findings

As high-risk findings tend to be more complex than other categories, it is imperative that the submission details how the described vulnerability can be exploited in a clear way. The highest level of scrutiny should be applied to findings of this category, ensuring awarded submissions are of exceptional quality and increasing the competitiveness of each high-severity finding's reward pool.

Should any of the chapters in the submission fail to articulate the finding in a professional and correct way, the finding may become eligible for penalization at the Judge's discretion.

DISCLAIMER

The above is merely an example; I understand that the text may be incorrect or targeted to the wrong audience (i.e. wardens instead of judges and vice-versa). Perhaps a simpler approach is needed or the grading guidelines are too harsh. Feedback is greatly appreciated and expected from judges to keep this discussion alive and make some actual changes to the C4 documentation!

To note, these changes are not meant to change how findings are presently awarded/graded. Instead, they are meant to describe the existing methodology employed as it is currently not described in the documentation of C4 and is instead in the mind of each individual Judge. Submission grading has been and will remain at each Judge's sole discretion due to the nature of C4.

As a final remark, the grading system above is not expected to be provided as a reply on each given exhibit. Rather, it is meant to serve as a tool for Judges to employ whenever their verdict on a particular finding is questioned either by wardens or a potential appeal committee in a less-subjective and commonly agreed on way.

@alcueca
Copy link

alcueca commented Sep 25, 2023

I think that a much of the controversy is due to framing the grading as penalties, such as in "if you don't provide a PoC, you'll be penalized by 50% of the reward".

Consider this phrasing now: "If you make the effort to code a PoC, you'll get double points".

I like PoCs in C4. We get inexperienced wardens that would benefit from writing more of them, and we would get many less invalid submissions if more PoCs would be coded. Not only that, but it is easier for sponsors and judges to see the assumptions taken by the warden in a PoC.

While I understand that very experienced wardens could write text PoCs as good as code PoCs, I think those are in the minority.

In general, I like rewarding wardens, of all skill levels, for going the extra mile and coding a PoC that keeps the invalid submissions down and lightens the work of lookouts, sponsors and judges.

In Maia/Ulysses I've announced to the wardens that the judging policy will work like that, and I'll report on results.

  • High/Mediums without a PoC in a group of duplicates get 50%. Inadequate Impact or Recommendation get the reward down to 25%. Only submissions with 100% can be selected for report.

@GalloDaSballo
Copy link

I think that a much of the controversy is due to framing the grading as penalties, such as in "if you don't provide a PoC, you'll be penalized by 50% of the reward".

Consider this phrasing now: "If you make the effort to code a PoC, you'll get double points".

I like PoCs in C4. We get inexperienced wardens that would benefit from writing more of them, and we would get many less invalid submissions if more PoCs would be coded. Not only that, but it is easier for sponsors and judges to see the assumptions taken by the warden in a PoC.

While I understand that very experienced wardens could write text PoCs as good as code PoCs, I think those are in the minority.

In general, I like rewarding wardens, of all skill levels, for going the extra mile and coding a PoC that keeps the invalid submissions down and lightens the work of lookouts, sponsors and judges.

In Maia/Ulysses I've announced to the wardens that the judging policy will work like that, and I'll report on results.

  • High/Mediums without a PoC in a group of duplicates get 50%. Inadequate Impact or Recommendation get the reward down to 25%. Only submissions with 100% can be selected for report.

I would recommend not enforcing this approach literally

There are many types of findings that don't necessarily require a POC

Saying that in general findings with POC will be preferred makes sense and I personally have consistently given the "Chosen for Report" label to findings with Coded POC vs those that don't have it

However, there are many scenarios where a Coded POC doesn't make sense, such as Incorrect Variables, Reverts due to non-existant functions, Incorrect String Hashing, Logical Flaws, Lack of Access Control

I think there are a lot of exceptions to the rule of needing a POC and don't recommend enforcing this idea literally

@alcueca
Copy link

alcueca commented Sep 25, 2023

While I feel that there are probably cases where a PoC doesn't make sense, I don't think they would be that common.

I've applied this rule already in the past three contests I've judged, and for all valid findings with duplicates, someone has always coded a PoC.

If a PoC is indeed not possible for a vulnerability, then no warden will code a PoC for it, and all wardens will be the same reward (wrt to PoC).

@trust1995
Copy link

trust1995 commented Sep 25, 2023

PoCs serve as the ultimate proof that the accompanied hand-waves actually hold. They're also highly useful for the sponsor for triaging as well as adding them as regression tests.
Having said that, I don't think lack of a PoC by itself is grounds for penalizing submissions. Typically I would require either a) a thorough hand-in-hand explanation of the impact supported by code snippets or b) A general explanation + PoC. Lacking both is grounds for 50/25, or in the case of solo find, possibly closing due to insufficient proof with a comment - "come back with a PoC during judging QA".

We can have an in-depth discussion around whether defaulting to partial scoring without PoCs is superior. The fundamental underlying issue is the fine balance of time overhead per submission for wardens vs. judges. This approach will tilt things heavily in the judge's favor.

PoCing is great but takes a long time, sometimes just the setup will take hours if there are config issues. For newer wardens, time spent PoCing is time we don't give them to explore deeper areas of the code. Penalizing may also make them withhold important findings. For top wardens, PoCing very clear issues increases fatigue and isn't necessary for them to confirm the issue. For judges, if the report is not convincing without a POC, they are entitled to close the issue (Even if it ends up being legit, if it wasn't explained well enough it is not eligible for rewards). I believe balance can be maintained this way, without tilting the burden of proof over the edge.

@IllIllI000
Copy link

IllIllI000 commented Sep 25, 2023

Most of the coded POC complains come from senior wardens that know how long it will take to write a POC for every finding. If you require them to write coded POCs, then they'll have less time to dig into every area of the code, and you'll either get fewer findings by them (bad for the sponsor), or they'll be forced to take the lower payout (bad for them and for C4 in the long term). @alcueca what would be your plan for cases where the warden didn't provide a coded POC, but is the only one that found the finding? I'd hope that they'd get the maximum points for those cases. I see that you said in a group of duplicates, but I'd just like to confirm, since you've also suggested changing the semantics from a penalty to a bonus

@alcueca
Copy link

alcueca commented Sep 25, 2023

@alcueca what would be your plan for cases where the warden didn't provide a coded POC, but is the only one that found the finding? I'd hope that they'd get the maximum points for those cases. I see that you said in a group of duplicates, but I'd just like to confirm, since you've also suggested changing the semantics from a penalty to a bonus

If it's a solo finding, or if no one submits a PoC in a group of duplicates, then by logic no reward modifiers would be applied.

@alcueca
Copy link

alcueca commented Sep 25, 2023

Most of the coded POC complains come from senior wardens that know how long it will take to write a POC for every finding. If you require them to write coded POCs, then they'll have less time to dig into every area of the code, and you'll either get fewer findings by them (bad for the sponsor), or they'll be forced to take the lower payout (bad for them and for C4 in the long term).

I think this nails it on the head.

I would suggest that the senior wardens team up with more junior wardens (not necessarily in an equal split). The senior wardens can do as they did before, and leave to the junior wardens their team do the PoC coding.

Another thing I could do is to apply this rule only to Highs, and use a more subjective appraisal for Mediums. There are not so many valid different Highs per contest, so it is not unreasonable to ask for those to have a coded PoC.

@IllIllI000
Copy link

Setting up and maintaining a team sounds like a lot of work in and of itself. Some may not already have that expertise, or be interested in spending time developing it. At one point there was a C4 initiative to create mentorship, that seems to not have gotten off the ground. A big draw for these contests is that you can do them when you have time, and not have to worry about soft skills.

@DadeKuma
Copy link

I believe that a coded PoC shouldn't be mandatory to receive a full reward. As others have already mentioned, this requirement has too many drawbacks:

  • Most seniors don't submit a PoC, which might lead them to migrate to another platform if it becomes mandatory
  • Creating a PoC is time-consuming, which could result in fewer findings
  • Some issues may not have a feasible coded PoC

It's not necessary for the judge to have a coded PoC for all the duplicates; one is sufficient. Therefore, the 30% bonus for the best report could be allocated for awarding it, like @GalloDaSballo is suggesting

@xuwinnie
Copy link

xuwinnie commented Sep 26, 2023

I also believe that a coded PoC shouldn't be enforced to receive a full reward. I understand enforcing POC will reduce invalid issues, however, I believe this is the duty of lookouts to filter out those invalid issues instead of increasing the burden of wardens. As long as sponsor can understand the issue clearly, I don't think it makes sense to give them penalty (rewards for POC = penalty for no POC)

@alcueca
Copy link

alcueca commented Sep 26, 2023

Most of the coded POC complains come from senior wardens that know how long it will take to write a POC for every finding. If you require them to write coded POCs, then they'll have less time to dig into every area of the code, and you'll either get fewer findings by them (bad for the sponsor), or they'll be forced to take the lower payout (bad for them and for C4 in the long term)

As a senior researcher you can also be smart about which PoCs you code, and which ones you omit. Obvious vulnerabilities will be found by many wardens, subtle ones by few wardens. Code the PoCs for the subtle ones first, as they will give you a bigger payout.

Which vulnerabilities are the subtle ones? Those that required you to fully understand the protocol, not the ones that were a matter of pattern-matching.

So maybe dig into every area first, and then code the PoCs optimizing for those that you think have better chances of having few duplicates.

Let the more junior wardens (competing against you) code the lower value PoCs for the more obvious vulnerabilities.

Note that the pool of money to be distributed is the same. If everyone submits less reports in the same measure because everyone spends more time coding PoCs, then everyone still gets the same payouts as before.

@xuwinnie
Copy link

If everyone submits less reports in the same measure because everyone spends more time coding PoCs, then everyone still gets the same payouts as before.

IMO, the only benefit of this is to reduce workload for judge & lookout, sponser gets less report, wardens spend time in writing meaningless codes (if the issue can be explained clearly by words) instead of finding bugs.

@hansfriese
Copy link

I personally do not agree with making it mandatory to submit a coded PoC.

A non-exhaustive list of reasons:

  • Protocols are often pre-deploy phase and a test suite might be not ready for wardens.
    Spending hours resolving the test suite setup (probably by numerous wardens) is very inefficient.
  • Some findings are difficult to write a PoC. (for example, a case that requires multiple actors and external protocol integrations)
  • Obvious logic errors are not always easy to spot. (Example 1, Example 2)
    A scenario-based explanation is more than enough for this kind of submission and I don't think it's desirable to force the wardens to spend time on PoC.

By the way, I am down to increasing the bonus for "Selected For Report" to encourage wardens further.

@IllIllI000
Copy link

IllIllI000 commented Sep 26, 2023

As a senior researcher you can also be smart about which PoCs you code, and which ones you omit

As was stated by somebody on twitter, people would also actively avoid contests where they knew POCs would be required, in favor of other concurrent ones. This may include a lot of senior wardens.

@0xA5DF
Copy link

0xA5DF commented Sep 26, 2023

I'm also not in favor of making PoCs mandatory.
There are more cost-effective ways to fight spam, like the penalization for invalid submissions that was suggested in the past. Making a PoC mandatory would be much more costly for C4 for the reasons mentioned in the comments above.

Also, there have been a few cases of invalid submissions that had PoC, making it mandatory might bring a surge of spammy submissions with a PoC, which might be much more difficult to handle (good luck trying to understand why the PoC runs perfectly fine despite the issue being clearly invalid).

@alcueca
Copy link

alcueca commented Sep 26, 2023

Fine, whatever. I will grade submissions based on quality, and lack of PoC won't automatically mean 50%.

@alcueca
Copy link

alcueca commented Sep 26, 2023

Enjoy your subjectivity y'all.

@kartoonjoy
Copy link

Per the Autumn 2023 C4 Supreme Court verdicts, the Supreme Court's verdict on this issue is:

We will address this as part of a clarification on the core criteria of submission and how lacking certain components would affect scoring.

The requisites of a full mark report are:

  • Identification and demonstration of the root cause
  • Identification and demonstration of the maximum achievable impact of the root cause

For a demonstration to be satisfactory, it can take the form of:

  1. A step-by-step explanation from root cause to impact, with either attached code snippets or a coded POC.
  2. At the judge’s discretion, a highly-standard issue can be accepted with less detail provided.

A lack of identification of the root cause is grounds for partial scoring, downgrading, or invalidating the issue.
A lack of identification of maximal impact is grounds for partial scoring or downgrading of the issue.

Additional factors that can be taken into account for partial scoring include:

  • Quality of the submission in the form of writing or presentation quality
  • Lack or incorrectness of the remediation steps

Link to verdict: https://docs.google.com/document/d/1Y2wJVt0d2URv8Pptmo7JqNd0DuPk_qF9EPJAj3iSQiE/edit#heading=h.70jqb921kz0g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests