-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Penalty / Award Standardization - Duplicate Report PoC Thoroughness #90
Comments
Regarding POCsMy personal recommendation is to award POCs via the "Selected for Report" label, especially if the POC is a test the Sponsor can use. But I'm also against punishing a report because there's no coded POC, and believe I was congruent with my own beliefs in judging above. Looking forward to other people's feedback |
Thanks for the feedback @GalloDaSballo! The core idea here is to discuss whether a code related PoC should affect any form of judgement either punitive or positive. As a code related PoC is not advised in the submission form, favouring findings with a code related PoC works as if we are downgrading all the remaining ones. I believe this needs to be strictly clarified in either the documentation of C4 or the submission form itself as presently a hidden grading rubric exists whereby findings with a code related PoC can be awarded a higher amount. |
To re-iterate and highlight; the focus of the issue is not the exemplary cases but the way finding submissions are meant to be treated. Creating a code related PoC when it is known to impact the grade of a finding is a guideline that most wardens will follow, however, it is presently unclear as to whether it has an impact in the final outcome of a submission. |
I would re-iterate that the best report is the one that explains the problem in the most thorough yet concise way, and naturally a coded demonstration does that better than most other options However, on average, I believe that a lack of POC should not be used to downgrade a report. I would instead encourage Judges to reward the presence of a coded POC via the "Selected For Report" |
A selected for report finding is awarded a higher reward for the same finding than it’s duplicates due to the usage of a shared pool system. As such, a guideline to select code PoCs must be incorporated to the docs of C4 to ensure an even playing field for all wardens. As an individual I agree with @GalloDaSballo’s assessment and believe that code PoCs should be awarded for the effort required to produce them in the first place. It would be good to get some more feedback from other judges to come to a grading agreement for duplicate findings on this topic. Once that’s done, the relevant sections of C4 (i.e. Judging Criteria, Submission Policy) need to be updated to match the discussions engaged here. |
I think coded PoCs are important and bring benefit to the sponsor and judging, especially when there is a complex issue that is not immediately clear from text. Coded PoCs show exactly how the issue can be exploited. Often times writing out one or two paragraphs is not clear in how the attack works, supplementing the text with a coded PoC makes the explanation more thorough. I agree with @GalloDaSballo that coded PoCs should not be necessary to receive 100%. Some issues are so simple that it is unnecessary. However user's who do not have a PoC risk having a 25/50% rating is they have not explained the attack in enough detail or thoroughness. I'm not sure if it is necessary to say a PoC is required or encourage to promote an issue to be "best". That is because "best" is given to the issue that provides the most benefit to the sponsor. Often times there will be another issue which has explained the impact better or has a better recommendation that could be the "best" without a coded PoC. I would be for adding some criteria that is required for Medium and separate criteria for high for awarding / downgrading. e.g. a couple of examples of criteria could be
I think these criteria should be documented for wardens to see, however it's still going to require a subjective decision by the judge. Due to the possible variance of each issue having a set grading criteria will still need to be supplemented by a judges opinion. |
To continue the discussion, I'd like to contribute a couple of thoughts. Given that there are a few innate criteria by which submissions are judged, I believe they should be listed in the documentation to aid wardens in increasing the quality of their work, thereby aiding in the whole lifecycle of a C4 contest's submission (judging, disputing, penalizing, etc.). A grading system similar to the one in place for Q&As may be wise to implement for medium / high severity findings unless that is considered too much of a burden on the judges' side time-wise. I would like to detail an example adjustment to the "Judging Criteria" documentation page of C4 and in particular "Duplicate Submissions": Duplicate SubmissionsShould multiple submissions describing the same vulnerability be submitted, Judges have the discretion to place these bugs into the same bucket, in which case, the award will be shared among those who submitted. However, multiple submissions from the same warden (or warden team), are treated as one by the awarding algorithm and do not split the pie into smaller pieces. To ensure an even playing field for all wardens and mandate a certain level of quality in all submissions, there are a set of criteria that need to be evaluated when assessing whether a duplicate finding is meant to receive the full reward, a penalization, or the "selected-for-report" bonus. General GuidelinesBeyond the guidelines meant to eliminate low-effort submissions (i.e. formatting, correct language, etc.), each submission should be graded on a simplistic system ranging from
ImpactThe impact chapter should be graded based on the accuracy of the statements made within it as well as how thorough and concise its description is. One of the main principles of C4 is that a warden comprehends and understands the vulnerability they submit in full; as such, an impact chapter containing an inaccurate description of how the system is affected may become eligible for penalization at the Judge's discretion. PoCTo adequately illustrate the impact of a vulnerability, a short yet easy-to-digest description must be provided for both the Judge and the sponsors to review. When it comes to code, a tangible example of how the vulnerability can be exploited is generally better received and understood over a textual description of it. To this end, we advise wardens to provide a code-based PoC if they detect that their textual description of the PoC spans multiple lines/paragraphs and is generally hard to understand. Simple vulnerabilities may be possible to explain in text, however, a code-based PoC that executes successfully for the described vulnerability guarantees that the vulnerability's impact is conveyed properly. RecommendationThe recommendation chapter is meant to provide instructions to the sponsors of a contest as to how they should remediate the vulnerability described and demonstrated in the previous chapters. As multiple avenues to remediate a vulnerability may exist, a single properly described one is sufficient to render the finding eligible for a full reward. For this chapter, code snippets that visualize how the changes advised would be applied to the codebase greatly increase the usefulness of the submission for the sponsor and thus its overall quality. As a final note, if the recommended course of action in this chapter does not adequately remediate the vulnerability or introduces another the finding may become eligible for penalization at the Judge's discretion. Medium Risk FindingsWhile medium-risk findings should be graded using the same rubric as high-risk findings, they can be a little more tolerant if only one of the chapters of the submission is of inadequate quality. This does not necessarily mean that findings of medium severity will be rarely eligible for penalization; it is meant to instead lower the effort required to grade medium-risk findings. Medium-risk findings are generally submitted in higher numbers than high-risk findings and applying the same level of scrutiny as high-risk findings would significantly affect how much time is spent grading each contest and thus the burden of each Judge. High Risk FindingsAs high-risk findings tend to be more complex than other categories, it is imperative that the submission details how the described vulnerability can be exploited in a clear way. The highest level of scrutiny should be applied to findings of this category, ensuring awarded submissions are of exceptional quality and increasing the competitiveness of each high-severity finding's reward pool. Should any of the chapters in the submission fail to articulate the finding in a professional and correct way, the finding may become eligible for penalization at the Judge's discretion. DISCLAIMERThe above is merely an example; I understand that the text may be incorrect or targeted to the wrong audience (i.e. wardens instead of judges and vice-versa). Perhaps a simpler approach is needed or the grading guidelines are too harsh. Feedback is greatly appreciated and expected from judges to keep this discussion alive and make some actual changes to the C4 documentation! To note, these changes are not meant to change how findings are presently awarded/graded. Instead, they are meant to describe the existing methodology employed as it is currently not described in the documentation of C4 and is instead in the mind of each individual Judge. Submission grading has been and will remain at each Judge's sole discretion due to the nature of C4. As a final remark, the grading system above is not expected to be provided as a reply on each given exhibit. Rather, it is meant to serve as a tool for Judges to employ whenever their verdict on a particular finding is questioned either by wardens or a potential appeal committee in a less-subjective and commonly agreed on way. |
I think that a much of the controversy is due to framing the grading as penalties, such as in "if you don't provide a PoC, you'll be penalized by 50% of the reward". Consider this phrasing now: "If you make the effort to code a PoC, you'll get double points". I like PoCs in C4. We get inexperienced wardens that would benefit from writing more of them, and we would get many less invalid submissions if more PoCs would be coded. Not only that, but it is easier for sponsors and judges to see the assumptions taken by the warden in a PoC. While I understand that very experienced wardens could write text PoCs as good as code PoCs, I think those are in the minority. In general, I like rewarding wardens, of all skill levels, for going the extra mile and coding a PoC that keeps the invalid submissions down and lightens the work of lookouts, sponsors and judges. In Maia/Ulysses I've announced to the wardens that the judging policy will work like that, and I'll report on results.
|
I would recommend not enforcing this approach literally There are many types of findings that don't necessarily require a POC Saying that in general findings with POC will be preferred makes sense and I personally have consistently given the "Chosen for Report" label to findings with Coded POC vs those that don't have it However, there are many scenarios where a Coded POC doesn't make sense, such as Incorrect Variables, Reverts due to non-existant functions, Incorrect String Hashing, Logical Flaws, Lack of Access Control I think there are a lot of exceptions to the rule of needing a POC and don't recommend enforcing this idea literally |
While I feel that there are probably cases where a PoC doesn't make sense, I don't think they would be that common. I've applied this rule already in the past three contests I've judged, and for all valid findings with duplicates, someone has always coded a PoC. If a PoC is indeed not possible for a vulnerability, then no warden will code a PoC for it, and all wardens will be the same reward (wrt to PoC). |
PoCs serve as the ultimate proof that the accompanied hand-waves actually hold. They're also highly useful for the sponsor for triaging as well as adding them as regression tests. We can have an in-depth discussion around whether defaulting to partial scoring without PoCs is superior. The fundamental underlying issue is the fine balance of time overhead per submission for wardens vs. judges. This approach will tilt things heavily in the judge's favor. PoCing is great but takes a long time, sometimes just the setup will take hours if there are config issues. For newer wardens, time spent PoCing is time we don't give them to explore deeper areas of the code. Penalizing may also make them withhold important findings. For top wardens, PoCing very clear issues increases fatigue and isn't necessary for them to confirm the issue. For judges, if the report is not convincing without a POC, they are entitled to close the issue (Even if it ends up being legit, if it wasn't explained well enough it is not eligible for rewards). I believe balance can be maintained this way, without tilting the burden of proof over the edge. |
Most of the coded POC complains come from senior wardens that know how long it will take to write a POC for every finding. If you require them to write coded POCs, then they'll have less time to dig into every area of the code, and you'll either get fewer findings by them (bad for the sponsor), or they'll be forced to take the lower payout (bad for them and for C4 in the long term). @alcueca what would be your plan for cases where the warden didn't provide a coded POC, but is the only one that found the finding? I'd hope that they'd get the maximum points for those cases. I see that you said |
If it's a solo finding, or if no one submits a PoC in a group of duplicates, then by logic no reward modifiers would be applied. |
I think this nails it on the head. I would suggest that the senior wardens team up with more junior wardens (not necessarily in an equal split). The senior wardens can do as they did before, and leave to the junior wardens their team do the PoC coding. Another thing I could do is to apply this rule only to Highs, and use a more subjective appraisal for Mediums. There are not so many valid different Highs per contest, so it is not unreasonable to ask for those to have a coded PoC. |
Setting up and maintaining a team sounds like a lot of work in and of itself. Some may not already have that expertise, or be interested in spending time developing it. At one point there was a C4 initiative to create mentorship, that seems to not have gotten off the ground. A big draw for these contests is that you can do them when you have time, and not have to worry about soft skills. |
I believe that a coded PoC shouldn't be mandatory to receive a full reward. As others have already mentioned, this requirement has too many drawbacks:
It's not necessary for the judge to have a coded PoC for all the duplicates; one is sufficient. Therefore, the 30% bonus for the best report could be allocated for awarding it, like @GalloDaSballo is suggesting |
I also believe that a coded PoC shouldn't be enforced to receive a full reward. I understand enforcing POC will reduce invalid issues, however, I believe this is the duty of lookouts to filter out those invalid issues instead of increasing the burden of wardens. As long as sponsor can understand the issue clearly, I don't think it makes sense to give them penalty (rewards for POC = penalty for no POC) |
As a senior researcher you can also be smart about which PoCs you code, and which ones you omit. Obvious vulnerabilities will be found by many wardens, subtle ones by few wardens. Code the PoCs for the subtle ones first, as they will give you a bigger payout. Which vulnerabilities are the subtle ones? Those that required you to fully understand the protocol, not the ones that were a matter of pattern-matching. So maybe dig into every area first, and then code the PoCs optimizing for those that you think have better chances of having few duplicates. Let the more junior wardens (competing against you) code the lower value PoCs for the more obvious vulnerabilities. Note that the pool of money to be distributed is the same. If everyone submits less reports in the same measure because everyone spends more time coding PoCs, then everyone still gets the same payouts as before. |
IMO, the only benefit of this is to reduce workload for judge & lookout, sponser gets less report, wardens spend time in writing meaningless codes (if the issue can be explained clearly by words) instead of finding bugs. |
I personally do not agree with making it mandatory to submit a coded PoC. A non-exhaustive list of reasons:
By the way, I am down to increasing the bonus for "Selected For Report" to encourage wardens further. |
As was stated by somebody on twitter, people would also actively avoid contests where they knew POCs would be required, in favor of other concurrent ones. This may include a lot of senior wardens. |
I'm also not in favor of making PoCs mandatory. Also, there have been a few cases of invalid submissions that had PoC, making it mandatory might bring a surge of spammy submissions with a PoC, which might be much more difficult to handle (good luck trying to understand why the PoC runs perfectly fine despite the issue being clearly invalid). |
Fine, whatever. I will grade submissions based on quality, and lack of PoC won't automatically mean 50%. |
Enjoy your subjectivity y'all. |
Per the Autumn 2023 C4 Supreme Court verdicts, the Supreme Court's verdict on this issue is:
Link to verdict: https://docs.google.com/document/d/1Y2wJVt0d2URv8Pptmo7JqNd0DuPk_qF9EPJAj3iSQiE/edit#heading=h.70jqb921kz0g |
Hello everyone, I would like to discuss today an issue w/ regard to penalty / award standardization for duplicate reports.
Description
Currently, there is no strict guideline in the C4 documentation concerning how a duplicate finding is meant to be penalized or how a finding is selected for the report. The following excerpt is the only guideline:
Argument
The point I would like to discuss is the necessity of a PoC (Proof-of-Concept) containing a code example. I believe that a Proof-of-Concept can be textual, not contain any exemplary code of how to exploit, and still be considered a valid finding. In the auditing industry, it is pretty standard to contain purely textual descriptions of how a vulnerability can be exploited (based on personal experience as well as publicly available reports).
A thorough textual description of how someone can exploit a vulnerability should be considered equivalent to a code-only Proof-of-Concept that demonstrates the vulnerability in practice with code. The outputs of a report are, in most cases, compiled to a browsable web-page. This is performed to contribute to the ecosystem and allow users to look up issues that have historically occurred and not repeat them.
By incentivizing code-related Proofs-of-Concept submissions rather than textual descriptions, we end up with PoCs that are specific to the project rather than specific to the vulnerability. Ensuring both types of findings are treated on equal grounds is, in my opinion, the ideal approach for the benefit of both the projects and the ecosystem.
Examples Requested
We welcome any warden and judge alike to provide us with a tangible example of a judgement discrepancy arising from PoCs.
As things stand, the most observable pattern is the selection-for-report pattern for submissions with a PoC over ones without one which does affect the topic discussed by this issue.
Parting Remarks
I would be glad to hear some judges' feedback on this and it would be good to clarify how penalization as well as selection-for-report works in the Code4rena documentation.
The text was updated successfully, but these errors were encountered: