New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplication does not work for Whitesource Scan #1654
Comments
Hello @bilalk88 ,
Again, we could do a little better if the library name was put into "file_path" by the parser: we could deduplicate on cve + file_path + severity which would allow cross-parsers deduplication and would be more resilient to title and description change due to whitesource evolutions. |
…ow the default one) Tune the configuration to allow cross-parser dedupe sonar vs checkmarx Configure dedupe for: dependency check, npm audit scan (DefectDojo#1640), whitesource scan(DefectDojo#1654) fix full database hash_code recompute and deduplication
…ow the default one) Tune the configuration to allow cross-parser dedupe sonar vs checkmarx Configure dedupe for: dependency check, npm audit scan (DefectDojo#1640), whitesource scan(DefectDojo#1654) fix full database hash_code recompute and deduplication
@ptrovatelli just adding in here, as we are in process of uploading multiple scan results (fortify, burp, sonar, .. ) into Dojo from which we'll be integrating with JIRA as part of full incident management pipeline. Deduplication of issues with multiple tools is a bit of a challenge. Question I have regarding deduplication: with the new mechanism that allows to specify specific fields used in the hash calculation, why do parsers (i.e. burp) still run some deduplication ( dupe_key = str(item.url) + item.severity + item.title) inside their logic, which will conflict when specifying different fields in HASHCODE_FIELDS_PER_SCANNER? Or do I have that wrong? |
#1854 could now help. The whitesource parser just needs to take advantage of that change. Fyi @alles-klar |
@dmeeetreee that "dupe_keys" thing in parsers should be understood as an aggregation of results. it is done by the parser before the deduplication is called. the aggregation is not configurable. the deduplication configuration should be done having in mind what the aggregation is, indeed: if you aggregate on url+severy+title, you cannot deduplicate on other fields than those because all the other fields will be either not present, or not correct( some parsers just pick one random value amongst the aggregated records). I've fixed this behavior in checkmarx parser by building a concatanation of all the found values rather than picking a random one (it was the last value amongst the aggregated records if I recall which was quite misleading. that same logic is probably still present is some parsers) I think we need to add this in the doc as it's really not obvious. |
although, in some case we could still use other fields in dedupe if they are present in one of the aggregate keys. still it would seem safer to add cve in the aggregate keys to make sure we don't loose data based on incorrect assumptions |
Thanks @ptrovatelli for explaining. As I understand, 2-stage process with aggregation in parser followed by de-duplication. Will have to talk to our business about integrating with JIRA as part of full incident management pipeline: the idea was to have one JIRA ticket / Dojo Finding for each issue/location/line, so if a file had 2 occurrences of same issue (i.e. CWE-798) we'd have separate incidents. (The Finding object/class inside Dojo model has a line number field so this could be supported in theory.) In practice that would require to make line number part of the deduplication key, and outside of aggregator. (Flawed in itself, as active development on a file would then see the line number of incident change place and we'd create duplicates ourselves.) Are there any best practices that can be suggested regarding aggregation/deduplication for our requirement, that is, we are not only interested in reporting the fact a specific security risk is found, but we also want to create a developer workflow/process around it? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Bug description
Deduplication does not work for Whitesource Scan
Steps to reproduce
Steps to reproduce the behavior:
Go to System Settings
Switch on Deduplicate findings
Click Submit
Go to Add Product
Fill the required fields.
Click Submit
Import Scan Results for Whitesource tool multiple times
Actual behavior
All findings have status just "Active".
Expected behavior
All findings have status "Duplicate".
Deployment method (select with an
X
)Environment information
Sample scan files (optional)
production-vulnerability-report.json.txt
The text was updated successfully, but these errors were encountered: