Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SARIF export support #3045

Closed
dector opened this issue Sep 1, 2020 · 53 comments
Closed

SARIF export support #3045

dector opened this issue Sep 1, 2020 · 53 comments
Assignees
Milestone

Comments

@dector
Copy link
Contributor

dector commented Sep 1, 2020

GitHub code scanning allows uploading SARIF (Static Analysis Results Interchange Format) files.

Kotlin language is not yet supported in current (beta) version. But it would be nice to be prepared in advance. :)

Useful resources:

@BraisGabin
Copy link
Member

I didn't know about Sarif it seems like a report format that we should support.

@schalkms
Copy link
Member

schalkms commented Sep 1, 2020

Yeah, I agree. That's another useful report format.
@dector do you have resources to help us here?
The docs would be a good starting point.
https://detekt.github.io/detekt/extensions.html#custom-reports

@dector
Copy link
Contributor Author

dector commented Sep 1, 2020

@BraisGabin That's Microsoft-developed format and they used it in GitHub code scan. It's JSON-based and seems quite simple.

@schalkms Added useful links.

@dector
Copy link
Contributor Author

dector commented Sep 1, 2020

I might prepare PoC as a topic-started but not in nearest week or two. But I guess we have time still.

@dector
Copy link
Contributor Author

dector commented Sep 8, 2020

Here some tryouts: https://github.com/dector/detekt/blob/sarif/detekt-report-sarif/src/test/kotlin/io/github/detekt/report/xml/SarifOutputFormatSpec.kt#L68

I guess I can continue with this task piece-by-piece.

Are there some preferences for JSON serialization? Should we use kotlinx.serialization for the sake of multi-platform support?

@arturbosch
Copy link
Member

arturbosch commented Sep 8, 2020

Here some tryouts: https://github.com/dector/detekt/blob/sarif/detekt-report-sarif/src/test/kotlin/io/github/detekt/report/xml/SarifOutputFormatSpec.kt#L68

I guess I can continue with this task piece-by-piece.

Are there some preferences for JSON serialization? Should we use kotlinx.serialization for the sake of multi-platform support?

Ironically one week before you opened this PR, we had a small discussion about supporting sarif format in the company I work at :).

Looks good.
detekt does not yet support relative paths which will be needed to upload such an format to GitHub if I see this correctlycorrectly (#2492).

If kotlinx.serialization is available in mavenCentral lets go with it.

@arturbosch
Copy link
Member

@BraisGabin I think you edited my answer somehow xD

The sarif format allows to add custom meta data about rules. That is good, I would like to remove the txt format and make the sarif format the default for 2.0 with the issue signature as meta data.

@BraisGabin
Copy link
Member

BraisGabin commented Sep 8, 2020

🤦🤦🤦🤦🤦 So sorry! I think that I clicked "Edit" instead of "Quote" and I end up writing my comment as an edit of yours >_<.

I just recovered your previous message. I didn't know that I could edit the messages of other people!

My previous message was:

If kotlinx.serialization is available in mavenCentral lets go with it.

If it's not I recommend moshi. It's really nice (I really don't care).

Just "dreaming": once we have this done we can even split the SARIF mappers from detekt. The mapper could be useful for ktlint too (for example).

@dector
Copy link
Contributor Author

dector commented Sep 8, 2020

Awesome! Thanks @arturbosch and @BraisGabin !

I thought about kotlinx.serialization because of it's multiplatform nature (e.g. will not create additional friction on building detekt as a native binary tool in the future). But I'm happy to use any good tool (including Moshi) for v1.

Just "dreaming": once we have this done we can even split the SARIF mappers from detekt. The mapper could be useful for ktlint too (for example).

I can implement SARIF library as a separate project - not a problem at all. So it'll be easier to reuse it in other tools too.

@drewhannay
Copy link

@dector it would be great if we had a way to generate Kotlin SARIF model bindings directly from the JSON schema

I've been looking into this as well, but it doesn't seem like there are any great bindings yet. SpotBugs added a SARIF report with some Java bindings, so that might be worth looking at. Ideally the SARIF team would add an "official" Java or Kotlin set of bindings.

@dector
Copy link
Contributor Author

dector commented Sep 8, 2020

it would be great if we had a way to generate Kotlin SARIF model bindings directly from the JSON schema

I agree. However (and most probably) Kotlinizing generated binding will require additional effort as there are no ready-to-use generators for this.

Ideally the SARIF team would add an "official" Java or Kotlin set of bindings.

Unfortunately Microsoft is here at it's best. 😞 There are few different tools for C# but no implementations for other languages. There are numerous mistakes in specification examples. Existing online tools are rough, buggy and have very limited functionality. :)

@schalkms
Copy link
Member

schalkms commented Sep 8, 2020

Thanks for the productive discussion, guys! 👍

Are there some preferences for JSON serialization? Should we use kotlinx.serialization for the sake of multi-platform support?

Definitely, we should choose a mature and maintained package. kotlinx.serialization would in my opinion fulfill this need.

I can implement SARIF library as a separate project - not a problem at all. So it'll be easier to reuse it in other tools too.

I would prefer using the detekt report API for the first prototype. I don't think we should make it more complex than necessary and dependent on other packages.

@greysteil
Copy link

greysteil commented Sep 22, 2020

👋 Weighing in from the GitHub code scanning team (I'm one of the PMs responsible for what we're doing in the space).

We'd love to support displaying results from detekt within code scanning, and to include detekt as a card in the code scanning setup experience (that link will only work for folks with write access to this repo). If it's helpful I can ask the team on our side to answer any questions you have on what's required - the simple answer is support for outputting SARIF and a version of the detekt-all Action that uploads results to code scanning.

Thanks for all your work on detekt!

@BraisGabin
Copy link
Member

Write access or administration access? I have write access but I get a 404

@greysteil
Copy link

Updated to actually be the right link! 🤦

@arturbosch
Copy link
Member

Hm, I'm still seeing a 404 for https://github.com/detekt/detekt/security/code-scanning/setup

@greysteil
Copy link

Ah, I know what this is - we’re still in beta so need to flip the feature flag for this repo. One moment

@greysteil
Copy link

You should have access now.

@arturbosch
Copy link
Member

You should have access now.

Thanks it works for me now.

2020-09-24-212417_590x291_scrot

I've found https://help.semmle.com/codeql/codeql-cli/reference/sarif-overview.html which seems to describe the tiny part of sarif which is required to get this to work :).

I try to take a look at https://github.com/dector/sarif-kotlin from @dector in the next days.

@jhutchings1
Copy link

@arturbosch Excited to see Detekt adding SARIF support! A couple of resources for you as you plan your integration:

Ping us if you encounter any issues!

@arturbosch
Copy link
Member

I've extended @dector repository with model generation based on the json schema - https://github.com/arturbosch/sarif-kotlin.

However I'm unsure about sarif's licensing - https://github.com/oasis-tcs/sarif-spec/blob/master/LICENSE.md.

Therefore I have a script to download the json schema on demand to generate the models.
Also I'm not sure if it's licencing allows to generate models by Non-OASIS/Microsoft members.

@greysteil
Copy link

Hmmm, that license isn't as clear as it could be. The readme says the intent is that everything in that repo should be free for anyone to use, but it looks like OASIS could be clearer on the legals.

@lcartey - you were a co-author on the SARIF spec, anything we can do to make this more straightforward, and to give @arturbosch and the team certainty they're not violating any licenses? Also cc @tom-corbett in case he can help.

@jhutchings1
Copy link

cc: @michaelcfanning as well

@arturbosch
Copy link
Member

Hey @dector,
are you interested in maintaining the sarif library in the long term? I could create a PR with my additions.

My current changes include generating Java models based on the json scheme and use moshi as the json writer.
To meet the requirements of plain Java developers, I moved Kotlin to the test dependencies. No extra Kotlin deps needed ;).

If you are okay with it, I would move my fork to the detekt organization and release an initial version under the detekt umbrella.

@dector
Copy link
Contributor Author

dector commented Oct 3, 2020

@arturbosch
I guess SARIF support should be part of the detekt family! 😃 However, I though that it should be Kotlin-first library. SARIF team is planning Java support (their website says "Coming soon"). So first-class Kotlin support (as I see it) includes:

  • no guarantee for Java compatibility;
  • nice Kotlin DSL;
  • KMP targeting (in the future) - this implies kotlinx.serialization usage.

I clearly see the point of generating POJOs on the first stages but it would be nice to move to Kotlin-first library with nice DSL later. Very interested to hear what do you think about that.

@michaelcfanning
Copy link

Hello, very interested to see this conversation in flight. The SARIF OASIS JSON schema is absolutely available for arbitrary use. I will ask @lgolding, SARIF project editor, to help make this more explicit in that repo.

We definitely appreciate the feedback on spec bugs. Separately, we have a new effort to provide SARIF assistance in a more practical, consumable form, this content is at https://github.com/microsoft/sarif-tutorials. It is very much a work in progress, but @lgolding is very responsive to specific requests for samples/guidance if you have them.

In terms of Java OM, historically, I have heard engineers have used jsonschema2pojo successfully against the SARIF schema (which again is free to use in this way or any other application). Sounds like @arturbosch has done something similar as a step in his process. We have not shipped the output of this tool, thinking that an actual Java SDK for SARIF probably would need more substance to it.

@lgolding and I are available at any time to review/comment on prospective SARIF, or answer any questions at all. If anyone would like to make email contact, my ms alias is mikefan.

@ghost
Copy link

ghost commented Dec 10, 2020

@michaelcfanning @eddynaka (handling this going forwards, AFAIK) -- FYI

@josepalafox
Copy link

josepalafox commented Dec 11, 2020 via email

@swinton
Copy link

swinton commented Dec 11, 2020

1.15.0-RC1 supports sarif. It will be great if you can test it in some use cases and give feedback.

Trying to test this over here.

Could someone provide some directions on how to generate the SARIF report? I tried the following after building the .jar from https://github.com/detekt/sarif4j, but no luck, I'm only getting the plain text report 🤔

detekt --plugins detekt-report-sarif.jar --report sarif:reports/detekt.sarif

@arturbosch
Copy link
Member

arturbosch commented Dec 11, 2020

Hi @swinton,

./detekt-cli/build/run/detekt -i ~/git/repos/kutils/ -p detekt-report-sarif/build/libs/detekt-report-sarif-1.15.0-RC1.jar -r sarif:sarif.json works fine for me for relative paths and -r sarif:/home/artur/sarif.json for absolute ones.

You mentioned the jar from https://github.com/detekt/sarif4j . sarif4j is bundled inside detekt-report-sarif so you have to referene this jar with the --plugins flag.

Hope it helps. I will test it asap with the published jar and edit my answer.

Edit:

Please try

  1. Download https://github.com/detekt/detekt/releases/tag/v1.15.0-RC1 (detekt-all.jar)
  2. Download https://mvnrepository.com/artifact/io.gitlab.arturbosch.detekt/detekt-report-sarif/1.15.0-RC1
  3. Edit paths and run: java -jar Downloads/detekt-cli-1.15.0-RC1-all.jar --plugins Downloads/detekt-report-sarif-1.15.0-RC1.jar --input ~/git/repos/kutils/src/ --report sarif:/home/artur/test.sarif.json

@swinton
Copy link

swinton commented Dec 15, 2020

Thanks so much for the pointers, @arturbosch, I was able to generate a SARIF report:

https://gist.github.com/swinton/43abb47ab719e7c3d67f789115defcff

JFYI here's how the report looks once loaded into GitHub:

We're not able to include the preview since (I believe) the results use absolute paths instead of relative paths, but looks like relative path support may be on the way?

@chao2zhang
Copy link
Member

@swinton Thank you for testing and reporting this. @arturbosch I can continue your work if you do not have free cycles.

@arturbosch
Copy link
Member

Sure, if you have some spare time in the next days you can take a look at #2492.
I think the PR is a good starting point. Feel free to @mention me in a draft.

arturbosch added a commit that referenced this issue Dec 21, 2020
* Support sarif as a report type - #3045

* Integrate sarif feedback from @lgolding

* Test the whole sarif report instead of some json paths

* Provide only the short description; we do not have access to the long one programmatically

* Use the plain rule id as sarif rule name; the rule set id is encoded in the sarif rule id

* Remove need for casting by using a when expression
@chao2zhang
Copy link
Member

chao2zhang commented Dec 22, 2020

@swinton According to Github Docs, I did not find any official documentation of uploading multiple .sarif files. Is it officially recommended that we merge all .sarif files into one?

Update: Never mind. I found github/codeql-action#220 where it currently supports a single file or the files under a directory

arturbosch added a commit that referenced this issue Jan 16, 2021
* Support sarif as a report type - #3045

* Integrate sarif feedback from @lgolding

* Test the whole sarif report instead of some json paths

* Provide only the short description; we do not have access to the long one programmatically

* Use the plain rule id as sarif rule name; the rule set id is encoded in the sarif rule id

* Remove need for casting by using a when expression
@TWiStErRob
Copy link
Member

TWiStErRob commented Oct 3, 2021

Silly question, and probably answered from one of the links in the long conversation above:
As an end user of Detekt, how do I use/benefit from this Sarif format support?

@josepalafox
Copy link

Hi @TWiStErRob at least one use is that GitHub accepts sarif reports and will produce inline pull request comments for errors identified by detekt.

If you navigate to a public repo you own, click security in the top ribbon, then click code scanning/set up code scanning you'll see detekt listed. From there you can use the provided workflow to set up detekt to run during on pull requests for the project and you'll get alerts directly in the GH UI.

@chao2zhang
Copy link
Member

You may refer to https://detekt.github.io/detekt/reporting.html#integration-with-github-code-scanning to learn more about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests