Skip to content

Semgrep: severity categories & database storage#93

Merged
pessi-v merged 10 commits into
mainfrom
semgrep-database-input-and-severity-grading
Nov 5, 2025
Merged

Semgrep: severity categories & database storage#93
pessi-v merged 10 commits into
mainfrom
semgrep-database-input-and-severity-grading

Conversation

@pessi-v
Copy link
Copy Markdown
Collaborator

@pessi-v pessi-v commented Oct 29, 2025

Adds the following functionality:

  • 3 categories of severity for semgrep rules
  • highlighting type in vscode depends on severity
  • semgrep results are stored in the database
  • database entries are shown in the side panel
  • it's possible to delete individual results

@pessi-v pessi-v changed the title update semgrep rules for their severity Semgrep: severity categories & database storage Oct 30, 2025
import * as vscode from "vscode";
import * as path from "path";
import { getSemgrepDataService } from "./semgrep-integration";

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an own tree provider? How is the UX?

@grrrau ? do you have a screenshot?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeeling this might all be duplicating already existing functionality partly?

Also is it missing test coverage?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-10-30 at 16 27 15 Screenshot 2025-10-30 at 16 27 28 Screenshot 2025-10-30 at 16 28 40

@suung @pessi-v sketched that, but not finished yet

Copy link
Copy Markdown
Collaborator

@suung suung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check if we really need a new tree provider, how UX is affected, how we can keep things a bit DRY and test coverage

@grrrau
Copy link
Copy Markdown
Collaborator

grrrau commented Oct 30, 2025

@pessi-v we had noted down that there were only 2 categories of severity for the semgrep rules, so the sketches only show two variants (those coloured o). just noting.

i'd be happy to guide us through the prototypes. (would help me!)

@grrrau
Copy link
Copy Markdown
Collaborator

grrrau commented Oct 30, 2025

thinking: shall we consider the deleting of db entries and add it to the functionalities too?

as semgrep results are stored in the database

as a user i

  • get a linting notification on the side panel (sidebar), connected to a file, and specific lines of code

  • see information on severity, as a visual item (color code)

  • can navigate to the lines of code within the specific file

  • see the lines of code highlighted according to severity (color code)

  • i see the feedback on the highlighted lines , and

a - i change the code // linting runs again
b - i ignore the feedback

What then?

a

  • do we create a new storage entry, to keep some sort of history? (that's the date we could use for imporvement graphs, right?)
  • do we delete whatever was saved to the storage and add no new entry?
  • what does user see? nothing, solved, all good!
  • what does the user see? a simply tiny dialog, not intrusive (yay, you went greener / or just some temporary thumbs up or similar next to the carbonara icon on the status bar)

b

  • do we allow users to delete linting notifications from the side bar, when they decide to ignore them?
  • do we show them more silently if nothing is done after [x] days?

@grrrau
Copy link
Copy Markdown
Collaborator

grrrau commented Oct 30, 2025

@suung what do you mean with 'new' tree provider?

despite my thinking which can be moved elsewhere, and the answer to the tree question, what else is needed here to
make this pr move forward?

@pessi-v
Copy link
Copy Markdown
Collaborator Author

pessi-v commented Oct 30, 2025

@pessi-v we had noted down that there were only 2 categories of severity for the semgrep rules, so the sketches only show two variants (those coloured o). just noting.

i'd be happy to guide us through the prototypes. (would help me!)

True! I discovered afterwards there was a third category too

@pessi-v
Copy link
Copy Markdown
Collaborator Author

pessi-v commented Oct 30, 2025

@suung what do you mean with 'new' tree provider?

despite my thinking which can be moved elsewhere, and the answer to the tree question, what else is needed here to make this pr move forward?

For the side panel/tree view there is a separate file for the code scan (the "provider"). I think it makes sense to have it in a separate lines, since it's 370 lines and quite specific to showing the semgrep results.

@pessi-v
Copy link
Copy Markdown
Collaborator Author

pessi-v commented Oct 30, 2025

thinking: shall we consider the deleting of db entries and add it to the functionalities too?

as semgrep results are stored in the database

as a user i

* get a linting notification on the side panel (sidebar), connected to a file, and specific lines of code

* see information on severity, as a visual item (color code)

* can navigate to the lines of code within the specific file

* see the lines of code highlighted according to severity (color code)

* i see the feedback on the highlighted lines , and

a - i change the code // linting runs again b - i ignore the feedback

What then?

a

* do we create a new storage entry, to keep some sort of history? (that's the date we could use for imporvement graphs, right?)

* do we delete whatever was saved to the storage and add no new entry?

* what does user see? nothing, solved, all good!

* what does the user see? a simply tiny dialog, not intrusive (yay, you went greener / or just some temporary thumbs up or similar next to the carbonara icon on the status bar)

b

* do we allow users to delete linting notifications from the side bar, when they decide to ignore them?

* do we show them more silently if nothing is done after [x] days?

Results can already be deleted one by one, or all at one time. I forgot to add it in the intro!

For the improvement tracking, maybe we can store the state of semgrep results when user's branch is merged or something.

@suung
Copy link
Copy Markdown
Collaborator

suung commented Oct 30, 2025

@suung what do you mean with 'new' tree provider?

despite my thinking which can be moved elsewhere, and the answer to the tree question, what else is needed here to make this pr move forward?

We have the data tree provider which lists in the sidepanel all data from the database

Could it be reused?

Re deleting, i would almost say deletion could be an own ticket (also later) and we might want to consider soft deleete?

@suung
Copy link
Copy Markdown
Collaborator

suung commented Oct 30, 2025

@suung what do you mean with 'new' tree provider?
despite my thinking which can be moved elsewhere, and the answer to the tree question, what else is needed here to make this pr move forward?

For the side panel/tree view there is a separate file for the code scan (the "provider"). I think it makes sense to have it in a separate lines, since it's 370 lines and quite specific to showing the semgrep results.

@pessi-v ok if thats so, then i think it would be best practice to generalize it to an extent, i had the feeling there is just quite some code that is redundant

if they are both subtypes then it could be good to generalize it, consider a base class or something..

the other question is: do we really want an own panel? I am not against it, but do we want to add more and more to the sidebar? @pessi-v @grrrau

Comment thread packages/core/src/data-service.ts Outdated
const filePath = actualFilePath || match.path;

this.db.run(
`INSERT INTO semgrep_results (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this more generic?

Reasoning: Other tools will also create code hotspots, not just semgrep

The existing schema should be used as mach as possible.

https://github.com/climateandtech/carbonara/pull/57/files#r2479231157

here we store with a data type (based on the tool) in our schema existing schema and it does the job

If we don't do it like this, we would need a realationship, reasoning behind this is, so we can tie different analysis to before and after changes...

Maybe it can be done like here, and we just extend the existing schema

https://github.com/climateandtech/carbonara/pull/57/files#r2479231157

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you have another idea how to do before and after checks but for me it feels most reasonable to just keep it in one table, add to the same table also cpu data, datadog data, whaever it is, have different types and have it in an order.

then when we want to do an experiment (perform a change and see if it improves emissions) this is just a a type of event going to the same table.

Would be great if you could think over it and see if that would work aswell.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have some code how to get the highlighting data from the database https://github.com/climateandtech/carbonara/pull/57/files#r2479265712

co2_variables: any;
}

export interface SemgrepResultRow {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For severities i introduced here a mapping

https://github.com/climateandtech/carbonara/pull/57/files#r2479263089
https://github.com/climateandtech/carbonara/pull/57/files#r2479274263

The reasoning behind is, that different tools could introduce different severity labels that we want to map to our display.

So we move the mapping to the display. You changed it now in the semgrep rules which of course works for this case, I would do this with a mapping, it's in doubt easier and more flexible and more importantly we can integrate other tools that have their own labels
https://github.com/climateandtech/carbonara/pull/57/files#r2479274263

I suggest you check if you can adapt an approach like this

CodeScanItem | undefined | null | void
> = new vscode.EventEmitter<CodeScanItem | undefined | null | void>();
readonly onDidChangeTreeData: vscode.Event<
CodeScanItem | undefined | null | void
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding test coverage, there is two things i would see

@pessi-v pessi-v force-pushed the semgrep-database-input-and-severity-grading branch from 21e50c6 to 46884a5 Compare November 5, 2025 12:35
@grrrau
Copy link
Copy Markdown
Collaborator

grrrau commented Nov 5, 2025

@pessi-v we had noted down that there were only 2 categories of severity for the semgrep rules, so the sketches only show two variants (those coloured o). just noting.

True! I discovered afterwards there was a third category too

@pessi-v back to the categories. which are they at the moment? do they have names? how are they being shown and differentiated? i still did not move forward with the screens, since the last drafts in which i show only 2, differentiated with colours. i would like to look into revising that.

@grrrau
Copy link
Copy Markdown
Collaborator

grrrau commented Nov 5, 2025

@pessi-v i think i found it:

type - badge:
error - 🚨
warning - ⚠️
info - ℹ️

is that up to date?

question:

  • what is an error? and does it always provides a solution = actionable?
  • what is a warning? does it provide feedback? can it too have an actionable?
  • what is info? what makes it different than a warning>

@pessi-v
Copy link
Copy Markdown
Collaborator Author

pessi-v commented Nov 5, 2025

@grrrau going to merge this and open a new PR for further design changes

@pessi-v pessi-v merged commit 4d4e485 into main Nov 5, 2025
2 checks passed
@suung suung linked an issue Nov 10, 2025 that may be closed by this pull request
6 tasks
@suung suung mentioned this pull request Nov 10, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unified code highlighting api

3 participants