Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for vulnerability data to entities model #6875

Merged
merged 16 commits into from Feb 8, 2019
Merged

Conversation

xavierdecoster
Copy link
Member

No description provided.

@joelverhagen
Copy link
Member

Does this really have to be part of the gallery database? A smell to me is that the relationship between a package deprecation and a CVE/CWE is not modeled in the standard way, i.e. a many-to-many relationship with a PackageDeprecationCve table. Instead it is a JSON array.

Copy link
Contributor

@scottbommarito scottbommarito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments. Looks great though!

Please keep this as a separate migration from the initial one--we're deploying the other migration this week (although you can't create any deprecation entities) so we'll need to be able to apply this migration on top of it.

src/NuGet.Services.Entities/CVE.cs Outdated Show resolved Hide resolved

modelBuilder.Entity<PackageDeprecation>()
.HasMany(p => p.CVEs)
.WithMany(c => c.PackageDeprecations);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we verified that when

  • a PackageDeprecation is deleted, it deletes the corresponding rows in the PackageDeprecationCVE/PackageDeprecationCWEs tables?
  • a CVE/CWE is deleted, it deletes the corresponding rows in the PackageDeprecationCVE/PackageDeprecationCWEs tables?

Or is this not the behavior we want to have?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a key insight.

This is quite important because we need to understand what to do when a CVE/CWE is not longer present in the data source or if some "delete" signal is presented to us. I think the easiest approach is to mandate that CVE/CWEs are never deleted. If this is already the case in the data source, then we're fine. Otherwise, we need to verify that it's okay for us to keep a deleted CVE/CWE in our database forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assume these ID's are never hard-deleted, although I couldn't find an official statement on that.

In general, MITRE will maintain CWE as long as it serves the community to do so.

As for CVE IDs, these will be marked with a "reject" state and the "description" field will be labeled as such in the data source, according to this source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified delete behavior:

  • when deleting a PackageDeprecation, it properly deletes the corresponding rows in the PackageDeprecationCwe and PackageDeprecationCve tables, but doesn't delete the CWE or CVE record.
  • when deleting a CVE or CWE, it properly deletes the corresponding rows in the PackageDeprecationCwe and PackageDeprecationCve tables, but doesn't delete the PackageDeprecation record.

However, we should never delete CVE or CWE records (adding a Status column to those entities for when a CVE/CWE gets rejected after publication, so we can filter them out).

@joelverhagen
Copy link
Member

To be clear, I am not against this stuff being in the gallery DB, but I wanted to make sure we evaluated other options like an independent database+service or another storage mechanism like Azure Search or in-memory JSON.

@scottbommarito
Copy link
Contributor

@joelverhagen - with this PR we are now modeling "deprecation to CVE/CWE" in a proper many-to-many relationship table and no longer a JSON array. I agree that we should make sure we do our due diligence in investigating all options, but I think it's a huge benefit to this approach that it allows our gallery database to store proper relationships instead of JSON.

@joelverhagen
Copy link
Member

@scottbommarito, oops, sorry I missed that part. I think think we should understand whether it is necessary to couple CVE/CWE data with gallery data. A JSON array would be appropriate if you want to refer to data in another system (potentially backed by another SQL database).

Remember we put validation data in its own database. The idea there was that there is some state about validation that gallery will never care about. In the CVE/CWE case, another database would allow us to scale independently and keep details like ingestion state/etags/timestamps/throttles in tables that have nothing to do with gallery DB.

Again, I don't feel strongly but we should give all of these a approaches an honest consideration properly weighting maintenance costs of another DB with the benefit of decoupling unrelated data.

Copy link
Contributor

@skofman1 skofman1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@xavierdecoster
Copy link
Member Author

For CVEs, I added a Boolean column Rejected to the entity model, to indicate whether a CVE has been rejected after it was first published (a scenario we may encounter during data refresh, in which case we flag the CVE but don’t delete it - aka "soft-delete").
CVEs available in the NVD database may have another status, but except for REJECT, I don't think we care, so this is all we need to cover for. (this is in fact a “soft-delete” in the NVD source data feed)

I noticed CWEs may have a varying status in the source data as well.
The ‘Development View’ has three types of CWEs in the latest data set:
• Incomplete
• Draft
• Usable

Looking at the StatusEnumeration spec, it’s a bit unclear whether we should reduce the dataset or not…
As the external data source seems to simply list them all, no matter their status, some of these may be used by the public even though they have a status of incomplete or obsolete…
The status enumeration may be subject to change, but it appears that CWE is also taking the same approach of “soft-deleting” CWEs by marking them as Deprecated.
Which leads me to think we don’t care about CWE status at all, except for “Deprecated”, which could be an additional column on the CWE entity (similar to the “Rejected” column on CVE).

From usability point of view, we simply need to be able to hide the CVE/CWE items that are unpublished, without deleting potentially existing references to PackageDeprecation(s). Having a simple Boolean instead of an actual status column that is subject to change seems like the simplest approach here.

Any thoughts? @scottbommarito @skofman1 @anangaur @joelverhagen @loic-sharma

@joelverhagen
Copy link
Member

Any thoughts?

Perhaps we can just have a Listed boolean on both tables which controls visibility and this value gets populated by any column or combinations of columns on Cve and Cwe? The reason I think this is beneficial is that it moved the filtering-out logic to the actor that populates the table instead of the temptation of having some complex or domain-specific filtering in the gallery. Gallery could just query for Listed = true .

@xavierdecoster
Copy link
Member Author

@joelverhagen that's exactly what I'm aiming for! I could name them both Listed.

@scottbommarito
Copy link
Contributor

@joelverhagen - I think whether or not Listed is a good idea depends on the design of the job that inserts into the table. If we ever change the definition of what should show in the table or not, the actor that inserts into the table will have to be updated to reprocess existing entries to change the listed property or a manual query will need to be run. I would steer away from any approach that requires manual queries.

I think regardless of this, we should store all information that would help us determine whether or not the CVE/CWE is relevant, e.g. StatusEnumeration for CWE, to prevent needing to fetch the entire set of CVEs/CWEs again just to update the listed status.

@xavierdecoster
Copy link
Member Author

xavierdecoster commented Feb 7, 2019

@scottbommarito the data update process for CWE requires us to fetch the entire data set every time anyway... (a CSV file download), so if Status changed, it will be processed (and if visibility logic needs changing, the job can be updated, and it will then be processed on the next iteration).

For CVE, there's a Modified endpoint we can use to query for changes and update only those records.
The CVE data feeds do NOT expose a separate Status column. Instead, they use some prefix in the Description field (e.g. ** REJECT ** Description or ** DISPUTE ** Description). Which means the Description will change and be processed accordingly as well. We already store the descriptions for CWE and CVE in the database, so the status is part of the data.

No matter what changes to status or visibility, if the initial import considered the data as relevant at that time, it's going to remain relevant forever (only soft-deletes allowed - aka unlisting).
Once a CWE or CVE ID has been published in the external data sources, it's out there forever too, no matter the status. They only recommend us to hide Rejected/Deprecated status.

@scottbommarito
Copy link
Contributor

@xavierdecoster - perfect. Listed sounds great to me!

@xavierdecoster
Copy link
Member Author

Almost there. May need to track CVSS scores provided with CVE data. Once confirmed, I'll add the property and update the migration, and merge this PR.

@xavierdecoster xavierdecoster merged commit 1d1f3e6 into dev Feb 8, 2019
@xavierdecoster xavierdecoster deleted the dev-issue6874 branch February 8, 2019 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants