Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computable open-source package information #86

Closed
rsc opened this issue Jul 20, 2021 · 14 comments
Closed

computable open-source package information #86

rsc opened this issue Jul 20, 2021 · 14 comments
Assignees

Comments

@rsc
Copy link
Contributor

rsc commented Jul 20, 2021

Background

The OSV schema has been adopted by Go, OSV, Python, Rust, and UVI to describe vulnerabilities in open-source software. The OSV schema’s key advantage over the CVE format is that it identifies the specific affected packages and versions in a precise, computable way.

For example, suppose we wanted to check whether a particular software package, as described by an SBOM, made use of any open-source components with known vulnerabilities. An SBOM for a given package ecosystem would be a list of its packages and versions. A tool can test whether each SBOM entry is affected by a database entry written to the OSV schema, without any additional information (such a version or commit graph or access to the repository containing the source code for the open-source software). This is what we mean when we say the package and version identification is computable.

We propose that the new CVE JSON schema be changed to make its package and version identification computable too. This would make it possible for vulnerability-checking tools to check SBOMs against the CVE database as easily as they can currently check SBOMs against OSV-schema databases. Adjusting the CVE JSON schema would also allow OSV-schema databases to embed their information into CVE format, allowing all their vulnerability information to be pushed upstream to the CVE database and then propagated to any CVE-aware software, a net benefit for the entire software ecosystem.

This issue focuses on computable package identification. See issue #87 for computable version identification.

Computable package identification

The lack of computable package identification was also raised in issue #79. In that discussion, it was suggested to use the combination of collectionUrl and packageName as a precise identifier. This could be sufficient, provided each ecosystem publishes the exact spelling of its standard “collectionUrl” and the syntax of its “packageName”. To avoid misspellings and other problems, ideally there should be a canonical list of “collectionUrl” values, or a canonical list of links to the pages where ecosystems have defined their own “collectionUrl” and “packageName” syntaxes. (Presumably there is an equivalent list of canonical “vendorName” values.)

It is unclear, however, why the collectionUrl and packageName are nested under the “vendor” and “product” keys. What would it mean for the same collectionUrl/packageName to appear with different “vendorName” or “productName” values? The packager’s URL should be sufficient to identify a collection of packages. “Vendor” and “product” are attributes that make sense for commercial software identified by a plain-language name, but not for URL-scoped open-source software.

If CVE is to support robust open-source vulnerability tooling, it should name the packages clearly and simply. To correct the scoping problem, the top-level structure of the “affected” object needs to be changed to not be so vendor-centric.

One possible solution would be to simplify affected > vendors > products and affected > cpes nesting down to just “affected”. That is, replace:

{
  "affected": {
    "vendors": [{
      "vendorName": string,
      "products": [{
        "productName": string,
        "modules": [string],
        "programFiles": [string],
        "programRoutines": [string],
        "packageName": string,
        "collectionURL": string,
        ...
      }]
    }],
    "cpes": [...],
  }
}

with

{
  "affected": [{
    "vendorName": string,
    "productName": string,
    "packagerUrl": string,
    "packageName": string,
    "cpe": string,
    "modules": [string],
    "programFiles": [string],
    "programRoutines": [string],
    ...
  }]
}

Each affected object would be required to have at least one of (1) vendorName and productName, (2) packagerUrl and packageName, or (3) cpe. It would be fine to list more than one of these if there are multiple clear ways to identify the package, but open-source vulnerability scanners would use (2).
We have renamed collectionUrl to packagerUrl to make the connection to packageName clearer.

Of course, there may be other ways to present this data. For example, perhaps it would make the requirements clearer to group the vendor and open-source package info as in:

{
  "affected": [{
    "product": {
      "vendorName": string,
      "productName": string,
    },
    "package": {
      "packagerUrl": string,
      "packageName": string,
    },
    "cpe": string,
    "modules": [string],
    "programFiles": [string],
    "programRoutines": [string],
    ...
  }]
}

and then the requirement would be more simply stated as “at least one of product, package, or cpe must be present.”

Another possibility would be to say that each packager it itself a vendor, but that still leaves the question of ensuring that open-source packagers have a canonical identification, as well as what “productName” means versus “packageName”.

This general topic is also raised by #70, #78, and #79. What is important is that the CVE schema make clear how to write and access a record that treats the combination of packager URL and package name as the unique identifier for an open-source package.

I would be happy to prepare a PR if there is consensus here on the general direction of the path forward.

@chandanbn
Copy link
Collaborator

Yes, we need a flat listing of affected "things" that is like a SQL table easier for joining with another flat list of inventory from SBOMs.

The current vendor-product hierarchy is meant to be a concise/compact "pivot table" representation of the same flat list (but becomes tabular at version level).

Is there any information loss between the two representations?
Aren't they interchangeable (pivot -> unwind -> tabular) or (tabular -> group-by -> pivot) ?

I feel we need both (even at the cost of being redundant duplicating the information):

  • a flat tabular data - easier for SQL-leaning tools to generate and consume.
  • pivoted compact format for human data entry (eg., in a web form) and human comprehension (displaying on CVE entry view page), easier for NoSQL-leaning tooling.

What would it mean for the same collectionUrl/packageName to appear with different “vendorName” or “productName” values?

That would be a mistake, but the problem exists with both tabular and pivoted representations. Is there a way to enforce this consistency in the schema definition?

How flat do we need this tabular representation? Should all the arrays (files, modules, platforms) be unwinded too?

vendor product version platform programFile collectionURL packageName module
Acme Explorer 1.0 Android url.c https://play.google.com acme-explorer URL parser
Acme Explorer 1.0 Android idn.c https://play.google.com acme-explorer URL parser
Acme Explorer 1.0 iOS url.c https://www.apple.com/app-store acme-explorer URL parser
Acme Explorer 1.0 iOS ios-idn.c https://www.apple.com/app-store acme-explorer URL parser

@rsc
Copy link
Contributor Author

rsc commented Jul 22, 2021

That would be a mistake, but the problem exists with both tabular and pivoted representations. Is there a way to enforce this consistency in the schema definition?

A variant of the second form I suggested might solve this, assuming we agree that there are three possible ID forms - vendorName/productName, packagerUrl/packageName, cpe. Then there can be an id list and as many of those forms can be listed as are necessary:

{
  "affected": [{
    "id": [{    // each entry in the list of ids is a oneOf(product, package, cpe)
      "product": {
        "vendorName": string,
        "productName": string,
      },
      "package": {
        "packagerUrl": string,
        "packageName": string,
      },
      "cpe": string,
    }],
    "modules": [string],
    "programFiles": [string],
    "programRoutines": [string],
    ...
  }]
}

But maybe there's a bug here or I misunderstand the goal. I'm not a JSON schema expert. :-)

@chandanbn
Copy link
Collaborator

AI: refactor the required fields.
AI: add more guidance/examples for field elements.
Ai: reorganization of fields.

@mprpic
Copy link
Contributor

mprpic commented Jul 26, 2021

+1 for a more flat structure of the affects information (I particularly like the first suggested variant). It would solve the disassociation of CPE information from the product/package attributes as noted in #41 and #43.

Also, the use of this schema is to represent content in unambiguous machine-readable format, not to make data entry easier. That's a job for the specific tools that create records using this schema.

@chandanbn
Copy link
Collaborator

Assuming that most CVEs are identified in a single vendor's single product, I agree the grouped vendor-product representation has no advantage in most cases.
As long as the ask is not to have one object per version or versionGroup (aka branch) I feel we are fine dropping the vendor-product grouping - remember it is much worse in JSON version 4 :-).

How about?

{
  "affected": [{
    "vendor": string,
    "product": string,
    "collectionUrl": string,
    "packageName": string,
    "cpes": [string],
    "modules": [string],
    "programFiles": [string],
    "programRoutines": [string],
    "platforms": [string],
    "versions": [{ version object }]
  }]
}

The CPE entries can be a flattened array with entries per affected version per platform (== CPE's edition).

For random open source projects/packages in a packaging ecosystem vendor should typically capture the account name of the user or project that owns/manages the package and product capture what ever human readable name or title they call the work.

If names of the affected modules, files or program routines differ between versions of the same product, then it should be expressed with different objects.

@ElectricNroff
Copy link

This reorganization of vendor information is problematic because it may disrupt expected use cases for vendor ADPs.

Suppose that the OSS-Assess organization is a CNA. It has some type of automated vulnerability identification technology (might be static or dynamic analysis). It produces CVE Records that assert the presence of a vulnerability in both specific upstream versions of an OSS product, and in specific packages from Linux distributions. The technology is good but imperfect. It has frequent bug fixes and runs continuously, and thus it sometimes updates its CVE Records to have different assertions (e.g., to add new affected items). In some cases, it covers OSS products or Linux distributions whose maintainers don't actively cooperate with the OSS-Assess effort (e.g., they don't have time or don't feel it's important enough).

In a future world (2022 or later), many Linux distributions are ADPs, and their respective scopes are limited to their own packages. A typical distribution does not create ADP container data for every applicable CVE Record (i.e., ones from the OSS-Assess CNA that assert that that distribution has a vulnerable package), but does sometimes create ADP container data (e.g., when they plan to publish a security advisory and/or when they believe that the OSS-Assess data is not fully accurate). Any information that the distribution produces will go into their own ADP container - they have a legitimate reason for not contacting the CNA to ask for corrections of the CNA container (e.g., this is too inefficient, or the CNA simply is not interested in manually curated data).

Here, PotatoLinux is a Linux distribution (and an ADP), and has packages for the products named OpenBanana, BananaPeel, and BananaSplit. PotatoLinux does not own the OpenBanana, BananaPeel, and BananaSplit upstream products, and PotatoLinux does not have the resources to become a CNA for those products.

In JSON 5.0 today, the OSS-Assess CNA produces this for CVE-2023-0001:

"affected": {
  "vendors": [
    {
      "vendorName": "PotatoLinux",
      "products": [
        {
          "productName": "OpenBanana",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "8.4-3"
            },
            {
              "versionAffected": "=",
              "versionValue": "8.6+c4_f"
            }
          ]
        },
        {
          "productName": "BananaPeel",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "1.1-4"
            },
            {
              "versionAffected": "=",
              "versionValue": "1.1-5"
            }
          ]
        }
      ]
    },
    {
      "vendorName": "OpenBanana Foundation",
      "products": [
        {
          "productName": "OpenBanana",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "8.4"
            }
          ]
        }
      ]
    }
  ]
}

The PotatoLinux ADP wants to use its ADP container to make a statement with the following semantics: "We understand CVE-2023-0001 and are offering authoritative and final information about the exact set of affected PotatoLinux packages. Anyone who trusts us as an ADP should consider all CNA assertions about CVE-2023-0001 for PotatoLinux to be superseded by our own assertions in our own ADP container. For this CVE Record, our authoritative assertions for our own ADP scope supersede all current CNA assertions, and supersede any CNA assertions that might be added in the future."

With the traditional organization of "vendors" information in JSON 5.0, this can be implemented with a slight change in which a CNA is required to associate a UUID with any assertion about the vulnerability status of another vendor. For example, JSON 6.0 might allow a CNA to publish:

"affected": {
  "vendors": [
    {
      "vendorName": "PotatoLinux",
      "otherVendorAssertionId": "10000000-0000-0000-0000-0000000000000",
      "products": [
        {
          "productName": "OpenBanana",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "8.4-3"
            },
            {
              "versionAffected": "=",
              "versionValue": "8.6+c4_f"
            }
          ]
        },
        {
          "productName": "BananaPeel",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "1.1-4"
            },
            {
              "versionAffected": "=",
              "versionValue": "1.1-5"
            }
          ]
        }
      ]
    },
    {
      "vendorName": "OpenBanana Foundation",
      "otherVendorAssertionId": "20000000-0000-0000-0000-0000000000000",      
      "products": [
        {
          "productName": "OpenBanana",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "8.4"
            }
          ]
        }
      ]
    }
  ]
}

Then, the ADP container might have the following (i.e., the PotatoLinux security team has determined that only OpenBanana packages were affected; no BananaPeel package was affected. Also, 8.6+c4_f was actually not affected, and there are two other affected packages that OSS-Assess missed):

"affected": {
  "vendors": [
    {
      "vendorName": "PotatoLinux",
      "intendedToSupersedeCnaAssertion": "10000000-0000-0000-0000-0000000000000",
      "products": [
        {
          "productName": "OpenBanana",
          "versions": [
            {
              "versionAffected": "=",
              "versionValue": "8.3-8"
            },
            {
              "versionAffected": "=",
              "versionValue": "8.4-2"
            },
            {
              "versionAffected": "=",
              "versionValue": "8.4-3"
            }
          ]
        }
      ]
    }
  ]
}

When an end user wishes to use a CVE Record for automated vulnerability tracking, they might have a configuration file that states that they fully trust the PotatoLinux ADP. If so, then all CVE-2023-0001 vulnerability tracking would proceed exactly as if the CNA container had not mentioned PotatoLinux at all. If the end user's configuration file states that the PotatoLinux ADP is untrusted, or partially trusted, then the CNA container data for PotatoLinux would factor into the vulnerability tracking report.

This is perhaps much harder to implement with the new strategy of:

{
  "affected": [{
      "vendor": string,
      "product": string,
      ...

The CNA could have a UUID for each element of the affected array, e.g.,

{
  "affected": [{
      "vendor": "PotatoLinux",
      "product": "BananaPeel",
      "otherVendorsProductAssertionId": "30000000-0000-0000-0000-0000000000000",
      ...

but this doesn't support the desired semantics of "For this CVE Record, our authoritative assertions for our own ADP scope supersede all current CNA assertions, and supersede any CNA assertions that might be added in the future." In other words, although the PotatoLinux ADP certainly could have their own one-to-one mappings of superseding assertions for each package (e.g., BananaPeel above) that is currently mentioned in the CNA container, there is no way to proactively make one-to-one mappings of superseding assertions for all PotatoLinux packages that might be added to the CNA container in the future.

For example, because the PotatoLinux BananaSplit package is not yet found in the CNA container, there is no precise way for the PotatoLinux ADP to refer to it when superseding. Also, the PotatoLinux ADP may prefer not to assert anything about BananaSplit, because no version was ever affected. Furthermore, additional PotatoLinux packages might come into existence after the CVE Record was first published. It is not expected that an ADP has the resources to continuously, or even promptly, react to record changes made by the OSS-Assess CNA. Also, it is probably not convenient to have superseding assertions apply to a set of strings, e.g., by trying to express that every array element with '"vendor": "PotatoLinux"' has been superseded (e.g., what if the CNA inserts a space sometimes: "Potato Linux"). The PotatoLinux ADP had some time one day to fully research CVE-2023-0001 across its package set, they're done now, and they don't want any ongoing operational costs.

In other words, the product/version data in a vendor's ADP container is typically intended to be a response to the entirety of the CNA's product/version data about that vendor. Thus, there should be a single point within the CNA container data structure that represents that entirety.

More generally, the future evolution of the CVE Program is expected to grow the set of ADPs at the per-vendor level of abstraction. Accordingly, an ADP may have a strong need to refer to a "vendors" element in a CNA container so that their ADP contributions have sensible semantics and are easily used within automation. Consumers often rely on vendors to review all packages and determine whether any are actually not vulnerable for some reason (e.g., on PotatoLinux systems, there is backported code that prevents exploitation, the kernel prevents exploitation, hardened system libraries prevent exploitation, or there was simply an implementation error in the OSS-Assess technology). Also, a large number of end users primarily use Linux distribution packages and only rarely use upstream code directly, and thus accurate information about affected distribution packages is critical to risk management.

If JSON 5.0 is reorganized without a "vendors" key, it is likely to impede the development of future JSON versions that further enhance the important role of vendor ADPs.

@chandanbn
Copy link
Collaborator

I feel a container should allow representation/assertion of facts independently of other containers without hard linking, since containers can change independently. Concepts of extending or superseding (as in object oriented programming) are tough to implement here. If we have to cross-link objects, it can be done at the product level too. I do not see how this limits future improvements to the format.

@ElectricNroff
Copy link

When an ADP places a vendor/product/version data structure in an ADP container, any of these might be meant:

  1. It stands alone as the complete data structure. It must copy all the corresponding data from the CNA container, except for what the ADP disagrees with. (Very easy for consumers; high maintenance cost for the ADP.)

  2. It's a set of additions and deletions, relative to the corresponding data structure within the CNA container at some instant of time. (Usability may require new server-side support; moderate maintenance cost for the ADP.)

  3. It's a set of only additions, relative to the corresponding data structure within the CNA container at some instant of time. (Usability may require new server-side support; lower maintenance cost for the ADP, but may be insufficiently expressive.)

  4. It's a statement that the ADP is claiming control of a data structure subset -- presumably the ADP's own scope -- and that the ADP wants consumers to ignore the corresponding data structure subset within the CNA container. (Fairly easy for consumers; very low cost for the ADP.)

  5. It's ambiguous. The ADP is apparently communicating something, but a consumer -- who believes that the CNA and ADP are each credible -- has no algorithm for combining the corresponding data structures of the ADP and CNA containers. (Very hard for consumers; extremely low cost for the ADP.)

#86 (comment) advocated option 4. There is a new constraint on CNAs: if a CNA container has associated a UUID with a vendor name, and that UUID is used within an ADP container (in the same CVE Record, of course), then the meaning of that UUID within the CNA container must not be changed.

@rsc
Copy link
Contributor Author

rsc commented Aug 7, 2021

It is necessary to define what fields uniquely identify a particular product. In the flattening, the idea is that any of these choices serves to identify a product:

  • the combination of vendor and product
  • the combination of collectionUrl and packageName
  • a CPE (we can ignore this if you like, this is more speculative as I understand it)

I would expect that an ADP publishing a record for {"vendor": V, "product": P, ...} would be considered to have completely replaced any other record for V+P in configurations where that ADP is considered authoritative. I would expect PotatoLinux to issue an ADP section with two records:

{"vendor": "PotatoLinux", "product": "OpenBanana", "versions": [ ... ]}
{"vendor": "PotatoLinux", "product": "BananaSplit", "versions": []}  // no versions listed, so none vulnerable

There would be no need to use UUIDs here.

In the hypothetical, the ADP issuer (PotatoLinux) wants to further issue a blanket assertion about all other possible later records added for PotatoLinux packages, such as BananaCreamPie. I am skeptical about this as a requirement, for two reasons:

  1. I don't understand how PotatoLinux can make assertions about the future. If OSS-Assess finds a different input that does tickle the same vulnerability in BananaCreamPie, in a way that the PotatoLinux response team had not thought to examine, then now the earlier assertion is covering up a real vulnerability. So the blanket assertion seems unsound.
  2. Nothing stops OSS-Assess from creating a new CVE, which the ADP issuer cannot preempt. It seems like only a partial solution to be able to preempt additions of other packages to existing CVEs without being able to preempt new CVEs. So the blanket assertion seems incomplete as well.

If the blanket assertion still needed to be made, it seems that we could still define a product wildcard as in:

{"vendor": "PotatoLinux", "product": "OpenBanana", "versions": [ ... ]}
{"vendor": "PotatoLinux", "product": "*", "versions": []}  // no versions listed, so none vulnerable

and since the ADP would have a match for any PotatoLinux product, the main entries would be ignored, even future ones.

@ElectricNroff
Copy link

Your position seems to be that you somewhat agree with option 4 in #86 (comment) but "claiming control of a data structure subset" would be done separately for each product. If an ADP is hoping to do this "claiming control of a data structure subset" for a vendor's complete set of products, then the answer is either (a) this is not desirable and would not be implemented or (b) it would be implemented through a new wildcarding syntax.

If this is true, then apparently the algorithm (for combining information between one ADP container and the CNA container) is: if V+P appears in at least one ADP item, then the ADP is replacing all CNA items in which V+P appears. For example, this one ADP item:

{"vendor": "PotatoLinux", "product": "OpenBanana", "platforms": "x86", "versions": [ ... ]}

replaces these two CNA items:

{"vendor": "PotatoLinux", "product": "OpenBanana", "platforms": "x86", "versions": [ ... ]}
{"vendor": "PotatoLinux", "product": "OpenBanana", "platforms": "ARM", "versions": [ ... ]}

and thus asserts that the vulnerability doesn't exist on the ARM platform.

This algorithm could be implemented easily, and the design does have sensible semantics.

The most important drawback of this approach is that, because UUIDs aren't used, all participants must use identical strings for the V+P data (or else the algorithm must state that fuzzy matching is expected). In practice, the CVE Program doesn't specify how vendors and products are named. A CNA might have used any number of names with UUID 10000000-0000-0000-0000-0000000000000 above: "PotatoLinux" "Potato Linux" "potato" "Potato Software Foundation, Inc." etc. In general, a CNA is unable to anticipate the vendor name that an ADP may prefer to use, especially when an organization becomes an ADP after the CVE record is published. (This is a completely realistic situation. Historically, one of the common motivating factors for a vendor to become a CVE Program participant is that they noticed another organization publishing inaccurate vulnerability information about their products.)

Admittedly, this could be completely solved at the V+P level, e.g., the CNA publishes the following (as before, a UUID is meaningful only within the context of one CVE Record):

{"vendorProductId": "40000000-0000-0000-0000-000000000000", "vendor": "PotatoLinux", "product": "OpenBanana", "platforms": "x86", "versions": [ ... ]}
{"vendorProductId": "40000000-0000-0000-0000-000000000000", "vendor": "PotatoLinux", "product": "OpenBanana", "platforms": "ARM", "versions": [ ... ]}
{"vendorProductId": "50000000-0000-0000-0000-000000000000", "vendor": "OpenBanana Foundation", "product": "OpenBanana", "versions": [ ... ]}

However, UUIDs at the vendor level of abstraction may be more attractive for several reasons:

  1. It is simpler for ADPs to compose their data, and thus probably more likely that vendors will choose to be ADPs. If UUIDs are for V+P pairs, the ADP may need to correctly understand and copy many UUIDs, and may need to reconcile its own product naming with the naming chosen by the CNA. This can be expensive: in the commercial closed-source world, products suites are often reorganized and rebranded for marketing reasons.

  2. As a policy goal, the CVE Program probably prefers that a vendor ADP take responsibility for publishing information applicable to its entire scope. (It's less useful to take responsibility for some V+P pairs but not others.) This does not mean that the CVE Program believes that any vendor ADP is infallible. The "tickle the same vulnerability in BananaCreamPie" situation can happen. If an ADP makes this BananaCreamPie error too often, the hope is that fewer CVE consumers would trust that ADP. However, the CVE Program probably still wants to offer an easy syntax that is aligned with the policy goal.

  3. Adding wildcard support for product names makes the algorithm harder to implement, and probably makes the CVE Record format harder to understand and use. (For example, the user needs to remember that wildcarding is only for products, and "*" elsewhere in the JSON document is interpreted literally.)

Finally, the naming options of "the combination of vendor and product" and "the combination of collectionUrl and packageName" don't, by themselves, require flattening. There could be a change from:

"vendorName": {
   "type": "string",
   "description": "name of the organization, project, community, or individual that created or maintains this product or hosted service.",
   "minLength": 1,
   "maxLength": 512
}

to

"vendorName": {
   "type": [ "string", "null" ],
   "description": "name of the organization, project, community, or individual that created or maintains this product or hosted service - use null if a name does not exist, is is not needed for identifying the product/service, or is intentionally omitted for other reasons",
   "minLength": 1,
   "maxLength": 512
}

For example, upstream SQLite is a popular product that (at least recently) doesn't want to be considered part of a collection and yet doesn't want to be associated with a vendor name. There are many products that are uniquely identified by collectionUrl/packageName (e.g., for the https://wordpress.org/plugins collectionUrl) and yet do have commercial vendors that may become ADPs.

It's probably very unlikely that these UUIDs (e.g., otherVendorAssertionId or vendorProductId) will be an official part of JSON 5.0. However, keeping the vendorName key (with the unflattened approach) seems to offer the best migration path to future schema versions, in which ADPs are expected to precisely associate their vendor/product/version data contribution with the entirety of a vendorName section of the CNA container.

@rsc
Copy link
Contributor Author

rsc commented Aug 16, 2021

@ElectricNroff, assuming for sake of argument that UUIDs are added at some future date, then no matter what you'd be able to override prior statements about the vulnerability of a given product.

This restructuring removes one possible way to override future statements about the vulnerability of a given vendor's product, but it leaves other possibilities open. So there's really very little actual downside here. The upside is the ability to define clear references to vendor-less products, so that CVE can represent open source projects more clearly and not overfit as much to commercial products.

rsc added a commit to rsc/cve-schema that referenced this issue Aug 16, 2021
- Changed affected from object with array of vendor objects
  with array of products to just plain array of products.
- Added vendor string to product object.
- Renamed productName to product in product object.
- Added cpes array of string to product object,
  replacing affectsCpes inside old affected object.
- Reordered property list in product object
  to put all identifying fields first.
- Changed programRoutines to be array of objects, not array of strings.
- Defined that product object:
  - Requires a product identification, at least one of:
    - vendor and product
    - collectionURL and packageName
    - cpes
  - Also requires versions.

Based on discussion on issue CVEProject#86.
@rsc
Copy link
Contributor Author

rsc commented Aug 16, 2021

Filed #99 implementing @chandanbn's #86 (comment).

@ElectricNroff
Copy link

The downside is that it becomes impossible to construct an exact algorithm for combining the information from the "affected" key in the CNA container and the "affected" key in an ADP container, unless the ADP is willing to express product names in exactly the same way as the CNA expressed them. Historically, nearly all vendor CNAs have wanted CVE Records to use product names that match their current product marketing. It seems very likely the vendor ADPs will want this too.

For example, a CNA includes these two items in its CNA container:

{"vendor": "Open Banana Foundation", "product": "Banana SecureClient", "versions": [ {"number": "12.0.1", "comparison": "=="} ]}
{"vendor": "Open Banana Foundation", "product": "Banana SecureServer", "versions": [ {"number": "12.0.1", "comparison": "=="} ]}

One problem, from the ADP's perspective, is that no product named Banana SecureClient or Banana SecureServer ever existed. Customers could only buy BananaSecure as a whole (with dozens of subcomponents: client, server, proxy, agent, etc.). For BananaSecure (and every subcomponent), version 1.0 was affected and version 1.1 is fixed. Furthermore, 12.0.1 has nothing to do with BananaSecure product versioning: it's the version of the LLVM Compiler Infrastructure that was used to build the product. All of these are realistic types of anomalies that exist in, for example, CNA-LR records. The ADP is not primarily focused on defending against future changes that the CNA may make to this one CVE Record. Also, the ADP is not concerned about the slight possibility that the CNA will, in the future, assign more CVE IDs for exactly the same vulnerability. Instead, the ADP is primarily focused on cleaning up the mess that exists right now.

(As an aside, the existence of ADP containers is not the best possible architecture for all situations. The CVE Program may have chosen a different architecture that offered an effective way to propose and accept Merge Requests for the CNA container. But, as it stands today, the expectation is that an ADP will normally use its own container to offer its own perspective about the correct data.)

What does the ADP do in the flattened approach? Somehow, it wants to express that "Banana SecureClient" and "Banana SecureServer" aren't valid product names, but doesn't want to risk any confusion about the BananaSecure 1.0 client and the BananaSecure 1.0 server actually being vulnerable. Quite possibly, there needs to be a new syntax to negate the erroneous CNA information; also, the ADP may want to split things out at the component level to eliminate the ambiguity, e.g.,

{"vendor": "Open Banana Foundation", "product": "Banana SecureClient", "productStatus": "imaginary", "versions": [] }
{"vendor": "Open Banana Foundation", "product": "Banana SecureServer", "productStatus": "imaginary", "versions": [] }
{"vendor": "Open Banana Foundation", "product": "BananaSecure", "productStatus": "real", "component": "client", "versions": [ {"number": "1.0", "comparison": "=="} ]}
{"vendor": "Open Banana Foundation", "product": "BananaSecure", "productStatus": "real", "component": "server", "versions": [ {"number": "1.0", "comparison": "=="} ]}
{"vendor": "Open Banana Foundation", "product": "BananaSecure", "productStatus": "real", "component": "proxy", "versions": [ {"number": "1.0", "comparison": "=="} ]}
{"vendor": "Open Banana Foundation", "product": "BananaSecure", "productStatus": "real", "component": "agent", "versions": [ {"number": "1.0", "comparison": "=="} ]}
{"vendor": "Open Banana Foundation", "product": "BananaSecure", "productStatus": "real", "component": "library", "versions": [ {"number": "1.0", "comparison": "=="} ]}
etc.

The old approach was much simpler to produce and simpler to understand. The vendor ADP just offers all of the correct data for its own scope, which is (as expected) at the vendor level of abstraction. The algorithm can find the superseded CNA data by matching "vendorName == Open Banana Foundation" or (in future versions of the schema) matching one UUID in one place.

"affected": {
  "vendors": [
    {
      "vendorName": "Open Banana Foundation",
      "products": [
        {
          "productName": "BananaSecure",
          "versions": [
            {
	       "number": "1.0",
               "comparison": "=="
            },

If it's vendor-less (as in the SQLite discussion above), then use:

"affected": {
  "vendors": [
    {
      "vendorName": null
      "products": [

or (if null is unattractive) have an enum to express one of N common reasons for why a vendor name was omitted (e.g., the data provider is declining to enter a vendor name because collectionURL/packageName seemed sufficient, versus there actually isn't a vendor name).

@chandanbn chandanbn added this to the CVE Record JSON Format v5.0 milestone Aug 18, 2021
rsc added a commit to rsc/cve-schema that referenced this issue Aug 18, 2021
- Changed affected from object with array of vendor objects
  with array of products to just plain array of products.
- Added vendor string to product object.
- Renamed productName to product in product object.
- Added cpes array of string to product object,
  replacing affectsCpes inside old affected object.
- Reordered property list in product object
  to put all identifying fields first.
- Changed programRoutines to be array of objects, not array of strings.
- Defined that product object:
  - Requires a product identification, at least one of:
    - vendor and product
    - collectionURL and packageName
    - cpes
  - Also requires versions.
  - Expands CPE definition (previously unspecified).

Based on discussion on issue CVEProject#86.
Fixes CVEProject#41.
Fixes CVEProject#86.
rsc added a commit to rsc/cve-schema that referenced this issue Aug 19, 2021
- Changed affected from object with array of vendor objects
  with array of products to just plain array of products.
- Added vendor string to product object.
- Renamed productName to product in product object.
- Added cpes array of string to product object,
  replacing affectsCpes inside old affected object.
- Reordered property list in product object
  to put all identifying fields first.
- Changed programRoutines to be array of objects, not array of strings.
- Defined that product object:
  - Requires a product identification, at least one of:
    - vendor and product
    - collectionURL and packageName
    - cpes
  - Also requires versions.
  - Expands CPE definition (previously unspecified).

Based on discussion on issue CVEProject#86.
Fixes CVEProject#41.
Fixes CVEProject#86.
rsc added a commit to rsc/cve-schema that referenced this issue Aug 19, 2021
- Changed affected from object with array of vendor objects
  with array of products to just plain array of products.
- Added vendor string to product object.
- Renamed productName to product in product object.
- Added cpes array of string to product object,
  replacing affectsCpes inside old affected object.
- Reordered property list in product object
  to put all identifying fields first.
- Changed programRoutines to be array of objects, not array of strings.
- Defined that product object:
  - Requires a product identification, at least one of:
    - vendor and product
    - collectionURL and packageName
    - cpes
  - Also requires versions.
  - Expands CPE definition (previously unspecified).

Based on discussion on issue CVEProject#86.
Fixes CVEProject#41.
Fixes CVEProject#86.
rsc added a commit to rsc/cve-schema that referenced this issue Aug 19, 2021
- Changed affected from object with array of vendor objects
  with array of products to just plain array of products.
- Added vendor string to product object.
- Renamed productName to product in product object.
- Added cpes array of string to product object,
  replacing affectsCpes inside old affected object.
- Reordered property list in product object
  to put all identifying fields first.
- Changed programRoutines to be array of objects, not array of strings.
- Defined that product object:
  - Requires a product identification, at least one of:
    - vendor and product
    - collectionURL and packageName
    - cpes
  - Also requires versions.
  - Expands CPE definition (previously unspecified).

Based on discussion on issue CVEProject#86.
Fixes CVEProject#41.
Fixes CVEProject#86.
@ElectricNroff
Copy link

(relevant to the flattening discussion; see below) It would be very useful for the schema to capture the primary types of information that vendors make public during large-scale vulnerability coordination efforts. There are many examples under https://www.kb.cert.org/vuls/ such as the https://www.kb.cert.org/vuls/id/357312 and https://www.kb.cert.org/vuls/id/257161 pages. Today, a vendor is allowed to submit a statement that it has the "Unaffected" status (i.e., it has zero vulnerable products); however,

  • the statement must be manually approved by a human at CERT/CC
  • the data is not published in a machine-readable format
  • the data is on a CERT/CC website, but consumers may want it to be delivered directly by the CVE Program

A simple solution is to keep this part of the schema structure:

 "vendors": {
     "type": "array",
     ...
             "products": {
                 "description": "This is the container for affected technologies, products, hardware, etc.",
                 "type": "array",
                 "minItems": 1,
                 "uniqueItems": true,
                 "items": {"$ref": "#/definitions/product"}
                 }

but delete the above "minItems": 1 instance. In other words, if a vendor is named in the "vendors" array (of any container) but the number of products for that array element is zero, then this means that the vendor has reviewed that one CVE Record and is asserting that none of its products has any vulnerable version. Presumably, a vendor choosing to do this is one whose customers may plausibly think that an affected component is part of that vendor's supply chain. In practice, CERT/CC receives many "Unaffected" statements. Possible use cases include:

  • When a cve.org website visitor is reviewing a CVE Record, they can choose to display this list of unaffected vendors. If the visitor is a large customer of the vendor, they may have otherwise called the vendor to ask about the affected/unaffected status.
  • A vulnerability-scanning product can automatically use the CVE Record information to annotate results. For example, when a scan target is flagged as vulnerable, and this target has a known vendor, but that vendor is (according to the CVE Record) unaffected, there can be an annotation suggesting a possible false positive.

This is a logically separate schema change that can be considered for 5.0 or a later version. It is, of course, suggested here because it is incompatible with the flattened structure. (Supporting "Unaffected" vendors in the flattened structure would require a new wildcarding feature or other complications.) This is another example of how information flow into the CVE Program benefits from schema organization at the per-vendor level of abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants