Skip to content

cveIdGetFiltered documentation doesn't mention time_ implications of statelessness #961

@ElectricNroff

Description

@ElectricNroff

This is different from the #920 issue because pagination can be used correctly, but is underdocumented and therefore many CNAs may end up using it incorrectly.

The https://cveawg-test.mitre.org/api-docs page doesn't specifically mention that the CVE Services API is a REST API or other stateless API. This has implications for interpreting how to use the https://cveawg-test.mitre.org/api-docs/#/CVE%20ID/cveIdGetFiltered time_ fields. Specifically, the user needs to know that paginated output only makes sense if one (or both) of these is true:

  • every request is accompanied by a time "lt" field that is no later than the time that any page was requested
  • the user can guarantee that no additional CVE IDs are reserved (or modified) during iteration across the pages

For example, if the request has no time fields or only time_reserved.gt, and is made by a large CNA at which multiple persons or applications may be requesting CVE IDs asynchronously, then the page layout can change during iteration (e.g., requests for non-sequential CVE IDs can obtain ones with lower numbers than some of the CVE IDs that had already been reserved before pagination began). The outcome is that the final collection of retrieved CVE IDs (combined across all pages) can be missing some CVE IDs and can have duplicate CVE IDs.

A CNA may run into this problem if:

  • they suspect that CVE Services has a REST API but do not bother to think about the implications of page-layout changes during iteration
  • they guess that the sort order is reservation date, e.g., if a few CVE IDs were reserved after iteration started, they would always be retrieved on the last page, and the page layout would not change. (This is an incorrect guess.)
  • they guess that the API is not stateless, e.g., the server hypothetically generates all of the page content after receiving the request for the first page, and places this into a cache perhaps associated with the client's source IP address, and thus the page layout is guaranteed to remain the same for a long time (until the cache is flushed or the last page is retrieved). This is, of course, also an incorrect guess.

In practice, at least one large CNA has been relying on the page layout to stay the same, even though they didn't use time_reserved.lt and did have multiple asynchronous requesters. This is not a theoretical problem. Also, publicly available client code such as https://github.com/RedHatProductSecurity/cvelib/blob/254fc36dfce518df3e7e423f5bd770ef039535e4/cvelib/cve_api.py#L125 does not insert a time_reserved.lt field for when the iteration began.

To resolve this problem, the api-docs content could be modified to state, for example:

time_reserved.lt    Most recent CVE ID reserved timestamp to retrieve. Intended to be used for any
                  request that can potentially return more than 500 CVE IDs; otherwise, results can be
                  erroneous because the paginator's page layout may change.
time_modified.lt   Most recent CVE ID modified timestamp to retrieve. Intended to be used for any
                 time_modified.gt request that can potentially return more than 500 CVE IDs;
                 otherwise, results can be erroneous because the paginator's page layout may change.

Of course, some readers will not understand what this means, and may have trouble deciding what value of time_reserved.lt or time_modified.lt should be chosen to avoid erroneous output. They can ask for help from the community or the Secretariat. Alternatively, more complete information could be included elsewhere in the CVE Services software documentation. It would probably not be useful to include a multi-paragraph explanation within the api-docs page itself.

As far as I know, if time_reserved.gt is being used (or neither time_reserved.gt nor time_modified.gt is being used) then the client should always be sending a time_reserved.lt field marking the start of the iteration. If time_modified.gt is being used, then the client should always be sending a time_modified.lt field marking the start of the iteration.

All of this has similar implications for the GET /cve endpoint but that documentation is only really needed by the Secretariat.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions