Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting metadata for impacted functions in advisories #68

Closed
tarcieri opened this Issue Nov 28, 2018 · 13 comments

Comments

Projects
None yet
3 participants
@tarcieri
Copy link
Member

tarcieri commented Nov 28, 2018

RustPräzi is a crater-like tool which builds a call graph across all of the crates published to crates.io:

https://internals.rust-lang.org/t/prototype-dev-tool-rustprazi-a-tool-to-build-an-entire-call-graph-of-crates-io/8912

It would be interesting to use this tool, perhaps in conjunction with crates-audit, to generate a list of impacted crates based on this call graph.

To do that we'd need to collect structured information about vulnerable functions in advisories. Looking at our existing advisories, that information is often already there, but buried in an unstructured prose description, e.g.:

https://rustsec.org/advisories/RUSTSEC-2018-0003

If an iterator passed to SmallVec::insert_many panicked in Iterator::next, destructors were run during unwinding while the vector was in an inconsistent state, possibly causing a double free (a destructor running on two copies of the same value).

I think it'd be good to come up with a structured format for this sort of information that can be fed into tools like RustPräzi to determine impacted users of vulnerable crates.

@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Dec 13, 2018

I think we should just collect a canonical path starting with the crate name, and sans any parameters as an optional attribute of the advisory.

Using the advisory from the above description as an example:

affected_functions = ["smallvec::SmallVec::insert_many"]

tarcieri added a commit to RustSec/rustsec-crate that referenced this issue Dec 14, 2018

Implement "affected_functions" advisory attribute
Closes RustSec/advisory-db#68

This adds an optional `affected_functions` attribute to the advisory
TOML, containing a list of canonical paths to functions within a
crate which contain vulnerable code associated with the advisory.:

```
affected_functions = ["smallvec::SmallVec::insert_many"]
```

By tracking this information in a structured manner, it can be consumed
by static analysis tools which e.g. trace call graphs.

tarcieri added a commit to RustSec/rustsec-crate that referenced this issue Dec 15, 2018

Implement "affected_functions" advisory attribute
Closes RustSec/advisory-db#68

This adds an optional `affected_functions` attribute to the advisory
TOML, containing a list of canonical paths to functions within a
crate which contain vulnerable code associated with the advisory:

    affected_functions = ["smallvec::SmallVec::insert_many"]

By tracking this information in a structured manner, it can be consumed
by static analysis tools which e.g. trace call graphs.

Inventitech added a commit to praezi/advisory-db that referenced this issue Dec 20, 2018

@Inventitech

This comment has been minimized.

Copy link
Contributor

Inventitech commented Dec 21, 2018

Sorry to chime in on this after it's been merged.

One problem with the current design is that it does not allow for the specification of a version.

Why could this be important?
Some security bugs affect quite a broad change of versions of a crate, and as such, the vulnerability might itself be subject to refactoring, renames, etc. So just specifying a name in affected_functions is perhaps overly simplistic.

Example

v1 introduce vulnerable function f1
v2 rename f1 to f2
v3 introduce new function f1, which is not vulnerable
v4 fix vulnerability in f2

Just naming all affected_functions f1 and f2 is not a great solution, because f1 would be considered vulnerable from v2 to v3, even though we know it isn't.

What would the change proposal be?
For each function, add a range of affected versions.

affected_functions = [ ["f1","<2"], ["f2",">2 <4"]]

What are the consequences of the proposed change?

  • It makes the declaration of this array more complicated.
  • It increases precision and resolves possible ambiguities. I don't know how often this would occur in practice.

Another question would be whether we would want to restrict this to functions, only. Couldn't a data structure have a vulnerability, too?

@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Dec 21, 2018

Haha, no worries.

Modifying the existing affected_functions might be tricky, but we can just abandon it, especially as this is a good reason:

Another question would be whether we would want to restrict this to functions, only. Couldn't a data structure have a vulnerability, too?

I definitely agree with this. There have been a few advisories filed which don't pertain to a function, but e.g. C FFI-related memory corruption (e.g. RUSTSEC-2018-0011).

How about this:

  • Deprecate and remove affected_functions. We can delete the existing ones.
  • Add a new attribute called affected_paths. How about using a TOML table where the keys are VersionReqs? e.g.
[affected_paths]
">= 1.0.0, < 1.2.3" = ["mycrate::foo::bar"]
"1.2.3" = ["mycrate::foo::baz"]

Or (these are two different TOML syntaxes for the same thing):

affected_paths."1.2.3" = [
  "mycrate::foo::bar",
  "mycrate::foo::baz",
  "mycrate::foo::quux",
]
@Inventitech

This comment has been minimized.

Copy link
Contributor

Inventitech commented Dec 21, 2018

That sounds good!

Personally, I would prefer the first affected_paths syntax you have given.

@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Dec 21, 2018

Both syntaxes will work, although we can document the first one.

@Shnatsel

This comment has been minimized.

Copy link

Shnatsel commented Jan 7, 2019

We need to decide and document whether affected_paths should list public functions so that humans could cross-reference them with their codebase, or private functions that are the root cause of the issue because that's what enables the most precise machine analysis.

@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Jan 8, 2019

It could potentially list both. Curious what @Inventitech thinks

@Shnatsel

This comment has been minimized.

Copy link

Shnatsel commented Jan 8, 2019

Listing both would probably result in conflicting information, too much room for human error, and increased burden for filing advisories.

@Inventitech

This comment has been minimized.

Copy link
Contributor

Inventitech commented Jan 9, 2019

Hmm ... hard question.

Pros for listing public affected paths are that these are easy to consume for most users.
Problems are that whoever writes the advisory has to build a call graph or data flow graph to see where they are being used (as you said), so there is more potential for wrong hard-coded information in the advisory.

Pros for listing just the offending (maybe private, maybe public) functionality is that it is easy to do for the advisory creator, it is easiest to check for correctness and it keeps the advisories short (this might be an important practical argument). In theory, once we have the directly affected paths, we can just auto-expand to all affected paths in the module. Of course, the danger is that people take a quick manual look at the advisory, see they're not using anything of what is listed under affected_paths directly and wrongly assume they're safe. Maybe a simple fix of this would be to rename the attribute to root_cause_paths. Also, another argument for listing what is directly affected might be Rust's re-export feature, or if reflection will be further developed, one can no longer assume it is impossible to get access to private paths. Thus it seems that the root_cause_paths solution is also more future proof.

I also discussed this with @jhejderup, and like you, we see pros and cons for both approaches, but in the end, would only mention the root cause paths. Main arguments are that it is easy for the advisory creator, it keeps the advisories short, it does not include parts that can be auto-generated (transitive uses), and it seems to be more future-proof.

@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Jan 13, 2019

An alternative to listing both under [affected_paths] is having advisory reporters continue to document the most relevant user-facing functions in the advisory description, which is what they're doing already organically.

tarcieri added a commit to RustSec/rustsec-crate that referenced this issue Jan 13, 2019

Redo 'affected_functions' as 'affected_paths'
As discussed here:

RustSec/advisory-db#68 (comment)

This change collects paths (either functions or types) to code affected
by an advisory, now parameterized by a `semver::VersionReq` as types
may have been changed between versions, even if all versions are
vulnerable.

tarcieri added a commit to RustSec/rustsec-crate that referenced this issue Jan 13, 2019

Redo 'affected_functions' as 'affected_paths'
As discussed here:

RustSec/advisory-db#68 (comment)

This change collects paths (either functions or types) to code affected
by an advisory, now parameterized by a `semver::VersionReq` as types
may have been changed between versions, even if all versions are
vulnerable.
@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Mar 3, 2019

We tried using the new syntax in #89, which provides a pretty good real-world example:

[affected_paths]
">= 0.4.0, <= 0.10.0"  = ["safe_transmute::guarded_transmute_vec_permissive"]
"= 0.10.0"             = ["safe_transmute::guarded_transmute_to_bytes_vec"]

This now has me wondering if instead of a map from paths to VersionReqs, it should be the other way around:

[affected_paths]
"safe_transmute::guarded_transmute_vec_permissive" = [">= 0.4.0, <= 0.10.0"]
"safe_transmute::guarded_transmute_to_bytes_vec" = ["= 0.10.0"]

(although ironically this seems to be breaking GitHub's TOML parser?)

Perhaps we should use an array of tables instead?

[[affected_path]]
name = "safe_transmute::guarded_transmute_vec_permissive"
versions = [">= 0.4.0, <= 0.10.0"]

[[affected_path]]
name = "safe_transmute::guarded_transmute_to_bytes_vec"
versions = ["= 0.10.0"]
@Inventitech

This comment has been minimized.

Copy link
Contributor

Inventitech commented Mar 4, 2019

This now has me wondering if instead of a map from paths to VersionReqs, it should be the other way around:

IMHO, it should (see my post #68 (comment)). Anyway, this is only a minor change.

Your other suggestion seems a bit more verbose than needs-be, by repeating [[affected_path]].

@tarcieri

This comment has been minimized.

Copy link
Member Author

tarcieri commented Mar 4, 2019

@Inventitech another way of writing the same thing is:

affected_paths = [
    { name = "safe_transmute::guarded_transmute_vec_permissive", versions = [">= 0.4.0, <= 0.10.0"] }
]

Your version is pretty close to that, aside from using nested arrays instead of an array of tables.

The reason for the more verbose form is the less verbose form seems to freak out TOML parsers (see the other version in my GitHub comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.