Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stack size analysis to CI #459

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Add stack size analysis to CI #459

wants to merge 1 commit into from

Conversation

robin-nitrokey
Copy link
Member

This patch adds a CI job that prints the stack sizes for the functions in the stable NK3 firmware as well as the necessary tooling.

Fixes: #313


Currently, this just prints the stack sizes. We should try to transform this into some metrics, but I’m not sure what would be useful:

  • max/total/average/median stack size
  • n-th percentile stack size
  • sum of top-k or top-n-% functions

Ideally we would have the max stack usage per call path but AFAIR we did not get useful results with the existing tooling.

This patch adds a CI job that prints the stack sizes for the functions
in the stable NK3 firmware as well as the necessary tooling.

Fixes: #313
Copy link
Contributor

@sosthene-nitrokey sosthene-nitrokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not include the functions that perform dynamic stack allocations. I don't think rust code would have any but it might be interesting thing to add the list of functions where the stack size is not available. Not sure if this can be obtained just by listing the function in the binary and finding the ones not mentioned in the stack-sizes section.

max/total/average/median stack size
n-th percentile stack size
sum of top-k or top-n-% functions

Maybe having a high limit, reporting all function with a stack size higher than 2kB? Or maybe report any "new" function higher than 2kB.

I'm not sure I understand the interest of the sum. For me it only makes sense for a specific stacktrace.

@robin-nitrokey
Copy link
Member Author

The sum is not useful for functional changes, but I think it could be used to analyze refactoring. We saw that minor changes that look like a no-op can have drastic effects on stack usage (e. g. trussed-dev/iso7816#20). And sometimes, inlining decisions change during refactoring. Both should be observable via the sum. Of course a change would then require manual investigation, but it could potentially be a first indicator.

Tracking individual functions would also be an option, but it requires us to keep much more state to perform the comparison against, and non-trivial changes can introduce a lot of noise.

@sosthene-nitrokey
Copy link
Contributor

Ok I see the appeal of the sum, but the issue I have with it is that if the sum of code-paths that cannot happen at the same time increases it's not really relevant. I also prefer the idea of tracking functions individually, but tracking it across commits would be much harder (and without even taking into accounts generics/name-mangling).

I think on the other hand that tracking the count of function that have stack usage higher than some threshold is great, and wouldn't be that complex. We could manually set the limit, ensuring that any new stack-intensive function is detected and intentional. No need to keep state in the CI itself, but we can detect the introduction of any outlier.

@robin-nitrokey
Copy link
Member Author

For me this would just be informational. It is then up to the author to decide if that change is expected for the current PR or if it is something to investigate. Tracking the count above a threshold sounds interesting. But I would still like to combine this with some quantitative metric – maybe average of functions above the threshold?

Comment on lines +16 to +38
fn main() {
let mut stdin = io::stdin().lock();
let mut buffer = Vec::new();
stdin.read_to_end(&mut buffer).unwrap();
let functions = analyze_executable(&buffer).unwrap();
let mut sorted: Vec<_> = functions.defined.into_values().collect();
sorted.sort_by_key(|f| f.stack().unwrap_or(0));
for f in sorted.into_iter().rev() {
if let Some(stack) = f.stack() {
print!("{:6}", stack);
} else {
print!("None");
}
print!(" {:6} ", f.size());
for (i, name) in f.names().into_iter().enumerate() {
if i > 0 {
print!(", ");
}
print!("{}", demangle::demangle(name));
}
println!();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After demangling there is a lot of duplication. Maybe adding some deduplication would be nice.

Deduplication would keep the max of the stack usage, sum the binary size, and show how many time the given function is duplicated.

  2424     64 trussed::client::FutureResult<T,C>::poll
  2424     64 trussed::client::FutureResult<T,C>::poll
  2424     96 trussed::client::FutureResult<T,C>::poll
  2424     64 trussed::client::FutureResult<T,C>::poll
  2424     64 trussed::client::FutureResult<T,C>::poll
  2424     64 trussed::client::FutureResult<T,C>::poll
  2424     64 trussed::client::FutureResult<T,C>::poll
  2416     68 trussed::client::FutureResult<T,C>::poll
  2416     58 trussed::client::FutureResult<T,C>::poll
  2416     58 trussed::client::FutureResult<T,C>::poll
  2416     58 trussed::client::FutureResult<T,C>::poll
  2416     58 trussed::client::FutureResult<T,C>::poll
  2416     80 trussed::client::FutureResult<T,C>::poll
  2416     44 heapless_bytes::Bytes<_>::from_slice
  2408     50 trussed::client::FutureResult<T,C>::poll
  2408     54 trussed::client::FutureResult<T,C>::poll
  2408     54 trussed::client::FutureResult<T,C>::poll
  2408     50 trussed::client::FutureResult<T,C>::poll
  2408     50 trussed::client::FutureResult<T,C>::poll

@sosthene-nitrokey
Copy link
Contributor

I tend to prefer to see the top 10 percentiles and then deciles over averages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add stack analysis tool
2 participants