Skip to content

Commit

Permalink
Add initial support for inlined C functions
Browse files Browse the repository at this point in the history
  • Loading branch information
Tails8521 committed Sep 13, 2021
1 parent bf9f3d2 commit d60bc20
Show file tree
Hide file tree
Showing 5 changed files with 88 additions and 25 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "md-profiler"
version = "1.0.0"
version = "1.1.0"
edition = "2018"

[dependencies]
Expand Down
43 changes: 40 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ You have several options:
You can use https://ui.perfetto.dev/ in any browser, with the Open trace button in the top left, select your json file
Or can use Google Chrome's chrome://tracing/ interface, press the Load button, on the top left and select your json file

# Limitations
# Limitations and working around them

- By default, the profiler only follows explicit subroutine calls with JSR or BSR instructions, if you jump to, or fall trough subroutine code, it won't show that subroutine as being currently called. This is fixable however, even without changing your code, but it will require a bit of manual input on your part, see the Advanced usage section for more details.
- C code with optimizations turned on tends to aggressively inline a vast amount of functions, and thus they don't appear in the graph. You can change the compiler options to make it inline less but keep in mind that builds with less inlining will not perform as well, and you may not get measurements that represent accurately how your optimized (with inlining) builds perform. I am working on a way to support manual intervals for C code, it is not as simple as with asm due to how a compiler can re-organize and duplicate code for optimization purposes, but I have a proof of concept that shows that it is feasible, I will add support for it once I come up with a good implementation.
- C code with optimizations turned on tends to aggressively inline a vast amount of functions, and thus they don't appear in the graph. You can change the compiler options to make it inline less but keep in mind that builds with less inlining will not perform as well, and you may not get measurements that represent accurately how your optimized (with inlining) builds perform. The Advanced usage section contains a workaround that lets you profile inlined functions without affecting the generated code, at the cost of having to insert annotations manually in your source files.

# Advanced usage: Manual intervals

Expand Down Expand Up @@ -88,4 +88,41 @@ mdp <output.mdp>
And then you can use the mdp command to record the trace as usual, except you also specify the interval file:
```
md-profiler -m <INTERVALS> -s <SYMBOLS> -i <INPUT> -o <OUTPUT>
```
```

## Profiling inlined C functions

In your C code, you can use these macros
```c
#define LABEL(name) asm volatile("mdp_label_" name "_%=: .global mdp_label_" name "_%=":);
#define FUNCTION_START(name) LABEL(name "_start");
#define FUNCTION_END(name) LABEL(name "_end");
```
to define global labels and function start/end markers that the profiler can use
For instance, if myfunction gets inlined when compiling but you still want to profile it, you can do this:
```c
s16 myfunction(s16 a) {
FUNCTION_START("myfunction");
if (a > 0) {
FUNCTION_END("myfunction");
return a+1;
}
if (a < 0) {
FUNCTION_END("myfunction");
return a-1;
}
FUNCTION_END("myfunction");
return 0;
}
```

You need to add FUNCTION_START at the start of the function and the FUNCTION_END before each return statement (including the implicit one at the end of the function for void functions)
In your interval file, you just specify the function name alone on a line, like so:
```
myfunction
```
I know that ideally gcc would be able to automate this work automatically, I am aware of -finstrument-functions, but this isn't really what I want, I would need something like -finstrument-macros
Anyway, if you are aware of a better way of doing this, please let me know.
58 changes: 42 additions & 16 deletions src/intervals.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use std::{collections::{HashMap, HashSet}, fs::File, io::{BufWriter, Write}};
use std::{collections::{BTreeMap, HashMap, HashSet}, fs::File, io::{BufWriter, Write}};

use crate::profiling::{TraceEvent, cycle_to_us};

Expand Down Expand Up @@ -52,17 +52,27 @@ impl Intervals {
}
}

fn read_interval_elm(input: &str, symbols: &HashMap<String, u32>) -> u32 {
fn read_interval_elm(input: &str, symbols: &BTreeMap<String, u32>) -> Vec<u32> {
if let Some(&address) = symbols.get(input) {
return address;
return vec![address];
}
if let Ok(address) = u32::from_str_radix(input, 16) {
return address;
return vec![address];
}
panic!("{} not found in the symbol files", input);
let mut ret = Vec::new();
let mut prefix = String::from("mdp_label_");
prefix.push_str(input);
// add all symbols that start with prefix
for (_symbol, &address) in symbols.range(prefix.clone()..).take_while(|(symbol, _)| symbol.starts_with(&prefix)) {
ret.push(address);
}
if ret.is_empty() {
panic!("{} not found in the symbol file", input);
}
ret
}

pub fn read_intervals(input: &[u8], symbols: &HashMap<String, u32>) -> (Intervals, HashMap<String, u32>) {
pub fn read_intervals(input: &[u8], symbols: &BTreeMap<String, u32>) -> (Intervals, HashMap<String, u32>) {
let mut intervals_info = Vec::new();
let mut starts: HashMap<u32, Vec<usize>> = HashMap::new();
let mut ends: HashMap<u32, Vec<usize>> = HashMap::new();
Expand All @@ -72,17 +82,33 @@ pub fn read_intervals(input: &[u8], symbols: &HashMap<String, u32>) -> (Interval
for line in input.split('\n') {
let line_elms: Vec<_> = line.split(',').collect();
let interval_index = intervals_info.len();
if line_elms.len() < 2 {
if line_elms.is_empty() || line.trim_start().starts_with("//") ||line_elms[0].trim().is_empty() {
continue;
}
line_elms[0].trim().split(';').for_each(|elm| {
let interval_start = read_interval_elm(elm, symbols);
starts.entry(interval_start).or_default().push(interval_index);
});
line_elms[1].trim().split(';').for_each(|elm| {
let interval_end = read_interval_elm(elm, symbols);
ends.entry(interval_end).or_default().push(interval_index);
});
if line_elms.len() == 1 {
let elm = line_elms[0].trim();
let mut elm_start = String::from(elm);
let mut elm_end = elm_start.clone();
elm_start.push_str("_start");
elm_end.push_str("_end");
for interval_start in read_interval_elm(&elm_start, symbols) {
starts.entry(interval_start).or_default().push(interval_index);
}
for interval_end in read_interval_elm(&elm_end, symbols) {
ends.entry(interval_end).or_default().push(interval_index);
}
} else {
line_elms[0].trim().split(';').for_each(|elm| {
for interval_start in read_interval_elm(elm, symbols) {
starts.entry(interval_start).or_default().push(interval_index);
}
});
line_elms[1].trim().split(';').for_each(|elm| {
for interval_end in read_interval_elm(elm, symbols) {
ends.entry(interval_end).or_default().push(interval_index);
}
});
}
let tid = if line_elms.len() >= 4 {
let custom_thread_name = line_elms[3].trim();
custom_threads.get(custom_thread_name).copied().unwrap_or_else(|| {
Expand Down Expand Up @@ -113,4 +139,4 @@ pub fn read_intervals(input: &[u8], symbols: &HashMap<String, u32>) -> (Interval
},
custom_threads
)
}
}
8 changes: 4 additions & 4 deletions src/symbols.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use std::{collections::{BTreeMap, HashMap}, convert::TryInto};
#[derive(Debug, Default)]
pub struct Symbols {
pub address_to_label: HashMap<u32, Vec<String>>,
pub label_to_address: HashMap<String, u32>,
pub label_to_address: BTreeMap<String, u32>,
}

pub fn read_symbols(input: &[u8]) -> Symbols {
Expand All @@ -18,7 +18,7 @@ pub fn read_symbols(input: &[u8]) -> Symbols {

fn read_asm68k_symbols(input: &[u8]) -> Symbols {
let mut address_to_label: BTreeMap<u32, Vec<String>> = BTreeMap::new();
let mut label_to_address: HashMap<String, u32> = HashMap::new();
let mut label_to_address: BTreeMap<String, u32> = BTreeMap::new();
let mut i = 8; // skip header
while i < input.len() {
let address = u32::from_le_bytes(input[i..i+4].try_into().unwrap());
Expand Down Expand Up @@ -51,7 +51,7 @@ fn read_asm68k_symbols(input: &[u8]) -> Symbols {

fn read_as_symbols(input: &[u8]) -> Symbols {
let mut address_to_symbols: HashMap<u32, Vec<String>> = HashMap::new();
let mut symbol_to_address: HashMap<String, u32> = HashMap::new();
let mut symbol_to_address: BTreeMap<String, u32> = BTreeMap::new();
let input = String::from_utf8_lossy(input);
let (_, input_symbols) = input.split_once("Symbols in Segment").expect("Error parsing as symbols");
for line in input_symbols.lines().skip(1) {
Expand Down Expand Up @@ -80,7 +80,7 @@ fn read_as_symbols(input: &[u8]) -> Symbols {

fn read_nm_symbols(input: &[u8]) -> Symbols {
let mut address_to_symbols: HashMap<u32, Vec<String>> = HashMap::new();
let mut symbol_to_address: HashMap<String, u32> = HashMap::new();
let mut symbol_to_address: BTreeMap<String, u32> = BTreeMap::new();
let input = String::from_utf8_lossy(input);
for line in input.split('\n') {
let elms: Vec<_> = line.split_ascii_whitespace().collect();
Expand Down

0 comments on commit d60bc20

Please sign in to comment.