List missing benchmarks #28

sunjay · 2017-04-19T16:43:39Z

Fixes #27

I added this behind a command line flag, but if you think it would be appropriate, I could make it enabled by default and make the flag disable it. I left the warning in, but only printed it if the user did not enable missing benchmarks.

Below is how it looks. I fed in some of my own benchmark results. I couldn't get the integration tests to pass on my machine, so I'm going to wait for the Travis output and see what adjustments I need to make.

Let me know if you have any feedback!

$ cat old
running 8 tests
test b01_compile_trivial   ... bench:       2,132 ns/iter (+/- 673)
test b02_compile_large     ... bench:   1,594,916 ns/iter (+/- 78,139)
test b03_compile_huge      ... bench:  51,480,362 ns/iter (+/- 1,144,024)
test b04_compile_simple    ... bench:      31,399 ns/iter (+/- 868)
test b05_compile_slow      ... bench:     225,000 ns/iter (+/- 5,625)
test b06_interpret_trivial ... bench:      10,325 ns/iter (+/- 221)
test b07_interpret_simple  ... bench:   6,740,466 ns/iter (+/- 81,711)
test b08_interpret_slow    ... ignored

test result: ok. 0 passed; 0 failed; 1 ignored; 7 measured

     Running target/release/deps/brainfuck-ca6ac36e58b54315

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target/release/deps/brainfuck-184a47abc46cc402

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

$ cat new
running 16 tests
test b01_compile_trivial       ... bench:       2,829 ns/iter (+/- 279)
test b01_compile_trivial_opt   ... bench:       1,746 ns/iter (+/- 55)
test b02_compile_large         ... bench:   2,516,662 ns/iter (+/- 392,982)
test b02_compile_large_opt     ... bench:   1,039,463 ns/iter (+/- 71,707)
test b03_compile_huge          ... bench:  86,575,722 ns/iter (+/- 3,605,019)
test b03_compile_huge_opt      ... bench:  15,571,740 ns/iter (+/- 1,062,383)
test b04_compile_simple        ... bench:      44,247 ns/iter (+/- 490)
test b04_compile_simple_opt    ... bench:      24,441 ns/iter (+/- 255)
test b05_compile_slow          ... bench:     289,571 ns/iter (+/- 3,150)
test b05_compile_slow_opt      ... bench:     158,586 ns/iter (+/- 2,301)
test b06_interpret_trivial     ... bench:      10,442 ns/iter (+/- 46)
test b06_interpret_trivial_opt ... bench:       5,634 ns/iter (+/- 19)
test b07_interpret_simple      ... bench:   7,388,517 ns/iter (+/- 34,084)
test b07_interpret_simple_opt  ... bench:   4,114,281 ns/iter (+/- 18,730)
test b08_interpret_slow        ... ignored
test b08_interpret_slow_opt    ... ignored

test result: ok. 0 passed; 0 failed; 2 ignored; 14 measured

     Running target/release/deps/brainfuck-ca6ac36e58b54315

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

     Running target/release/deps/brainfuck-184a47abc46cc402

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

$ cargo run -- benchcmp old new --include-missing
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/cargo-benchcmp benchcmp old new --include-missing`
 name                       old ns/iter  new ns/iter  diff ns/iter  diff % 
 b01_compile_trivial        2,132        2,829                 697  32.69% 
 b02_compile_large          1,594,916    2,516,662         921,746  57.79% 
 b03_compile_huge           51,480,362   86,575,722     35,095,360  68.17% 
 b04_compile_simple         31,399       44,247             12,848  40.92% 
 b05_compile_slow           225,000      289,571            64,571  28.70% 
 b06_interpret_trivial      10,325       10,442                117   1.13% 
 b07_interpret_simple       6,740,466    7,388,517         648,051   9.61% 
 b01_compile_trivial_opt    n/a          1,746                 n/a     n/a 
 b02_compile_large_opt      n/a          1,039,463             n/a     n/a 
 b03_compile_huge_opt       n/a          15,571,740            n/a     n/a 
 b04_compile_simple_opt     n/a          24,441                n/a     n/a 
 b05_compile_slow_opt       n/a          158,586               n/a     n/a 
 b06_interpret_trivial_opt  n/a          5,634                 n/a     n/a 
 b07_interpret_simple_opt   n/a          4,114,281             n/a     n/a

$ cargo run -- benchcmp new old --include-missing
   Compiling cargo-benchcmp v0.1.5 (file:///home/sunjay/Documents/projects/cargo-benchcmp)
    Finished dev [unoptimized + debuginfo] target(s) in 3.9 secs
     Running `target/debug/cargo-benchcmp benchcmp new old --include-missing`
 name                       new ns/iter  old ns/iter  diff ns/iter   diff % 
 b01_compile_trivial        2,829        2,132                -697  -24.64% 
 b02_compile_large          2,516,662    1,594,916        -921,746  -36.63% 
 b03_compile_huge           86,575,722   51,480,362    -35,095,360  -40.54% 
 b04_compile_simple         44,247       31,399            -12,848  -29.04% 
 b05_compile_slow           289,571      225,000           -64,571  -22.30% 
 b06_interpret_trivial      10,442       10,325               -117   -1.12% 
 b07_interpret_simple       7,388,517    6,740,466        -648,051   -8.77% 
 b01_compile_trivial_opt    1,746        n/a                   n/a      n/a 
 b02_compile_large_opt      1,039,463    n/a                   n/a      n/a 
 b03_compile_huge_opt       15,571,740   n/a                   n/a      n/a 
 b04_compile_simple_opt     24,441       n/a                   n/a      n/a 
 b05_compile_slow_opt       158,586      n/a                   n/a      n/a 
 b06_interpret_trivial_opt  5,634        n/a                   n/a      n/a 
 b07_interpret_simple_opt   4,114,281    n/a                   n/a      n/a

BurntSushi

This looks great! Thanks so much. :-)

I haven't touched the tests in a while, but I know some exist. Would it be possible for you to add a test for this? (If it turns out to take longer than a few minutes, then I'm fine skipping it.)

sunjay · 2017-04-19T16:48:23Z

The tests exist! I just don't know how to successfully run them locally.

It would be great if you had some instructions in the repo about how to run the tests. At first I was getting lots of errors about having no cargo out dir environment variable. I eventually figured out that the variable I needed was OUT_DIR, but then ran into even more errors.

This is the output I get locally when I run OUT_DIR=target cargo test:

running 17 tests
test benchmark::tests::commafy::comma_every_three ... ok
test benchmark::tests::commafy::number_matches ... ok
test benchmark::tests::overlap::overlap_correct ... ok
test benchmark::tests::overlap::result_from_vecs ... ok
test benchmark::tests::overlap::missing_correct ... ok
test tests::names::empty_gives_old ... ok
test tests::names::empty_gives_new ... ok
test benchmark::tests::benchmark::reparse ... ok
test tests::names::non_path_gives_originals ... ok
test tests::names::same_path_gives_originals ... ok
test tests::names::difference_preserving ... ok
test tests::names::shortest_difference ... ok
test tests::names::gives_suffixes ... ok
test tests::names::symmetric_operation ... ok
test tests::split_benchmarks::dropped_non_prefix ... ok
test tests::split_benchmarks::non_overlapping ... ok
test tests::split_benchmarks::from_original ... ok

test result: ok. 17 passed; 0 failed; 0 ignored; 0 measured


running 9 tests
test different_input_selections ... FAILED
test different_input_colored ... FAILED
test different_input ... FAILED
test empty_results ... FAILED
test invalid_arguments ... FAILED
test non_overlapping_input ... FAILED
test same_input ... FAILED
test stdin ... FAILED
test version ... FAILED

failures:

---- different_input_selections stdout ----
	current_directory_resolved: 
thread 'different_input_selections' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- different_input_colored stdout ----
	current_directory_resolved: 
thread 'different_input_colored' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- different_input stdout ----
	current_directory_resolved: 
thread 'different_input' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868
note: Run with `RUST_BACKTRACE=1` for a backtrace.

---- empty_results stdout ----
	current_directory_resolved: 
thread 'empty_results' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- invalid_arguments stdout ----
	current_directory_resolved: 
thread 'invalid_arguments' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- non_overlapping_input stdout ----
	current_directory_resolved: 
thread 'non_overlapping_input' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- same_input stdout ----
	current_directory_resolved: 
thread 'same_input' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- stdin stdout ----
	current_directory_resolved: 
thread 'stdin' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868

---- version stdout ----
	current_directory_resolved: 
thread 'version' panicked at 'asked to canonicalize "cargo-benchcmp" but failed: Error { repr: Os { code: 2, message: "No such file or directory" } }', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/result.rs:868


failures:
    different_input
    different_input_colored
    different_input_selections
    empty_results
    invalid_arguments
    non_overlapping_input
    same_input
    stdin
    version

test result: FAILED. 0 passed; 9 failed; 0 ignored; 0 measured

Could you tell me how I can run the tests locally?

sunjay · 2017-04-19T16:49:23Z

Looks like Travis CI is failing because of the same issues.

BurntSushi · 2017-04-19T16:54:55Z

Ug. Drats. This looks like a bug in second_law. This is why I hate using frameworks for tests.

Either second_law needs to get patched or it needs to be ripped out (strong preference towards the latter). Neither thing is something you should have to worry about.

When I get chance, my plan will be to rip out second_law and just use the homegrown testing infrastructure that I use in xsv/ripgrep.

sunjay · 2017-04-19T16:57:45Z

@BurntSushi Sounds good!

It seems to work fine from my preliminary tests. Do you have any thoughts about the command line flag or the warning stuff that I brought up in my original post? I wasn't sure about whether I should leave the warning in or not and whether this should be enabled by default.

If that's all good, I think you're good to merge and release a new version on crates.io. 😄

BurntSushi · 2017-04-19T17:08:35Z

@sunjay I think the behavior your implemented sounds reasonable. It's possible a flag isn't even necessary and that it should just do this by default, but I don't have any strong opinions.

I would like to get some tests written for this and CI green before merging.

sunjay · 2017-04-19T17:18:02Z

Would you like me to write a test? I'm not sure how to fix the issues in the current tests. If you can sort those out I'll happily add some for this feature. :)

BurntSushi · 2017-04-19T17:29:40Z

@sunjay Right. This PR is blocked on me fixing things. I'm not sure when I'll get to it. Hopefully soon.

sunjay · 2017-04-19T18:29:25Z

@BurntSushi I investigated this a bit to see if it was a quick fix. Unfortunately, it looks like second_law was depending on some implementation details of cargo that changed. I opened an issue in second_law to deal with this.

sunjay · 2017-04-19T18:42:31Z

I did a bit of binary search and found that the breakage happened somewhere between nightly-2016-11-29 and nightly-2016-12-16. There were no nightlies published between those dates. The tests work on 2016-11-29 and begin to fail on 2016-12-16.

I pinned the travis nightly version to that specific date so it should all pass now. Once second_law fixes the bug or once you migrate away from it, you can easily switch it back.

sunjay · 2017-04-19T18:46:16Z

@BurntSushi Looks like there was an SSL issue. The build just needs to be re-run.

BurntSushi · 2017-05-17T12:41:03Z

@sunjay It looks like #30 fixed the testing issue with a work around. Could you remove the nightly pin and rebase? It should work then. Thanks!

sunjay · 2017-05-18T20:08:37Z

Done 😄

BurntSushi · 2017-05-18T20:27:34Z

yay thanks!

sunjay · 2017-05-18T20:38:51Z

Could you release a new version so I can start to use this right away? 😄

BurntSushi · 2017-05-18T23:07:48Z

All set! 0.2.0 should be on crates.io. :-)

sunjay mentioned this pull request Apr 19, 2017

Include results that are not in both result sets #27

Closed

BurntSushi reviewed Apr 19, 2017

View reviewed changes

Added an option to list missing benchmarks as part of the results

ae05a5b

BurntSushi merged commit df427b1 into BurntSushi:master May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List missing benchmarks #28

List missing benchmarks #28

sunjay commented Apr 19, 2017

BurntSushi left a comment

sunjay commented Apr 19, 2017

sunjay commented Apr 19, 2017

BurntSushi commented Apr 19, 2017 •

edited

sunjay commented Apr 19, 2017

BurntSushi commented Apr 19, 2017

sunjay commented Apr 19, 2017

BurntSushi commented Apr 19, 2017

sunjay commented Apr 19, 2017

sunjay commented Apr 19, 2017

sunjay commented Apr 19, 2017 •

edited

BurntSushi commented May 17, 2017

sunjay commented May 18, 2017

BurntSushi commented May 18, 2017

sunjay commented May 18, 2017

BurntSushi commented May 18, 2017

List missing benchmarks #28

List missing benchmarks #28

Conversation

sunjay commented Apr 19, 2017

BurntSushi left a comment

Choose a reason for hiding this comment

sunjay commented Apr 19, 2017

sunjay commented Apr 19, 2017

BurntSushi commented Apr 19, 2017 • edited

sunjay commented Apr 19, 2017

BurntSushi commented Apr 19, 2017

sunjay commented Apr 19, 2017

BurntSushi commented Apr 19, 2017

sunjay commented Apr 19, 2017

sunjay commented Apr 19, 2017

sunjay commented Apr 19, 2017 • edited

BurntSushi commented May 17, 2017

sunjay commented May 18, 2017

BurntSushi commented May 18, 2017

sunjay commented May 18, 2017

BurntSushi commented May 18, 2017

BurntSushi commented Apr 19, 2017 •

edited

sunjay commented Apr 19, 2017 •

edited