-
Notifications
You must be signed in to change notification settings - Fork 44
Detecting benchmark regressions #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e8fbccf
to
415236b
Compare
Hmm have you seen my post in http://community.rubybench.org/t/gsoc-project-improving-rubybench/99/53?u=tgxworld? 😄 |
commit_objects.each do |object| | ||
commit_results << object.result[category] | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this should be done in a background job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The background job can be called after the current benchmark_run
is saved to the database successfully.
415236b
to
fc4f350
Compare
app/jobs/benchmark_regression_job.rb
Outdated
|
||
def perform(initiator, benchmark_type, benchmark_result_type, category) | ||
commit_objects = BenchmarkRun.where( initiator: initiator, benchmark_type: benchmark_type, | ||
benchmark_result_type: benchmark_result_type).limit(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be the best value here instead of 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What values have you considered and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can hardcode it to a specific number but this largely depends on what baseline we are considering to detect performance regression in this new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What method have you decided to implement to detect regressions? Based on the method, you can them derive a reasonable value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a post in http://community.rubybench.org/t/gsoc-project-improving-rubybench/99/55 to discuss the method.
fc4f350
to
ae15b3c
Compare
Have a look at this. |
app/jobs/benchmark_regression_job.rb
Outdated
results_standard_deviation = Math.sqrt(commit_results.inject(0){|accum, i| accum +(i-results_avg)**2 }/(commit_results.length - 1).to_f) | ||
results_param = results_avg + 2*results_standard_deviation | ||
|
||
if commit_objects[0].result[category] > results_param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the new result should be included in the calculation of the standard deviation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not including the new result in the calculation of standard deviation by dropping the first element in the loop (see commit_objects.drop(1)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh ic. Anyway, the default scope is to order by created_at
, what we want here is to order by Commit#created_at
instead.
4e551f3
to
13dd3cd
Compare
app/jobs/benchmark_regression_job.rb
Outdated
|
||
def perform(initiator, benchmark_type, benchmark_result_type, category) | ||
commit_objects = BenchmarkRun.where(benchmark_type: benchmark_type, | ||
benchmark_result_type: benchmark_result_type).offset(10).limit(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we are fetching 1000 results it doesn't make much of a difference if we leave a first few results. So we can set a safety offset(here 10) so that we can be sure that we are fetching the correct results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we are fetching 1000 results it doesn't make much of a difference if we leave a first few results.
It does make a difference if the last 10 results are way off. What you want here is the benchmark results of the last 1000 commits which you can easily fetch.
13dd3cd
to
1f6ad98
Compare
app/jobs/benchmark_regression_job.rb
Outdated
benchmark_result_type: benchmark_result_type).order(:created_at).first.initiator_id | ||
|
||
benchmark_runs = BenchmarkRun.where(initiator_type: Commit, benchmark_type: benchmark_type, | ||
benchmark_result_type: benchmark_result_type).offset(last_initiator - current_initiator).limit(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets the offset so that the benchmarks which came after our benchmark are not considered.
1f6ad98
to
68817ff
Compare
Done 👍 |
Is this PR done? |
You can have a look at it once. According to me, it's done 😄 |
hmm I see WIP in the title and we don't seem to be creating an issue in Github yet when a regression is detected? You need to add tests as well otherwise, we won't know that your code is working as intended. |
68817ff
to
2394fb3
Compare
Done 👍 |
category = "sometime" | ||
BenchmarkRegressionJob.new.perform(benchmark_run, initiator, benchmark_type, benchmark_result_type, category) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
erm.... what exactly is this test testing for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing for the new benchmark regression job which I added!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please read through http://guides.rubyonrails.org/testing.html.
I am not following how your test is actually testing that the job you added works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I add tests for this job differently? Currently this test just runs the perform method with appropriate parameters to test the method which is similar to what implemented in remote_server_job tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't similar to remote_server_job
tests. Note how those tests sets expectations about certain method being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're doing here is just calling the method.
- How do you know if it is correctly detecting a regression?
- When it correctly detects a regression, what is the code expected to do?
These have to be included in the tests
2394fb3
to
2097328
Compare
b2e862c
to
f9095dd
Compare
Done 👍 |
What is done ❓ |
I have made the changes. You can have a look. |
app/jobs/benchmark_regression_job.rb
Outdated
uri = URI.parse(Rails.application.secrets.github_api+"?state=open&since=#{get_time}") | ||
response = JSON.parse(Net::HTTP.get(uri)) | ||
response.each do |response| | ||
puts response["body"].nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahsaurabh0605 Please check your PR... I've mentioned this before.
f9095dd
to
2d8689d
Compare
Made the changes. |
2d8689d
to
a404814
Compare
test "check for similar issues" do | ||
stub_request(:get, Rails.application.secrets.github_api+"?state=open&since=#{BenchmarkRegressionJob.new.get_time}"). | ||
with(:headers => {'Accept'=>'*/*', 'Accept-Encoding'=>'gzip;q=1.0,deflate;q=0.6,identity;q=0.3', 'User-Agent'=>'Ruby'}). | ||
to_return(:status => 200, :body => %Q[["body"]], :headers => {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we are not using a VCR cassette here ❓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are using stub_request then we are not using an actual api request. So do we need vcr cassettes over here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using VCR for the other request to GitHub so I'm curious why we end up stubbing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are using current time in the api request which continuously changes. So i think stubbing the request must be the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is https://github.com/travisjeffery/timecop for that. I just realized you're only unit testing each method individually. How are you sure that when we glue everything together, it'll work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure all the pieces glue together, i think i need to add test for the perform method which makes use of all the other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do :) I actually think a single test will cover everything. We don't really need to unit test those methods individually. Be sure to cover failure cases as well. Example: When an issue is not supposed to be created
The PR is looking good. One last thing. I need you to fix up your code style because it is all over the place right now.
Please look through each and every line carefully to make sure the code style is consistent. |
2177622
to
02e0b3b
Compare
travel_to "2016-06-03 13:50:41 +0530" do | ||
VCR.use_cassette('benchmark_regression') do | ||
assert_equal ["sometime"], BenchmarkRegressionJob.new.perform(@benchmark_run.id) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit stuck here. I am able to travel to a particular time and use VCR cassettes. I need to add two tests, one for the regression detected and issue created and another for issue not created. But how can I test this as the method does not return response code now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well then you got to think about what you are testing for. In this case, you care that certain HTTP calls will be made under the right scenario. Whether the response code is returned by the method doesn't really matter because the VCR cassette will end up returning the same response code every when is it using the cassette.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually just restore the previous unit tests. I realized the trouble of writing integration test isn't worth it.
02e0b3b
to
d3800b1
Compare
Any errors in indentation? |
Anything left in this? |
Looks good to me. I'll probably have to merge this over the weekend so that I can fix the tests after merging |
@shahsaurabh0605 Thanks for the work :) |
Have you seen my reply http://community.rubybench.org/t/gsoc-project-improving-rubybench/99/70 ? |
Closing as stale. |
Added a method to detect benchmark regressions. This fetches the result of last 10 commits from the database and compares with the newly created result.