New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background Coverage Data Reporting #124

Closed
Kallin opened this Issue Sep 28, 2018 · 31 comments

Comments

Projects
None yet
3 participants
@Kallin
Contributor

Kallin commented Sep 28, 2018

We're trying out coverband on production, and finding that the redis calls are taking a majority of the time of some of our web requests when looking at new relic.

Digging into the redis_store, it seems that, for every file, we first get all of the existing values using existing = redis.hgetall(key), then after incrementing the line counts, we use redis.mapped_hmset(key, values) to update redis.

In our application this results in hundreds of gets, and hundreds of sets to redis in the middleware. I have a couple ideas and I'm curious what the devs here think:

  1. Fork this call, https://github.com/danmayer/coverband/blob/master/lib/coverband/middleware.rb#L14 . The web request doesn't need to wait for anything from this report generation, so why not let it run in another process?

  2. Pipeline the calls instead of doing things one file at a time. In our case, that would mean 2 redis commands instead of ~300 on some requests.

ie, here: https://github.com/danmayer/coverband/blob/master/lib/coverband/adapters/redis_store.rb#L26 , we iterate through every file in the report, and for each we get the existing values then increment and update here: https://github.com/danmayer/coverband/blob/master/lib/coverband/adapters/redis_store.rb#L53 . It looks like redis supports pipelining, https://redis.io/topics/pipelining, which would allow all these requests to be grouped together. I've tried this and it seems to work:

def save_report(report)
  store_array(base_key, report.keys)

  keys = report.keys.map {|file| "#{base_key}.#{file}"}

  existing_records = redis.pipelined do
    keys.each do |key|
      redis.hgetall(key)
    end
  end

  combined_report = {}
  keys.each_with_index do | key, i|
    combined_report[key] = {
      new: report.values[i],
      existing: existing_records[i]
    }
  end

  redis.pipelined do
    combined_report.each do |file, values|
      existing = values[:existing]
      new = values[:new]
      unless values.empty?
        # in redis all keys are strings
        new_string_values = Hash[new.map { |k, val| [k.to_s, val] }]
        new_string_values.merge!(existing) { |_k, old_v, new_v| old_v.to_i + new_v.to_i }
        redis.mapped_hmset(file, new_string_values)
        redis.expire(file, @ttl) if @ttl
      end
    end
  end

  # report.each do |file, lines|
  #   store_map("#{base_key}.#{file}", lines)
  # end
end

(extra credit); not sure if redis support this, but perhaps the update could be done with one command? Could you tell redis 'here are all the keys (files) and their values (arrays of line# counts), and if you find existing values for those keys, increment by the count given.

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Sep 28, 2018

I have run into the same problem. For now I am running in production using a branch that supports caching using the MemoryCacheStore.

#119

Note that you need to enable the config.memory_caching attribute.

I also want to note that as of yesterday, we started writing to redis in a background thread that sits completely outside of the web request. After calling configure on coverband, we kick off the thread:

Thread.new do
  loop do
    ::Coverband::Collectors::Base.instance.save
    sleep 300
  end
end

Note this works only with the default configuration of the ruby coverage collector and not trace. With this enabled ruby is actually collecting coverage information all of the time and the reporting to redis can be done in a completely separate thread periodically. If you go with this approach, make sure to either set the config.percentage to 0 or remove the Coverband::Middleware.

Hope that makes sense.

@Kallin

This comment has been minimized.

Contributor

Kallin commented Oct 1, 2018

thx @kbaum .
Opened this PR to support forking reporting: #125

@Kallin

This comment has been minimized.

Contributor

Kallin commented Oct 1, 2018

also added this PR for batching redis operations: #126

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 6, 2018

What percentage are you reporting @Kallin I would check that you have a low report rate as one of the first things...

After that I think moving away from a get per file and post per file to a single get and post the full hash might be a better way to make this change, we can avoid a good amount of complexity. I believe this issue popped up because coverage lists all files in every report and previously the tracepoint collector only listed any files with changes which is what @kbaum memory cache adds back.

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 7, 2018

See the comments here @Kallin as I think just enabling the memory cache might be good enough to significantly improve your performance.

#102 (comment)

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Oct 7, 2018

Memory cache would definitely improve performance. One problem is it takes away the ability to count how many times a line was hit. Another problem is that it assumed that full coverage was being sent as opposed to doubles. The current implementation filters out file coverage that matches the previous result for that file. This mostly works (i think) but takes away line count reliability and does not optimally cache. Once i realized the faults, i created this PR:

#119

This PR knowingly removes the line counting ability but optimally caches and is the one my company is currently using. As @danmayer points out though, some folks are using the line counting so this is probably not good for a merge.

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 9, 2018

yeah I guess for that kind of change would either need to make optional or do it as part of a major version release... I think I will just keep pushing on the 3.0 branch opposed to trying to figure out a good caching or improved redis structure.

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 10, 2018

benchmarks now allow for local or remote redis...

local:

Warming up --------------------------------------
 store_redis_reports     1.000  i/100ms
 store_cache_reports     1.000  i/100ms
Calculating -------------------------------------
 store_redis_reports      0.676  (± 0.0%) i/s -     11.000  in  16.447342s
 store_cache_reports      5.519  (±18.1%) i/s -     77.000  in  15.045604s

Comparison:
 store_cache_reports:        5.5 i/s
 store_redis_reports:        0.7 i/s - 8.16x  slower

remote (against heroku redis)

REDIS_TEST_URL=redis://xyz.com:35439 rake benchmarks:cache_reporting
runs benchmarks on reporting large sets of files to redis using memory cache
Warming up --------------------------------------
 store_redis_reports     1.000  i/100ms
 store_cache_reports     1.000  i/100ms
Calculating -------------------------------------
 store_redis_reports      0.348  (± 0.0%) i/s -      6.000  in  17.228485s
 store_cache_reports      1.908  (± 0.0%) i/s -     27.000  in  15.155553s

Comparison:
 store_cache_reports:        1.9 i/s
 store_redis_reports:        0.3 i/s - 5.48x  slower

This shows the redis trip as a significantly larger factor if you aren't on local redis... Now heroku app to heroku redis should be much faster than my local machine to heroku redis... but I think it still can help give a better idea of the perf wins from various redis / memory cache changes.

Note without the cache local redis is 0.676 iterations per second and with remote it slows to 0.348

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 10, 2018

OK, I released Coverband version 2.0.3.alpha... I added an AB_test benchmark against coverband demo, and ran that benchmark against it deployed on heroku... This shows significant improvement on perf. @Kallin / @kbaum feel free to give the alpha a shot. I am going to push it to our staging and then production servers and long as everything looks good I will make the non alpha release tomorrow.

Against https://coverband-demo.herokuapp.com/

2.0.2

Time per request:       109.566 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
  50%    504
  66%    550
  75%    613
  80%    629
  90%    716
  95%    856
  98%   1004
  99%   1047
 100%   1141 (longest request)

2.0.3

Time per request:       63.451 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
  50%    306
  66%    321
  75%    338
  80%    344
  90%    368
  95%    388
  98%    426
  99%    448
 100%    473 (longest request)

2.0.3 with memory cache on, note I think my benchmark doesn't really exercise this feature

Time per request:       71.091 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
  50%    328
  66%    349
  75%    373
  80%    393
  90%    460
  95%    515
  98%    710
  99%    725
 100%    825 (longest request)
@kbaum

This comment has been minimized.

Collaborator

kbaum commented Oct 10, 2018

Interesting benchmarks. I think we would see a much bigger delta for large rails apps with thousands of files. One thought is that we should probably just remove memory cache since it might cause inaccurate data and at this point gives limited to no performance win since we have reduced the number of calls to redis.

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 10, 2018

OK I will remove memory cache in my 3.0 branch @kbaum. Yeah this Rails app is very small. I will update the benchmark script and move it from the demo repo to the coverband repo, and try to make it easy for anyone to run the benchmark against any coverband app. Trying to think if I could simplify the collection of data, like grabbing coverband version, the number of files in Coverage.peek_results, number in Redis... or something like that... At the very least, I can get the basics down and it would require some manual work by an operator to note specifics related to their configuration.

@Kallin

This comment has been minimized.

Contributor

Kallin commented Oct 10, 2018

In my local benchmarking, using a fairly large legacy rails app, I found that the memory cache was still a big improvement, even with the redis pipelining. Not sure that we would be able to run comfortably in production without it, unless one of the background threading/forking solutions was in as a replacement.

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Oct 10, 2018

@Kallin - try master without memory caching as i think this change will be a big improvement:

#127

@Kallin

This comment has been minimized.

Contributor

Kallin commented Oct 10, 2018

@kbaum , thanks I'll give it a test

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 10, 2018

Yeah I think while memory caching breaks the exact line counts (which perhaps should be optional) on the 2.x branch it is a significant improvement for larger apps. That is showed by microbenchmarks, I just don't have a full benchmark against a large app that shows it. I think along with other changes in 3.0 it will not be needed at all, but I will ensure we have some solid benchmarks ensuring that 3.0 significantly reduces the overhead for large file sets.

@Kallin / @kbaum the change in #127 is on the 2.0.3.alpha release, so no need for master if you want to test that pre-release gem. I think that memory cache, in current code case, could still help apps with large file sets

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 11, 2018

OK the 2.0.3 release is live so I am going to focus back in on 3.0 and see where we end up on thread/forking reporting

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 11, 2018

OK wow @Kallin @kbaum

I just got the very first version of 3.0 to the point I could run it in coverband_demo and get some of the test / benchmarks working... I guess we will see how much background thread or forked process matters, just by redoing the storage mechanism for Redis to avoid the n+1 issue, the store large Redis benchmarks appear to be nearly 60X faster, that is without some of the other improvements that should be able to be added. Current estimate means it would add less than 25ms per reporting request (still controlled with a reporting frequency) even on the old middleware style reporting.

2.0.3

rake benchmarks:redis_reporting
runs benchmarks on reporting large sets of files to redis
Warming up --------------------------------------
       store_reports     1.000  i/100ms
Calculating -------------------------------------
       store_reports      0.814  (± 0.0%) i/s -     13.000  in  15.982911s

3.0.0 branch

rake benchmarks:redis_reporting
runs benchmarks on reporting large sets of files to redis
Warming up --------------------------------------
       store_reports     4.000  i/100ms
Calculating -------------------------------------
       store_reports     47.111  (± 4.2%) i/s -    708.000  in  15.066259s

Lots of cleanup and various things are still broken, but I believe the change will be very significant.

@Kallin

This comment has been minimized.

Contributor

Kallin commented Oct 11, 2018

That's amazing and looks really promising! Could you link to the change to the redis storage that you think is responsible for such a boost? I'm curious to see it.

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 12, 2018

sure @Kallin the branch is here, it is still very messy and a work in progress, I wanted to prove some things out so this isn't what I would call standard or clean refactoring ;)

https://github.com/danmayer/coverband/tree/feature/coverband_3

The specific Redis change is here, but it doesn't or wouldn't work without some of the related changes to the collector and the reporting code. The big change is that there is no longer good reason to work file by file which made more sense with tracepoint "sparse array" format and having a few files. Now with the way Coverage is formatted never actually being a sparse array but having a hash that represents every line... and Coverage having all Ruby file coverage including Rails / Gems, and such...It has much more overhead. One option would be to do a lot more filtering, but I figured it is only 200-400k, let's just do this as 3 steps

  • get stored coverage
  • merge current and stored coverage
  • store the merged coverage

new Redis adapter: https://github.com/danmayer/coverband/blob/feature/coverband_3/lib/coverband/adapters/redis_store.rb

This results in a much simpler design and has allowed me to remove a bunch of conversions back in forth from line hit arrays [0,nil,1,2] and hashes {'0' => 0, '2' => 1, '3' => 2} which was the format things came from tracepoint due to it being a "sparse array"...

Let me know how it is looking @Kallin / @kbaum I will need some more time to sort out some known issues like full path vs relative path, etc.

@danmayer

This comment has been minimized.

Owner

danmayer commented Oct 15, 2018

OK, all tests passing and full functional support on the 3.0 branch @Kallin @kbaum

#129

A bit more cleanup and work to do, but could be getting close enough for an alpha release soon.

@danmayer danmayer changed the title from redis performance to Background Coverage Data Reporting Nov 3, 2018

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 3, 2018

changed the title from "redis performance"

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 3, 2018

As we are about to get Coverband 3.0.0 out, we think we can possibly add this as an option in the 3.0.x line. I will look at implementing the threaded option sometime soon and review the related suggested PRs.

@danmayer danmayer added the enhancement label Nov 9, 2018

@danmayer danmayer assigned danmayer and kbaum and unassigned danmayer Nov 9, 2018

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 9, 2018

cool assigned over to you @kbaum thanks for taking a look on this...

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 11, 2018

Realizing this can be pretty tricky to automatically do within Puma or Unicorn as the thread has to run within the forked workers.

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 11, 2018

hmmm good point... Especially breaks the idea of just configuring and starting a different integration...

What if the integration always required the middleware, but based on config settings all the middleware does is check if the currently accessable thread has a coverband background thread started... if it doesn't it starts it... after the the middleware basically no-ops.

That or we would need to figure out a fairly generic way to hook into all the various servers after_fork methods.

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 11, 2018

Figured out how to hook into puma:
470b643#diff-a637af9972be5b34783d41387d25e0bfL45
Unicorn seems more difficult.

I like your idea about the middleware starting the thread but I'm blanking on how I would check to see if there is a coverband background thread started.

I also started looking into the newrelic/rpm library to see how they start their agent in the worker. There's some comments and logic referencing delaying the start of the agent until the workers fork, but ultimately couldn't figure out how they do it.

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 11, 2018

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 11, 2018

I really like that idea. I think we would use some kind of global process state as opposed to a thread local as there should only be one background thread per process.

@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 11, 2018

Looks much better and should work with unicorn and puma. So far tested with puma.

https://github.com/danmayer/coverband/compare/background_thread_reporting

I think we can follow this approach if we detect puma, unicorn, or any other forking framework within the current stacktrace. Otherwise we can just start the background thread within Coverband.start. Newrelic and sqreen seemed to be following this approach. I like it because we would still work out of box with other libraries like sidekiq, delayed_job, thin, etc. Basically, we either start the thread immediately or wait until the first request.

@danmayer

This comment has been minimized.

Owner

danmayer commented Nov 11, 2018

Nice looks good so far @kbaum thanks

Some thoughts:

  • really like the first thread running kicks off at the process level
  • good call on process level var vs thread local
  • especially while first rolling out before we have railties integration I would make it a configurable option
    * I do think with railties we can detect forking servers, but for now this would let folks configure and try it out
  • would still like to see it in /integrations/ vs root Coveband file directly
  • we should figure out a way to test, but it does seem a bit tricky to test, any ideas?
@kbaum

This comment has been minimized.

Collaborator

kbaum commented Nov 11, 2018

Agreed on all points. Will move to separate class under integrations module. RE: testing, was going to give that a shot now first starting with unit testing. Might be tough to do an integration test with threads and infinite loops involved, but we might be able to come up with something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment