Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rrrspec-master cpu 100% #63

Open
jrluis opened this issue Aug 22, 2016 · 4 comments
Open

rrrspec-master cpu 100% #63

jrluis opened this issue Aug 22, 2016 · 4 comments

Comments

@jrluis
Copy link

jrluis commented Aug 22, 2016

The setup:

  • rrrspec-master v0.4.3
  • 2 kinds of workers
    • 10 rspec workers on google cloud (runs 436 specs with 5958 examples)
    • 30 cucumber workers on google cloud (runs 136 features with 10815 steps)
  • Four jobs to invoke the tests
    • Each job creates a taskset
    • Each job has is rsync folder
    • Pull request check jobs -> runs rspec and cucumber each as a taskset
    • Master to staging check jobs -> runs rspec and cucumber each as a taskset
  • The workers are booted up on demand when the test is submitted
  • rrrspec-master, redis and mysql run in the office as docker containers
  • The docker host is imac running docker for mac 1.12 stable

The system starts to run ok, but after some 5 mixed invocations of the pull request jobs and the master to staging jobs the rrrspec-master ruby process locks 100% making the system unusable.

After canceling all the running tasksets and waiting something like an hour, the cpu goes idle.

I suspect it's rspec-master saving worker data from redis. You can see the attached screenshot, there is one worker that get repeated in the ui, this is wrong it should show one line for each worker.

screen shot 2016-08-22 at 09 23 28

@draftcode
Copy link
Contributor

Since you say that you're running Cucumber workers, I suppose that you're running RRRSpec with your patch. It makes it difficult for me to guess the internal state. Also I cannot comment on what's going on in your system as I do not know the actual setup precisely. All I can do is giving you some advice.

  • It's better to separate issues. High CPU load and the worker duplication would be different issues. Even if CPU spins 100%, it won't calculate 1+1=3. It just becomes slow.
  • You need to look into Redis directly to see the state of workers to figure out why there are so many workers with same name. IIRC, one worker corresponds to one Redis key, so duplicated workers are weird.
  • rrrspec-master saves the result data (not the interim worker data) to MySQL. I admit that this is one of the design faults. This skew is coming from Redis. RRRSpec should not have used Redis for its RPC+Temporary datastore. I wanted to change the datastore, but Redis is tightly coupled to RRRSpec's RPC, and it's not easy to move off.

@jrluis
Copy link
Author

jrluis commented Aug 29, 2016

Problem cause:

As the workers are on google cloud and the master is in the office the master can assign tasksets much faster than worker can consume it.

As the worker always notifies the master before consuming a taskset this creates a loop resulting in the master assigning the same taskset N times. After the execution of several tasksets the system "deadlocks".

Having more workers causes the "deadlock" to appear faster. This is because they all notify the master to assign them more tasks.

This causes the cpu to rise to 100% in the master.

@jrluis
Copy link
Author

jrluis commented Aug 29, 2016

I've made a fix at #64

@draftcode
Copy link
Contributor

I still cannot get the situation. Can you write an interleaved scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants