Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AllRowsReader wants too many threads #411

Closed
ash2k opened this issue Sep 27, 2013 · 9 comments
Closed

AllRowsReader wants too many threads #411

ash2k opened this issue Sep 27, 2013 · 9 comments
Assignees

Comments

@ash2k
Copy link
Contributor

ash2k commented Sep 27, 2013

AllRowsReader submits tasks to read token ranges. Each task waits for ALL such tasks to start. So the number of threads to do that will be equals to the number of token ranges.

  1. If I understand it correctly, this will be A LOT for clusters with vnodes;
  2. Why do you want to do that at all?

If a bounded executor is provided then it would deadlock if it cannot create enough threads (as it happened in my case). This behaviour should be at least configurable.

@ghost ghost assigned opuneet Oct 9, 2013
@opuneet
Copy link
Contributor

opuneet commented Oct 9, 2013

@ash2k

That is a great catch, I didn't notice that. I'm not sure why all tasks are waiting on the cyclic barrier. They should just run as and when they are submitted to the threadpool and if it is a bounded threadpool, then they will run as threads / workers become available.

IMHO that we should just remove the CyclicBarrier altogether, but it's better to check with author just to make sure.
@elandau was there a specific reason for ensuring that all the tasks start at the same time?

@ash2k
Copy link
Contributor Author

ash2k commented Oct 22, 2013

Any updates on this issue? Now we are seeing one more problem, caused by this issue:

com.netflix.astyanax.recipes.reader.AllRowsReader :525 - Error process token/key range
com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=10.0.1.148(10.0.1.148):9160, latency=5156(5156), attempts=3]Timed out waiting for connection
        at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
        at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
        at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
        at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
        at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
        at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$2.execute(ThriftColumnFamilyQueryImpl.java:386)
        at com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:448)
        at com.netflix.astyanax.recipes.reader.AllRowsReader$1.call(AllRowsReader.java:420)
        at com.netflix.astyanax.util.BarrierCallableDecorator.call(BarrierCallableDecorator.java:34)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

Looks like all those threads exhausted the connection pool.

@opuneet
Copy link
Contributor

opuneet commented Oct 22, 2013

Hi @ash2k
Have you tried setting the concurrency level? I looked at the code again, and if that is set, then it will divide your entire range by the number of threads / concurrency level.

Hence if you have 256 token ranges, you should still be able to use this with 10 threads.
Can you try this out?

@ash2k
Copy link
Contributor Author

ash2k commented Oct 30, 2013

Hi @opuneet.

I haven't tried setting concurrency level but I will.
There could be situation when the code that uses AllRowsReader do not know how many threads are available in the provided executor. In such case setting concurrency level will not prevent deadlocks - other instances of AllRowsReader may have already occupied some threads and hence no more threads are available and no progress happens! What is needed is an option to switch off waiting on barrier. Or better remove that waiting completely.

@opuneet
Copy link
Contributor

opuneet commented Oct 31, 2013

I'm for removing the barrier completely. It makes no sense to me. I can remove the check quickly, but I'd feel more comfortable checking it in if there was a good unit test testing it which is where the time consuming effort is.

I'll try to get to this soon. In any case, you are more than welcome to submit a pull request :)

@opuneet
Copy link
Contributor

opuneet commented Nov 7, 2013

I've made the commit. Also did the release, but the artifact may take a while to show up in Maven.
In any case you could just build Astyanax directly for yourself (in the meantime)

@opuneet opuneet closed this as completed Nov 7, 2013
@ash2k
Copy link
Contributor Author

ash2k commented Nov 7, 2013

@opuneet Thank you very much. Sorry, I'm too busy right now and was unable to make a contribution.

@opuneet
Copy link
Contributor

opuneet commented Nov 7, 2013

No worries.

@tunkl
Copy link

tunkl commented Jan 27, 2014

Hi, I am using version 1.56.44 from maven, which should include your fix, but I still get error with TimeoutException. But If I set concurrency level it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants