New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElasticSearch overload when handle many connections #10291
Comments
When the ES service started, I try to run, curl 'localhost:9200/_cluster/stats?network=true&transport=true&http=true&thread_pool=true&indices=false&pretty=true' And I get the result: why open_file_descriptors are very large :( |
When I switch to version 0.9.x again, here is command "mpstat -P ALL 1" response output: 09:27:39 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle who think that problem is version 1.5 take many CPU? |
Hiya That network error occurs when the client network connection disconnects, perhaps because you set a request timeout? It looks like your Elasticsearch is under severe memory pressure. You've given it less than one GB of heap, and it needs more to cope with your load. Also, you say you had to upgrade ES because of security, but then you have reenabled dynamic scripting, so you are allowing anybody with access to your box to run whatever scripts they want, defeating the point of the upgrade. The number of open file descriptors you have is not large at all. Elasticsearch can easily use many more file handles than you are currently using. Also check (look at GET /_nodes) to ensure that mlockall is being applied correctly. |
Also you can check what is using all the CPU with the hot threads API, but I think you'll find that it is mostly garbage collection. |
Hi clintongormley what is the best value of HEAP size should I assign, my server RAM is 4GB (maybe 8GB) currently, the mlockall value is true, is that ok? And what is the hots API. Thank you very much. |
Hi guy, Here is some variables in init script es_heap_size: 2G I try to add ES_HEAP_SIZE to 2G, and ElasticSearch server run normally for 3 hours, and after that, ES will not response request anymore (like the issue I created). Here is the lastest log: [root@ip-10-0-0-230 quydo]# tailf /var/log/elasticsearch/elasticsearch.log
What should I do to solve that problem |
It looks like you need more memory. Your heap is full. |
@clintongormley if the heap is full shouldn't it use disk to store? I thought so if it doesn't have index.store.type set to memory. |
@splashx that would simply kill performance |
Hi all
I have installed ElasticSearch (ES) version 1.5 in our server (only one server)
I set ulimit ~ 64K (max open file).
Our server: 4 virtual CPUs, 7.5 GB RAM (Amazon EC2)
after I start ES, many users can search with reponse respectively. But after 30 seconds to 1 minute, our ES can not response anymore connect ffrom client (PHP library, just implement curl command)
I try to get PID of ES, and run command: ls /proc/ES_ID/fd | wc -l
the result is increase every second (~ 2 connect), and the CPU that ES process take about ~ 100%, RAM free about 60%
I try to view log, here is some logs:
[2015-03-27 07:57:33,512][DEBUG][http.netty ] [Impossible Man] Caught exception while handling client http traffic, closing connection [id: 0xbd4fa29e, /10.0.0.166:43963 :> /10.0.0.230:9200]
java.nio.channels.ClosedChannelException
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:725)
at org.elasticsearch.common.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
at org.elasticsearch.common.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:784)
at org.elasticsearch.http.netty.pipelining.HttpPipeliningHandler.handleDownstream(HttpPipeliningHandler.java:87)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(NettyHttpChannel.java:199)
at org.elasticsearch.rest.action.support.RestResponseListener.processResponse(RestResponseListener.java:43)
at org.elasticsearch.rest.action.support.RestActionListener.onResponse(RestActionListener.java:49)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2.doRun(TransportSearchQueryThenFetchAction.java:149)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-03-27 07:57:38,286][DEBUG][monitor.jvm ] [Impossible Man] [gc][old][207][165] duration [2.7s], collections [1]/[2.8s], total [2.7s]/[5.1m], memory [811.9mb]->[836mb]/[990.7mb], all_pools {[young] [120.7mb]->[144.9mb]/[266.2mb]}{[survivor] [0b]->[0b]/[33.2mb]}{[old] [691.2mb]->[691.2mb]/[691.2mb]}
Here is some configurations I added:
script.groovy.sandbox.enabled: true
bootstrap.mlockall: true
threadpool.index.type: fixed
threadpool.index.size: 4
threadpool.index.queue_size: 400
threadpool.search.queue_size: 1000
threadpool.search.type: cached
threadpool.bulk.type: fixed
threadpool.bulk.size: 4 # availableProcessors
threadpool.bulk.queue_size: 1000
And the remaining configurations are default.
With those connections (many connections), when I use version 0.9x, everything still OK - work normally. I must upgrade because critical security of ES :(
Could you help me solve that problem :( :(((((((((((((((((((((
The text was updated successfully, but these errors were encountered: