Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request limit breaker does not calculate estimated_size_in_bytes correctly, which causes all the aggregations failed #26943

Closed
xzer opened this issue Oct 10, 2017 · 11 comments
Assignees
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload feedback_needed

Comments

@xzer
Copy link

xzer commented Oct 10, 2017

Describe the feature:

Elasticsearch version (bin/elasticsearch --version): 5.4.3

Plugins installed: []

JVM version (java -version): 1.8.0_131

OS version (uname -a if on a Unix-like system): Debian 3.16.43-2

Description of the problem including expected versus actual behavior:

we have 6 nodes cluster and we noticed that the estimated_size_in_bytes of request breaker keep increasing, which causes all the aggregation failed when the estimated_size_in_bytes reach at the configured limit.

we have 31g heap size and 24g old gen congiured. we also configured the request limit as 2g, after we tripped by the limit breaker, if we increase the limit to 4g dynamically, of course, we got our queries revived. But, after we increased our limit to 4g, we also noticed that the increasing speed of estimated_size_in_bytes became extremely slowly, and almost keep at the original 2G level.

Another related information is that we also configured our request cache to 2g. At first, we suspected that the cached result does not count down the estimated size in request because the resource may not be release, but after we clear the request cache, the estimated size of request limit breaker remains without decreasing. after we clear cache, we increased limit dynamically, and then, as described above, the increasing of estimated_size_in_bytes became slowly.

(additinal info: after 3 hours test, the estimated_size_in_bytes is increasing again, now it has been over 2.5g)

the following graph suggest how the counter is increasing and the cache increasing, query cache and request cache was not collected initially.

image

We found an old issue here with almost same symptom, according the description in the issue, there may be OOM, but we did not get OOM by check our log files.

#14065

We also did some source digging, by step by step debugging, we confirmed that at the following location:

https://github.com/elastic/elasticsearch/blob/v5.4.3/core/src/main/java/org/elasticsearch/common/breaker/ChildMemoryCircuitBreaker.java#L155

the currentUsed is zero at the first time we stop at the breakpoint after we started the first query, and then, after the first query finished, we perform the second one, we noticed that the currentUsed is not zero at the first time stopping after we performed the second query.

Aslo, even we highly suspect our discovery and believe that we must missing something in the source, we noticed that the request limit breaker is retrieved from a global registry and it seems that it does not act per-request which is described in the document.

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

  1. keep running queries with aggregation
  2. keep check [curl -XGET 'http://127.0.0.1:9200/_nodes/stats/breaker?pretty']
  3. keeping increasing estimated_size_in_bytes can be observed.
@xzer xzer changed the title request limit breaker does not calculate estimated_size_in_bytes correctly request limit breaker does not calculate estimated_size_in_bytes correctly, which causes all the aggregations failed Oct 10, 2017
@dakrone
Copy link
Member

dakrone commented Oct 14, 2017

@xzer I tried to reproduce this, but without any luck. Can you elaborate on perhaps the type of data (an example document and mapping would be great) and query/aggregations that you are running?

@dakrone dakrone added :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload feedback_needed labels Oct 17, 2017
@dakrone dakrone self-assigned this Oct 17, 2017
@davinliuda
Copy link

I met the same issue, please help.

@dakrone
Copy link
Member

dakrone commented Oct 30, 2017

@liud I'm still hoping for a better reproduction of this, do you have a working reproduction?

@davinliuda
Copy link

davinliuda commented Oct 31, 2017

@dakrone

Describe the feature:
elasticsearch-5.6.1
elasticsearch plugins : readonlyrest
kibana-5.6.1
linux x86_64 2.6.32.43
jvm 1.8.0_144

There were no other query requests when I started kibana, and found estimated_size only increases
I crawlled the http packet sent from kibana

HEAD / HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 362

GET /_nodes?filter_path=nodes..version%2Cnodes..http.publish_address%2Cnodes.*.ip HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 121

{"nodes":{"zfJglW84TbmHwld2BeJ0dw":{"ip":"10.133.8.72","version":"5.6.1","http":{"publish_address":"10.133.8.72:8200"}}}}GET /_nodes/_local?filter_path=nodes.*.settings.tribe HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 2

{}POST /_mget HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
content-type: application/json
Host: 10.133.8.72:8200
Content-Length: 62
Connection: keep-alive

{"docs":[{"_index":".kibana","_type":"config","_id":"5.6.1"}]}HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 155

{"docs":[{"_index":".kibana","_type":"config","_id":"5.6.1","_version":2,"found":true,"_source":{"buildNum":15533,"defaultIndex":"AV9SX9txRDfVRTQGun0h"}}]}GET /_cluster/health/.kibana?timeout=5s HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 397

{"cluster_name":"es5-qdcc-test-cluster","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":1,"active_shards":1,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":1,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}GET /.kibana/_mappings HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 2394

{".kibana":{"mappings":{"url":{"dynamic":"strict","properties":{"accessCount":{"type":"long"},"accessDate":{"type":"date"},"createDate":{"type":"date"},"url":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":2048}}}}},"timelion-sheet":{"dynamic":"strict","properties":{"description":{"type":"text"},"hits":{"type":"integer"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"timelion_chart_height":{"type":"integer"},"timelion_columns":{"type":"integer"},"timelion_interval":{"type":"keyword"},"timelion_other_interval":{"type":"keyword"},"timelion_rows":{"type":"integer"},"timelion_sheet":{"type":"text"},"title":{"type":"text"},"version":{"type":"integer"}}},"default":{"dynamic":"strict"},"config":{"dynamic":"true","properties":{"buildNum":{"type":"keyword"},"defaultIndex":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"search":{"dynamic":"strict","properties":{"columns":{"type":"keyword"},"description":{"type":"text"},"hits":{"type":"integer"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"sort":{"type":"keyword"},"title":{"type":"text"},"version":{"type":"integer"}}},"visualization":{"dynamic":"strict","properties":{"description":{"type":"text"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"savedSearchId":{"type":"keyword"},"title":{"type":"text"},"uiStateJSON":{"type":"text"},"version":{"type":"integer"},"visState":{"type":"text"}}},"dashboard":{"dynamic":"strict","properties":{"description":{"type":"text"},"hits":{"type":"integer"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"optionsJSON":{"type":"text"},"panelsJSON":{"type":"text"},"refreshInterval":{"properties":{"display":{"type":"keyword"},"pause":{"type":"boolean"},"section":{"type":"integer"},"value":{"type":"integer"}}},"timeFrom":{"type":"keyword"},"timeRestore":{"type":"boolean"},"timeTo":{"type":"keyword"},"title":{"type":"text"},"uiStateJSON":{"type":"text"},"version":{"type":"integer"}}},"index-pattern":{"dynamic":"strict","properties":{"fieldFormatMap":{"type":"text"},"fields":{"type":"text"},"intervalName":{"type":"keyword"},"notExpandable":{"type":"boolean"},"sourceFilters":{"type":"text"},"timeFieldName":{"type":"keyword"},"title":{"type":"text"}}},"server":{"dynamic":"strict","properties":{"uuid":{"type":"keyword"}}}}}}POST /.kibana/_search?size=1000&from=0 HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
content-type: application/json
Host: 10.133.8.72:8200
Content-Length: 277
Connection: keep-alive

{"version":true,"query":{"bool":{"must":[{"match_all":{}}],"filter":[{"bool":{"should":[{"term":{"_type":"config"}},{"term":{"type":"config"}}]}}]}},"sort":[{"buildNum":{"order":"desc","unmapped_type":"keyword"}},{"config.buildNum":{"order":"desc","unmapped_type":"keyword"}}]}HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 301

{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":1,"max_score":null,"hits":[{"_index":".kibana","_type":"config","_id":"5.6.1","_version":2,"_score":null,"_source":{"buildNum":15533,"defaultIndex":"AV9SX9txRDfVRTQGun0h"},"sort":["15533",null]}]}}HEAD / HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 362

GET /_nodes?filter_path=nodes..version%2Cnodes..http.publish_address%2Cnodes.*.ip HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 121

{"nodes":{"zfJglW84TbmHwld2BeJ0dw":{"ip":"10.133.8.72","version":"5.6.1","http":{"publish_address":"10.133.8.72:8200"}}}}GET /_nodes/_local?filter_path=nodes.*.settings.tribe HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 2

{}POST /_mget HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
content-type: application/json
Host: 10.133.8.72:8200
Content-Length: 62
Connection: keep-alive

{"docs":[{"_index":".kibana","_type":"config","_id":"5.6.1"}]}HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 155

{"docs":[{"_index":".kibana","_type":"config","_id":"5.6.1","_version":2,"found":true,"_source":{"buildNum":15533,"defaultIndex":"AV9SX9txRDfVRTQGun0h"}}]}GET /_cluster/health/.kibana?timeout=5s HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 397

{"cluster_name":"es5-qdcc-test-cluster","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":1,"active_shards":1,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":1,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}GET /.kibana/_mappings HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
Host: 10.133.8.72:8200
Content-Length: 0
Connection: keep-alive

HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 2394

{".kibana":{"mappings":{"url":{"dynamic":"strict","properties":{"accessCount":{"type":"long"},"accessDate":{"type":"date"},"createDate":{"type":"date"},"url":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":2048}}}}},"timelion-sheet":{"dynamic":"strict","properties":{"description":{"type":"text"},"hits":{"type":"integer"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"timelion_chart_height":{"type":"integer"},"timelion_columns":{"type":"integer"},"timelion_interval":{"type":"keyword"},"timelion_other_interval":{"type":"keyword"},"timelion_rows":{"type":"integer"},"timelion_sheet":{"type":"text"},"title":{"type":"text"},"version":{"type":"integer"}}},"default":{"dynamic":"strict"},"config":{"dynamic":"true","properties":{"buildNum":{"type":"keyword"},"defaultIndex":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"search":{"dynamic":"strict","properties":{"columns":{"type":"keyword"},"description":{"type":"text"},"hits":{"type":"integer"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"sort":{"type":"keyword"},"title":{"type":"text"},"version":{"type":"integer"}}},"visualization":{"dynamic":"strict","properties":{"description":{"type":"text"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"savedSearchId":{"type":"keyword"},"title":{"type":"text"},"uiStateJSON":{"type":"text"},"version":{"type":"integer"},"visState":{"type":"text"}}},"dashboard":{"dynamic":"strict","properties":{"description":{"type":"text"},"hits":{"type":"integer"},"kibanaSavedObjectMeta":{"properties":{"searchSourceJSON":{"type":"text"}}},"optionsJSON":{"type":"text"},"panelsJSON":{"type":"text"},"refreshInterval":{"properties":{"display":{"type":"keyword"},"pause":{"type":"boolean"},"section":{"type":"integer"},"value":{"type":"integer"}}},"timeFrom":{"type":"keyword"},"timeRestore":{"type":"boolean"},"timeTo":{"type":"keyword"},"title":{"type":"text"},"uiStateJSON":{"type":"text"},"version":{"type":"integer"}}},"index-pattern":{"dynamic":"strict","properties":{"fieldFormatMap":{"type":"text"},"fields":{"type":"text"},"intervalName":{"type":"keyword"},"notExpandable":{"type":"boolean"},"sourceFilters":{"type":"text"},"timeFieldName":{"type":"keyword"},"title":{"type":"text"}}},"server":{"dynamic":"strict","properties":{"uuid":{"type":"keyword"}}}}}}POST /.kibana/_search?size=1000&from=0 HTTP/1.1
Authorization: Basic a2liYW5hOnRlc3Rfa2liYW5h
content-type: application/json
Host: 10.133.8.72:8200
Content-Length: 277
Connection: keep-alive

{"version":true,"query":{"bool":{"must":[{"match_all":{}}],"filter":[{"bool":{"should":[{"term":{"_type":"config"}},{"term":{"type":"config"}}]}}]}},"sort":[{"buildNum":{"order":"desc","unmapped_type":"keyword"}},{"config.buildNum":{"order":"desc","unmapped_type":"keyword"}}]}HTTP/1.1 200 OK
x-ror-kibana_access: rw
x-ror-available-groups: kibana_base
X-RR-User: kibana
content-type: application/json; charset=UTF-8
content-length: 301

@davinliuda
Copy link

ES log

[2017-10-30T19:54:47,167][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:2016606753-1505744276#2982213, TYP:MainRequest, CGR:N/A, USR:kibana, BRS:false, ACT:cluster:monitor/
main, OA:10.133.8.72, IDX:<N/A>, MET:HEAD, PTH:/, CNT:<N/A>, HDR:Authorization,Connection,Content-Length,Host, HIS:[admin->[groups->false]], [spark_base->[groups->false]], [read_svr->[groups->false]], [write_svr->[groups->
false]], [kibana_base->[kibana_access->true, indices->true, auth_key_sha1->true]] } ^[[0m
[2017-10-30T19:54:47,168][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:986856368-1194762014#2982214, TYP:NodesInfoRequest, CGR:N/A, USR:kibana, BRS:false, ACT:cluster:moni
tor/nodes/info, OA:10.133.8.72, IDX:<N/A>, MET:GET, PTH:/_nodes?filter_path=nodes..version%2Cnodes..http.publish_address%2Cnodes..ip, CNT:<N/A>, HDR:Authorization,Connection,Content-Length,Host, HIS:[admin->[groups->fal
se]], [read_svr->[groups->false]], [spark_base->[groups->false]], [kibana_base->[kibana_access->true, indices->true, auth_key_sha1->true]], [write_svr->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,170][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:766038084-1060559657#2982216, TYP:NodesInfoRequest, CGR:N/A, USR:kibana, BRS:false, ACT:cluster:moni
tor/nodes/info, OA:10.133.8.72, IDX:<N/A>, MET:GET, PTH:/_nodes/_local?filter_path=nodes.
.settings.tribe, CNT:<N/A>, HDR:Authorization,Connection,Content-Length,Host, HIS:[kibana_base->[kibana_access->true, indices->true,
auth_key_sha1->true]], [write_svr->[groups->false]], [admin->[groups->false]], [spark_base->[groups->false]], [read_svr->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,172][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:1252873296-58776014#2982218, TYP:MultiGetRequest, CGR:N/A, USR:kibana, BRS:false, ACT:indices:data/r
ead/mget, OA:10.133.8.72, IDX:.kibana, MET:POST, PTH:/_mget, CNT:<OMITTED, LENGTH=62>, HDR:Authorization,Connection,Content-Length,content-type,Host, HIS:[admin->[groups->false]], [spark_base->[groups->false]], [kibana_bas
e->[kibana_access->true, indices->true, auth_key_sha1->true]], [read_svr->[groups->false]], [write_svr->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,172][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:1252873296-1685101926#2982219, TYP:MultiGetShardRequest, CGR:N/A, USR:kibana, BRS:false, ACT:indices
:data/read/mget[shard], OA:10.133.8.72, IDX:.kibana, MET:POST, PTH:/_mget, CNT:<OMITTED, LENGTH=62>, HDR:Authorization,Connection,Content-Length,content-type,Host, HIS:[write_svr->[groups->false]], [admin->[groups->false]]
, [read_svr->[groups->false]], [kibana_base->[kibana_access->true, indices->true, auth_key_sha1->true]], [spark_base->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,174][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:162489314-75288848#2982221, TYP:ClusterHealthRequest, CGR:N/A, USR:kibana, BRS:false, ACT:cluster:mo
nitor/health, OA:10.133.8.72, IDX:.kibana, MET:GET, PTH:/_cluster/health/.kibana?timeout=5s, CNT:<N/A>, HDR:Authorization,Connection,Content-Length,Host, HIS:[spark_base->[groups->false]], [kibana_base->[kibana_access->tru
e, indices->true, auth_key_sha1->true]], [admin->[groups->false]], [write_svr->[groups->false]], [read_svr->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,175][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:1280699424-1859389884#2982222, TYP:GetIndexRequest, CGR:N/A, USR:kibana, BRS:false, ACT:indices:admi
n/get, OA:10.133.8.72, IDX:.kibana, MET:GET, PTH:/.kibana/_mappings, CNT:<N/A>, HDR:Authorization,Connection,Content-Length,Host, HIS:[write_svr->[groups->false]], [admin->[groups->false]], [read_svr->[groups->false]], [ki
bana_base->[kibana_access->true, indices->true, auth_key_sha1->true]], [spark_base->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,177][INFO ][o.e.p.r.a.ACL ] ^[[36mALLOWED by '{ block=kibana_base, match=true }' req={ ID:425378103--189071578#2982223, TYP:SearchRequest, CGR:N/A, USR:kibana, BRS:false, ACT:indices:data/re
ad/search, OA:10.133.8.72, IDX:.kibana, MET:POST, PTH:/.kibana/_search?size=1000&from=0, CNT:<OMITTED, LENGTH=277>, HDR:Authorization,Connection,Content-Length,content-type,Host, HIS:[admin->[groups->false]], [kibana_base-

[kibana_access->true, indices->true, auth_key_sha1->true]], [read_svr->[groups->false]], [spark_base->[groups->false]], [write_svr->[groups->false]] } ^[[0m
[2017-10-30T19:54:47,179][INFO ][o.e.p.r.a.ACL ] ^[[35mFORBIDDEN by default req={ ID:1919140401-1104816419#2982225, TYP:ClusterStateRequest, CGR:N/A, USR:kibana(?), BRS:false, ACT:cluster:monitor/state, OA:10.13
3.8.72, IDX:, MET:GET, PTH:/_cluster/settings?include_defaults=true&filter_path=**.script.engine.*.inline, CNT:<N/A>, HDR:Authorization,Connection,Content-Length,Host, HIS:[kibana_base->[kibana_access->false, auth_key_sha1
->true]], [head->[auth_key_sha1->false]], [admin->[groups->false]], [write_svr->[groups->false]], [spark_base->[groups->false]], [read_svr->[groups->false]] } ^[[0m

@xzer
Copy link
Author

xzer commented Nov 20, 2017

@dakrone I am sorry for reply so late, because I was being concentrated on our production release in last month.

At first, I have to apologize that we finally found that the breaker count leak is caused by a plugin which is created by other team for some special aggregation process.

And still, I have question for the request limit break, as I described in my initial description, according tot he document, the request limit breaker should be counted per request, but we found it is calculated based on a global counter, which is obviously not per request. So is the document wrong or the implementation size?

@dakrone
Copy link
Member

dakrone commented Nov 27, 2017

@xzer the request breaker is a global breaker for all requests in the system, so the bytes are counted for a particular request, but the limit is global.

Going to close this now since it was caused by a plugin.

@dakrone dakrone closed this as completed Nov 27, 2017
@xzer
Copy link
Author

xzer commented Nov 28, 2017

@dakrone according to the official document as following, we understood that it means the breaker is per-request, not in global. So I believe the document need to be modified to clarify how the breaker is working.

Request circuit breakeredit
The request circuit breaker allows Elasticsearch to prevent per-request data structures (for example, memory used for calculating aggregations during a request) from exceeding a certain amount of memory.

indices.breaker.request.limit
  Limit for request breaker, defaults to 60% of JVM heap
indices.breaker.request.overhead
  A constant that all request estimations are multiplied with to determine a final estimation. Defaults to 1

In flight requests circuit breakeredit
The in flight requests circuit breaker allows Elasticsearch to limit the memory usage of all currently active incoming requests on transport or HTTP level from exceeding a certain amount of memory on a node. The memory usage is based on the content length of the request itself.

network.breaker.inflight_requests.limit
  Limit for in flight requests breaker, defaults to 100% of JVM heap. This means that it is bound by the limit configured for the parent circuit breaker.
network.breaker.inflight_requests.overhead
  A constant that all in flight requests estimations are multiplied with to determine a final estimation. Defaults to 1

And also, what's different between the "indices.breaker.request.limit" and "network.breaker.inflight_requests.limit"? According to the source, both are same counted globally.

@dakrone
Copy link
Member

dakrone commented Nov 28, 2017

@xzer for the difference between the request and inflight_request breakers, I'm adding a bit more info about them here: https://github.com/elastic/elasticsearch/pull/27116/files#diff-c35a409b177f40a4be365a147eefa2f9R45

For what I meant for "per-request" versus "global", I mean the 60% limit is global, and each request increments the amount, so let's say you have 3 concurrent requests, using 10%, 8%, and 5%, so globally it's 23% usage, even though the actual "use" is per-request (meaning it's released when the request is finished).

@xzer
Copy link
Author

xzer commented Nov 28, 2017

@dakrone thanks for your information and now I understand the difference between them.

But I still wish the document to be modified to explain what is per-request and what is global as you explained here. The current description is really misleading readers.

@dakrone
Copy link
Member

dakrone commented Nov 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload feedback_needed
Projects
None yet
Development

No branches or pull requests

3 participants