Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One question & one problem #423

Closed
Agileaq opened this issue Apr 26, 2018 · 11 comments
Closed

One question & one problem #423

Agileaq opened this issue Apr 26, 2018 · 11 comments
Labels

Comments

@Agileaq
Copy link

Agileaq commented Apr 26, 2018

Dear artgon & zuul team,

Question :

I saw the code snippet in SampleService.java,
public Observable makeSlowRequest() {
return Observable.just("test").delay(500, TimeUnit.MILLISECONDS);
}
It runs on :
filter.applyAsync(inMesg)
.observeOn(Schedulers.from(getChannelHandlerContext(inMesg).executor()))
.doOnUnsubscribe(resumer::decrementConcurrency)
.subscribe(resumer);
In my local machine, I only have 8 worker threads and RxJava will use one of them & blocks the thread. if many connections come in, e.g. 10 http requests sent within 500milliseconds, it must have 2 requests waiting there until two of 8/10 finish.
Should I use another executor to handle the slowRequest?
such as
public Observable makeSlowRequest() {
return Observable.just("test").map(sync_service_code).subscribeOn(Schedulers.io());
}

Problem :

I configured in application.properties like below :

// Load balancing backends with Eureka

//eureka.shouldUseDns=true
//eureka.eurekaServer.context=discovery/v2
//eureka.eurekaServer.domainName=discovery${environment}.netflix.net
//eureka.eurekaServer.gzipContent=true

//eureka.serviceUrl.default=http://${region}.${eureka.eurekaServer.domainName}:7001/${eureka.eurekaServer.context}

//api.ribbon.NIWSServerListClassName=com.netflix.niws.loadbalancer.DiscoveryEnabledNIWSServerList
//api.ribbon.DeploymentContextBasedVipAddresses=api-test.netflix.net:7001

////// Load balancing backends without Eureka

eureka.shouldFetchRegistry=false

api.ribbon.listOfServers=127.0.0.1:8889
api.ribbon.client.NIWSServerListClassName=com.netflix.loadbalancer.ConfigurationBasedServerList
api.ribbon.DeploymentContextBasedVipAddresses=api-test.netflix.net:7001

And start up the zuul2.1 server,
then I run command on my mac 'ab -c 100 -n 100 -k 'http://localhost:7001/api/hello''
After I get the tests results, I go to my web browser and type in http://localhost:7001/api/hello,
the http request will always get error code 503.
I look at my zuul2.1 server logs, it says :

2018-04-26 13:22:14,711 WARN com.netflix.zuul.netty.connectionpool.PerServerConnectionPool [Salamander-ClientToZuulWorker-5] Unable to create new connection because at MaxConnectionsPerHost! maxConnectionsPerHost=50, connectionsPerHost=50, host=aaaaa.dianrong.com:8889origin=api
2018-04-26 13:22:14,712 WARN com.netflix.zuul.filters.endpoint.ProxyEndpoint [Salamander-ClientToZuulWorker-5] FAILURE_LOCAL_THROTTLED_ORIGIN_SERVER_MAXCONN, origin = api, origin channel info =
com.netflix.zuul.netty.connectionpool.OriginConnectException: maxConnectionsPerHost=50, connectionsPerHost=50
2018-04-26 13:22:14,712 INFO com.netflix.zuul.sample.filters.outbound.ZuulResponseFilter [Salamander-ClientToZuulWorker-5] Passport: CurrentPassport {start_ms=1524720134710, [+0=SERVER_CH_ACTIVE, +310536=IN_REQ_HEADERS_RECEIVED, +663597=FILTERS_INBOUND_START, +1057463=FILTERS_INBOUND_END, +1624557=ORIGIN_CONN_ACQUIRE_START, +1682176=ORIGIN_CONN_ACQUIRE_FAILED, +1967045=FILTERS_OUTBOUND_START, +2326851=NOW]}
2018-04-26 13:22:14,712 WARN com.netflix.zuul.netty.server.ClientResponseWriter [Salamander-ClientToZuulWorker-5] Writing response to client channel before have received the LastContent of request! uri=http://localhost:7001/api/hello, method=get, clientip=0:0:0:0:0:0:0:1, Channel: [id: 0x0612891b, L:/0:0:0:0:0:0:0:1:7001 - R:/0:0:0:0:0:0:0:1:63491], active=true, open=true, registered=true, writable=true, id=0612891b, Passport: CurrentPassport {start_ms=1524720134710, [+0=SERVER_CH_ACTIVE, +310536=IN_REQ_HEADERS_RECEIVED, +663597=FILTERS_INBOUND_START, +1057463=FILTERS_INBOUND_END, +1624557=ORIGIN_CONN_ACQUIRE_START, +1682176=ORIGIN_CONN_ACQUIRE_FAILED, +1967045=FILTERS_OUTBOUND_START, +2428057=FILTERS_OUTBOUND_END, +2522462=NOW]}
2018-04-26 13:22:14,713 INFO ACCESS [Salamander-ClientToZuulWorker-5] 2018-04-26T13:22:14.71 0:0:0:0:0:0:0:1 7001 GET /api/hello 503 3027 - 7b1e09cb-7417-4e74-8a2c-bd4028cb123f "localhost:7001" "-" "-" "-" "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36" "-" "-" "-"

I debug into it and I found that it will always tryMakingNewConnection in PerServerConnectionPool.java and never auto-recover util I restart server again..

Could you help me to find out why I encounter the problem ?

@wade42
Copy link

wade42 commented Apr 27, 2018

Problem:
add api.ribbon.MaxConnectionsPerHost=-1

@Agileaq
Copy link
Author

Agileaq commented Apr 27, 2018

@wade42 -1 may resolve the problem,
but I can find methods like : release, decrementXXXX , getPoolForEventLoop..
I don't think it is the best practise..

@artgon
Copy link
Contributor

artgon commented Apr 27, 2018

In my local machine, I only have 8 worker threads and RxJava will use one of them & blocks the thread.

This is false. RxJava will not block a thread for delay() it will just schedule the operation for a later point in time. If you have to do something else that does block, you should schedule it on a separate thread. You should not do anything on a Netty thread that blocks.

Also, observeOn is a function to return back to the specified scheduler. This is done so that subsequent filters run on the Netty thread. You should read up on the difference between subscribeOn and observeOn (e.g. https://proandroiddev.com/understanding-rxjava-subscribeon-and-observeon-744b0c6a41ea).

FAILURE_LOCAL_THROTTLED_ORIGIN_SERVER_MAXCONN

This status means that you've reached the configured limit of connections to a single instance. This is just a safety mechanism, you can change it to a larger number or disable it (particularly if you want to run a benchmark). More detail on statuses here: https://github.com/Netflix/zuul/wiki/Core-Features#status-categories

@artgon artgon added the 2.x label Apr 27, 2018
@Agileaq
Copy link
Author

Agileaq commented Apr 28, 2018

Thank you !

@Agileaq Agileaq closed this as completed Apr 28, 2018
@jingege
Copy link

jingege commented May 14, 2018

Problem:
add api.ribbon.MaxConnectionsPerHost=-1

It causes the pool does not cache any connection. It's not a good practise. Something must be done to complete the PerServerConnectionPool.

@artgon
Copy link
Contributor

artgon commented May 14, 2018

@jingege what gives you that idea?

@pcabot
Copy link

pcabot commented May 14, 2018

I am experiencing the same issue described by @Agileaq when simply running Zuul as a proxy (without eureka) where I quickly reach the max amount of connection per host (50), and I get the same error ("FAILURE_LOCAL_THROTTLED_ORIGIN_SERVER_MAXCONN").

After stepping into the code, it looks like the connections are never released to the pool, which may cause this leak. It looks like a similar problem that was reported a few days ago: #424 (connection never be released to pool?).

I think the current throttling error and bug#424 are related, is that correct ?

@artgon
Copy link
Contributor

artgon commented May 14, 2018

@pcabot Yes, that issue caused connection not being returned and thus the counter not recovering after a burst of traffic. Are you seeing this behavior on the latest code or on the released version?

@pcabot
Copy link

pcabot commented May 15, 2018

Yes, I am seing the connection leak issue on the released version (2.1.1).

I just tried using the latest code this morning, and the problem seems to be gone (i.e. I can now use my app without getting "FAILURE_LOCAL_THROTTLED_ORIGIN_SERVER_MAXCONN" exceptions all the time through the proxy.)

What is the plan for the next official release with the fix ?

@artgon
Copy link
Contributor

artgon commented May 15, 2018

Thanks for confirming. I'll release a new version this week.

@SMakhrov
Copy link

Was it fixed?

I have the same issue in version 2.2.0.

DEBUG com.netflix.zuul.netty.insights.PassportLoggingHandler [Salamander-ClientToZuulWorker-3] State after complete. , context-error = com.netflix.zuul.netty.connectionpool.OriginConnectException: maxConnectionsPerHost=50, connectionsPerHost=50, current-http-reqs = 46, toplevelid = 7464ce0b-575e-446e-936d-e0177d31bbcd, req = uri=https://test:80/index.html, method=get, clientip=192.168.147.37, attempts = [{"status":-1,"duration":0,"attempt":1,"error":"ORIGIN_ORIGIN_SERVER_MAX_CONNS","exceptionType":"OriginConnectException","region":"UNKNOW","instanceId":"test123.s3-us-west-2.amazonaws.com:80","readTimeout":5000,"connectTimeout":2000}], passport = CurrentPassport {start_ms=1609206360921, [+0=IN_REQ_HEADERS_RECEIVED, +114391=FILTERS_INBOUND_START, +1387513=IN_REQ_LAST_CONTENT_RECEIVED, +34975764=FILTERS_INBOUND_END, +35016527=ORIGIN_CONN_ACQUIRE_START, +35018242=ORIGIN_CONN_ACQUIRE_FAILED, +35075970=FILTERS_OUTBOUND_START, +35089510=FILTERS_OUTBOUND_END, +35110908=OUT_RESP_HEADERS_SENDING, +35127608=OUT_RESP_LAST_CONTENT_SENDING, +35155862=OUT_RESP_HEADERS_SENT, +35156733=OUT_RESP_LAST_CONTENT_SENT, +35191918=SERVER_CH_CLOSE, +35194983=SERVER_CH_CLOSE, +35242562=NOW]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants