Skip to content
This repository was archived by the owner on Apr 30, 2025. It is now read-only.

Conversation

@kecirlotfi
Copy link
Contributor

@kecirlotfi kecirlotfi commented Mar 15, 2022

There is an open issue in cf/routing-release with id cloudfoundry/routing-release#250.
This proposal will change the way GoRouter manages routes for crashed apps.

We have three cases:
Below, for each case we describe, we will provide logs of tests before and after our change.
Before change is named Current behaviour and After change is named New Behaviour.

The app is UP but not reachable by gorouter for some reason (ie if the endpoint uses TLS and the error is one of PrunableClassifiers),

The gorouter will remove the endpoint and once the pool is empty, it will be removed.
Then future calls on the app will get 404 response code.
Our change will prevent removing empty pool for X period of time. As result, all calls will respond with 503 code.
After that X period is passed, the empty pool will be removed. Gorouter will repond with 404 later on for this app.

This case was tested locally, after building gorouter

Current behaviour

# register the route
nats-pub 'router.register' '{"host":"127.0.0.1","tls_port":1235,"uris":["tls.localhost.routing.cf-app.com"],"tags":{"another_key":"another_value","some_key":"some_value"}}'
#GoRouter logs
{"log_level":1,"timestamp":1648567660.59884,"message":"route-registered","source":"gorouter.stdout.registry","data":{"uri":"tls.localhost.routing.cf-app.com"}}
{"log_level":0,"timestamp":1648567660.598932,"message":"uri-added","source":"gorouter.stdout.registry","data":{"uri":"tls.localhost.routing.cf-app.com"}}
{"log_level":1,"timestamp":1648567660.5990021,"message":"endpoint-registered","source":"gorouter.stdout.registry","data":{"uri":"tls.localhost.routing.cf-app.com","backend":"127.0.0.1:1235","application_id":"","instance_id":"","server_cert_domain_san":"","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":0,"timestamp":1648567664.4837272,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"efb3b01d-aacf-4165-7de7-d2e53b66011b"}}
{"log_level":0,"timestamp":1648567664.4841878,"message":"backend","source":"gorouter.stdout","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""},"attempt":1}}
{"log_level":3,"timestamp":1648567664.484983,"message":"prune-failed-endpoint","source":"gorouter.stdout.registry","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""}}}
{"log_level":3,"timestamp":1648567664.485009,"message":"backend-endpoint-failed","source":"gorouter.stdout","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""},"error":"dial tcp 127.0.0.1:1235: connect: connection refused","attempt":1,"vcap_request_id":"efb3b01d-aacf-4165-7de7-d2e53b66011b","retriable":true,"num-endpoints":1}}
{"log_level":0,"timestamp":1648567664.48502,"message":"retriable-error","source":"gorouter.stdout","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""},"error":{"Op":"dial","Net":"tcp","Source":null,"Addr":{"IP":"127.0.0.1","Port":1235,"Zone":""},"Err":{"Syscall":"connect","Err":61}}}}
{"log_level":3,"timestamp":1648567664.485076,"message":"select-endpoint-failed","source":"gorouter.stdout","data":{"host":"tls.localhost.routing.cf-app.com","error":"No endpoints available"}}
2022/03/29 17:27:44 http: proxy error: dial tcp 127.0.0.1:1235: connect: connection refused
{"log_level":0,"timestamp":1648567665.7921522,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"c0de9950-2232-4720-4324-fe1b7d218fc1"}}

Querying the app

# Query the app
➜  ~ curl tls.localhost.routing.cf-app.com:8081 ; date
502 Bad Gateway: Registered endpoint failed to handle the request.
Tue Mar 29 17:27:44 CEST 2022
➜  ~ curl tls.localhost.routing.cf-app.com:8081 ; date
404 Not Found: Requested route ('tls.localhost.routing.cf-app.com:8081') does not exist.
Tue Mar 29 17:27:45 CEST 2022

New bahviour

nats-pub 'router.register' '{"host":"127.0.0.1","tls_port":1235,"uris":["tls.localhost.routing.cf-app.com"],"tags":{"another_key":"another_value","some_key":"some_value"}}'
#GoRouter Logs
{"log_level":1,"timestamp":1648568063.7172391,"message":"route-registered","source":"gorouter.stdout.registry","data":{"uri":"tls.localhost.routing.cf-app.com"}}
{"log_level":0,"timestamp":1648568063.7173162,"message":"uri-added","source":"gorouter.stdout.registry","data":{"uri":"tls.localhost.routing.cf-app.com"}}
{"log_level":1,"timestamp":1648568063.717377,"message":"endpoint-registered","source":"gorouter.stdout.registry","data":{"uri":"tls.localhost.routing.cf-app.com","backend":"127.0.0.1:1235","application_id":"","instance_id":"","server_cert_domain_san":"","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":0,"timestamp":1648568072.828989,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"ec44d38c-21c8-459f-536d-1033d7879a0e"}}
{"log_level":0,"timestamp":1648568072.829365,"message":"backend","source":"gorouter.stdout","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""},"attempt":1}}
{"log_level":3,"timestamp":1648568072.830132,"message":"prune-failed-endpoint","source":"gorouter.stdout.registry","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""}}}
{"log_level":3,"timestamp":1648568072.83016,"message":"backend-endpoint-failed","source":"gorouter.stdout","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""},"error":"dial tcp 127.0.0.1:1235: connect: connection refused","attempt":1,"vcap_request_id":"ec44d38c-21c8-459f-536d-1033d7879a0e","retriable":true,"num-endpoints":1}}
{"log_level":0,"timestamp":1648568072.8301692,"message":"retriable-error","source":"gorouter.stdout","data":{"route-endpoint":{"ApplicationId":"","Addr":"127.0.0.1:1235","Tags":{"another_key":"another_value","some_key":"some_value"},"RouteServiceUrl":""},"error":{"Op":"dial","Net":"tcp","Source":null,"Addr":{"IP":"127.0.0.1","Port":1235,"Zone":""},"Err":{"Syscall":"connect","Err":61}}}}
{"log_level":3,"timestamp":1648568072.830206,"message":"select-endpoint-failed","source":"gorouter.stdout","data":{"host":"tls.localhost.routing.cf-app.com","error":"No endpoints available"}}
2022/03/29 17:34:32 http: proxy error: dial tcp 127.0.0.1:1235: connect: connection refused
{"log_level":0,"timestamp":1648568075.1977432,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"fbbd9b4d-9c1c-4fc5-4ff8-0425dd498996"}}
{"log_level":1,"timestamp":1648568075.1978729,"message":"status","source":"gorouter.stdout","data":{"body":"503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints."}}
{"log_level":0,"timestamp":1648568077.469234,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"addf4673-a650-4f53-6933-d7fc359a16a0"}}
{"log_level":1,"timestamp":1648568077.469388,"message":"status","source":"gorouter.stdout","data":{"body":"503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints."}}
{"log_level":0,"timestamp":1648568084.436451,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"b9a48576-4c37-4b76-7513-01841fd9cec2"}}
{"log_level":1,"timestamp":1648568084.436553,"message":"status","source":"gorouter.stdout","data":{"body":"503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints."}}
{"log_level":0,"timestamp":1648568086.2084942,"message":"start-pruning-routes","source":"gorouter.stdout.registry","data":{}}
{"log_level":0,"timestamp":1648568086.208517,"message":"finished-pruning-routes","source":"gorouter.stdout.registry","data":{}}
{"log_level":0,"timestamp":1648568090.889292,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"a5f08c95-caa4-4da4-5859-e2498dd4c544"}}
{"log_level":1,"timestamp":1648568090.889442,"message":"status","source":"gorouter.stdout","data":{"body":"503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints."}}
{"log_level":0,"timestamp":1648568100.633421,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"4504c828-f77c-44ff-7dc7-a51e8cdecf12"}}
{"log_level":1,"timestamp":1648568100.633523,"message":"status","source":"gorouter.stdout","data":{"body":"503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints."}}
{"log_level":0,"timestamp":1648568106.937655,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"81ae0508-e14d-4c42-4ded-38fa247e86c1"}}
{"log_level":1,"timestamp":1648568106.937793,"message":"status","source":"gorouter.stdout","data":{"body":"503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints."}}
{"log_level":0,"timestamp":1648568116.2087529,"message":"start-pruning-routes","source":"gorouter.stdout.registry","data":{}}
{"log_level":0,"timestamp":1648568116.208767,"message":"finished-pruning-routes","source":"gorouter.stdout.registry","data":{}}
{"log_level":0,"timestamp":1648568118.027283,"message":"vcap-request-id-header-set","source":"gorouter.stdout","data":{"VcapRequestIdHeader":"1aaff5a0-b1e8-4aed-782b-913dacf82d2d"}}

Querying the app

➜  ~ for i in `seq 1 30`; do ; curl tls.localhost.routing.cf-app.com:8081 ; sleep 2; done 
502 Bad Gateway: Registered endpoint failed to handle the request.
503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints.
503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints.
503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints.
503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints.
503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints.
503 Service Unavailable: Requested route ('tls.localhost.routing.cf-app.com:8081') has no available endpoints.
404 Not Found: Requested route ('tls.localhost.routing.cf-app.com:8081') does not exist.

The app crashes, so the route-emitter will unregister the route: all incoming calls will get 404 respose code.

Our change will prevent from unregistering the route of the crashed app for X period of time. This will give the app time to become live again.
During this X period, Enduser will get 503 response code for each call.
After this period is reached and if the app is not alive again, the route will be pruned during the next pruning cycle.
This will result in 404 response code.

current behviour

#GoRouter logs
{"log_level":1,"timestamp":"2022-03-29T14:58:07.588298046Z","message":"route-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN"}}
{"log_level":1,"timestamp":"2022-03-29T14:58:07.588419206Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN","backend":"10.36.64.5:61003","application_id":"f26411c9-25d6-448b-98be-146807c39ccc","instance_id":"6dc000c9-3d6a-47fa-717f-6101","server_cert_domain_san":"6dc000c9-3d6a-47fa-717f-6101","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":3,"timestamp":"2022-03-29T14:58:08.746949120Z","message":"prune-failed-endpoint","source":"vcap.gorouter.registry","data":{"route-endpoint":{"ApplicationId":"f26411c9-25d6-448b-98be-146807c39ccc","Addr":"10.36.64.5:61003","Tags":{"app_id":"f26411c9-25d6-448b-98be-146807c39ccc","app_name":"go-fail","component":"route-emitter","instance_id":"0","organization_id":"68190e66-80ae-4d98-a0ef-26a14fd4b73b","organization_name":"SAP","process_id":"f26411c9-25d6-448b-98be-146807c39ccc","process_instance_id":"6dc000c9-3d6a-47fa-717f-6101","process_type":"web","source_id":"f26411c9-25d6-448b-98be-146807c39ccc","space_id":"4846433d-63cb-4f3b-866c-835c52976933","space_name":"haproxy"},"RouteServiceUrl":""}}}
{"log_level":3,"timestamp":"2022-03-29T14:58:08.747239281Z","message":"backend-endpoint-failed","source":"vcap.gorouter","data":{"route-endpoint":{"ApplicationId":"f26411c9-25d6-448b-98be-146807c39ccc","Addr":"10.36.64.5:61003","Tags":{"app_id":"f26411c9-25d6-448b-98be-146807c39ccc","app_name":"go-fail","component":"route-emitter","instance_id":"0","organization_id":"68190e66-80ae-4d98-a0ef-26a14fd4b73b","organization_name":"SAP","process_id":"f26411c9-25d6-448b-98be-146807c39ccc","process_instance_id":"6dc000c9-3d6a-47fa-717f-6101","process_type":"web","source_id":"f26411c9-25d6-448b-98be-146807c39ccc","space_id":"4846433d-63cb-4f3b-866c-835c52976933","space_name":"haproxy"},"RouteServiceUrl":""},"error":"EOF (via idempotent request)","attempt":1,"vcap_request_id":"c4708435-0542-4988-5dbd-6fa8c48577d3","retriable":true,"num-endpoints":1}}
{"log_level":3,"timestamp":"2022-03-29T14:58:08.747287686Z","message":"select-endpoint-failed","source":"vcap.gorouter","data":{"host":"go-fail.cfapps.SOME_CF_DOMAIN","error":"No endpoints available"}}
{"log_level":1,"timestamp":"2022-03-29T14:58:10.274841920Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-29T14:58:11.794864021Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-29T14:58:13.547995540Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-29T14:58:14.239141158Z","message":"route-unregistered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN"}}
{"log_level":1,"timestamp":"2022-03-29T14:58:20.746515120Z","message":"route-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN"}}
{"log_level":1,"timestamp":"2022-03-29T14:58:20.746633170Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN","backend":"10.36.65.4:61003","application_id":"f26411c9-25d6-448b-98be-146807c39ccc","instance_id":"f230797d-774a-4757-5424-8a8d","server_cert_domain_san":"f230797d-774a-4757-5424-8a8d","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2022-03-29T14:59:04.245102535Z","message":"route-registered","source":"vcap.gorouter.registry","data":{"uri":"cf-push-monitoring-testapp-11c97d1b-f261-4ea5-6e9f-7d31.cf.SOME_CF_DOMAIN"}}
{"log_level":1,"timestamp":"2022-03-29T14:59:04.245365816Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"cf-push-monitoring-testapp-11c97d1b-f261-4ea5-6e9f-7d31.cf.SOME_CF_DOMAIN","backend":"10.36.65.4:61018","application_id":"8ee1afb2-74f2-414a-91d0-a024ccf939b2","instance_id":"51c5978d-7df0-4373-424f-f2a3","server_cert_domain_san":"51c5978d-7df0-4373-424f-f2a3","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2022-03-29T14:59:07.710947662Z","message":"endpoint-unregistered","source":"vcap.gorouter.registry","data":{"uri":"cf-push-monitoring-testapp-11c97d1b-f261-4ea5-6e9f-7d31.cf.SOME_CF_DOMAIN","backend":"10.36.65.4:61018","application_id":"8ee1afb2-74f2-414a-91d0-a024ccf939b2","instance_id":"51c5978d-7df0-4373-424f-f2a3","server_cert_domain_san":"51c5978d-7df0-4373-424f-f2a3","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2022-03-29T14:59:07.711018688Z","message":"route-unregistered","source":"vcap.gorouter.registry","data":{"uri":"cf-push-monitoring-testapp-11c97d1b-f261-4ea5-6e9f-7d31.cf.SOME_CF_DOMAIN"}}

Querying the app

➜  ~ for i in `seq 1 30`; do ; curl https://go-fail.cfapps.SOME_CF_DOMAIN/fail\?p\=50 ; sleep 1; done
(instance: 0) 56 >= 50 - OK
(instance: 0) 77 >= 50 - OK
(instance: 0) 57 >= 50 - OK
502 Bad Gateway: Registered endpoint failed to handle the request.
503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
502 Bad Gateway: Registered endpoint failed to handle the request.
503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') has no available endpoints.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.

New bahviour

# GoRouter logs
{"log_level":3,"timestamp":"2022-03-31T12:37:37.586968620Z","message":"prune-failed-endpoint","source":"vcap.gorouter.registry","data":{"route-endpoint":{"ApplicationId":"f26411c9-25d6-448b-98be-146807c39ccc","Addr":"10.36.64.5:61003","Tags":{"app_id":"f26411c9-25d6-448b-98be-146807c39ccc","app_name":"go-fail","component":"route-emitter","instance_id":"0","organization_id":"68190e66-80ae-4d98-a0ef-26a14fd4b73b","organization_name":"SAP","process_id":"f26411c9-25d6-448b-98be-146807c39ccc","process_instance_id":"6716a544-5f15-4095-7d2e-1718","process_type":"web","source_id":"f26411c9-25d6-448b-98be-146807c39ccc","space_id":"4846433d-63cb-4f3b-866c-835c52976933","space_name":"haproxy"},"RouteServiceUrl":""}}}
{"log_level":3,"timestamp":"2022-03-31T12:37:37.587187399Z","message":"backend-endpoint-failed","source":"vcap.gorouter","data":{"route-endpoint":{"ApplicationId":"f26411c9-25d6-448b-98be-146807c39ccc","Addr":"10.36.64.5:61003","Tags":{"app_id":"f26411c9-25d6-448b-98be-146807c39ccc","app_name":"go-fail","component":"route-emitter","instance_id":"0","organization_id":"68190e66-80ae-4d98-a0ef-26a14fd4b73b","organization_name":"SAP","process_id":"f26411c9-25d6-448b-98be-146807c39ccc","process_instance_id":"6716a544-5f15-4095-7d2e-1718","process_type":"web","source_id":"f26411c9-25d6-448b-98be-146807c39ccc","space_id":"4846433d-63cb-4f3b-866c-835c52976933","space_name":"haproxy"},"RouteServiceUrl":""},"error":"EOF (via idempotent request)","attempt":1,"vcap_request_id":"86ce7138-10fa-4419-6341-a5f734ae1e95","retriable":true,"num-endpoints":1}}
{"log_level":3,"timestamp":"2022-03-31T12:37:37.587222281Z","message":"select-endpoint-failed","source":"vcap.gorouter","data":{"host":"go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN","error":"No endpoints available"}}
{"log_level":1,"timestamp":"2022-03-31T12:37:38.106110243Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:37:40.630264960Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:37:43.150265595Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:37:45.678383355Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:37:48.202326449Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:37:49.948872577Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN","backend":"10.36.65.4:61003","application_id":"f26411c9-25d6-448b-98be-146807c39ccc","instance_id":"2320c055-6d64-499c-6dce-9a1a","server_cert_domain_san":"2320c055-6d64-499c-6dce-9a1a","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}

Querying the app

➜  ~  curl https://go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN/fail\?p\=99  ;for i in `seq 1 10`; do ; curl https://go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN/fail\?p\=0 ; sleep 2; done
502 Bad Gateway: Registered endpoint failed to handle the request.
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
(instance: 0) 15 >= 0 - OK

The app is stopped for any reason.

The app will show stopped but any request will receive 503 response code until X timeout is reached and pruning cycle runs.

Current behaviour

# GoRouter logs
{"log_level":2,"timestamp":"2022-03-29T14:20:20.473694009Z","message":"deprecated-semicolon-params","source":"vcap.gorouter","data":{"vcap_request_id":"e540123c-03ec-486c-4723-06b1d08992ea"}}
{"log_level":2,"timestamp":"2022-03-29T14:20:20.993599040Z","message":"deprecated-semicolon-params","source":"vcap.gorouter","data":{"vcap_request_id":"d4c13737-2d22-488c-713a-cfca80b960e8"}}
{"log_level":1,"timestamp":"2022-03-29T14:21:05.204192669Z","message":"endpoint-unregistered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN","backend":"10.36.65.4:61003","application_id":"f26411c9-25d6-448b-98be-146807c39ccc","instance_id":"435f3b75-6df1-448f-41f9-ab67","server_cert_domain_san":"435f3b75-6df1-448f-41f9-ab67","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2022-03-29T14:21:05.204273795Z","message":"route-unregistered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.SOME_CF_DOMAIN"}}

Querying the app

for  i in `seq 1 30`; do ; curl https://go-fail.cfapps.SOME_CF_DOMAIN/fail\?p\=0 ; sleep 1; done
(instance: 0) 98 >= 0 - OK
(instance: 0) 0 >= 0 - OK
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.SOME_CF_DOMAIN') does not exist.

New behaviour

# GoRouter logs
{"log_level":1,"timestamp":"2022-03-31T12:46:38.506370736Z","message":"route-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN"}}
{"log_level":1,"timestamp":"2022-03-31T12:46:38.506552168Z","message":"endpoint-registered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN","backend":"10.36.65.4:61003","application_id":"f26411c9-25d6-448b-98be-146807c39ccc","instance_id":"56c806c4-ce90-447d-720b-5963","server_cert_domain_san":"56c806c4-ce90-447d-720b-5963","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2022-03-31T12:46:51.890662638Z","message":"endpoint-unregistered","source":"vcap.gorouter.registry","data":{"uri":"go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN","backend":"10.36.65.4:61003","application_id":"f26411c9-25d6-448b-98be-146807c39ccc","instance_id":"56c806c4-ce90-447d-720b-5963","server_cert_domain_san":"56c806c4-ce90-447d-720b-5963","protocol":"http1","modification_tag":{"guid":"","index":0},"isolation_segment":"-","isTLS":true}}
{"log_level":1,"timestamp":"2022-03-31T12:47:19.781613831Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:22.306076941Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:24.833871833Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:27.354047268Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:29.874047945Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:32.397915519Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:34.921711488Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:37.441778017Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}
{"log_level":1,"timestamp":"2022-03-31T12:47:39.961964382Z","message":"status","source":"vcap.gorouter","data":{"body":"503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints."}}

Querying the app

➜  ~ for i in `seq 1 30`; do ; curl https://go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN/fail\?p\=0 ; sleep 2; done
(instance: 0) 80 >= 0 - OK
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
...
503 Service Unavailable: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') has no available endpoints.
404 Not Found: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') does not exist.
404 Not Found: Requested route ('go-fail.cfapps.aws-cfn02.SOME_CF_DOMAIN') does not exist.
  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using scripts/run-unit-tests-in-docker from routing-release.

  • (Optional) I have run Routing Acceptance Tests and Routing Smoke Tests on bosh lite

  • (Optional) I have run CF Acceptance Tests on bosh lite

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 15, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

Comment on lines 46 to 54
} else if runtime.GOOS == "darwin" {
// no /proc on MacOS, falling back to lsof
out, err := exec.Command("/bin/sh", "-c", fmt.Sprintf("lsof -p %v", os.Getpid())).Output()
if err != nil {
f.logger.Error("error-running-lsof", zap.Error(err))
break
}
lines := strings.Split(string(out), "\n")
numFds = len(lines) - 1 //cut the table header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gorouter on macOS a supported configuration / maintained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit is actually part of #312 - not meant to be in this PR.
I don't think gorouter on macOS is supported for production use, but it should work at least for development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to handle both issues in the same PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified on the phone, no. We'll remove it from the PR.

@kecirlotfi kecirlotfi force-pushed the debug_404_errors_when_app_crashed branch 3 times, most recently from 03e8265 to c62af1d Compare March 16, 2022 14:37
@kecirlotfi kecirlotfi marked this pull request as draft March 18, 2022 09:05
@kecirlotfi kecirlotfi force-pushed the debug_404_errors_when_app_crashed branch from f4da896 to fbf60c3 Compare March 18, 2022 13:23
Comment on lines 327 to 334
if r.EmptyPoolResponseCode503 {
if t.Pool.EmptyPoolTimeout == 0 {
t.Pool.EmptyPoolTimeout = 30 * time.Second
}
if time.Since(t.Pool.LastUpdated()) > t.Pool.EmptyPoolTimeout {
t.Snip()
}
} else {
t.Snip()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if r.EmptyPoolResponseCode503 {
if t.Pool.EmptyPoolTimeout == 0 {
t.Pool.EmptyPoolTimeout = 30 * time.Second
}
if time.Since(t.Pool.LastUpdated()) > t.Pool.EmptyPoolTimeout {
t.Snip()
}
} else {
t.Snip()
}
if r.EmptyPoolResponseCode503 && t.Pool.EmptyPoolTimeout > 0 {
if time.Since(t.Pool.LastUpdated()) > t.Pool.EmptyPoolTimeout {
t.Snip()
}
} else {
t.Snip()
}

There is an open issue in cf/routing-release with id cloudfoundry/routing-release#250.
It introduces a Pool timeout and Pool Updated notions used to prevent pruning empty pools before they timeout.
After the timeout is reached, the pool will be deleted and all coming calls on the related app will result in 404 respose code.
There is also a new gorouter parameter in the config (EmptyPoolTimeout) to set the timeout for pools.

Fixe yaml syntaxe typo in defining PerAppPrometheusHttpMetricsReporting in config.go
@kecirlotfi kecirlotfi force-pushed the debug_404_errors_when_app_crashed branch from 4f48a82 to 9b6e00c Compare March 18, 2022 14:13
@kecirlotfi kecirlotfi marked this pull request as ready for review March 18, 2022 14:16
kecirlotfi added a commit to kecirlotfi/routing-release that referenced this pull request Mar 22, 2022
This PR fixes cloudfoundry#250
It depends on the PR cloudfoundry/gorouter#314
A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
kecirlotfi added a commit to kecirlotfi/routing-release that referenced this pull request Mar 22, 2022
This PR fixes cloudfoundry#250
It depends on the PR cloudfoundry/gorouter#314
A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
@kecirlotfi kecirlotfi changed the title Debug 404 errors when app crashed Return 503 instead of 404 when an app is crashes Mar 30, 2022
@kecirlotfi kecirlotfi changed the title Return 503 instead of 404 when an app is crashes Return 503 instead of 404 for longer using pool timeout Mar 30, 2022
@domdom82
Copy link
Contributor

LGTM

@ameowlia ameowlia merged commit a3cc508 into cloudfoundry:main Apr 8, 2022
ameowlia pushed a commit to cloudfoundry/routing-release that referenced this pull request Apr 8, 2022
This PR fixes #250
It depends on the PR cloudfoundry/gorouter#314
A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
ameowlia pushed a commit to cloudfoundry/routing-release that referenced this pull request Apr 8, 2022
This PR fixes #250
It depends on the PR cloudfoundry/gorouter#314
A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
@ameowlia
Copy link
Member

ameowlia commented Apr 8, 2022

🎉 Thank you everyone for your hard work on this!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants