Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong throws 404 not found on route paths that exist at times. #4055

Closed
jeremyjpj0916 opened this issue Dec 5, 2018 · 65 comments
Closed

Kong throws 404 not found on route paths that exist at times. #4055

jeremyjpj0916 opened this issue Dec 5, 2018 · 65 comments

Comments

@jeremyjpj0916
Copy link
Contributor

jeremyjpj0916 commented Dec 5, 2018

Summary

We have noticed in our Kong Gateway nodes times when common endpoints the gateway exposes throwing 404 route not found on a % of API calls. The specific proxy we focused on with this post( "/F5/status") does not route, has no auth, and serves as a ping up time endpoint that returns static 200 success. We do notice this as well on other endpoints that have auth and plugins as well, but the frequency in which our ping endpoint gets consumed is consistent and provides the best insight.

Steps To Reproduce

Reproducing consistently seems impossible from our perspective at this time, but we will elaborate with as much detail and screenshots as we can.

  1. Create combination of services and resources 1 to 1. We see 130 service route pairs. 300 plugins. Some global, some applied directly to routes(acl/auth).

  2. Add additional services + routes pairs with standard plugins over extended time.

  3. Suspicion is eventually the 404's will reveal themselves, and we have a high degree of confidence it does not happen globally across all worker processes.

New arbitrary Service+Route was made on the Gateway, specific important to note is created timestamp:
servicecreatedattimestamp

Converted to UTC:
timestampconverted

Above time matches identically with when the existing "/F5/status" began throwing 404 route not found errors:
f5statuserrorrateoccurs

You can see a direct correlation to when that new service+route pair was created to when the gateway began to throw 404 not found errors on a route that exists and previously had no problems. Note the "/F5/status" endpoint takes consistent traffic at all time from health check monitors.

Interesting bit its the % of errors to this individual Kong Node, we run 6 worker processes and the error rate % is almost perfect for 1 worker process showing impact:
f5status404rate

To discuss our architecture we run Kong with Cassandra 3.X in 2 datacenters, 1 Kong node per data center. We run a 6 node Cassandra cluster, 3 C* nodes per datacenter. The errors only occurred in a single datacenter on the Kong node in our examples above, but both datacenters share identical settings. We redeploy Kong on a weekly basis every Monday early AM, but this error presented above Started on a Wednesday so we can't correlate the problem to any sort of Kong startup issue.

To us the behavior points to cache rebuilding during new resource creation based on what we can correlate. Sadly nothing in Kong logging catches anything of interest when we notice issue presenting itself. Also note it does not happen every time obviously so its a very hard issue to nail down.

We also notice the issue correcting itself too at times, we have not traced the correction to anything specific just yet, but I assume very likely its when further services and routes are created after the errors are occurring and what seems to be a problematic worker process has its router cleaned up again.

Other points I can make are that production has not seen this issue with identical Kong configurations and architecture. But production has fewer proxies and has new services or routes added at a much lower frequency(1-2 per week vs 20+ in non-prod).

I wonder at this time if it may be also safer for us to switch cache TTL back from the 0 infinity value to some arbitrary number of hours to force cycle on the resources. I suppose if it is indeed the cache as we suspect that that would actually make the frequency of this issue more prevalent possibly though.

I may write a Kong health script that just arbitrarily grabs all routes on the gateway and calls each 1 one by 1 to ensure they don't return a 404 as a sanity check to run daily too right now. My biggest fear is as production grows in size and/or higher frequency in services/routes created daily we may begin to see the issue present itself there as well and that would cause dramatic impact to existing priority services if they start to 404 respond due to Kong not recognizing the proxy route exists in the db and caching appropriately.

Sorry I could not provide a 100% reproducible scenario for this situation, can only go off the analytics we have. Although if it yields some underlying bug in how Kong currently manages services and routes, that would bring huge stability to the core product.

Additional Details & Logs

  • Kong version 0.14.1

  • Kong error logs -
    Kong Error logs reveal nothing about the 404 not found's from Kong's perspetive. Nothing gets logged during these events in terms of normal or debug execution.

  • Kong configuration (the output of a GET request to Kong's Admin port - see
    https://docs.konghq.com/latest/admin-api/#retrieve-node-information)

{
  "plugins": {
    "enabled_in_cluster": [
      "correlation-id",
      "acl",
      "kong-oidc-multi-idp",
      "oauth2",
      "rate-limiting",
      "cors",
      "jwt",
      "kong-spec-expose",
      "request-size-limiting",
      "kong-response-size-limiting",
      "request-transformer",
      "request-termination",
      "kong-error-log",
      "kong-oidc-implicit-token",
      "kong-splunk-log",
      "kong-upstream-jwt",
      "kong-cluster-drain",
      "statsd"
    ],
    "available_on_server": {
      "kong-path-based-routing": true,
      "kong-spec-expose": true,
      "kong-cluster-drain": true,
      "correlation-id": true,
      "statsd": true,
      "jwt": true,
      "cors": true,
      "kong-oidc-multi-idp": true,
      "kong-response-size-limiting": true,
      "kong-oidc-auth": true,
      "kong-upstream-jwt": true,
      "kong-error-log": true,
      "request-termination": true,
      "request-size-limiting": true,
      "acl": true,
      "rate-limiting": true,
      "kong-service-virtualization": true,
      "request-transformer": true,
      "kong-oidc-implicit-token": true,
      "kong-splunk-log": true,
      "oauth2": true
    }
  },
  "tagline": "Welcome to kong",
  "configuration": {
    "plugins": [
      "kong-error-log",
      "kong-oidc-implicit-token",
      "kong-response-size-limiting",
      "cors",
      "request-transformer",
      "kong-service-virtualization",
      "kong-cluster-drain",
      "kong-upstream-jwt",
      "kong-splunk-log",
      "kong-spec-expose",
      "kong-oidc-auth",
      "kong-path-based-routing",
      "kong-oidc-multi-idp",
      "correlation-id",
      "oauth2",
      "statsd",
      "jwt",
      "rate-limiting",
      "acl",
      "request-size-limiting",
      "request-termination"
    ],
    "admin_ssl_enabled": false,
    "lua_ssl_verify_depth": 3,
    "trusted_ips": [
      "0.0.0.0/0",
      "::/0"
    ],
    "lua_ssl_trusted_certificate": "/usr/local/kong/ssl/kongcert.pem",
    "loaded_plugins": {
      "kong-path-based-routing": true,
      "kong-spec-expose": true,
      "kong-cluster-drain": true,
      "correlation-id": true,
      "statsd": true,
      "jwt": true,
      "cors": true,
      "rate-limiting": true,
      "kong-response-size-limiting": true,
      "kong-oidc-auth": true,
      "kong-upstream-jwt": true,
      "acl": true,
      "oauth2": true,
      "kong-splunk-log": true,
      "kong-oidc-implicit-token": true,
      "kong-error-log": true,
      "kong-service-virtualization": true,
      "request-transformer": true,
      "kong-oidc-multi-idp": true,
      "request-size-limiting": true,
      "request-termination": true
    },
    "cassandra_username": "****",
    "admin_ssl_cert_csr_default": "/usr/local/kong/ssl/admin-kong-default.csr",
    "ssl_cert_key": "/usr/local/kong/ssl/kongprivatekey.key",
    "dns_resolver": {},
    "pg_user": "kong",
    "mem_cache_size": "1024m",
    "cassandra_data_centers": [
      "dc1:2",
      "dc2:3"
    ],
    "nginx_admin_directives": {},
    "cassandra_password": "******",
    "custom_plugins": {},
    "pg_host": "127.0.0.1",
    "nginx_acc_logs": "/usr/local/kong/logs/access.log",
    "proxy_listen": [
      "0.0.0.0:8000",
      "0.0.0.0:8443 ssl http2"
    ],
    "client_ssl_cert_default": "/usr/local/kong/ssl/kong-default.crt",
    "ssl_cert_key_default": "/usr/local/kong/ssl/kong-default.key",
    "dns_no_sync": false,
    "db_update_propagation": 5,
    "nginx_err_logs": "/usr/local/kong/logs/error.log",
    "cassandra_port": 9042,
    "dns_order": [
      "LAST",
      "SRV",
      "A",
      "CNAME"
    ],
    "dns_error_ttl": 1,
    "headers": [
      "latency_tokens"
    ],
    "dns_stale_ttl": 4,
    "nginx_optimizations": true,
    "database": "cassandra",
    "pg_database": "kong",
    "nginx_worker_processes": "auto",
    "lua_package_cpath": "",
    "admin_acc_logs": "/usr/local/kong/logs/admin_access.log",
    "lua_package_path": "./?.lua;./?/init.lua;",
    "nginx_pid": "/usr/local/kong/pids/nginx.pid",
    "upstream_keepalive": 120,
    "cassandra_contact_points": [
      "server8429",
      "server8431",
      "server8432",
      "server8433",
      "server8445",
      "server8428"
    ],
    "admin_access_log": "off",
    "client_ssl_cert_csr_default": "/usr/local/kong/ssl/kong-default.csr",
    "proxy_listeners": [
      {
        "ssl": false,
        "ip": "0.0.0.0",
        "proxy_protocol": false,
        "port": 8000,
        "http2": false,
        "listener": "0.0.0.0:8000"
      },
      {
        "ssl": true,
        "ip": "0.0.0.0",
        "proxy_protocol": false,
        "port": 8443,
        "http2": true,
        "listener": "0.0.0.0:8443 ssl http2"
      }
    ],
    "proxy_ssl_enabled": true,
    "proxy_access_log": "off",
    "ssl_cert_csr_default": "/usr/local/kong/ssl/kong-default.csr",
    "enabled_headers": {
      "latency_tokens": true,
      "X-Upstream-Status": false,
      "X-Proxy-Latency": true,
      "server_tokens": false,
      "Server": false,
      "Via": false,
      "X-Upstream-Latency": true
    },
    "cassandra_ssl": true,
    "cassandra_local_datacenter": "DC2",
    "db_resurrect_ttl": 30,
    "db_update_frequency": 5,
    "cassandra_consistency": "LOCAL_QUORUM",
    "client_max_body_size": "100m",
    "admin_error_log": "/dev/stderr",
    "pg_ssl_verify": false,
    "dns_not_found_ttl": 30,
    "pg_ssl": false,
    "cassandra_repl_factor": 1,
    "cassandra_lb_policy": "RequestDCAwareRoundRobin",
    "cassandra_repl_strategy": "SimpleStrategy",
    "nginx_kong_conf": "/usr/local/kong/nginx-kong.conf",
    "error_default_type": "text/plain",
    "nginx_http_directives": {},
    "real_ip_header": "X-Forwarded-For",
    "kong_env": "/usr/local/kong/.kong_env",
    "cassandra_schema_consensus_timeout": 10000,
    "dns_hostsfile": "/etc/hosts",
    "admin_listeners": [
      {
        "ssl": false,
        "ip": "0.0.0.0",
        "proxy_protocol": false,
        "port": 8001,
        "http2": false,
        "listener": "0.0.0.0:8001"
      }
    ],
    "ssl_ciphers": "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256",
    "ssl_cert": "/usr/local/kong/ssl/kongcert.crt",
    "prefix": "/usr/local/kong",
    "admin_ssl_cert_key_default": "/usr/local/kong/ssl/admin-kong-default.key",
    "cassandra_ssl_verify": true,
    "db_cache_ttl": 0,
    "ssl_cipher_suite": "modern",
    "real_ip_recursive": "on",
    "proxy_error_log": "/dev/stderr",
    "client_ssl_cert_key_default": "/usr/local/kong/ssl/kong-default.key",
    "nginx_daemon": "off",
    "anonymous_reports": false,
    "cassandra_timeout": 5000,
    "nginx_proxy_directives": {},
    "pg_port": 5432,
    "log_level": "debug",
    "client_body_buffer_size": "50m",
    "client_ssl": false,
    "lua_socket_pool_size": 30,
    "admin_ssl_cert_default": "/usr/local/kong/ssl/admin-kong-default.crt",
    "cassandra_keyspace": "kong_stage",
    "ssl_cert_default": "/usr/local/kong/ssl/kong-default.crt",
    "nginx_conf": "/usr/local/kong/nginx.conf",
    "admin_listen": [
      "0.0.0.0:8001"
    ]
  },
  "version": "0.14.1",
  "node_id": "cf0c92a9-724d-4972-baed-599785cff5ed",
  "lua_version": "LuaJIT 2.1.0-beta3",
  "prng_seeds": {
    "pid: 73": 183184219419,
    "pid: 71": 114224634222,
    "pid: 72": 213242339120,
    "pid: 70": 218221808514,
    "pid: 69": 233240145991,
    "pid: 68": 231238177547
  },
  "timers": {
    "pending": 4,
    "running": 0
  },
  "hostname": "kong-62-hxqnq"
}
  • Operating system: Kong's Docker Alpine version 3.6 in github repo
@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 7, 2018

We were able to reproduce again with predictability.

6 Worker processes too do note.

  1. Start heavy 100+ TPS load test on a given route path on your gateway, monitor the status codes coming off of it(use the termination plugin make it throw a 200 success or what have you to get high throughput to the Kong node).

  2. We already have a decent number of services and routes already in place on our gateway so understand we are making a new service and route where 130+ are already existing. But create a new service and route pair during that load test.

  3. Watch as the route taking the load at that instance in time will throw 404's from a certain # of worker processes. Use the correlation id plugin globally to get those worker as uuid#counter and you will generally see some % of worker processes seemed to have managed to rebuild router while under that specific load just fine, the other worker processes throw the Kong 404 no route available error during that window on the route under load.

What does this mean? I believe Kong needs to implement router rebuild mutex in place so as Kong figures out changes to routes it has a temp cache to leverage of all the "existing" routes during the window of rebuild(I think @p0pr0ck5 implemented a mutex like this around Service resource create/updates/deletes for Kong 1.0 ).

Some interesting thoughts trailing off on this though as well. IF it was just the instance in time of rebuild that had issue why did we see existing routes like our /F5/status endpoint persist the 404 no route found error with certain worker processes? Maybe a timing thing or Kong invalidly caches that bad "miss" as a truth and continues to persist that miss even if its present in the db? I am not sure, but I believe the mutex lock would probably stop that from happening anyways.

Let us know if we can provide you any further help but I think Kong will be able to reproduce at this point!

UPDATE: Was told in gitter chat that 1.0RC3 has the router mutex in place. Will see if that resolves the described issues here.

@p0pr0ck5
Copy link
Contributor

p0pr0ck5 commented Dec 7, 2018

There's some interesting behavior in this that I don't know has been discussed before. The design goal of router rebuilds is to maintain a strongly consistent environment to avoid some worker processes sending different responses. Each worker maintains its own router, so rebuild events are done through the worker_events mechanism:

https://github.com/Kong/kong/blob/next/kong/runloop/handler.lua#L418-L421

These events are handled based on a polling mechanism, since there's no direct inter-worker IPC mechanism available at this time (well there is via a third part module, but nothing that's provided as a first-class OpenResty primative):

https://github.com/Kong/kong/blob/next/kong/init.lua#L432

At a given time, some worker processes may 'see' the event broadcast and act by forcing a router rebuild, and other worker processes will not yet have polled the shared zone. It's an incorrect assumption that all workers will handle an event at the exact same time. There will be some drift between workers handling events, so the behavior that you're seeing here is expected; it a nature of relying on a polling mechanism inside a shim to broadcast events among all workers. It's been long known that the current solution is sub-optimal for cases like this, but again, a true IPC solution requires some major under-the-hood work.

A node-wide router definition might solve some problems, but as discussed before raises other issues and concerns (memory limitations, deserialization, etc).

BTW, the router rebuild mutex would not solve this at all. The goal of that work is to avoid stampeding the database in high-concurrency traffic situations when a router rebuild is required, but it still relies on the same polling mechanism, so the race-y behavior seen here would likely still manifest.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 7, 2018

Uh oh. Well that is unfortunate, guess the best way to solve this would be an inter-worker IPC mechanism then. Maybe in the mean time there could be a patch in place until Kong/OpenResty can get there with the IPC, that in the background compares whats in the router cache vs db and invalidates the router if mismatches are detected maybe based on a configurable timer, maybe reconciles once a minute or so (I hate this approach so don't go with this lol)? Does not sound like a bug that can be fixed by 1.0 release then, but do keep us posted if anything comes out of it or an elegant alternative approach takes shape in the meantime.

Edit - Another idea could be worker processes reconciling service + routes against each other in a given interval setting on a per node basis over evaluating against the DB. Anything to help prevent the 404's from persisting would be my biggest concern until a better solution can be found.

I think for now we will limit changes to our Kong environments in prod to once nightly and use a script we whipped up to ensure no 404's are present post change.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 10, 2018

Updates: Was talking on Kong public gitter, and I mentioned that I had been unable to reproduce in dev with very few Kong Service+Route+Consumer+Plugin pairs. This held true, when I have fewer than say 10 total services and consumers and such I could never reproduce the error. I decided to ramp things up a tad by adding 500 Service+Route pairs, where the Route had the ACL plugin enabled with whitelist entry being the route id and Oauth2 plugin enabled, Also created a consumer to each route with an acl group tied to route id, as well as giving the consumer a pair of oauth2 and jwt creds for kicks(mimics how I handle all created consumers internally). After getting those resources in place to put a little meat on Kong's bones, I then ran a 100 TPS loadtest against our health check endpoint. It became easily producible then while adding service+route pairs during the load test to see the 404's once more, so it certainly seems Kong in its current form must have a breaking limit on # of resources the current internals can handle before things go a bit haywire. Granted I was not able in this short time of testing in my dev environment to "persist" the 404's like I have also seen occasionally in Stage(which stay throwing 404's until a service+route rebuild gets manually enforced), so a special sort of timing/race condition must have to occur that causes the 404 to persist. Each green bar spike of 404's are when when service/route pairs were created in dev manually.

devloadtesttonight

@p0pr0ck5
Copy link
Contributor

This almost sounds like a race condition when a new DAO iterator runs in multiple concurrent request handlers, because each page iteration results in thread-yielding network I/O. The end result would be that the request handler that writes the final version of the router cache value incorrectly assumed itself to have been completed. I haven't been able to track down exactly where that might occur, but it doesn't sound out of the realm of possibility at this point. In a wonderful showcase of hat eating, I suspect that the router rebuilding mutex in place in the release candidates actually would solve this, because only one instance of the iterator would be executing at a given time inside a worker.

@jeremyjpj0916 have you had a crack at testing this against 1.0 RC2/RC3 yet?

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 10, 2018

😄 @p0pr0ck5 don't be giving me some false hope now hah. I was holding off testing RC3 after your first assessment stated you did not believe the mutex to be a fix for this behavior, but since it sounds like you think it may help I am pushing that testing forward to today as I consider understanding the issue critical to our Kong deployments stability(as our production will grow in resource/proxy count and see all the problems stage presents pretty regularly at the moment). Will post back here by late afternoon/tonight with my results of that testing!

Edit - With rc3 throwing errors will await rc4's docker-kong git release and resume analysis then!

@p0pr0ck5
Copy link
Contributor

@jeremyjpj0916 FWIW the router mutex logic made its way into RC2, so if you want to test on RC2 that would suffice for context here.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 11, 2018

@p0pr0ck5 good point, I will give that a go in an hour or so actually.

Edit - nvrmind, I just remembered rc2 is where the migrations issue occurs where acl groups as a route uuid's(how we do thing internally) breaks the migrations commands(Thibaultcha patched that into the rc3 release). So guess ill wait on rc4 still 🎅 (if yall want to fix up RC2 or 3 with newer patches I can test that as well but might be better to focus on rc4 unless you consider me testing this a priority before 1.0).

Edit Edit - well look at that 1.0rc4 is here :D, Christmas came early. If yall get that updated here I will test it asap: https://github.com/Kong/docker-kong

Docker Kong info belowwww:

ENV KONG_VERSION 1.0.0rc4
ENV KONG_SHA256 b65c7c01a4d523b1687898c05793aae71645e8ba0b89d59343c157f4f88db7dd

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 12, 2018

@thibaultcha @p0pr0ck5
Bad news folks, 1.0.0rc4 actually causes worse impact around this.

100 TPS load test against our healthcheck endpoint yielded 100's of these this in the logs on the node taking the load(and even after load test finished the errors continually print every subsequent call too), Note every 4 seconds or so we added a route to an arbitrary service not currently under the load test on the same node being load tested. We then checked the other datacenter that was not taking any traffic and somehow it also ended up with these in its logs(albeit fewer of them, EDIT - but growing too now post loadtest) :

2018/12/12 00:35:11 [error] 145#0: *48178 [lua] responses.lua:128: before(): no router to route request (reason: error attempting to acquire build_router lock: timeout), client: 10.***.**.*, server: kong, request: "GET /F5/status HTTP/1.1", host: "gateway-dev-core-***.*****.***"

Rather than 404's the errors manifest themselves in logs as 500 now:
newloadtestwhilemodifyingroutes

Gave a quick try in terminal too on the route path in the other data center on a separate node as well as we noticed it went down, meaning it started failing its health checks too with 500's(we were not even load testing this specific node nor do we have cluster wide rate limiting as a global or anything):
localhostcurlinotherdc

I find the individual tx log interesting, why TotalLatency reported as 5 seconds on the dot? Any relation to KONG_CASSANDRA_TIMEOUT being set to 5000 in our environments? Note all my original config info from the admin API dump in my original post still holds true for the 1.0.0rc4 testing.

Findings are as follows, we start seeing 500's under load and it didn't take long till those errors in the stdout started printing by the 100's and overwhelmed not only the node taking load test traffic in one datacenter, but somehow the other node in a totally different data center as well not taking any traffic besides 1 ping call every 3-5 seconds (the /F5/status endpoint just uses just the request termination plugin set to 200 success remember) was failing.

Then still 10-20 minutes after the load test both of my nodes in either dc are unable to serve traffic on any routes, all of them throw the:

{"message":"An unexpected error occurred"}

And every call to any route like so above takes 5 seconds to respond it seems for that error.

And the STDOUT logs reveal the same
no router to route request (reason: error attempting to acquire build_router lock: timeout) on each of those calls.

20 minutes later I decided to see if there was anything in the db blowing up, locks was empty at the time and seemed like a new table. Nothing stood out to me major at a glance.

If you need any further dumps or analysis let me know. I think this should be easily producible if you run a load test of 100 tps against an endpoint and make some changes to routes on your test node. What really stumps me is how does the independent node in the other dc not even taking traffic get similar rekt status lol, some sort of introduced internode-dependency??

Other things I just noticed, in the node taking the loadtest vs node in other dc just chilling,

2018/12/12 00:35:11 [error] 141#0
through
2018/12/12 00:35:11 [error] 146#0

vs

2018/12/12 01:33:46 [error] 119#0:
2018/12/12 01:33:46 [error] 122#0:

All 6 worker processes in the node under load are throwing the error still in stdout, in the other DC node that was not load tested only 2 of the 6 are throwing the error.

@bungle
Copy link
Member

bungle commented Dec 12, 2018

I have seen this happening when running our testsuite. I have a feeling that it might be paging related.

@bungle
Copy link
Member

bungle commented Dec 12, 2018

I will try something tomorrow and ping you back.

@thibaultcha
Copy link
Member

When I was trying to reproduce this in the plane earlier, I hit a couple of other issues with the DAO that I am in the process of fixing. Unfortunately, traveling, and other thhings are getting in my way these days.

Those issues that I found should be mitigated by the router mutex though.

That said, I was not able to replicate your observations Jeremy. But as I am about to board another 12h flight, who knows, maybe I will have another chance at it, or @bungle will!

Thanks for giving rc4 a spin.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 12, 2018

Looking forward to any improvements that can be made. Would love for Kong 1.0 to have the ability to be configured on the fly with no availability impact to active traffic(or persisted impact), and to do so with a few thousand proxies and consumers ideally. If I have any more findings will post them here, or if you have any lua file changes or config changes you want me to give a go let me know!

I have a high degree of confidence yall will get it right with some tinkering 😸 .

On a positive note I did notice the C* client connections when router changes were being made during load did not jump nearly as dramatically nor did the C* read latency results as compared to 0.14.1 vs newer 1.0.0rc4 so some improvements seem to showing at the high level.

@p0pr0ck5
Copy link
Contributor

Leaving a very quick note here as it’s late and today I spent fire-fighting elsewhere. @jeremyjpj0916 thanks immensely for all your efforts in testing this. I see a bug in the router rebuild mutex handling that was introduced after the original change was committed. I will be digging into this tomorrow.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 12, 2018

No thanks necessary, I am overjoyed with the level of engagement you all have with the community, and its really awesome seeing the team iterate and improve on the product! Cheers to 1.0 nearing release as well as Kong's growth into 2019! Let me know when ready to test again and I will give it a go.

@bungle
Copy link
Member

bungle commented Dec 12, 2018

I am not sure if this is an another issue, but I will leave a comment here too:

Imagine you are rebuilding routes and rebuilder is on page 3. Someone adds a route or deletes a route on page 1. Now rebuilder should stop immediately and start again (I guess it does not do it?). Otherwise rebuilder itself produces a stale router, at least for requests that it has queued or at very least for the one request that is rebuilding the router currently (possibly per worker)

@bungle
Copy link
Member

bungle commented Dec 13, 2018

@jeremyjpj0916 there is now pr #4084, we would greatly appreciate if you can check what effect it has on this issue.

@thibaultcha
Copy link
Member

#4084 looks good. We should also fallback to rebuilding the router upon semaphore timeout (which the PR will definitely help preventing, but still a good safety measure to have).

That said, the original issue was in the latest stable (0.14.1), so unless the router rebuild semaphore circumvents the issue, it is likely still there... I disabled the router semaphore on 1.0rc3 and ran a number of tests, but was unable to reproduce the issue you are seeing @jeremyjpj0916 (producing a 404 - router miss - upon router rebuild when a correct Route should be matched). Can you share a script or something creating the Routes to test this against? I get that the number of Routes should be in the hundreds, but that did not seem to trigger any issue on my end, so some specific steps must be missing like order in which they are created, paths to match, etc...

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 13, 2018

@bungle Just woke up, on PTO today and tomorrow but I intend to spend some time this afternoon/evening and hack in those changes to the rc4 release and give it a go as I am super interested in it. Thanks for putting this together!

EDIT: So I am about to test here in a bit finishing my lunch, but essentially after I pull down the rc4 I am doing this in my build path(that handler.lua is the one from your PR):

COPY patches/handler.lua /usr/local/share/lua/5.1/kong/runloop/handler.lua
RUN chmod 777 /usr/local/share/lua/5.1/kong/runloop/handler.lua

Will give it a go!

@thibaultcha

I disabled the router semaphore on 1.0rc3 and ran a number of tests, but was unable to reproduce the issue you are seeing @jeremyjpj0916 (producing a 404 - router miss - upon router rebuild when a correct Route should be matched).

In the newer release candidates it presented itself with 500 level errors like my splunk log showed above.

I think I can whip up something in python that will create them all for yah as well as something to run live during the load test. My current logic is java but its a mess and not easy to break out.

Could the reason you are having trouble reproducing be because your Kong node and db are all just localhost with no replication? We have to do replication factor of 3 so all nodes in a dc hold all data because without it OAuth token generation has issue under load(we went through those discussions awhile back). I am curious if there was a way to add simulated latency in to 20-30ms range for read/writes from C*(probably close to realistic for what I see under load, read sometimes actually spikes to in the 100-800ms ranges on 0.14.1) and if the issues in my real environments would become apparent in your test env.

@thibaultcha
Copy link
Member

In the newer release candidates it presented itself with 500 level errors like my splunk log showed above.

That is the other issue related to the router semaphore fixed by #4084. My goal was to reproduce the behaviour you were originally describing here, but on the latest 1.0 branch, on which I needed to disable the router semaphore to mimic 0.14.

I am running a 3 nodes local cluster, but will increase RF to 3 and give it a another shot. With a QUORUM consistency I suppose.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 13, 2018

@thibaultcha ah I see what you mean now.

@bungle just tried your patch on this version

ENV KONG_VERSION 1.0.0rc4
ENV KONG_SHA256 b65c7c01a4d523b1687898c05793aae71645e8ba0b89d59343c157f4f88db7dd

I am seeing this error in runtime:

2018/12/13 20:56:48 [error] 22#0: init_by_lua error: /usr/local/share/lua/5.1/kong/runloop/handler.lua:14: module 'kong.api_router' not found:No LuaRocks module found for kong.api_router
	no field package.preload['kong.api_router']
	no file './kong/api_router.lua'
	no file './kong/api_router/init.lua'
	no file '/usr/local/openresty/site/lualib/kong/api_router.ljbc'
	no file '/usr/local/openresty/site/lualib/kong/api_router/init.ljbc'
	no file '/usr/local/openresty/lualib/kong/api_router.ljbc'
	no file '/usr/local/openresty/lualib/kong/api_router/init.ljbc'
	no file '/usr/local/openresty/site/lualib/kong/api_router.lua'
	no file '/usr/local/openresty/site/lualib/kong/api_router/init.lua'
	no file '/usr/local/openresty/lualib/kong/api_router.lua'
	no file '/usr/local/openresty/lualib/kong/api_router/init.lua'
	no file '/usr/local/openresty/luajit/share/luajit-2.1.0-beta3/kong/api_router.lua'
	no file '/usr/local/share/lua/5.1/kong/api_router.lua'
	no file '/usr/local/share/lua/5.1/kong/api_router/init.lua'
	no file '/usr/local/openresty/luajit/share/lua/5.1/kong/api_router.lua'
	no file '/usr/local/openresty/luajit/share/lua/5.1/kong/api_router/init.lua'
	no file '/.luarocks/share/lua/5.1/kong/api_router.lua'
	no file '/.luarocks/share/lua/5.1/kong/api_router/init.lua'
	no file '/usr/local/openresty/site/lualib/kong/api_router.so'
	no file '/usr/local/openresty/lualib/kong/api_router.so'
	no file './kong/api_router.so'
	no file '/usr/local/lib/lua/5.1/kong/api_router.so'
	no file '/usr/local/openresty/luajit/lib/lua/5.1/kong/api_router.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
	no file '/.luarocks/lib/lua/5.1/kong/api_router.so'
	no file '/usr/local/openresty/site/lualib/kong.so'
	no file '/usr/local/openresty/lualib/kong.so'
	no file './kong.so'
	no file '/usr/local/lib/lua/5.1/kong.so'
	no file '/usr/local/openresty/luajit/lib/lua/5.1/kong.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
	no file '/.luarocks/lib/lua/5.1/kong.so'
stack traceback:
	[C]: in function 'require'
	/usr/local/share/lua/5.1/kong/runloop/handler.lua:14: in main chunk
	[C]: in function 'require'
	/usr/local/share/lua/5.1/kong/init.lua:71: in main chunk
	[C]: in function 'require'
	init_by_lua:2: in main chunk
nginx: [error] init_by_lua error: /usr/local/share/lua/5.1/kong/runloop/handler.lua:14: module 'kong.api_router' not found:No LuaRocks module found for kong.api_router
	no field package.preload['kong.api_router']
	no file './kong/api_router.lua'
	no file './kong/api_router/init.lua'
	no file '/usr/local/openresty/site/lualib/kong/api_router.ljbc'
	no file '/usr/local/openresty/site/lualib/kong/api_router/init.ljbc'
	no file '/usr/local/openresty/lualib/kong/api_router.ljbc'
	no file '/usr/local/openresty/lualib/kong/api_router/init.ljbc'
	no file '/usr/local/openresty/site/lualib/kong/api_router.lua'
	no file '/usr/local/openresty/site/lualib/kong/api_router/init.lua'
	no file '/usr/local/openresty/lualib/kong/api_router.lua'
	no file '/usr/local/openresty/lualib/kong/api_router/init.lua'
	no file '/usr/local/openresty/luajit/share/luajit-2.1.0-beta3/kong/api_router.lua'
	no file '/usr/local/share/lua/5.1/kong/api_router.lua'
	no file '/usr/local/share/lua/5.1/kong/api_router/init.lua'
	no file '/usr/local/openresty/luajit/share/lua/5.1/kong/api_router.lua'
	no file '/usr/local/openresty/luajit/share/lua/5.1/kong/api_router/init.lua'
	no file '/.luarocks/share/lua/5.1/kong/api_router.lua'
	no file '/.luarocks/share/lua/5.1/kong/api_router/init.lua'
	no file '/usr/local/openresty/site/lualib/kong/api_router.so'
	no file '/usr/local/openresty/lualib/kong/api_router.so'
	no file './kong/api_router.so'
	no file '/usr/local/lib/lua/5.1/kong/api_router.so'
	no file '/usr/local/openresty/luajit/lib/lua/5.1/kong/api_router.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
	no file '/.luarocks/lib/lua/5.1/kong/api_router.so'
	no file '/usr/local/openresty/site/lualib/kong.so'
	no file '/usr/local/openresty/lualib/kong.so'
	no file './kong.so'
	no file '/usr/local/lib/lua/5.1/kong.so'
	no file '/usr/local/openresty/luajit/lib/lua/5.1/kong.so'
	no file '/usr/local/lib/lua/5.1/loadall.so'
	no file '/.luarocks/lib/lua/5.1/kong.so'
stack traceback:
	[C]: in function 'require'
	/usr/local/share/lua/5.1/kong/runloop/handler.lua:14: in main chunk
	[C]: in function 'require'
	/usr/local/share/lua/5.1/kong/init.lua:71: in main chunk
	[C]: in function 'require'
init_by_lua:2: in main chunk

Is the handler you patched still including deprecated api resources? I have to test against the rc4 for my dev environment

EDIT-
Ahh yep seems it is as I glance the code, I will just rid the patched handler of all deprecated references in my local git and test.

@thibaultcha
Copy link
Member

The patch was created on top of the next branch, which still has APIs and such, while rc4 is cut from the release branch, which has additional commits deleting APIs. So the patch must be appropriately tweaks for it to work in rc4, one would not be able to copy-paste the file as-is.

@jeremyjpj0916
Copy link
Contributor Author

I ran through most the code quick and gutted what I believe to be right as long as underlying method calls drop that ,api) passed var are still taking parameters in the same way:

Jeremy hacked handler:

-- Kong runloop
--
-- This consists of local_events that need to
-- be ran at the very beginning and very end of the lua-nginx-module contexts.
-- It mainly carries information related to a request from one context to the next one,
-- through the `ngx.ctx` table.
--
-- In the `access_by_lua` phase, it is responsible for retrieving the route being proxied by
-- a consumer. Then it is responsible for loading the plugins to execute on this request.
local ck          = require "resty.cookie"
local meta        = require "kong.meta"
local utils       = require "kong.tools.utils"
local Router      = require "kong.router"
local reports     = require "kong.reports"
local balancer    = require "kong.runloop.balancer"
local mesh        = require "kong.runloop.mesh"
local constants   = require "kong.constants"
local semaphore   = require "ngx.semaphore"
local responses   = require "kong.tools.responses"
local singletons  = require "kong.singletons"
local certificate = require "kong.runloop.certificate"


local kong        = kong
local tostring    = tostring
local tonumber    = tonumber
local sub         = string.sub
local lower       = string.lower
local fmt         = string.format
local sort        = table.sort
local ngx         = ngx
local log         = ngx.log
local ngx_now     = ngx.now
local update_time = ngx.update_time
local subsystem   = ngx.config.subsystem
local unpack      = unpack


local ERR         = ngx.ERR
local DEBUG       = ngx.DEBUG


local CACHE_ROUTER_OPTS = { ttl = 0 }
local EMPTY_T = {}


local get_router, build_router
local server_header = meta._SERVER_TOKENS


local build_router_semaphore


local function get_now()
  update_time()
  return ngx_now() * 1000 -- time is kept in seconds with millisecond resolution.
end

do
  -- Given a protocol, return the subsystem that handles it
  local protocol_subsystem = {
    http = "http",
    https = "http",
    tcp = "stream",
    tls = "stream",
  }

  local router
  local router_version


  build_router = function(db, version)
    local routes, i = {}, 0

    for route, err in db.routes:each(1000) do
      if err then
        return nil, "could not load routes: " .. err
      end

      local service_pk = route.service

      if not service_pk then
        return nil, "route (" .. route.id .. ") is not associated with service"
      end

      local service, err = db.services:select(service_pk)
      if not service then
        return nil, "could not find service for route (" .. route.id .. "): " ..
                    err
      end

      local stype = protocol_subsystem[service.protocol]
      if subsystem == stype then
        local r = {
          route   = route,
          service = service,
        }

        if stype == "http" and route.hosts then
          -- TODO: headers should probably be moved to route
          r.headers = {
            host = route.hosts,
          }
        end

        i = i + 1
        routes[i] = r
      end
    end

    sort(routes, function(r1, r2)
      r1, r2 = r1.route, r2.route
      if r1.regex_priority == r2.regex_priority then
        return r1.created_at < r2.created_at
      end
      return r1.regex_priority > r2.regex_priority
    end)

    local err
    router, err = Router.new(routes)
    if not router then
      return nil, "could not create router: " .. err
    end

    if version then
      router_version = version
    end

    singletons.router = router

    return true
  end


  local function check_router_rebuild()
    -- we might not need to rebuild the router (if we were not
    -- the first request in this process to enter this code path)
    -- check again and rebuild only if necessary
    local version, err = singletons.cache:get("router:version",
                                              CACHE_ROUTER_OPTS,
                                              utils.uuid)
    if err then
      log(ngx.CRIT, "could not ensure router is up to date: ", err)
      return nil, err
    end

    if version == router_version then
      return true
    end

    -- router needs to be rebuilt in this worker
    log(DEBUG, "rebuilding router")

    local ok, err = build_router(singletons.db, version)
    if not ok then
      log(ngx.CRIT, "could not rebuild router: ", err)
      return nil, err
    end

    return true
  end


  get_router = function()
    local version, err = singletons.cache:get("router:version",
                                              CACHE_ROUTER_OPTS,
                                              utils.uuid)
    if err then
      log(ngx.CRIT, "could not ensure router is up to date: ", err)
      return nil, err
    end

    if version == router_version then
      return router
    end

    -- wrap router rebuilds in a per-worker mutex (via ngx.semaphore)
    -- this prevents dogpiling the database during rebuilds in
    -- high-concurrency traffic patterns
    -- requests that arrive on this process during a router rebuild will be
    -- queued. once the semaphore resource is acquired we re-check the
    -- router version again to prevent unnecessary subsequent rebuilds

    local timeout = 60
    if singletons.configuration.database == "cassandra" then
      -- cassandra_timeout is defined in ms
      timeout = singletons.configuration.cassandra_timeout / 1000

    elseif singletons.configuration.database == "postgres" then
      -- pg_timeout is defined in ms
      timeout = singletons.configuration.pg_timeout / 1000
    end

    -- acquire lock
    local ok, err = build_router_semaphore:wait(timeout)
    if not ok then
      return nil, "error attempting to acquire build_router lock: " .. err
    end

    local pok
    pok, ok, err = pcall(check_router_rebuild)

    -- release lock
    build_router_semaphore:post(1)

    if not pok then
      return nil, ok
    end

    if not ok then
      return nil, err
    end

    return router
  end
end


local function balancer_setup_stage1(ctx, scheme, host_type, host, port,
                                     service, route)
  local balancer_data = {
    scheme         = scheme,    -- scheme for balancer: http, https
    type           = host_type, -- type of 'host': ipv4, ipv6, name
    host           = host,      -- target host per `upstream_url`
    port           = port,      -- final target port
    try_count      = 0,         -- retry counter
    tries          = {},        -- stores info per try
    ssl_ctx        = kong.default_client_ssl_ctx, -- SSL_CTX* to use
    -- ip          = nil,       -- final target IP address
    -- balancer    = nil,       -- the balancer object, if any
    -- hostname    = nil,       -- hostname of the final target IP
    -- hash_cookie = nil,       -- if Upstream sets hash_on_cookie
  }

  -- TODO: this is probably not optimal
  do
    local retries = service.retries
    if retries then
      balancer_data.retries = retries

    else
      balancer_data.retries = 5
    end

    local connect_timeout = service.connect_timeout
    
    if connect_timeout then
      balancer_data.connect_timeout = connect_timeout

    else
      balancer_data.connect_timeout = 60000
    end

    local send_timeout = service.write_timeout 
    
    if send_timeout then
      balancer_data.send_timeout = send_timeout

    else
      balancer_data.send_timeout = 60000
    end

    local read_timeout = service.read_timeout 
    
    if read_timeout then
      balancer_data.read_timeout = read_timeout

    else
      balancer_data.read_timeout = 60000
    end
  end

  -- TODO: this needs to be removed when references to ctx.api are removed
  ctx.service          = service
  ctx.route            = route
  ctx.balancer_data    = balancer_data
  ctx.balancer_address = balancer_data -- for plugin backward compatibility
end


local function balancer_setup_stage2(ctx)
  local balancer_data = ctx.balancer_data

  do -- Check for KONG_ORIGINS override
    local origin_key = balancer_data.scheme .. "://" ..
                       utils.format_host(balancer_data)
    local origin = singletons.origins[origin_key]
    if origin then
      balancer_data.scheme = origin.scheme
      balancer_data.type = origin.type
      balancer_data.host = origin.host
      balancer_data.port = origin.port
    end
  end

  local ok, err, errcode = balancer.execute(balancer_data, ctx)
  if not ok and errcode == 500 then
    err = "failed the initial dns/balancer resolve for '" ..
          balancer_data.host .. "' with: " .. tostring(err)
  end

  return ok, err, errcode
end


-- in the table below the `before` and `after` is to indicate when they run:
-- before or after the plugins
return {
  build_router     = build_router,

  init_worker = {
    before = function()
      reports.init_worker()

      -- initialize local local_events hooks
      local db             = singletons.db
      local dao            = singletons.dao
      local cache          = singletons.cache
      local worker_events  = singletons.worker_events
      local cluster_events = singletons.cluster_events


      -- get a cache key using either the old or new DAO
      local function get_cache_key(data, entity)
        if data.schema.table then
          return dao[data.schema.table]:entity_cache_key(entity)
        else
          return db[data.schema.name]:cache_key(entity)
        end
      end


      -- events dispatcher


      worker_events.register(function(data)
        if not data.schema then
          log(ngx.ERR, "[events] missing schema in crud subscriber")
          return
        end

        if not data.entity then
          log(ngx.ERR, "[events] missing entity in crud subscriber")
          return
        end

        -- invalidate this entity anywhere it is cached if it has a
        -- caching key

        local cache_key = get_cache_key(data, data.entity)

        if cache_key then
          cache:invalidate(cache_key)
        end

        -- if we had an update, but the cache key was part of what was updated,
        -- we need to invalidate the previous entity as well

        if data.old_entity then
          cache_key = get_cache_key(data, data.old_entity)
          if cache_key then
            cache:invalidate(cache_key)
          end
        end

        if not data.operation then
          log(ngx.ERR, "[events] missing operation in crud subscriber")
          return
        end

        -- public worker events propagation

        local entity_channel           = data.schema.table or data.schema.name
        local entity_operation_channel = fmt("%s:%s", entity_channel,
                                             data.operation)

        -- crud:routes
        local _, err = worker_events.post_local("crud", entity_channel, data)
        if err then
          log(ngx.ERR, "[events] could not broadcast crud event: ", err)
          return
        end

        -- crud:routes:create
        _, err = worker_events.post_local("crud", entity_operation_channel, data)
        if err then
          log(ngx.ERR, "[events] could not broadcast crud event: ", err)
          return
        end
      end, "dao:crud")


      -- local events (same worker)

      worker_events.register(function()
        log(DEBUG, "[events] Route updated, invalidating router")
        cache:invalidate("router:version")
      end, "crud", "routes")


      worker_events.register(function(data)
        if data.operation ~= "create" and
           data.operation ~= "delete"
        then
          -- no need to rebuild the router if we just added a Service
          -- since no Route is pointing to that Service yet.
          -- ditto for deletion: if a Service if being deleted, it is
          -- only allowed because no Route is pointing to it anymore.
          log(DEBUG, "[events] Service updated, invalidating router")
          cache:invalidate("router:version")
        end
      end, "crud", "services")


      worker_events.register(function(data)
        -- assume an update doesnt also change the whole entity!
        if data.operation ~= "update" then
          log(DEBUG, "[events] Plugin updated, invalidating plugins map")
          cache:invalidate("plugins_map:version")
        end
      end, "crud", "plugins")


      -- SSL certs / SNIs invalidations


      worker_events.register(function(data)
        log(DEBUG, "[events] SNI updated, invalidating cached certificates")
        local sn = data.entity

        cache:invalidate("certificates:" .. sn.name)
      end, "crud", "snis")


      worker_events.register(function(data)
        log(DEBUG, "[events] SSL cert updated, invalidating cached certificates")
        local certificate = data.entity

        for sn, err in db.snis:each_for_certificate({ id = certificate.id }) do
          if err then
            log(ERR, "[events] could not find associated snis for certificate: ",
                     err)
            break
          end

          cache:invalidate("certificates:" .. sn.name)
        end
      end, "crud", "certificates")


      -- target updates


      -- worker_events local handler: event received from DAO
      worker_events.register(function(data)
        local operation = data.operation
        local target = data.entity
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "targets", {
          operation = data.operation,
          entity = data.entity,
        })
        if not ok then
          log(ERR, "failed broadcasting target ",
              operation, " to workers: ", err)
        end
        -- => to cluster_events handler
        local key = fmt("%s:%s", operation, target.upstream.id)
        ok, err = cluster_events:broadcast("balancer:targets", key)
        if not ok then
          log(ERR, "failed broadcasting target ", operation, " to cluster: ", err)
        end
      end, "crud", "targets")


      -- worker_events node handler
      worker_events.register(function(data)
        local operation = data.operation
        local target = data.entity

        -- => to balancer update
        balancer.on_target_event(operation, target)
      end, "balancer", "targets")


      -- cluster_events handler
      cluster_events:subscribe("balancer:targets", function(data)
        local operation, key = unpack(utils.split(data, ":"))
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "targets", {
          operation = operation,
          entity = {
            upstream = { id = key },
          }
        })
        if not ok then
          log(ERR, "failed broadcasting target ", operation, " to workers: ", err)
        end
      end)


      -- manual health updates
      cluster_events:subscribe("balancer:post_health", function(data)
        local pattern = "([^|]+)|([^|]+)|([^|]+)|([^|]+)|(.*)"
        local ip, port, health, id, name = data:match(pattern)
        port = tonumber(port)
        local upstream = { id = id, name = name }
        local ok, err = balancer.post_health(upstream, ip, port, health == "1")
        if not ok then
          log(ERR, "failed posting health of ", name, " to workers: ", err)
        end
      end)


      -- upstream updates


      -- worker_events local handler: event received from DAO
      worker_events.register(function(data)
        local operation = data.operation
        local upstream = data.entity
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "upstreams", {
          operation = data.operation,
          entity = data.entity,
        })
        if not ok then
          log(ERR, "failed broadcasting upstream ",
              operation, " to workers: ", err)
        end
        -- => to cluster_events handler
        local key = fmt("%s:%s:%s", operation, upstream.id, upstream.name)
        ok, err = cluster_events:broadcast("balancer:upstreams", key)
        if not ok then
          log(ERR, "failed broadcasting upstream ", operation, " to cluster: ", err)
        end
      end, "crud", "upstreams")


      -- worker_events node handler
      worker_events.register(function(data)
        local operation = data.operation
        local upstream = data.entity

        -- => to balancer update
        balancer.on_upstream_event(operation, upstream)
      end, "balancer", "upstreams")


      cluster_events:subscribe("balancer:upstreams", function(data)
        local operation, id, name = unpack(utils.split(data, ":"))
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "upstreams", {
          operation = operation,
          entity = {
            id = id,
            name = name,
          }
        })
        if not ok then
          log(ERR, "failed broadcasting upstream ", operation, " to workers: ", err)
        end
      end)


      -- initialize balancers for active healthchecks
      ngx.timer.at(0, function()
        balancer.init()
      end)

      do
        local err

        build_router_semaphore, err = semaphore.new()
        if err then
          log(ngx.CRIT, "failed to create build_router_semaphore: ", err)
        end

        build_router_semaphore:post(1)
      end
    end
  },
  certificate = {
    before = function(_)
      certificate.execute()
    end
  },
  rewrite = {
    before = function(ctx)
      ctx.KONG_REWRITE_START = get_now()
      mesh.rewrite()
    end,
    after = function(ctx)
      ctx.KONG_REWRITE_TIME = get_now() - ctx.KONG_REWRITE_START -- time spent in Kong's rewrite_by_lua
    end
  },
  preread = {
    before = function(ctx)
      local router, err = get_router()
      if not router then
        log(ERR, "no router to route connection (reason: " .. err .. ")")
        return ngx.exit(500)
      end

      local match_t = router.exec(ngx)
      if not match_t then
        log(ERR, "no Route found with those values")
        return ngx.exit(500)
      end

      local var = ngx.var

      local ssl_termination_ctx -- OpenSSL SSL_CTX to use for termination

      local ssl_preread_alpn_protocols = var.ssl_preread_alpn_protocols
      -- ssl_preread_alpn_protocols is a comma separated list
      -- see https://trac.nginx.org/nginx/ticket/1616
      if ssl_preread_alpn_protocols and
         ssl_preread_alpn_protocols:find(mesh.get_mesh_alpn(), 1, true) then
        -- Is probably an incoming service mesh connection
        -- terminate service-mesh Mutual TLS
        ssl_termination_ctx = mesh.mesh_server_ssl_ctx
        ctx.is_service_mesh_request = true
      else
        -- TODO: stream router should decide if TLS is terminated or not
        -- XXX: for now, use presence of SNI to terminate.
        local sni = var.ssl_preread_server_name
        if sni then
          ngx.log(ngx.DEBUG, "SNI: ", sni)

          local err
          ssl_termination_ctx, err = certificate.find_certificate(sni)
          if not ssl_termination_ctx then
            ngx.log(ngx.ERR, err)
            return ngx.exit(ngx.ERROR)
          end

          -- TODO Fake certificate phase?

          ngx.log(ngx.INFO, "attempting to terminate TLS")
        end
      end

      -- Terminate TLS
      if ssl_termination_ctx and not ngx.req.starttls(ssl_termination_ctx) then -- luacheck: ignore
        -- errors are logged by nginx core
        return ngx.exit(ngx.ERROR)
      end

      ctx.KONG_PREREAD_START = get_now()

      local route = match_t.route or EMPTY_T
      local service = match_t.service or EMPTY_T
      local upstream_url_t = match_t.upstream_url_t

      balancer_setup_stage1(ctx, match_t.upstream_scheme,
                            upstream_url_t.type,
                            upstream_url_t.host,
                            upstream_url_t.port,
                            service, route)
    end,
    after = function(ctx)
      local ok, err, errcode = balancer_setup_stage2(ctx)
      if not ok then
        return responses.send(errcode, err)
      end

      local now = get_now()

      -- time spent in Kong's preread_by_lua
      ctx.KONG_PREREAD_TIME     = now - ctx.KONG_PREREAD_START
      ctx.KONG_PREREAD_ENDED_AT = now
      -- time spent in Kong before sending the request to upstream
      -- ngx.req.start_time() is kept in seconds with millisecond resolution.
      ctx.KONG_PROXY_LATENCY   = now - ngx.req.start_time() * 1000
      ctx.KONG_PROXIED         = true
    end
  },
  access = {
    before = function(ctx)
      -- ensure routers are up-to-date
      local cache = singletons.cache

      -- router for Routes/Services

      local router, err = get_router()
      if not router then
        return responses.send_HTTP_INTERNAL_SERVER_ERROR(
          "no router to route request (reason: " .. err ..  ")")
      end

      -- routing request

      local var = ngx.var

      ctx.KONG_ACCESS_START = get_now()

      local match_t = router.exec(ngx)
      if not match_t then
        return responses.send_HTTP_NOT_FOUND("no route found with those values")
      end

      local route              = match_t.route or EMPTY_T
      local service            = match_t.service or EMPTY_T
      local upstream_url_t     = match_t.upstream_url_t

      local realip_remote_addr = var.realip_remote_addr
      local forwarded_proto
      local forwarded_host
      local forwarded_port

      -- X-Forwarded-* Headers Parsing
      --
      -- We could use $proxy_add_x_forwarded_for, but it does not work properly
      -- with the realip module. The realip module overrides $remote_addr and it
      -- is okay for us to use it in case no X-Forwarded-For header was present.
      -- But in case it was given, we will append the $realip_remote_addr that
      -- contains the IP that was originally in $remote_addr before realip
      -- module overrode that (aka the client that connected us).

      local trusted_ip = kong.ip.is_trusted(realip_remote_addr)
      if trusted_ip then
        forwarded_proto = var.http_x_forwarded_proto or var.scheme
        forwarded_host  = var.http_x_forwarded_host  or var.host
        forwarded_port  = var.http_x_forwarded_port  or var.server_port

      else
        forwarded_proto = var.scheme
        forwarded_host  = var.host
        forwarded_port  = var.server_port
      end

      local protocols = route.protocols
      if (protocols and protocols.https and not protocols.http and forwarded_proto ~= "https")
      then
        ngx.header["connection"] = "Upgrade"
        ngx.header["upgrade"]    = "TLS/1.2, HTTP/1.1"
        return responses.send(426, "Please use HTTPS protocol")
      end

      balancer_setup_stage1(ctx, match_t.upstream_scheme,
                            upstream_url_t.type,
                            upstream_url_t.host,
                            upstream_url_t.port,
                            service, route)

      ctx.router_matches = match_t.matches

      -- `uri` is the URI with which to call upstream, as returned by the
      --       router, which might have truncated it (`strip_uri`).
      -- `host` is the original header to be preserved if set.
      var.upstream_scheme = match_t.upstream_scheme -- COMPAT: pdk
      var.upstream_uri    = match_t.upstream_uri
      var.upstream_host   = match_t.upstream_host

      -- Keep-Alive and WebSocket Protocol Upgrade Headers
      if var.http_upgrade and lower(var.http_upgrade) == "websocket" then
        var.upstream_connection = "upgrade"
        var.upstream_upgrade    = "websocket"

      else
        var.upstream_connection = "keep-alive"
      end

      -- X-Forwarded-* Headers
      local http_x_forwarded_for = var.http_x_forwarded_for
      if http_x_forwarded_for then
        var.upstream_x_forwarded_for = http_x_forwarded_for .. ", " ..
                                       realip_remote_addr

      else
        var.upstream_x_forwarded_for = var.remote_addr
      end

      var.upstream_x_forwarded_proto = forwarded_proto
      var.upstream_x_forwarded_host  = forwarded_host
      var.upstream_x_forwarded_port  = forwarded_port
    end,
    -- Only executed if the `router` module found a route and allows nginx to proxy it.
    after = function(ctx)
      local var = ngx.var

      do
        -- Nginx's behavior when proxying a request with an empty querystring
        -- `/foo?` is to keep `$is_args` an empty string, hence effectively
        -- stripping the empty querystring.
        -- We overcome this behavior with our own logic, to preserve user
        -- desired semantics.
        local upstream_uri = var.upstream_uri

        if var.is_args == "?" or sub(var.request_uri, -1) == "?" then
          var.upstream_uri = upstream_uri .. "?" .. (var.args or "")
        end
      end

      local balancer_data = ctx.balancer_data
      balancer_data.scheme = var.upstream_scheme -- COMPAT: pdk

      local ok, err, errcode = balancer_setup_stage2(ctx)
      if not ok then
        return responses.send(errcode, err)
      end

      var.upstream_scheme = balancer_data.scheme

      do
        -- set the upstream host header if not `preserve_host`
        local upstream_host = var.upstream_host

        if not upstream_host or upstream_host == "" then
          upstream_host = balancer_data.hostname

          local upstream_scheme = var.upstream_scheme
          if upstream_scheme == "http"  and balancer_data.port ~= 80 or
             upstream_scheme == "https" and balancer_data.port ~= 443
          then
            upstream_host = upstream_host .. ":" .. balancer_data.port
          end

          var.upstream_host = upstream_host
        end
      end

      local now = get_now()

      -- time spent in Kong's access_by_lua
      ctx.KONG_ACCESS_TIME     = now - ctx.KONG_ACCESS_START
      ctx.KONG_ACCESS_ENDED_AT = now
      -- time spent in Kong before sending the request to upstream
      -- ngx.req.start_time() is kept in seconds with millisecond resolution.
      ctx.KONG_PROXY_LATENCY   = now - ngx.req.start_time() * 1000
      ctx.KONG_PROXIED         = true
    end
  },
  balancer = {
    before = function(ctx)
      local balancer_data = ctx.balancer_data
      local current_try = balancer_data.tries[balancer_data.try_count]
      current_try.balancer_start = get_now()
    end,
    after = function(ctx)
      local balancer_data = ctx.balancer_data
      local current_try = balancer_data.tries[balancer_data.try_count]

      -- record try-latency
      local try_latency = get_now() - current_try.balancer_start
      current_try.balancer_latency = try_latency

      -- record overall latency
      ctx.KONG_BALANCER_TIME = (ctx.KONG_BALANCER_TIME or 0) + try_latency
    end
  },
  header_filter = {
    before = function(ctx)
      local header = ngx.header

      if not ctx.KONG_PROXIED then
        return
      end

      local now = get_now()
      -- time spent waiting for a response from upstream
      ctx.KONG_WAITING_TIME             = now - ctx.KONG_ACCESS_ENDED_AT
      ctx.KONG_HEADER_FILTER_STARTED_AT = now

      local upstream_status_header = constants.HEADERS.UPSTREAM_STATUS
      if singletons.configuration.enabled_headers[upstream_status_header] then
        header[upstream_status_header] = tonumber(sub(ngx.var.upstream_status or "", -3))
        if not header[upstream_status_header] then
          log(ERR, "failed to set ", upstream_status_header, " header")
        end
      end

      local hash_cookie = ctx.balancer_data.hash_cookie
      if not hash_cookie then
        return
      end

      local cookie = ck:new()
      local ok, err = cookie:set(hash_cookie)

      if not ok then
        log(ngx.WARN, "failed to set the cookie for hash-based load balancing: ", err,
                      " (key=", hash_cookie.key,
                      ", path=", hash_cookie.path, ")")
      end
    end,
    after = function(ctx)
      local header = ngx.header

      if ctx.KONG_PROXIED then
        if singletons.configuration.enabled_headers[constants.HEADERS.UPSTREAM_LATENCY] then
          header[constants.HEADERS.UPSTREAM_LATENCY] = ctx.KONG_WAITING_TIME
        end

        if singletons.configuration.enabled_headers[constants.HEADERS.PROXY_LATENCY] then
          header[constants.HEADERS.PROXY_LATENCY] = ctx.KONG_PROXY_LATENCY
        end

        if singletons.configuration.enabled_headers[constants.HEADERS.VIA] then
          header[constants.HEADERS.VIA] = server_header
        end

      else
        if singletons.configuration.enabled_headers[constants.HEADERS.SERVER] then
          header[constants.HEADERS.SERVER] = server_header

        else
          header[constants.HEADERS.SERVER] = nil
        end
      end
    end
  },
  body_filter = {
    after = function(ctx)
      if not ngx.arg[2] then
        return
      end

      local now = get_now()
      ctx.KONG_BODY_FILTER_ENDED_AT = now

      if ctx.KONG_PROXIED then
        -- time spent receiving the response (header_filter + body_filter)
        -- we could use $upstream_response_time but we need to distinguish the waiting time
        -- from the receiving time in our logging plugins (especially ALF serializer).
        ctx.KONG_RECEIVE_TIME = now - ctx.KONG_HEADER_FILTER_STARTED_AT
      end
    end
  },
  log = {
    after = function(ctx)
      reports.log()

      if not ctx.KONG_PROXIED then
        return
      end

      -- If response was produced by an upstream (ie, not by a Kong plugin)
      -- Report HTTP status for health checks
      local balancer_data = ctx.balancer_data
      if balancer_data and balancer_data.balancer and balancer_data.ip then
        local ip, port = balancer_data.ip, balancer_data.port

        local status = ngx.status
        if status == 504 then
          balancer_data.balancer.report_timeout(ip, port)
        else
          balancer_data.balancer.report_http_status(ip, port, status)
        end
      end
    end
  }
}

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 13, 2018

Aight, yep seems the above ^ dropped in Kong can start up, we are in business folks. Time to test.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 13, 2018

@thibaultcha @bungle @p0pr0ck5

Initial results are looking quality here folks, no mutex thrown errors in STDOUT and my load test @ 100 TPS shows no availability impact to the endpoint under load during the router adds on the fly!!!! Will continue testing edge cases as well as what its like calling other existing routes during this time of load-test but this is likely the best possible outcome at the moment I could have hoped for. You guys rock.

Screenshot results @ the 100 TPS during router changes:
newfirsttestsuccess

Edit - not sure on the cpu statement, revoking it.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 13, 2018

Okay we actually ran a test with 100ish TPS to 2 route endpoints(200ish TPS total at this point), and I started modifying routes again on the fly adding say 20-30 to an arbitrary service(at a rate of about 1 every 2 seconds) not related to the ones under load and I still was able to reproduce some 500 errors that showed up in about 3/6th of the worker processes. But they only persisted during the window I was making router changes while loadtesting whereas the first rc4 before the @bungle patch the errors continually persisted long after the load test was done:

2018/12/13 22:45:01 [error] 153#0: 78094 [lua] responses.lua:128: before(): no router to route request (reason: error attempting to acquire build_router lock: timeout), client: 10..., server: kong, request: "GET /F5/status HTTP/1.1", host: "gateway-dev--.*****.com"

2018/12/13 22:45:01 [error] 153#0: 77722 [lua] responses.lua:128: before(): no router to route request (reason: error attempting to acquire build_router lock: timeout), client: 10..., server: kong, request: "GET /api/demo/test HTTP/1.1", host: "gateway-dev--.*****.com"

We will try to whip up python scripts for building out the Kong env as well as simulating active changes under load and attach here.

Edit visual:
newresults500errors

If you notice see when the tps of 200's is dipping and throughput seems impacted(didn't seem to in my prior 100 TPS test)? That signifies when the router changes started taking place and when the throughput stabilizes is when I stopped yet continued the load testing. Then you can see the blip of 500's which are all related to the build_router lock error. So seems errors can still occur and throughput is impacted when routes are being modified. The throughput drop is not the most worrisome for now but the 500 level errors is still a concern from a SLA impact.

Edit - All the 500 level errors much like prior discussed from rc testing took a TotalLatency of 5 seconds exactly.

@rsbrisci
Copy link
Contributor

rsbrisci commented Dec 14, 2018

@thibaultcha you had asked for a couple of scripts to generate routes/services and to update them to reproduce the failures

PythonCreateKongProxies.zip

There's two scripts in there. They are both pretty similar.

For example, to create 500 test proxies, I'd use this command:

python create_kong_proxies.py localhost:8001 500

To add 50 routes to a service once every 2 seconds to cause failures during an active loadtest, I'd use this (assuming in both cases that localhost:8001 is the root path to my admin API):

python update_proxy_routes.py localhost:8001  12345-my-service-id-54321 50

@bungle
Copy link
Member

bungle commented Dec 14, 2018

Yes 5 seconds:

pg_timeout = 5000
cassandra_timeout = 5000

It means that requests were stuck 5 seconds waiting for semaphore to be released while other lighthread (another request) was trying to build a new router. The router building did not finish in 5 seconds, and no router was given back.

The above scenario happens on all workers on a node.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 15, 2018

Right so sounds like the first patchwork will be the only going in for 1.0 because it still maintains worker consistency which is a Kong primary objective (noting our environment we were able to reproduce 500 level errors in that state under duress higher than 100 TPS, which was clarified to likely be because the time it takes to iterate over our 500+ services/route pairs exceeds 5 second C* timeout limits).

It sounds like there may be a shot at getting the best of both worlds(consistency with worker processes AND no potential for failures under any circumstance) with @thibaultcha suggestion of a shm to keep track and leverage new vs stale router during rebuild. And this would come in later post 1.0 releases maybe as a minor update it sounds like.

If I were to play devils advocate on the "eventually consistent workers seems to utterly confuse our users", could the same not be said for what seems to be deemed the proper implementation which is "I have 10,000 service and route resources Kong team, I added a route and tried to call it programmatically in 2 second why did it throw 404?" Then the appropriate Kong response would be, "well the router rebuild is a global and takes some time to query all those services and routes and the worker processes had not yet agreed on the next stable router(by use of consensus through shm I suppose)." There is always going to be something to clarify to people huh 😄 .

I do think whatever the case going forward an implementation that does not impact pending requests and existing prior service/route resources has to be achieved regardless of how poor the db (postgres/cassandra) behaves. I like the idea of optimizing how services are fetched and to do local processing on the Kong node over bombarding C* with tons of individual queries per each service tied to the route(which feels slightly weird anyways because 1 service can have many routes, not 1 route has many services BUT routes are what actually expose the service resource so at the same time I get it).

I may personally hold onto Bungle's full patchwork to run with the 1.0 release because I will take consistency differences in a few seconds in time between worker processes any day over possible failures of any kind. This originating 404 issue was probably the second scare I have had with Kong, hoping to keep my sanity among how fast paced the project is tied around leveraging Kong internally. Essentially 2019 is the year for us Kong goes ga internally and shows those around us it can actually work, been in a beta of sorts now with a few production services. I expect 5,000 non-prod proxies total by eoy 2019 (the environment we currently have about 700+ proxies now in where I see the 404's on 0.14.1). And close to say 1000 production proxies, with traffic in the 2,500 TPS - 3,000 TPS range. Hopefully smooth sailing eh (never is the case in tech).

Hope yall are having a nice weekend,
-Jeremy

@thibaultcha
Copy link
Member

I don’t think the fallback mechanism introduced in #4098 will actually help alleviate the initial symptoms reported here.

That is not what I claimed.

@thibaultcha
Copy link
Member

thibaultcha commented Dec 15, 2018

I actually said the exact same thing in my above comment that the underlying issue was not resolved, and laid out clearly what each contribution solves and what other steps we could take.

@p0pr0ck5
Copy link
Contributor

@jeremyjpj0916 I've tossed up #4101 as a POC/review PR. Would you be willing to give this patch a try in your stress testing environment? It's built against the next branch and won't apply cleanly to RC4, but it should be fairly straightforward to manually transpose (let me know if you need a hand with this).

@jeremyjpj0916
Copy link
Contributor Author

@p0pr0ck5 I will give it a go on Sunday if It drops into rc4 with minimal changes needed. By the looks of it this perf should make the runloop more efficient in db interactions, maybe even keeps the 500 errors away at various tps with change rates to the routers 😄. Fingers crossed!

@bungle
Copy link
Member

bungle commented Dec 16, 2018

@jeremyjpj0916 I consider my work now more or less done with #4102 (though waiting for feedback). I know @thibaultcha argued against adding a configuration parameter that is hard to describe, but I think it is really hard to have one size fits all strategy for this without going very much back to the drawing board and implementing more granular router updates.

So all in all I think the PR #4102 covers a lot what we have discussed:

  1. timers do building router recursively and they ensure that db state is not changed during the build
  2. all workers on node do run their own timer whenever routes or services are changed
  3. other members on cluster are notified on changes and they also update all their workers (though cluster events are bit slower)
  4. services are cached on node level, so each worker can access same services, and service cache is pre-warmed on node startup

This provides 3 modes: 1. events timer only, 2. request timer and sync fallback, 3. request wait and synchronous fallback

mode 1 (the most performant):

router_timeout = -1

mode 2 (tries harder to update router by creating timer on request or fallbacking to synchronous rebuild on request, still very fast, no waiting unless fallback):

router_timeout = 0

mode 3a (waits just a 1ms and then builds synchronously if new router is not yet available):

router_timeout = 1

mode 3b (old behavior, waits at max 5 secs and then builds synchronously if new router is not yet available):

router_timeout = 5000

All the modes have events timer enabled, but modes 2 and 3 also do request time rebuilding, it is bit of a luck what gets to run it then (request or event).

I would run either 0 or -1 on production. It never gives 500 as it always returns some router. If you have time, you can try that branch with different settings described above. You can apply it directly on top of 1.0.0rc4. I am not sure can we include it for final 1.0.0(probably not), but I hope we can include it shortly after, and there your feedback comes really valuable!

Thanks.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 16, 2018

@bungle Its all great to me due to the flexibility and as someone who somewhat understands the issue and your approach to solve it, but it does sound like you may have a hard time internally convincing others because:

  1. It does introduce a new configuration to users that lacking background may choose the wrong config for their needs (the right default would be important here).

  2. Enables eventually consistent worker processes with mode 1(but probably the mode I would choose to run), which seems to be considered a regression because of confusion around behavior users noticed (but maybe with @p0pr0ck5 enhancement to how Kong pulls in service resources the time it takes for workers to reach consistency would be pretty fast, would be interesting to quantify to have an idea of long it realistically takes for say 100/1000/10000 proxy resources to reach consistency across the workers in an eventually consistent Kong environment).

  3. Because maybe the best path forward is granularity in service/route CRUD. Obviously that would be a pretty big overhaul, but is it the magic bullet Kong needs to achieve the best scale/perf/stability when CRUD occurs around proxy resources(and I suppose removes the need for new configs and the possibility of eventual consistency w worker processes)? I don't know exactly, maybe it would be a decent chunk of work for little gain. Although for how quickly you whip things up that's like 2-3 nights of pizza+beer+coding eh 😄 ?

Whatever becomes of your patch work its been awesome for us and solving the issues we see, I still struggle to understand how no one else has come across the issue and that its not easy to reproduce under realistic conditions while leveraging services and routes on 0.14.1. Maybe most of the community isn't on it yet(I still see posts from people on Kong versions like 3-4 major iterations behind 0.14.1. Or maybe its because not many run a C* setup multi DC with strict consistency which adds a little DB latency to the mix under load. Also with how slow many Enterprises move I would think most enterprise customers are still on deprecated API resources that run large Kong nodes with many proxies 🐌.

I will give your new stuff a go though, do note I will be on PTO next work week without laptop access.
But can vet it more once I am back in the saddle.

Thanks!
-Jeremy

@bungle
Copy link
Member

bungle commented Dec 17, 2018

Yes, we will see. While Roberts prefetch does speed things up, I would guess my proposal's caching is one step forward from that as cache is shared with all the workers and invalidated automatically and warmed on node start. Prefetch can also make small bugs as the service prefeched may already be stale when looping routes. And the approach I took is to update every worker on cluster regardless if those workers get requests when we get new router version event and when they do they already have router up to date. But well...

@jeremyjpj0916
Copy link
Contributor Author

@p0pr0ck5 This is what I came up with for the cleaned up handler based on your next branch that should work with rc4

-- Kong runloop
--
-- This consists of local_events that need to
-- be ran at the very beginning and very end of the lua-nginx-module contexts.
-- It mainly carries information related to a request from one context to the next one,
-- through the `ngx.ctx` table.
--
-- In the `access_by_lua` phase, it is responsible for retrieving the route being proxied by
-- a consumer. Then it is responsible for loading the plugins to execute on this request.
local ck          = require "resty.cookie"
local meta        = require "kong.meta"
local utils       = require "kong.tools.utils"
local Router      = require "kong.router"
local reports     = require "kong.reports"
local balancer    = require "kong.runloop.balancer"
local mesh        = require "kong.runloop.mesh"
local constants   = require "kong.constants"
local semaphore   = require "ngx.semaphore"
local responses   = require "kong.tools.responses"
local singletons  = require "kong.singletons"
local certificate = require "kong.runloop.certificate"


local kong        = kong
local tostring    = tostring
local tonumber    = tonumber
local sub         = string.sub
local lower       = string.lower
local fmt         = string.format
local sort        = table.sort
local ngx         = ngx
local log         = ngx.log
local ngx_now     = ngx.now
local update_time = ngx.update_time
local subsystem   = ngx.config.subsystem
local unpack      = unpack


local ERR         = ngx.ERR
local WARN        = ngx.WARN
local DEBUG       = ngx.DEBUG


local CACHE_ROUTER_OPTS = { ttl = 0 }
local EMPTY_T = {}


local get_router, build_router
local server_header = meta._SERVER_TOKENS
local _set_check_router_rebuild

local build_router_semaphore


local function get_now()
  update_time()
  return ngx_now() * 1000 -- time is kept in seconds with millisecond resolution.
end

do
  -- Given a protocol, return the subsystem that handles it
  local protocol_subsystem = {
    http = "http",
    https = "http",
    tcp = "stream",
    tls = "stream",
  }

  local router
  local router_version


  local function gather_services(db)
    local services_t = {}

    for service, err in db.services:each(1000) do
      if err then
        return nil, "could not load services: " .. err
      end

      if not services_t[service.id] then
        services_t[service.id] = service
      end
    end

    return services_t
  end


  build_router = function(db, version)
    local routes, i = {}, 0

    -- trade space for time by pre-fetching services in an efficient
    -- manner (paging through them, using the same page size as
    -- as implemented by the Route pager below). this lets us build
    -- the Route <> Service relationship needed by the router with
    -- minimal database traffic
    local services, err = gather_services(db)
    if err then
      return nil, err
    end

    for route, err in db.routes:each(1000) do
      if err then
        return nil, "could not load routes: " .. err
      end

      local service_pk = route.service

      if not service_pk then
        return nil, "route (" .. route.id .. ") is not associated with service"
      end

      -- service_pk is a table of { id = <uuid> }
      local service = services[service_pk.id]
      if not service then
        return nil, "could not find service for route (" .. route.id .. "): " ..
                    err
      end

      local stype = protocol_subsystem[service.protocol]
      if subsystem == stype then
        local r = {
          route   = route,
          service = service,
        }

        if stype == "http" and route.hosts then
          -- TODO: headers should probably be moved to route
          r.headers = {
            host = route.hosts,
          }
        end

        i = i + 1
        routes[i] = r
      end
    end

    sort(routes, function(r1, r2)
      r1, r2 = r1.route, r2.route
      if r1.regex_priority == r2.regex_priority then
        return r1.created_at < r2.created_at
      end
      return r1.regex_priority > r2.regex_priority
    end)

    local err
    router, err = Router.new(routes)
    if not router then
      return nil, "could not create router: " .. err
    end

    if version then
      router_version = version
    end

    singletons.router = router

    return true
  end


  local function check_router_rebuild()
    -- we might not need to rebuild the router (if we were not
    -- the first request in this process to enter this code path)
    -- check again and rebuild only if necessary
    local version, err = singletons.cache:get("router:version",
                                              CACHE_ROUTER_OPTS,
                                              utils.uuid)
    if err then
      log(ngx.CRIT, "could not ensure router is up to date: ", err)
      return nil, err
    end

    if version == router_version then
      return true
    end

    -- router needs to be rebuilt in this worker
    log(DEBUG, "rebuilding router")

    local ok, err = build_router(singletons.db, version)
    if not ok then
      log(ngx.CRIT, "could not rebuild router: ", err)
      return nil, err
    end

    return true
  end


  -- for unit-testing purposes only
  _set_check_router_rebuild = function(f)
    check_router_rebuild = f
  end


  get_router = function()
    local version, err = singletons.cache:get("router:version",
                                              CACHE_ROUTER_OPTS,
                                              utils.uuid)
    if err then
      log(ngx.CRIT, "could not ensure router is up to date: ", err)
      return nil, err
    end

    if version == router_version then
      return router
    end

    -- wrap router rebuilds in a per-worker mutex (via ngx.semaphore)
    -- this prevents dogpiling the database during rebuilds in
    -- high-concurrency traffic patterns
    -- requests that arrive on this process during a router rebuild will be
    -- queued. once the semaphore resource is acquired we re-check the
    -- router version again to prevent unnecessary subsequent rebuilds

    local timeout = 60
    if singletons.configuration.database == "cassandra" then
      -- cassandra_timeout is defined in ms
      timeout = singletons.configuration.cassandra_timeout / 1000

    elseif singletons.configuration.database == "postgres" then
      -- pg_timeout is defined in ms
      timeout = singletons.configuration.pg_timeout / 1000
    end

    -- acquire lock
    local lok, err = build_router_semaphore:wait(timeout)
    if not lok then
      if err ~= "timeout" then
        return nil, "error attempting to acquire build_router lock: " .. err
      end

      log(WARN, "bypassing build_router lock: timeout")
    end

    local pok, ok, err = pcall(check_router_rebuild)

    if lok then
      -- release lock
      build_router_semaphore:post(1)
    end

    if not pok then
      return nil, ok
    end

    if not ok then
      return nil, err
    end

    return router
  end
end


local function balancer_setup_stage1(ctx, scheme, host_type, host, port,
                                     service, route)
  local balancer_data = {
    scheme         = scheme,    -- scheme for balancer: http, https
    type           = host_type, -- type of 'host': ipv4, ipv6, name
    host           = host,      -- target host per `upstream_url`
    port           = port,      -- final target port
    try_count      = 0,         -- retry counter
    tries          = {},        -- stores info per try
    ssl_ctx        = kong.default_client_ssl_ctx, -- SSL_CTX* to use
    -- ip          = nil,       -- final target IP address
    -- balancer    = nil,       -- the balancer object, if any
    -- hostname    = nil,       -- hostname of the final target IP
    -- hash_cookie = nil,       -- if Upstream sets hash_on_cookie
  }

  -- TODO: this is probably not optimal
  do
    local retries = service.retries
    if retries then
      balancer_data.retries = retries

    else
      balancer_data.retries = 5
    end

    local connect_timeout = service.connect_timeout 
    
    if connect_timeout then
      balancer_data.connect_timeout = connect_timeout

    else
      balancer_data.connect_timeout = 60000
    end

    local send_timeout = service.write_timeout 
    
    if send_timeout then
      balancer_data.send_timeout = send_timeout

    else
      balancer_data.send_timeout = 60000
    end

    local read_timeout = service.read_timeout 
    
    if read_timeout then
      balancer_data.read_timeout = read_timeout

    else
      balancer_data.read_timeout = 60000
    end
  end

  ctx.service          = service
  ctx.route            = route
  ctx.balancer_data    = balancer_data
  ctx.balancer_address = balancer_data -- for plugin backward compatibility
end


local function balancer_setup_stage2(ctx)
  local balancer_data = ctx.balancer_data

  do -- Check for KONG_ORIGINS override
    local origin_key = balancer_data.scheme .. "://" ..
                       utils.format_host(balancer_data)
    local origin = singletons.origins[origin_key]
    if origin then
      balancer_data.scheme = origin.scheme
      balancer_data.type = origin.type
      balancer_data.host = origin.host
      balancer_data.port = origin.port
    end
  end

  local ok, err, errcode = balancer.execute(balancer_data, ctx)
  if not ok and errcode == 500 then
    err = "failed the initial dns/balancer resolve for '" ..
          balancer_data.host .. "' with: " .. tostring(err)
  end

  return ok, err, errcode
end


-- in the table below the `before` and `after` is to indicate when they run:
-- before or after the plugins
return {
  build_router     = build_router,

  -- exported for unit-testing purposes only
  _set_check_router_rebuild = _set_check_router_rebuild,

  init_worker = {
    before = function()
      reports.init_worker()

      -- initialize local local_events hooks
      local db             = singletons.db
      local dao            = singletons.dao
      local cache          = singletons.cache
      local worker_events  = singletons.worker_events
      local cluster_events = singletons.cluster_events


      -- get a cache key using either the old or new DAO
      local function get_cache_key(data, entity)
        if data.schema.table then
          return dao[data.schema.table]:entity_cache_key(entity)
        else
          return db[data.schema.name]:cache_key(entity)
        end
      end


      -- events dispatcher


      worker_events.register(function(data)
        if not data.schema then
          log(ngx.ERR, "[events] missing schema in crud subscriber")
          return
        end

        if not data.entity then
          log(ngx.ERR, "[events] missing entity in crud subscriber")
          return
        end

        -- invalidate this entity anywhere it is cached if it has a
        -- caching key

        local cache_key = get_cache_key(data, data.entity)

        if cache_key then
          cache:invalidate(cache_key)
        end

        -- if we had an update, but the cache key was part of what was updated,
        -- we need to invalidate the previous entity as well

        if data.old_entity then
          cache_key = get_cache_key(data, data.old_entity)
          if cache_key then
            cache:invalidate(cache_key)
          end
        end

        if not data.operation then
          log(ngx.ERR, "[events] missing operation in crud subscriber")
          return
        end

        -- public worker events propagation

        local entity_channel           = data.schema.table or data.schema.name
        local entity_operation_channel = fmt("%s:%s", entity_channel,
                                             data.operation)

        -- crud:routes
        local _, err = worker_events.post_local("crud", entity_channel, data)
        if err then
          log(ngx.ERR, "[events] could not broadcast crud event: ", err)
          return
        end

        -- crud:routes:create
        _, err = worker_events.post_local("crud", entity_operation_channel, data)
        if err then
          log(ngx.ERR, "[events] could not broadcast crud event: ", err)
          return
        end
      end, "dao:crud")


      -- local events (same worker)


      worker_events.register(function()
        log(DEBUG, "[events] Route updated, invalidating router")
        cache:invalidate("router:version")
      end, "crud", "routes")


      worker_events.register(function(data)
        if data.operation ~= "create" and
           data.operation ~= "delete"
        then
          -- no need to rebuild the router if we just added a Service
          -- since no Route is pointing to that Service yet.
          -- ditto for deletion: if a Service if being deleted, it is
          -- only allowed because no Route is pointing to it anymore.
          log(DEBUG, "[events] Service updated, invalidating router")
          cache:invalidate("router:version")
        end
      end, "crud", "services")


      worker_events.register(function(data)
        -- assume an update doesnt also change the whole entity!
        if data.operation ~= "update" then
          log(DEBUG, "[events] Plugin updated, invalidating plugins map")
          cache:invalidate("plugins_map:version")
        end
      end, "crud", "plugins")


      -- SSL certs / SNIs invalidations


      worker_events.register(function(data)
        log(DEBUG, "[events] SNI updated, invalidating cached certificates")
        local sn = data.entity

        cache:invalidate("certificates:" .. sn.name)
      end, "crud", "snis")


      worker_events.register(function(data)
        log(DEBUG, "[events] SSL cert updated, invalidating cached certificates")
        local certificate = data.entity

        for sn, err in db.snis:each_for_certificate({ id = certificate.id }) do
          if err then
            log(ERR, "[events] could not find associated snis for certificate: ",
                     err)
            break
          end

          cache:invalidate("certificates:" .. sn.name)
        end
      end, "crud", "certificates")


      -- target updates


      -- worker_events local handler: event received from DAO
      worker_events.register(function(data)
        local operation = data.operation
        local target = data.entity
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "targets", {
          operation = data.operation,
          entity = data.entity,
        })
        if not ok then
          log(ERR, "failed broadcasting target ",
              operation, " to workers: ", err)
        end
        -- => to cluster_events handler
        local key = fmt("%s:%s", operation, target.upstream.id)
        ok, err = cluster_events:broadcast("balancer:targets", key)
        if not ok then
          log(ERR, "failed broadcasting target ", operation, " to cluster: ", err)
        end
      end, "crud", "targets")


      -- worker_events node handler
      worker_events.register(function(data)
        local operation = data.operation
        local target = data.entity

        -- => to balancer update
        balancer.on_target_event(operation, target)
      end, "balancer", "targets")


      -- cluster_events handler
      cluster_events:subscribe("balancer:targets", function(data)
        local operation, key = unpack(utils.split(data, ":"))
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "targets", {
          operation = operation,
          entity = {
            upstream = { id = key },
          }
        })
        if not ok then
          log(ERR, "failed broadcasting target ", operation, " to workers: ", err)
        end
      end)


      -- manual health updates
      cluster_events:subscribe("balancer:post_health", function(data)
        local pattern = "([^|]+)|([^|]+)|([^|]+)|([^|]+)|(.*)"
        local ip, port, health, id, name = data:match(pattern)
        port = tonumber(port)
        local upstream = { id = id, name = name }
        local ok, err = balancer.post_health(upstream, ip, port, health == "1")
        if not ok then
          log(ERR, "failed posting health of ", name, " to workers: ", err)
        end
      end)


      -- upstream updates


      -- worker_events local handler: event received from DAO
      worker_events.register(function(data)
        local operation = data.operation
        local upstream = data.entity
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "upstreams", {
          operation = data.operation,
          entity = data.entity,
        })
        if not ok then
          log(ERR, "failed broadcasting upstream ",
              operation, " to workers: ", err)
        end
        -- => to cluster_events handler
        local key = fmt("%s:%s:%s", operation, upstream.id, upstream.name)
        ok, err = cluster_events:broadcast("balancer:upstreams", key)
        if not ok then
          log(ERR, "failed broadcasting upstream ", operation, " to cluster: ", err)
        end
      end, "crud", "upstreams")


      -- worker_events node handler
      worker_events.register(function(data)
        local operation = data.operation
        local upstream = data.entity

        -- => to balancer update
        balancer.on_upstream_event(operation, upstream)
      end, "balancer", "upstreams")


      cluster_events:subscribe("balancer:upstreams", function(data)
        local operation, id, name = unpack(utils.split(data, ":"))
        -- => to worker_events node handler
        local ok, err = worker_events.post("balancer", "upstreams", {
          operation = operation,
          entity = {
            id = id,
            name = name,
          }
        })
        if not ok then
          log(ERR, "failed broadcasting upstream ", operation, " to workers: ", err)
        end
      end)


      -- initialize balancers for active healthchecks
      ngx.timer.at(0, function()
        balancer.init()
      end)

      do
        local err

        build_router_semaphore, err = semaphore.new()
        if err then
          log(ngx.CRIT, "failed to create build_router_semaphore: ", err)
        end

        build_router_semaphore:post(1)
      end
    end
  },
  certificate = {
    before = function(_)
      certificate.execute()
    end
  },
  rewrite = {
    before = function(ctx)
      ctx.KONG_REWRITE_START = get_now()
      mesh.rewrite()
    end,
    after = function(ctx)
      ctx.KONG_REWRITE_TIME = get_now() - ctx.KONG_REWRITE_START -- time spent in Kong's rewrite_by_lua
    end
  },
  preread = {
    before = function(ctx)
      local router, err = get_router()
      if not router then
        log(ERR, "no router to route connection (reason: " .. err .. ")")
        return ngx.exit(500)
      end

      local match_t = router.exec(ngx)
      if not match_t then
        log(ERR, "no Route found with those values")
        return ngx.exit(500)
      end

      local var = ngx.var

      local ssl_termination_ctx -- OpenSSL SSL_CTX to use for termination

      local ssl_preread_alpn_protocols = var.ssl_preread_alpn_protocols
      -- ssl_preread_alpn_protocols is a comma separated list
      -- see https://trac.nginx.org/nginx/ticket/1616
      if ssl_preread_alpn_protocols and
         ssl_preread_alpn_protocols:find(mesh.get_mesh_alpn(), 1, true) then
        -- Is probably an incoming service mesh connection
        -- terminate service-mesh Mutual TLS
        ssl_termination_ctx = mesh.mesh_server_ssl_ctx
        ctx.is_service_mesh_request = true
      else
        -- TODO: stream router should decide if TLS is terminated or not
        -- XXX: for now, use presence of SNI to terminate.
        local sni = var.ssl_preread_server_name
        if sni then
          ngx.log(ngx.DEBUG, "SNI: ", sni)

          local err
          ssl_termination_ctx, err = certificate.find_certificate(sni)
          if not ssl_termination_ctx then
            ngx.log(ngx.ERR, err)
            return ngx.exit(ngx.ERROR)
          end

          -- TODO Fake certificate phase?

          ngx.log(ngx.INFO, "attempting to terminate TLS")
        end
      end

      -- Terminate TLS
      if ssl_termination_ctx and not ngx.req.starttls(ssl_termination_ctx) then -- luacheck: ignore
        -- errors are logged by nginx core
        return ngx.exit(ngx.ERROR)
      end

      ctx.KONG_PREREAD_START = get_now()

      local route = match_t.route or EMPTY_T
      local service = match_t.service or EMPTY_T
      local upstream_url_t = match_t.upstream_url_t

      balancer_setup_stage1(ctx, match_t.upstream_scheme,
                            upstream_url_t.type,
                            upstream_url_t.host,
                            upstream_url_t.port,
                            service, route)
    end,
    after = function(ctx)
      local ok, err, errcode = balancer_setup_stage2(ctx)
      if not ok then
        return responses.send(errcode, err)
      end

      local now = get_now()

      -- time spent in Kong's preread_by_lua
      ctx.KONG_PREREAD_TIME     = now - ctx.KONG_PREREAD_START
      ctx.KONG_PREREAD_ENDED_AT = now
      -- time spent in Kong before sending the request to upstream
      -- ngx.req.start_time() is kept in seconds with millisecond resolution.
      ctx.KONG_PROXY_LATENCY   = now - ngx.req.start_time() * 1000
      ctx.KONG_PROXIED         = true
    end
  },
  access = {
    before = function(ctx)
      -- ensure routers are up-to-date
      local cache = singletons.cache

      -- router for Routes/Services

      local router, err = get_router()
      if not router then
        return responses.send_HTTP_INTERNAL_SERVER_ERROR(
          "no router to route request (reason: " .. err ..  ")")
      end

      -- routing request

      local var = ngx.var

      ctx.KONG_ACCESS_START = get_now()

      local match_t = router.exec(ngx)
      if not match_t then
          return responses.send_HTTP_NOT_FOUND("no route found with those values")
      end

      local route              = match_t.route or EMPTY_T
      local service            = match_t.service or EMPTY_T
      local upstream_url_t     = match_t.upstream_url_t

      local realip_remote_addr = var.realip_remote_addr
      local forwarded_proto
      local forwarded_host
      local forwarded_port

      -- X-Forwarded-* Headers Parsing
      --
      -- We could use $proxy_add_x_forwarded_for, but it does not work properly
      -- with the realip module. The realip module overrides $remote_addr and it
      -- is okay for us to use it in case no X-Forwarded-For header was present.
      -- But in case it was given, we will append the $realip_remote_addr that
      -- contains the IP that was originally in $remote_addr before realip
      -- module overrode that (aka the client that connected us).

      local trusted_ip = kong.ip.is_trusted(realip_remote_addr)
      if trusted_ip then
        forwarded_proto = var.http_x_forwarded_proto or var.scheme
        forwarded_host  = var.http_x_forwarded_host  or var.host
        forwarded_port  = var.http_x_forwarded_port  or var.server_port

      else
        forwarded_proto = var.scheme
        forwarded_host  = var.host
        forwarded_port  = var.server_port
      end

      local protocols = route.protocols
      if (protocols and protocols.https and not protocols.http and forwarded_proto ~= "https") then
        ngx.header["connection"] = "Upgrade"
        ngx.header["upgrade"]    = "TLS/1.2, HTTP/1.1"
        return responses.send(426, "Please use HTTPS protocol")
      end

      balancer_setup_stage1(ctx, match_t.upstream_scheme,
                            upstream_url_t.type,
                            upstream_url_t.host,
                            upstream_url_t.port,
                            service, route)

      ctx.router_matches = match_t.matches

      -- `uri` is the URI with which to call upstream, as returned by the
      --       router, which might have truncated it (`strip_uri`).
      -- `host` is the original header to be preserved if set.
      var.upstream_scheme = match_t.upstream_scheme -- COMPAT: pdk
      var.upstream_uri    = match_t.upstream_uri
      var.upstream_host   = match_t.upstream_host

      -- Keep-Alive and WebSocket Protocol Upgrade Headers
      if var.http_upgrade and lower(var.http_upgrade) == "websocket" then
        var.upstream_connection = "upgrade"
        var.upstream_upgrade    = "websocket"

      else
        var.upstream_connection = "keep-alive"
      end

      -- X-Forwarded-* Headers
      local http_x_forwarded_for = var.http_x_forwarded_for
      if http_x_forwarded_for then
        var.upstream_x_forwarded_for = http_x_forwarded_for .. ", " ..
                                       realip_remote_addr

      else
        var.upstream_x_forwarded_for = var.remote_addr
      end

      var.upstream_x_forwarded_proto = forwarded_proto
      var.upstream_x_forwarded_host  = forwarded_host
      var.upstream_x_forwarded_port  = forwarded_port
    end,
    -- Only executed if the `router` module found a route and allows nginx to proxy it.
    after = function(ctx)
      local var = ngx.var

      do
        -- Nginx's behavior when proxying a request with an empty querystring
        -- `/foo?` is to keep `$is_args` an empty string, hence effectively
        -- stripping the empty querystring.
        -- We overcome this behavior with our own logic, to preserve user
        -- desired semantics.
        local upstream_uri = var.upstream_uri

        if var.is_args == "?" or sub(var.request_uri, -1) == "?" then
          var.upstream_uri = upstream_uri .. "?" .. (var.args or "")
        end
      end

      local balancer_data = ctx.balancer_data
      balancer_data.scheme = var.upstream_scheme -- COMPAT: pdk

      local ok, err, errcode = balancer_setup_stage2(ctx)
      if not ok then
        return responses.send(errcode, err)
      end

      var.upstream_scheme = balancer_data.scheme

      do
        -- set the upstream host header if not `preserve_host`
        local upstream_host = var.upstream_host

        if not upstream_host or upstream_host == "" then
          upstream_host = balancer_data.hostname

          local upstream_scheme = var.upstream_scheme
          if upstream_scheme == "http"  and balancer_data.port ~= 80 or
             upstream_scheme == "https" and balancer_data.port ~= 443
          then
            upstream_host = upstream_host .. ":" .. balancer_data.port
          end

          var.upstream_host = upstream_host
        end
      end

      local now = get_now()

      -- time spent in Kong's access_by_lua
      ctx.KONG_ACCESS_TIME     = now - ctx.KONG_ACCESS_START
      ctx.KONG_ACCESS_ENDED_AT = now
      -- time spent in Kong before sending the request to upstream
      -- ngx.req.start_time() is kept in seconds with millisecond resolution.
      ctx.KONG_PROXY_LATENCY   = now - ngx.req.start_time() * 1000
      ctx.KONG_PROXIED         = true
    end
  },
  balancer = {
    before = function(ctx)
      local balancer_data = ctx.balancer_data
      local current_try = balancer_data.tries[balancer_data.try_count]
      current_try.balancer_start = get_now()
    end,
    after = function(ctx)
      local balancer_data = ctx.balancer_data
      local current_try = balancer_data.tries[balancer_data.try_count]

      -- record try-latency
      local try_latency = get_now() - current_try.balancer_start
      current_try.balancer_latency = try_latency

      -- record overall latency
      ctx.KONG_BALANCER_TIME = (ctx.KONG_BALANCER_TIME or 0) + try_latency
    end
  },
  header_filter = {
    before = function(ctx)
      local header = ngx.header

      if not ctx.KONG_PROXIED then
        return
      end

      local now = get_now()
      -- time spent waiting for a response from upstream
      ctx.KONG_WAITING_TIME             = now - ctx.KONG_ACCESS_ENDED_AT
      ctx.KONG_HEADER_FILTER_STARTED_AT = now

      local upstream_status_header = constants.HEADERS.UPSTREAM_STATUS
      if singletons.configuration.enabled_headers[upstream_status_header] then
        header[upstream_status_header] = tonumber(sub(ngx.var.upstream_status or "", -3))
        if not header[upstream_status_header] then
          log(ERR, "failed to set ", upstream_status_header, " header")
        end
      end

      local hash_cookie = ctx.balancer_data.hash_cookie
      if not hash_cookie then
        return
      end

      local cookie = ck:new()
      local ok, err = cookie:set(hash_cookie)

      if not ok then
        log(ngx.WARN, "failed to set the cookie for hash-based load balancing: ", err,
                      " (key=", hash_cookie.key,
                      ", path=", hash_cookie.path, ")")
      end
    end,
    after = function(ctx)
      local header = ngx.header

      if ctx.KONG_PROXIED then
        if singletons.configuration.enabled_headers[constants.HEADERS.UPSTREAM_LATENCY] then
          header[constants.HEADERS.UPSTREAM_LATENCY] = ctx.KONG_WAITING_TIME
        end

        if singletons.configuration.enabled_headers[constants.HEADERS.PROXY_LATENCY] then
          header[constants.HEADERS.PROXY_LATENCY] = ctx.KONG_PROXY_LATENCY
        end

        if singletons.configuration.enabled_headers[constants.HEADERS.VIA] then
          header[constants.HEADERS.VIA] = server_header
        end

      else
        if singletons.configuration.enabled_headers[constants.HEADERS.SERVER] then
          header[constants.HEADERS.SERVER] = server_header

        else
          header[constants.HEADERS.SERVER] = nil
        end
      end
    end
  },
  body_filter = {
    after = function(ctx)
      if not ngx.arg[2] then
        return
      end

      local now = get_now()
      ctx.KONG_BODY_FILTER_ENDED_AT = now

      if ctx.KONG_PROXIED then
        -- time spent receiving the response (header_filter + body_filter)
        -- we could use $upstream_response_time but we need to distinguish the waiting time
        -- from the receiving time in our logging plugins (especially ALF serializer).
        ctx.KONG_RECEIVE_TIME = now - ctx.KONG_HEADER_FILTER_STARTED_AT
      end
    end
  },
  log = {
    after = function(ctx)
      reports.log()

      if not ctx.KONG_PROXIED then
        return
      end

      -- If response was produced by an upstream (ie, not by a Kong plugin)
      -- Report HTTP status for health checks
      local balancer_data = ctx.balancer_data
      if balancer_data and balancer_data.balancer and balancer_data.ip then
        local ip, port = balancer_data.ip, balancer_data.port

        local status = ngx.status
        if status == 504 then
          balancer_data.balancer.report_timeout(ip, port)
        else
          balancer_data.balancer.report_http_status(ip, port, status)
        end
      end
    end
  }
}

Will try dropping in just this changed handler and giving a quick stress test and will edit with results.

@bungle
Copy link
Member

bungle commented Dec 17, 2018

One thing about CPU is that my approach can make spike as all workers are updated around the same time and other nodes as soon as they get cluster event (already configurable), or before if it is request that starts a timer (timeout 0) or builds it synchronously (timeout > 0). But it will make it much shorter too (e.g. after update on route or service your whole cluster is fully cached and full speed across all workers in matter of seconds), while with old approach you couldn't tell when the whole cluster and all the workers have Been updated.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Dec 17, 2018

@p0pr0ck5 I gave your logic a go, was only able to do a quick test tonight before my trip, based on just the handler change above with your new service change I could not produce 500 level errors either nor did it seem to cause any throughput degradation besides maybe 1 slight dip on the tail end of the calls. I ran this at 240 TPS and created 30 routes on the fly every 2 seconds with the python script offered above. So no extensive testing but initial results seem positive.

I like the idea of less bombardment of queries to the db around services so I wonder if this could get somehow mixed into the fray with all of the @bungle changes(granted they reach approval). Anyways I gotta head out, all testing I can do for now!

Hope all at Kong enjoy the holidays!
-Jeremy

@bungle
Copy link
Member

bungle commented Dec 17, 2018

Jeremy, my PR already does "the same", and because of caching, it does not even do prefetch. It might occasionally make one or two single row queries to db with primary key. Even that could be prewarmed, but I don't think it is neccessary.

@jeremyjpj0916
Copy link
Contributor Author

@bungle Oh even better! Querying services 1000 at a time as opposed to 1 was what I was thinking could be beneficial, which based on what you said it sounds like that is indeed the case except under special circumstances that may require singular lookups, but few in number. Thanks for all the good work!

Other ideas floating in my head would be are there any fail scenarios or problems where the new router build could not reach completion where Kong would need to err log to users to inform stale router has remained in use an excessive amount of time? I suppose if its db connectivity related issues that already would be visible based on existing safe guards, guess thinking if there is anything ontop of that it may need? You probably already considered all that though 😄.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 10, 2019

Edit - no need anymore.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 11, 2019

Edit - No need anymore.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 13, 2019

@bungle QQ I have been testing the pending PR #4102 on 1.0 tonight with high throughput on the -1 setting and its been great so far where service/route additions/deletions are not causing any impact to the gateway. In what situation does "(if timer creation fails, new router will not be build, but you can trigger a new try with modifying routes or services)" occur? And are there steps that could be taken in Nginx/OpenResty/Kong to keep the likelihood of timer creation success a very high probability? Also curious if when timer creation fails will Kong/Nginx log something to stderr that there is a problem creating them so its not hidden?

@jeremyjpj0916
Copy link
Contributor Author

Closing as a dup of #4194 at this point .

@UnixBoy1
Copy link

UnixBoy1 commented Sep 5, 2019

@here was this fix implemented into older version 0.12.X or 0.15.X

@jeremyjpj0916
Copy link
Contributor Author

@UnixBoy1 I think a more minimal form of it made it into 1.2 as the first iteration for eventual consistency. Was not added to earlier versions.

@jmgpeeters
Copy link

jmgpeeters commented Sep 18, 2019

We just saw something quite similar on our Kong 1.2, i.e. intermittent 404's whilst it was under heavy load. We run in DB-less mode.

The discussion threads are a bit hard to follow - is this potentially similar to what was going on here (I see various references to C*, which we no longer use) , and is it meant to be fixed in a more recent Kong version?

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Sep 18, 2019

@jmgpeeters have you tried the eventual consistency option? Although I think the 404’s would be less impactful in dbless mode(rebuilds should be faster). What scenario do you experience the 404’s?

@jmgpeeters
Copy link

I haven't tried the eventual consistency option. Is there any documentation on it? I'm a bit confused as to how DB-less (i.e. no communication between Kong nodes required) could lead to consistency issues. Isn't everything just immediately consistent?

The scenario was that we were experiencing very heavy load on Kong for a brief period, which led to a small amount of intermittent 404's (not able to match a route) being thrown back to users. After the load was reduced, these 404's went away - so it does sound quite similar to what is being described here.

We're actually quite OK for Kong to shed load through some error code, if it fails to keep up, but perhaps 404 isn't optimal. I suppose that is just a side-effect of how Kong is implemented, but if the condition can be detected, then maybe 503 or so is more appropriate.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Sep 19, 2019

@jmgpeeters my issue was around 404's when modifications to routers were occurring causing rebuilds(which deals with the complexities of runtime in-memory change woes). Your issue is slightly different to be seeing 404's just under heavy load and not when new proxies are being added/deleted from Kong. I would consider it a totally separate issue since its just apparent at high traffic volume.

@jmgpeeters
Copy link

Yes, that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants