Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Passive healthcheck can't disable #3304

Closed
mengskysama opened this issue Mar 16, 2018 · 8 comments · Fixed by #3319
Closed

bug: Passive healthcheck can't disable #3304

mengskysama opened this issue Mar 16, 2018 · 8 comments · Fixed by #3319

Comments

@mengskysama
Copy link
Contributor

mengskysama commented Mar 16, 2018

Summary

Upstream passive healthcheck can't disable.

Steps To Reproduce

  1. Config just like https://getkong.org/docs/0.12.x/health-checks-circuit-breakers/ says disable all healthcheck.

{
    "created_at": 1521134937862,
    "hash_on": "none",
    "id": "35adaf0b-3cd3-4271-bec0-617f1b948146",
    "healthchecks": {
        "active": {
            "unhealthy": {
                "http_statuses": [
                    429,
                    404,
                    500,
                    501,
                    502,
                    503,
                    504,
                    505
                ],
                "tcp_failures": 0,
                "timeouts": 0,
                "http_failures": 0,
                "interval": 0
            },
            "http_path": "/",
            "healthy": {
                "http_statuses": [
                    200,
                    302
                ],
                "interval": 0,
                "successes": 0
            },
            "timeout": 1,
            "concurrency": 10
        },
        "passive": {
            "unhealthy": {
                "http_failures": 0,
                "http_statuses": [
                    429,
                    500,
                    503
                ],
                "tcp_failures": 0,
                "timeouts": 0
            },
            "healthy": {
                "successes": 0,
                "http_statuses": [
                    200,
                    201,
                    202,
                    203,
                    204,
                    205,
                    206,
                    207,
                    208,
                    226,
                    300,
                    301,
                    302,
                    303,
                    304,
                    305,
                    306,
                    307,
                    308
                ]
            }
        }
    },
    "name": "ip_qingting_fm",
    "hash_fallback": "none",
    "slots": 100
}
  1. Let's upstream service stop
  2. Request
curl  x -H "Host: x.fm"
The upstream server is timing out
curl  x -H "Host: x.fm"
The upstream server is timing out
curl  x -H "Host: x.fm"
{"message":"failure to get a peer from the ring-balancer"}
  1. Let's upstream service resume...but..
curl  x -H "Host: x.fm"
{"message":"failure to get a peer from the ring-balancer"}

Additional Details & Logs

Kong version (0.12.3)

2018/03/16 06:16:40 [warn] 922#0: *4158563 [lua] healthcheck.lua:957: log(): [healthcheck] (ip_qingting_fm) unhealthy TCP increment (4/2) for 192.168.50.132:80 while connecting to upstream, client: 60.25.21.33, server: kong, request: "GET /ip HTTP/1.1", upstream: "http://192.168.50.132:80/ip", host: "x.fm", referrer: "httpsies/3617?appversion=6.2.6"
2018/03/16 06:16:40 [error] 922#0: *4158563 [lua] init.lua:319: balancer(): failed to retry the dns/balancer resolver for ip_qingting_fm' with: failure to get a peer from the ring-balancer while connecting to upstream, client: 60.25.21.33, server: kong, request: "GET /ip HTTP/1.1", upstream: "http://192.168.50.132:80/ip", host: "x.fm", referrer: "httpsies/3617?appversion=6.2.6"
@mengskysama
Copy link
Contributor Author

btw https://getkong.org/docs/0.12.x/health-checks-circuit-breakers/ target enable url http://localhost:8001/upstream/my_upstream/targets/10.1.2.3:1234/healthy should be http://localhost:8001/upstreams/my_upstream/targets/10.1.2.3:1234/healthy

@thibaultcha
Copy link
Member

@mengskysama Thanks for the report.

Pushed Kong/docs.konghq.com#627 for now - we will look at your PR soon!

thibaultcha added a commit to Kong/docs.konghq.com that referenced this issue Mar 17, 2018
@hishamhm
Copy link
Contributor

@mengskysama I left a comment in your PR!

@thibaultcha thank you for the doc fix!

@mengskysama
Copy link
Contributor Author

mengskysama commented Mar 20, 2018

@hishamhm Thanks for your reply!

Passive health check can be disable in single instance but cluster.

Here is code can help reproduce problem.

Modify to yours

cluster_admin_api = 'http://192.168.50.135:8001'
# same instance cluster_admin_api
cluster0_api = 'http://192.168.50.135'
# different instance
cluster1_api = 'http://192.168.50.136'

Delete all of upstream and api before run

import requests
import json
import time


s = requests.session()
s.headers['Content-Type'] = 'application/json'

cluster_admin_api = 'http://192.168.50.135:8001'
# same instance cluster_admin_api
cluster0_api = 'http://192.168.50.135'
# different instance
cluster1_api = 'http://192.168.50.136'


def init_kong():
    # create api
    data = {
        "strip_uri": True,
        "hosts": [
            "api.test.com"
        ],
        "name": "test",
        "methods": [
            "GET",
            "HEAD"
        ],
        "http_if_terminated": True,
        "https_only": False,
        "retries": 1,
        "uris": [
            "/"
        ],
        "preserve_host": False,
        "upstream_connect_timeout": 1000,
        "upstream_read_timeout": 3000,
        "upstream_send_timeout": 3000,
        "upstream_url": "http://192.168.50.132"
    }
    r = s.post('%s/apis' % cluster_admin_api, data=json.dumps(data)).json()
    return r['id']


def update_upstream(api_id):
    # upstream
    data = {
        "name": "test"
    }
    ret = s.post('%s/upstreams' % cluster_admin_api, data=json.dumps(data)).json()
    upstream_id = ret['id']

    # add target
    data = {
        "target": "192.168.50.132:80"
    }
    s.post('%s/upstreams/%s/targets' % (cluster_admin_api, upstream_id), data=json.dumps(data))

    # update upstream
    data = {
        "id": api_id,
        "created_at": int(time.time()),
        "strip_uri": True,
        "hosts": [
            "api.test.com"
        ],
        "name": "test",
        "methods": [
            "GET",
            "HEAD"
        ],
        "http_if_terminated": True,
        "https_only": False,
        "retries": 1,
        "uris": [
            "/"
        ],
        "preserve_host": False,
        "upstream_connect_timeout": 1000,
        "upstream_read_timeout": 3000,
        "upstream_send_timeout": 3000,
        "upstream_url": "http://test",
    }

    s.put('%s/apis' % cluster_admin_api, data=json.dumps(data))


api_id = init_kong()
print('created api %s' % api_id)
update_upstream(api_id)

# wait kong pull db config
time.sleep(10)

for i in range(10):
    r = s.get(cluster0_api, headers={'Host': 'api.test.com'})
    print("cluster0_api ret = %s" % r.text)
    r = s.get(cluster1_api, headers={'Host': 'api.test.com'})
    print("cluster1_api ret = %s" % r.text)

output

cluster0_api ret = The upstream server is timing out

cluster1_api ret = The upstream server is timing out

cluster0_api ret = The upstream server is timing out

cluster1_api ret = The upstream server is currently unavailable

cluster0_api ret = The upstream server is timing out

cluster1_api ret = {"message":"failure to get a peer from the ring-balancer"}

cluster0_api ret = The upstream server is timing out

cluster1_api ret = {"message":"failure to get a peer from the ring-balancer"}

cluster0_api ret = The upstream server is timing out

cluster1_api ret = {"message":"failure to get a peer from the ring-balancer"}

cluster0_api ret = The upstream server is timing out

cluster1_api ret = {"message":"failure to get a peer from the ring-balancer"}

Please let me know if you can reproduce this problem.

@hishamhm
Copy link
Contributor

@mengskysama Thank you for the test case! I will try to reproduce it.

hishamhm added a commit that referenced this issue Mar 20, 2018
In the upstream event handler, `create_balancer` was being called
with the object received via the event, which contains "id" and
"name" only, and not the entire entity table containing the rest
of the upstream fields. This caused it create a healthchecker with
an empty configuration (ignoring the user's configuration),
which then fell back into the lua-resty-healthcheck defaults.

This fix obtains the proper entity object from the id and passes
it to `create_balancer`.

A regression test is included, which spawns two Kong instances
and reproduces the error scenario described by @mengskysama.

Fixes #3304.
@hishamhm
Copy link
Contributor

@mengskysama I reproduced your test case and submitted a PR with the fix! Thank you once again! If possible, please download the PR branch (fix/upstream-event) and let me know your results.

thibaultcha pushed a commit that referenced this issue Mar 20, 2018
In the upstream event handler, `create_balancer` was being called
with the object received via the event, which contains "id" and
"name" only, and not the entire entity table containing the rest
of the upstream fields. This caused it create a healthchecker with
an empty configuration (ignoring the user's configuration),
which then fell back into the lua-resty-healthcheck defaults.

This fix obtains the proper entity object from the id and passes
it to `create_balancer`.

A regression test is included, which spawns two Kong instances
and reproduces the error scenario described by @mengskysama.

Fixes #3304 
From #3319
thibaultcha pushed a commit that referenced this issue Mar 20, 2018
In the upstream event handler, `create_balancer` was being called
with the object received via the event, which contains "id" and
"name" only, and not the entire entity table containing the rest
of the upstream fields. This caused it create a healthchecker with
an empty configuration (ignoring the user's configuration),
which then fell back into the lua-resty-healthcheck defaults.

This fix obtains the proper entity object from the id and passes
it to `create_balancer`.

A regression test is included, which spawns two Kong instances
and reproduces the error scenario described by @mengskysama.

Fixes #3304 
From #3319
@mengskysama
Copy link
Contributor Author

mengskysama commented Mar 21, 2018

@hishamhm Great! I double check fix/upstream-event fixed perfect 👍
I will I spend some time to understand balancer module logic.

@hishamhm
Copy link
Contributor

hishamhm commented Mar 21, 2018 via email

hishamhm added a commit that referenced this issue May 11, 2018
The regression test for issue #3304 was flaky because it launches two
Kong nodes and it waited for the second one to be ready by reading the
logs. This is not a reliable way of determining if a node is
immediately ready to proxy a configured route.

Reversing the order of proxy calls in the test made it fail more
consistently, which helped debugging the issue.

This changes the check to verify if the router has been rebuilt,
using a dummy route for triggering the routing rebuild before the
proper test starts. (Thanks @thibaultcha for the idea!)

The changes are also backported to `spec-old-api/`.
thibaultcha pushed a commit that referenced this issue May 11, 2018
The regression test for issue #3304 was flaky because it launches two
Kong nodes and it waited for the second one to be ready by reading the
logs. This is not a reliable way of determining if a node is immediately
ready to proxy a configured route.

Reversing the order of proxy calls in the test made it fail more
consistently, which helped debugging the issue.

This changes the check to verify if the router has been rebuilt,
using a dummy route for triggering the routing rebuild before the
proper test starts. (Thanks @thibaultcha for the idea!)

The changes are also backported to `spec-old-api/`.

From #3454
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants