Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passive healthcheck not working after configuration changing #4077

Closed
Ben0625 opened this issue Apr 19, 2021 · 13 comments · Fixed by #5589
Closed

passive healthcheck not working after configuration changing #4077

Ben0625 opened this issue Apr 19, 2021 · 13 comments · Fixed by #5589
Assignees
Labels
bug Something isn't working

Comments

@Ben0625
Copy link
Contributor

Ben0625 commented Apr 19, 2021

Issue description

When both active and passive configuration are updated from only active configuration, the http statuses of passive become an empty table which causes passive healthcheck not working.

Environment

  • apisix version: 2.0
  • OS: 3.10.0-693.el7.x86_64
  • OpenResty version: openresty/1.17.8.2
  • etcd version: 3.4.13
  • apisix-dashboard version: 2.0

Minimal test code / Steps to reproduce the issue

  1. Set a route with two upstreams(/test.do) and healthcheck active configuration(/check.do).
  2. Send a request to trigger creating new checker.
  3. Update this route only by changing healthcheck configuration to both active and passive. Let one of the upstream return 500 code to act like a unnormal node. Send some requests, but the unnormal node cannot be set to be unhealthy even reaching the passive.unhealthy.http_failures.

What's the actual result?

I prints some debug info inside this function: https://github.com/Kong/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L1372
I prints out the "defaults" table at line 1443.

After doing Step 2, the type of defaults.checks.passive.unhealthy.http_statuses changes from array to dict!

defaults.checks.passive.unhealthy.http_statuses before to_set() function:
{
1 = 429,
2 = 500,
3 = 503,
}

defaults.checks.passive.unhealthy.http_statuses after to_set() function:
{
429 = true,
500 = true,
503 = true,
}

It seems that to_set() changes the "self" object, result in changing "defaults" as well.

What's the expected result?

In my opinion, the "defaults" object should not be changed. Maybe there's some problem with fill_in_settings() function at line 1377. Or am I doing something wrong during these operations?

@Firstsawyou
Copy link
Contributor

You should provide the corresponding health check configuration information so that we can check it.

@Ben0625
Copy link
Contributor Author

Ben0625 commented Apr 19, 2021

For Step 1:

"checks": {
			"active": {
				"healthy": {
					"interval": 100,
					"successes": 4
				},
				"host": "aaa.com",
				"http_path": "/check.do",
				"timeout": 3,
				"unhealthy": {
					"http_failures": 5,
					"interval": 3
				}
			}
		}

For Step 3:

"checks": {
		"active": {
			"healthy": {
				"interval": 100,
				"successes": 4
			},
			"host": "aaa.com",
			"http_path": "/check.do",
			"timeout": 3,
			"unhealthy": {
				"http_failures": 5,
				"interval": 3
			}
		},
		"passive": {
			"healthy": {
				"http_statuses": [200],
				"successes": 1
			},
			"unhealthy": {
				"http_failures": 1,
				"http_statuses": [500],
				"tcp_failures": 1
			}
		}
	}

@Firstsawyou
Copy link
Contributor

I tested it, by looking at the log output, and it worked.

@tokers
Copy link
Contributor

tokers commented Apr 19, 2021

@Ben0625 Could you try the 2.5 version? And test it whether it can be reproduced in 2.5?

@Ben0625
Copy link
Contributor Author

Ben0625 commented Apr 20, 2021

@Ben0625 Could you try the 2.5 version? And test it whether it can be reproduced in 2.5?

I tried the Apisix 2.5 version but not use the dashboard 2.5 version. I just connect to the same etcd and it can be reproduced. Let me provide more details.
Step 1. Set a route with two upstreams(/test.do) and healthcheck active configuration(/check.do) through dashboard.
Step 2. Send a request to trigger creating new checker. By looking at the log output before and after to_set() function:

defaults.checks.passive.unhealthy.http_statuses before to_set() function:
{
1 = 429,
2 = 500,
3 = 503,
}
defaults.checks.passive.unhealthy.http_statuses after to_set() function:
{
429 = true,
500 = true,
503 = true,
}

Step 3. Edit the same route on dashboard and do not change any configuration and commit.
Step 4. Send a request to trigger creating new checker. By looking at the log output before and after to_set() function:

defaults.checks.passive.unhealthy.http_statuses before to_set() function:
{
429 = true,
500 = true,
503 = true,
}
defaults.checks.passive.unhealthy.http_statuses after to_set() function:
{
}

The configuration in etcd(/apisix/routes/350659784246034728):
{"id":"350659784246034728","create_time":1618539118,"update_time":1618902032,"uris":["/test.do"],"name":"test","methods":["GET","HEAD","POST","PUT","DELETE","OPTIONS","PATCH"],"vars":[],"upstream":{"nodes":[{"host":"172.25.69.182","port":8080,"weight":1},{"host":"172.25.70.116","port":8080,"weight":1}],"timeout":{"connect":6000,"read":6000,"send":6000},"type":"roundrobin","checks":{"active":{"healthy":{"interval":100,"successes":4},"host":"aaa.com","http_path":"/check.do","timeout":3,"unhealthy":{"http_failures":5,"interval":3}}}}}

@tzssangglass
Copy link
Member

@Ben0625 any news? Sure this problem still exists in apisix 2.5?

@chzhuo
Copy link
Contributor

chzhuo commented Nov 10, 2021

@Ben0625 Could you try the 2.5 version? And test it whether it can be reproduced in 2.5?

We get the same problems on the apisix 2.8

I found a bug in https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua
The content of var defaults will be modified when not assigned the passive or active config item

Reproduce the bug:
nginx.conf

daemon off;
worker_processes  1;
error_log stderr;
events {
    worker_connections 1024;
}
http {
    access_log off;
    server {
        listen 19000;
        location / {
            default_type text/html;
            content_by_lua '
                ngx.say("<p>Hello, World!</p>")
            ';
            log_by_lua '
                local check = require("check")
            ';
        }
    }
}

check.lua

local cjson = require("cjson.safe")

-- copy from: https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L1162
--============================================================================
-- Create health-checkers
--============================================================================


local NO_DEFAULT = {}
local MAXNUM = 2^31 - 1


local function fail(ctx, k, msg)
  ctx[#ctx + 1] = k
  error(table.concat(ctx, ".") .. ": " .. msg, #ctx + 1)
end


local function fill_in_settings(opts, defaults, ctx)
  ctx = ctx or {}
  local obj = {}
  for k, default in pairs(defaults) do
    local v = opts[k]

    -- basic type-check of configuration
    if default ~= NO_DEFAULT
    and v ~= nil
    and type(v) ~= type(default) then
      fail(ctx, k, "invalid value")
    end

    if v ~= nil then
      if type(v) == "table" then
        if default[1] then -- do not recurse on arrays
          obj[k] = v
        else
          ctx[#ctx + 1] = k
          obj[k] = fill_in_settings(v, default, ctx)
          ctx[#ctx + 1] = nil
        end
      else
        if type(v) == "number" and (v < 0 or v > MAXNUM) then
          fail(ctx, k, "must be between 0 and " .. MAXNUM)
        end
        obj[k] = v
      end
    elseif default ~= NO_DEFAULT then
      obj[k] = default
    end

  end
  return obj
end


local defaults = {
  name = NO_DEFAULT,
  shm_name = NO_DEFAULT,
  type = NO_DEFAULT,
  status_ver = 0,
  checks = {
    active = {
      type = "http",
      timeout = 1,
      concurrency = 10,
      http_path = "/",
      https_verify_certificate = true,
      healthy = {
        interval = 0, -- 0 = disabled by default
        http_statuses = { 200, 302 },
        successes = 2,
      },
      unhealthy = {
        interval = 0, -- 0 = disabled by default
        http_statuses = { 429, 404,
                          500, 501, 502, 503, 504, 505 },
        tcp_failures = 2,
        timeouts = 3,
        http_failures = 5,
      },
      req_headers = {""},
    },
    passive = {
      type = "http",
      healthy = {
        http_statuses = { 200, 201, 202, 203, 204, 205, 206, 207, 208, 226,
                          300, 301, 302, 303, 304, 305, 306, 307, 308 },
        successes = 5,
      },
      unhealthy = {
        http_statuses = { 429, 500, 503 },
        tcp_failures = 2,
        timeouts = 7,
        http_failures = 5,
      },
    },
  },
}


local function to_set(tbl, key)
  local set = {}
  for _, item in ipairs(tbl[key]) do
    set[item] = true
  end
  tbl[key] = set
end


  -- In the below is my code
local option_no_passive  =  cjson.decode([[
    {
        "checks":
        {
            "active":{"unhealthy":{"tcp_Failures":3,"http_statuses":[500,501,502,503,504,505],"timeouts":3,"interval":100,"tcp_failures":2,"http_failures":3},"concurrency":10,"http_path":"\/healthz","https_verify_certificate":true,"port":9080,"type":"http","healthy":{"interval":5,"successes":1,"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308,400,401,403,404,405,415,429]},"host":"healthcheck","timeout":5}
        },
        "name":"upstream#\/internal-apisix\/upstreams\/380542682478412600","shm_name":"upstream-healthcheck"
    }]])
-- The default can be encoded to json
ngx.log(ngx.ERR, "===defaults: ", cjson.encode(defaults))

-- The default `passive` config will be filled into `filled_option_no_passive` by the method `fill_in_settings`
local filled_option_no_passive = fill_in_settings(option_no_passive ,defaults)
ngx.log(ngx.ERR, "===filled_option_no_passive: ", cjson.encode(filled_option_no_passive))

-- copy from: https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L1364
to_set(filled_option_no_passive.checks.active.unhealthy, "http_statuses")
to_set(filled_option_no_passive.checks.active.healthy, "http_statuses")
to_set(filled_option_no_passive.checks.passive.unhealthy, "http_statuses")
to_set(filled_option_no_passive.checks.passive.healthy, "http_statuses")

-- print: nil Cannot serialise table: excessively sparse array while logging request
-- The defaults has been modified by `to_set`
ngx.log(ngx.ERR, "===defaults: ", cjson.encode(defaults))

@shuaijinchao
Copy link
Member

@Ben0625 Could you try the 2.5 version? And test it whether it can be reproduced in 2.5?

I tried the Apisix 2.5 version but not use the dashboard 2.5 version. I just connect to the same etcd and it can be reproduced. Let me provide more details. Step 1. Set a route with two upstreams(/test.do) and healthcheck active configuration(/check.do) through dashboard. Step 2. Send a request to trigger creating new checker. By looking at the log output before and after to_set() function:

defaults.checks.passive.unhealthy.http_statuses before to_set() function:
{
1 = 429,
2 = 500,
3 = 503,
}
defaults.checks.passive.unhealthy.http_statuses after to_set() function:
{
429 = true,
500 = true,
503 = true,
}

Step 3. Edit the same route on dashboard and do not change any configuration and commit. Step 4. Send a request to trigger creating new checker. By looking at the log output before and after to_set() function:

defaults.checks.passive.unhealthy.http_statuses before to_set() function:
{
429 = true,
500 = true,
503 = true,
}
defaults.checks.passive.unhealthy.http_statuses after to_set() function:
{
}

The configuration in etcd(/apisix/routes/350659784246034728): {"id":"350659784246034728","create_time":1618539118,"update_time":1618902032,"uris":["/test.do"],"name":"test","methods":["GET","HEAD","POST","PUT","DELETE","OPTIONS","PATCH"],"vars":[],"upstream":{"nodes":[{"host":"172.25.69.182","port":8080,"weight":1},{"host":"172.25.70.116","port":8080,"weight":1}],"timeout":{"connect":6000,"read":6000,"send":6000},"type":"roundrobin","checks":{"active":{"healthy":{"interval":100,"successes":4},"host":"aaa.com","http_path":"/check.do","timeout":3,"unhealthy":{"http_failures":5,"interval":3}}}}}


Following the steps you provided, I tested it through the CLI and Dashboard in the APISIX2.8 version, and this problem was not reproduced.

image

@shuaijinchao
Copy link
Member

shuaijinchao commented Nov 11, 2021

@Ben0625 Could you try the 2.5 version? And test it whether it can be reproduced in 2.5?

We get the same problems on the apisix 2.8

I found a bug in api7/lua-resty-healthcheck@master/lib/resty/healthcheck.lua The content of var defaults will be modified when not assigned the passive or active config item

Reproduce the bug: nginx.conf

daemon off;
worker_processes  1;
error_log stderr;
events {
    worker_connections 1024;
}
http {
    access_log off;
    server {
        listen 19000;
        location / {
            default_type text/html;
            content_by_lua '
                ngx.say("<p>Hello, World!</p>")
            ';
            log_by_lua '
                local check = require("check")
            ';
        }
    }
}

check.lua

local cjson = require("cjson.safe")

-- copy from: https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L1162
--============================================================================
-- Create health-checkers
--============================================================================


local NO_DEFAULT = {}
local MAXNUM = 2^31 - 1


local function fail(ctx, k, msg)
  ctx[#ctx + 1] = k
  error(table.concat(ctx, ".") .. ": " .. msg, #ctx + 1)
end


local function fill_in_settings(opts, defaults, ctx)
  ctx = ctx or {}
  local obj = {}
  for k, default in pairs(defaults) do
    local v = opts[k]

    -- basic type-check of configuration
    if default ~= NO_DEFAULT
    and v ~= nil
    and type(v) ~= type(default) then
      fail(ctx, k, "invalid value")
    end

    if v ~= nil then
      if type(v) == "table" then
        if default[1] then -- do not recurse on arrays
          obj[k] = v
        else
          ctx[#ctx + 1] = k
          obj[k] = fill_in_settings(v, default, ctx)
          ctx[#ctx + 1] = nil
        end
      else
        if type(v) == "number" and (v < 0 or v > MAXNUM) then
          fail(ctx, k, "must be between 0 and " .. MAXNUM)
        end
        obj[k] = v
      end
    elseif default ~= NO_DEFAULT then
      obj[k] = default
    end

  end
  return obj
end


local defaults = {
  name = NO_DEFAULT,
  shm_name = NO_DEFAULT,
  type = NO_DEFAULT,
  status_ver = 0,
  checks = {
    active = {
      type = "http",
      timeout = 1,
      concurrency = 10,
      http_path = "/",
      https_verify_certificate = true,
      healthy = {
        interval = 0, -- 0 = disabled by default
        http_statuses = { 200, 302 },
        successes = 2,
      },
      unhealthy = {
        interval = 0, -- 0 = disabled by default
        http_statuses = { 429, 404,
                          500, 501, 502, 503, 504, 505 },
        tcp_failures = 2,
        timeouts = 3,
        http_failures = 5,
      },
      req_headers = {""},
    },
    passive = {
      type = "http",
      healthy = {
        http_statuses = { 200, 201, 202, 203, 204, 205, 206, 207, 208, 226,
                          300, 301, 302, 303, 304, 305, 306, 307, 308 },
        successes = 5,
      },
      unhealthy = {
        http_statuses = { 429, 500, 503 },
        tcp_failures = 2,
        timeouts = 7,
        http_failures = 5,
      },
    },
  },
}


local function to_set(tbl, key)
  local set = {}
  for _, item in ipairs(tbl[key]) do
    set[item] = true
  end
  tbl[key] = set
end


  -- In the below is my code
local option_no_passive  =  cjson.decode([[
    {
        "checks":
        {
            "active":{"unhealthy":{"tcp_Failures":3,"http_statuses":[500,501,502,503,504,505],"timeouts":3,"interval":100,"tcp_failures":2,"http_failures":3},"concurrency":10,"http_path":"\/healthz","https_verify_certificate":true,"port":9080,"type":"http","healthy":{"interval":5,"successes":1,"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308,400,401,403,404,405,415,429]},"host":"healthcheck","timeout":5}
        },
        "name":"upstream#\/internal-apisix\/upstreams\/380542682478412600","shm_name":"upstream-healthcheck"
    }]])
-- The default can be encoded to json
ngx.log(ngx.ERR, "===defaults: ", cjson.encode(defaults))

-- The default `passive` config will be filled into `filled_option_no_passive` by the method `fill_in_settings`
local filled_option_no_passive = fill_in_settings(option_no_passive ,defaults)
ngx.log(ngx.ERR, "===filled_option_no_passive: ", cjson.encode(filled_option_no_passive))

-- copy from: https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L1364
to_set(filled_option_no_passive.checks.active.unhealthy, "http_statuses")
to_set(filled_option_no_passive.checks.active.healthy, "http_statuses")
to_set(filled_option_no_passive.checks.passive.unhealthy, "http_statuses")
to_set(filled_option_no_passive.checks.passive.healthy, "http_statuses")

-- print: nil Cannot serialise table: excessively sparse array while logging request
-- The defaults has been modified by `to_set`
ngx.log(ngx.ERR, "===defaults: ", cjson.encode(defaults))

@chzhuo your question does not seem to be related to @Ben0625's question. Regarding the issue of "Unable to serialize table: too sparse array", please refer to the lua-cjson document: kyne.com.au/~mark/software/lua-cjson -manual.html#encode_sparse_array

image

@chzhuo
Copy link
Contributor

chzhuo commented Nov 12, 2021

@shuaijinchao In you last screenshot, do you found out that the content of the global variable defaults has been modifed?

image

This change will effect the other upstream to call healthcheck.new()

@chzhuo
Copy link
Contributor

chzhuo commented Nov 12, 2021

@shuaijinchao You can reproduce in apisix by following step:

  1. add log(self) in https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L1326
  2. add the first upstream with only active config
  3. add the second upstream with active and passive config
  4. assign route to the two upstream
  5. curl the route that assign to first upstream
  6. curl the route that assign to second upstream
  7. checking the passive http_statues from log(self) is correct

notice: apisix won't start the health check when the upstream only has one node

@shuaijinchao
Copy link
Member

@chzhuo Thanks, I reproduced the problem. The reason for this problem is that when healthcheck fills the default value, the table object in the defaults variable is directly assigned to the opts variable, while the shallow copy used in the table type variable assignment in Lua. So when using to_set to update the opts variable, the defaults variable will also be affected, because the memory address of the operation is the same.

@shuaijinchao
Copy link
Member

At present, this problem still exists in the latest version of APISIX, and we will fix this problem in the recent version.

@shuaijinchao shuaijinchao added the bug Something isn't working label Nov 14, 2021
spacewander added a commit to spacewander/incubator-apisix that referenced this issue Sep 2, 2022
This issue is introduced in the bugfix:
apache#5589

This PR fixes the bug added by that bugfix, and also
introduce a new test (TEST 6) to cover issue
apache#4077.

The fix of that issue is submitted in
api7/lua-resty-healthcheck#5
which fixes the bug in an appropriate way.

Signed-off-by: spacewander <spacewanderlzx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants