Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(http-log) remove res nil pointer possibility #3947

Merged
merged 2 commits into from Nov 28, 2018

Conversation

Projects
None yet
6 participants
@jeremyjpj0916
Copy link
Contributor

commented Nov 3, 2018

Summary

res can be nil during bad tx, added additional retry logic for failed requests.

Side note, not sure if Kong wants it, there may be value in logging when a non 200 response comes back too, my snippet is here. Can probably do without the tostring() for perf if you want to bring it into the PR:

    if not res then
      log(ERR, "failed request to ", host, ":", tostring(port), ": ", err)
      return false    
    end

    res:read_body()
    if res.status ~= 200 then
      log(ERR, "Http log host returned status code: ",  tostring(res.status))
    end  

Another consideration may be not doing res:read_body() in the code at all as it serves no purpose at the moment that I can see.

Full changelog

  • Modify /kong/plugins/http-log/sender.lua
@subnetmarco

This comment has been minimized.

Copy link
Member

commented Nov 3, 2018

Note: We should check the other logging plugins to find out if they should also be updated.

Show resolved Hide resolved kong/plugins/http-log/sender.lua Outdated
@thibaultcha

This comment has been minimized.

Copy link
Member

commented Nov 3, 2018

Another consideration may be not doing res:read_body() in the code at all as it serves no purpose at the moment that I can see.

We need it since we keep the connection alive and we need to consume the data from the buffer before it becomes reused by another request.

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 4, 2018

@thibaultcha So here:

local httpc = http.new()
does not count for a new instantiation of the http client every time? What you describe sounds similar to something of a singleton persistent object but, when I see something declared as "new" in a loop I usually think of it as being newly declared. Anything special that makes it behave like that, would that mean any place I ever use this http client in code and set keepalive I need to do such a call, guessing so?

@thibaultcha

This comment has been minimized.

Copy link
Member

commented Nov 5, 2018

@jeremyjpj0916

So here:

local httpc = http.new()
does not count for a new instantiation of the http client every time?

On the Lua-land, it is a new object, but since previous requests put the socket into the connection pool via set_keepalive(), we cannot assume that the underlying socket is new, it may come from the connection pool. See https://github.com/openresty/lua-nginx-module#tcpsocksetkeepalive and note:

When the system receive buffer for the current connection has unread data, then this method will return the "connection in dubious state" error message (as the second return value) because the previous session has unread data left behind for the next session and the connection is not safe to be reused.

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

Thanks for the insights, committed your latest suggested change as I think it makes the most sense and is clean. Looks like a check has failed though :|, will take a look.

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

Committed my latest version. This moves the keepalive setting above the request, which feels fine to me to get http "settings" out of the way prior to request(and fulfills setting the keepalive before returning that Thibaultcha wanted). Lemme know if any issues seen, breaking any laws by setting a keepalive prior to the request?

Show resolved Hide resolved kong/plugins/http-log/sender.lua Outdated
Show resolved Hide resolved kong/plugins/http-log/sender.lua Outdated
@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

I think the code is in a good state right now. On tests not sure how to best capture these edge situations, any tests already out there around this new http-log plugin now, and what you are thinking would be good to see?

@thibaultcha

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

The best way would probably be to use a faulty HTTP server that would accept a connection, but then produce an invalid HTTP response or cause a timeout. Maybe our mock HTTP server can do the job here (see this endpoint). See how we make use of it in some integration tests: https://github.com/Kong/kong/blob/master/spec/03-plugins/01-tcp-log/01-tcp-log_spec.lua#L87 and how you can retrieve its URL from the spec.helpers module: https://github.com/Kong/kong/blob/master/spec/helpers.lua#L1151-L1152

In this case, we would configure an http-log plugin to point to this faulty HTTP endpoint, and configure a very low timeout, so that the test runs in a timely fashion (slow tests asserting 10s timeout are in effect are no fun...).

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

I guess the additional desired assertions and what is being validated are what I am unsure of. I see some of the existing tests of the http log plugin I found here:
https://github.com/Kong/kong/blob/next/spec/03-plugins/03-http-log/01-log_spec.lua

These seem more general settings validations (as opposed to behavior seen in bad network conditions) so I have nothing easy to mooch off for new tests.

Assert:

  1. That http-log successfully handles a faulty timed out http logger call (as opposed to throwing a nil pointer)? Maybe using the ["/delay/:duration"] mock upstream call (assert on return values of the lower level methods themselves though?)

  2. Some way to validate http client keepalive behavior(no clue how to validation here) ?

  3. Others ??

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

Also followup here after re-reading some of the thread got me thinking.

What was suggested

local res, err = httpc:request(...)
if not res then
  log(ERR, ...)
  httpc:set_keepalive(...) -- ignore errors
  return false
end

res:read_body()

local ok, err = httpc:set_keepalive(...)
if not ok then
  log(ERR, ...)
end

You mentioned:
if you call set_keepalive before calling receive (what read_body does under the hood), you'll get an error because the Lua-land socket object will be flipped to the "closed" state.
I have updated my previous suggestion to reflect how we usually handle this scenario (keepalive on a failed request)


Isn't calling the set_keepalive going to fail then in the not res block always, because we are doing so before a proper read_body could be executed? Would it not just be dead code, and returning false before even attempting setting the keepalive be more appropriate?

@thibaultcha

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

No, the connection is still opened but the socket received no data yet since the request was not sent.

@CLAassistant

This comment has been minimized.

Copy link

commented Nov 14, 2018

CLA assistant check
All committers have signed the CLA.

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 14, 2018

@thibaultcha I tried adding a new test to capture the original bug... It feels incomplete but I am not sure what do to here:

I added a new route and a http-log config that would be faulty(calling the delay 1 second mock path, while setting a timeout on the http-log plugin itself to 1 millisecond):

      local route6 = bp.routes:insert {
        hosts   = { "https_logging_faulty.test" },
        service = service2
      }

      bp.plugins:insert {
        route = { id = route6.id },
        name     = "http-log",
        config   = {
          http_endpoint = "https://" .. helpers.mock_upstream_ssl_host
                                     .. ":"
                                     .. helpers.mock_upstream_ssl_port
                                     .. "/delay/1",
          timeout = 1
        }
      }

Then well for the test, I call the route(which under the hood would invoke the faulty http-log plugin config with the delay endpoint which should run into the timeout. Which old code would throw a nil pointer exception trying to read res:body(), BUT does a unit test like this see a nil pointer exception and fail the test? I imagine it has to be achieved with asserts but I don't know what could go in this test(how do I assert a nil pointer never occurred in an underlying plugin function? Maybe I am not thinking about this correctly).

    it("gracefully handles layer 4 failures", function()
      local res = assert(proxy_client:send({
        method  = "GET",
        path    = "/status/200",
        headers = {
          ["Host"] = "https_logging_faulty.test"
        }
      }))
      assert.res_status(200, res)
    end)
@thibaultcha

This comment has been minimized.

Copy link
Member

commented Nov 14, 2018

I’m on a flight now, so this will be a short reply, sorry.

Parsing the log files and making sure that no error were written to it is an option, we do this in a few test cases (unfortunately we do not have a help function for that, but there is previous set to draw inspiration from).

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 14, 2018

@thibaultcha short reply was all that was needed, found the code snippet you were describing in a test case and added it to mine.

@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 15, 2018

Also adding one more thing you will want from another issue: #3970, I fixed it in my splunk code but didn't think to add it here. More the merrier to knock out getting yalls codebase ship shape.

Was going to do something like this as a test case for it too, but is it possible to declare a global version of the plugin in this unit test that would not impact the other unit tests? Is there a different unit test that gets used if I want to dabble with a global http-log, wish I could hardline it in straight to the specific unit test I am working with rather than at the top which would impact all unit tests.

    it("successfully executes globally when routes don't exist", function()
       --This assumes http-log is already enabled on a global level for the kong node!

    	-- setup: cleanup logs
      local test_error_log_path = helpers.test_conf.nginx_err_logs
      os.execute(":> " .. test_error_log_path)

      local res = assert(proxy_client:send({
        method  = "GET",
        path    = "/nonexistant/proxy/path",
        headers = {
          ["Host"] = "https_no_exist.test"
        }
      }))
      assert.res_status(404, res)

      -- Assertion: there should be no [error], including no error
      -- resulting from attempting to reference the id on
      -- a route or api when no such value exists

      local pl_file = require "pl.file"
      local logs = pl_file.read(test_error_log_path)

      for line in logs:gmatch("[^\r\n]+") do
        assert.not_match("[error]", line, nil, true)
      end
    end)

Its true you can eyeball the change and automatically know why that extra or "somevalue" is necessary if Kong is invoked on a route that does not exist and the plugin is global.

fix(http-log) remove res nil pointer possibility
res can be nil during bad tx, added additional retry logic for failed requests

it("successfully executes globally when route does not exist", function()
-- setup: cleanup logs
local test_error_log_path = helpers.test_conf.nginx_err_logs

This comment has been minimized.

Copy link
@thibaultcha

thibaultcha Nov 22, 2018

Member

We should not do this ourselves, a previous call to stop_kong (from the sibling describe block for example) should take care of it for us.

This comment has been minimized.

Copy link
@jeremyjpj0916

jeremyjpj0916 Nov 22, 2018

Author Contributor

So i just committed what I think you wanted, basically don't do the cleanup part at the top anymore but just reference the helpers nginx_err_logs down below. That's what you wanted right?

@hishamhm
Copy link
Member

left a comment

Change and test look good to me!

suggested changes were made

@hishamhm hishamhm merged commit 98b3ebd into Kong:next Nov 28, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@jeremyjpj0916

This comment has been minimized.

Copy link
Contributor Author

commented Nov 29, 2018

Good stuff. Thanks to all involved in progressing this one 👍 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.