Extend internal clients to allow for configurable timeouts #210

nurmi · 2019-06-05T02:53:40Z

The internal anchore http client handler has support for configuring timeouts on http connections, which currently is only used in select, targeted locations in the logic (for example, in the policy engine -> catalog upcall, put in as part of issue #154 ).

Under certain network conditions where an internal host/port starts holding connections indefinitely, other internal clients can experience blocking which only clears if the services are restarted (and the network condition is cleared).

As an operator of anchore engine, it would be a useful addition to be able to configure internal clients to timeout, in order to avoid indefinite blocking, even if this timeout value would set very high (as some internal anchore connections can be long lived).

armstrongli · 2019-06-11T09:01:47Z

we have encountered the problem about analyzer that it stops image scan after running around 2 hours.

all workers stop working. it is wired at first place. So we took some time to go deeper, and notice that it is one problem in infrastructure level about network connection.

we checked all the connections in all analyzers, and I found that
• all the workers stuck on loading analyze result to policy engine
• there are same number of connections connecting to policy engine in established state

we checked all the policy engine and notice that
• the policy engines have finished the image load work
• there are no connections from any client

It means that the connections have been closed from policy engine side, but analyzers don't get the FIN signal on closing TCP connection.
So workers stuck on the waiting for connection finish.

Then I checked the source code of anchore in http.py and notice that the timeout of connection is None. It means that the connection never timeouts if there are any package drop(or other reasons) in infrastructure level, and the connection will stuck.

So I did the change on the http client to add default timeout on all anchore requests(anchy post, update, get) to have default timeout.

Here is the change:

diff --git a/anchore_engine/clients/services/http.py b/anchore_engine/clients/services/http.py
index d91e42e..c1d0c20 100644
--- a/anchore_engine/clients/services/http.py
+++ b/anchore_engine/clients/services/http.py
@@ -55,7 +55,7 @@ def fpost_req(url, **kwargs):
     rawdata = b''
     jsondata = {}
     try:
-        r = requests.post(url, stream=True, **kwargs)
+        r = requests.post(url, stream=True, **dict(kwargs, timeout=1800))
         httpcode = r.status_code
         rawdata = b''
         for rchunk in r.iter_content(8192*100):
@@ -106,7 +106,7 @@ def fput_req(url, **kwargs):
     rawdata = b''
     jsondata = {}
     try:
-        r = requests.put(url, stream=True, **kwargs)
+        r = requests.put(url, stream=True, **dict(kwargs, timeout=1800))
         httpcode = r.status_code
         rawdata = b''
         for rchunk in r.iter_content(8192*100):
@@ -158,7 +158,7 @@ def fget_req(url, **kwargs):
     rawdata = b''
     jsondata = {}
     try:
-        r = requests.get(url, stream=True, **kwargs)
+        r = requests.get(url, stream=True, **dict(kwargs, timeout=1800))
         httpcode = r.status_code
         rawdata = b''
         for rchunk in r.iter_content(8192*100):
@@ -211,7 +211,7 @@ def fdelete_req(url, **kwargs):
     rawdata = b''
     jsondata = {}
     try:
-        r = requests.delete(url, stream=True, **kwargs)
+        r = requests.delete(url, stream=True, **dict(kwargs, timeout=1800))
         httpcode = r.status_code
         rawdata = b''
         for rchunk in r.iter_content(8192*100):

zhill · 2019-06-13T18:22:15Z

Fix in that adds two new global-level config options in config.yaml:

global_client_connect_timeout:
global_client_read_timeout:

The internal client code uses these as defaults if set. The defaults are 0.0 which is disabled, but a > 0 value enables them.

nurmi added the enhancement label Jun 5, 2019

zhill self-assigned this Jun 12, 2019

zhill closed this as completed in 9ab3a2a Jun 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend internal clients to allow for configurable timeouts #210

Extend internal clients to allow for configurable timeouts #210

nurmi commented Jun 5, 2019

armstrongli commented Jun 11, 2019

zhill commented Jun 13, 2019

Extend internal clients to allow for configurable timeouts #210

Extend internal clients to allow for configurable timeouts #210

Comments

nurmi commented Jun 5, 2019

armstrongli commented Jun 11, 2019

zhill commented Jun 13, 2019