Problem: ES client timeout is not configurable #556

scollazo · 2017-01-13T18:33:14Z

With big METs files, 10 seconds might not be enough. This creates a configuration parameter in order to configure it.

I went for the conservative approach of only changing the ES aip index call, but this can also be handled at connection level, with something like:

es_client = Elasticsearch(**{
    'hosts': _es_hosts,
    'timeout': request_timeout,
    'dead_timeout': 2,
})

Refs: #10734

Hwesta · 2017-03-16T22:13:35Z

This was against qa/1.6.x, which has been replaced by stable/1.6.x Changing what branch this is merging to and reopening.

sevein · 2017-05-09T20:47:23Z

@jraddaoui it would be awesome if you can give this a try. I would start looking at how the elasticsearchFunctions module is used - the dashboard initializes the module once from here: src/dashboard/src/wsgi.py - meaning that when a web worker is initialized to process web requests the first thing that happens is what it's in wsgi.py. Once you run elasticSearchFunctions.setup_reading_from_client_conf() the module holds a copy of the client which you can obtain with elasticSearchFunctions.get_client(). You can see the Dashboard does that quite a lot from different places. It's like a singleton really.

So Santi's PR is about setting a timeout when we hit ES. Once you have AM working locally (qa/1.x) you could verify that the change works by hitting any page in the dashboard that makes use of ES, e.g. the archival storage tab. Without ES running, the page return after 10 seconds with some error instead of blocking forever. If you add elasticsearchTimeout = 5 to your /etc/archivematica/MCPClient/etc/clientConfig.conf then it should only wait for 5 seconds. You would need to restart the Dashboard for the changes to apply I believe.

sevein · 2017-05-09T20:49:00Z

Once we confirm this is working properly we should include this in qa/1.x and stable/1.6.x.

jraddaoui · 2017-05-11T12:16:23Z

Hi @sevein and @scollazo , I was tryting first to reproduce the issue in qa/1.x (without the fix) and, with the ES service down, I got an error page right away with the following message:

ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f62e40c93d0>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f62e40c93d0>: Failed to establish a new connection: [Errno 111] Connection refused)

Am I missing something to reproduce the issue? Maybe it's not about ES not being running but about big transfers/METS files being indexed?

jraddaoui · 2017-05-11T12:25:22Z

Nevermind, I just saw the Redmine ticket, I'll try with a big ingest.

sevein · 2017-05-11T16:34:33Z

I was expecting the ES client to hang until the timeout is reached but maybe that's not the case. You may find other ways to reproduce but creating an ingest big enough may not be as easy. I wouldn't go too crazy, this change is pretty safe.

jraddaoui · 2017-05-13T19:23:33Z

Hi @scollazo,

It took me a while to understand how this works and after I set the config setting in the right place I got the following exception:

  File "/usr/lib/archivematica/MCPClient/clientScripts/indexAIP.py", line 128, in <module>
    sys.exit(index_aip())
  File "/usr/lib/archivematica/MCPClient/clientScripts/indexAIP.py", line 63, in index_aip
    client = elasticSearchFunctions.get_client()
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 171, in get_client
    create_indexes_if_needed(_es_client)  # TODO: find a better place!
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 216, in create_indexes_if_needed
    create_index(client, 'aips')
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 279, in create_index
    response = client.indices.create(index, ignore=400)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create
    params=params, body=body)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 89, in perform_request
    raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(a float is required) caused by: TypeError(a float is required)

Changing config.get() to config.getfloat() fixed the error but I'm still not able to create a timeout by changing that config setting to 0.1 or 1, when the index AIP task is taking 6 seconds. Moreover, the fix you mention in the Redmine ticket sets the "request_timeout" in the index and search calls, while this fix sets the "timeout" while setting up the client. Maybe the default request timeout is prominent than the index timeout or maybe I'm missing something else, but I think we should double check this is working as expected before merging.

One thing Sevein and I noticed is that those settings are case insensitive, so I think it would be a good practice to use snake_case for them in the future. Also, do we have them documented somewhere? It would be nice to add a note about the new setting if we do.

Finally, we try to follow this guidelines for the commit message, specially the part about the message and content lines length.

I've updated this branch to merge the fix in qa/1.x and then cherry-pick it to stable/1.6.x and I've amended your commit fixing the float error and the commit message, but please, could you double check this is working as expected.

Thanks!

sevein · 2017-05-13T19:34:23Z

Changing config.get() to config.getfloat() fixed the error but I'm still not able to create a timeout by changing that config setting to 0.1 or 1, when the index AIP task is taking 6 seconds.

So... is it not working?

jraddaoui · 2017-05-13T19:43:06Z

After parsing the value to a float the exception is not thrown, but it looks like the timeout value set in the client has no effect over the index request, at least not trying to set it down to force a timeout ...

jraddaoui · 2017-05-13T20:38:32Z

Not even setting request_timeout=0.1 in https://github.com/artefactual/archivematica/blob/qa/1.x/src/archivematicaCommon/lib/elasticSearchFunctions.py#L442 creates a timeout for me while the index AIP task is executed ...

sevein · 2017-05-14T09:47:47Z

Radda, I would just wait to get some feedback from Santi and move on to the next PR. Check out the GPG stuff Joel pushed yesterday, there are four PRs that depend on each other. Give it a try!

…

On May 13, 2017 12:43, "José Raddaoui Marín" ***@***.***> wrote: After parsing the value to a float the exception is not thrown, but it looks like the timeout value set in the client has no effect over the index request, at least not trying to set it down to force a timeout ... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#556 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlA--OxAtjlahTQlkrfC6Duo36uwaaUks5r5gfKgaJpZM4LjK8w> .

hakamine · 2017-05-15T23:21:52Z

This should be high priority, currently affecting a client production deployment (in stable/1.6.x). Adding myself as a reviewer.

scollazo · 2017-05-16T15:01:34Z

After setting elasticsearchTimeout = 0.1 in clientConfig, and restart mcp-client, I got this:

ERROR: error trying to index. ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=0.1)) ERROR: error trying to index.

The aip is finally indexed, after 6 timeouts like this, and the task duration goes to 62 seconds, while the default configuration does it in 3 . The default retry time is 10 seconds, so the behavior seems ok to me.

jraddaoui · 2017-05-16T17:15:50Z

After chatting with Santiago and reducing the timeout setting to 0.01 and the ES memory to 256m I've been able to recreate the timeout.

Sorry for the troubles and thanks for you help @scollazo and @sevein

With big METs files, 10 seconds might not be enough. This creates a configuration parameter in order to configure elasticsearch timeout.

hakamine · 2017-05-16T18:46:54Z

I tested this PR on top of stable/1.6.x and it seems to be working fine. Will cherry pick to stable/1.6.x

jraddaoui · 2017-05-16T18:48:31Z

Hi Hector, I've already done that ;)

hakamine · 2017-05-16T18:49:31Z

Just noticed it's already cherry picked to stable/1.6.x. Thanks!

nickwilkinson · 2017-06-06T22:06:41Z

Problem: ES client timeout is not configurable

qubot closed this Mar 10, 2017

Hwesta changed the base branch from qa/1.6.x to stable/1.6.x March 16, 2017 22:13

Hwesta reopened this Mar 16, 2017

Hwesta added this to the 1.7.0 milestone Mar 16, 2017

qubot force-pushed the dev/elasticsearch-timeout-10734 branch 2 times, most recently from 4ed0398 to 7b70482 Compare April 26, 2017 22:09

scollazo changed the title ~~Common: set request_timeout configurable when indexing AIPs~~ Common: set es timeout configurable when indexing AIPs Apr 26, 2017

scollazo changed the title ~~Common: set es timeout configurable when indexing AIPs~~ Problem: ES client timeout is not configurable Apr 26, 2017

qubot force-pushed the dev/elasticsearch-timeout-10734 branch from 7b70482 to e323ed5 Compare April 26, 2017 22:15

sevein requested a review from jraddaoui May 9, 2017 20:31

qubot force-pushed the dev/elasticsearch-timeout-10734 branch from e323ed5 to 42128d5 Compare May 12, 2017 20:33

sevein changed the base branch from stable/1.6.x to qa/1.x May 12, 2017 23:50

qubot force-pushed the dev/elasticsearch-timeout-10734 branch from 42128d5 to a4a62b4 Compare May 13, 2017 19:18

hakamine self-requested a review May 15, 2017 23:22

jraddaoui approved these changes May 16, 2017

View reviewed changes

common: configure es timeout per connection

e4b9df6

With big METs files, 10 seconds might not be enough. This creates a configuration parameter in order to configure elasticsearch timeout.

qubot force-pushed the dev/elasticsearch-timeout-10734 branch from a4a62b4 to e4b9df6 Compare May 16, 2017 17:18

qubot merged commit e4b9df6 into qa/1.x May 16, 2017

qubot deleted the dev/elasticsearch-timeout-10734 branch May 16, 2017 17:20

hakamine mentioned this pull request May 16, 2017

Add elasticsearch timeout config to /etc/archivematica/MCPClient/clientConfig.conf artefactual-labs/ansible-archivematica-src#112

Closed

jhsimpson modified the milestones: 1.6.1, 1.7.0 Jun 2, 2017

lwo mentioned this pull request Dec 8, 2017

ElasticSearch timeout when indexing a many-files AIP #831

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: ES client timeout is not configurable #556

Problem: ES client timeout is not configurable #556

scollazo commented Jan 13, 2017

Hwesta commented Mar 16, 2017

sevein commented May 9, 2017 •

edited

sevein commented May 9, 2017

jraddaoui commented May 11, 2017

jraddaoui commented May 11, 2017

sevein commented May 11, 2017

jraddaoui commented May 13, 2017 •

edited by sevein

sevein commented May 13, 2017

jraddaoui commented May 13, 2017

jraddaoui commented May 13, 2017 •

edited

sevein commented May 14, 2017 via email

hakamine commented May 15, 2017 •

edited

scollazo commented May 16, 2017

jraddaoui commented May 16, 2017 •

edited

hakamine commented May 16, 2017

jraddaoui commented May 16, 2017

hakamine commented May 16, 2017

nickwilkinson commented Jun 6, 2017

Problem: ES client timeout is not configurable #556

Problem: ES client timeout is not configurable #556

Conversation

scollazo commented Jan 13, 2017

Hwesta commented Mar 16, 2017

sevein commented May 9, 2017 • edited

sevein commented May 9, 2017

jraddaoui commented May 11, 2017

jraddaoui commented May 11, 2017

sevein commented May 11, 2017

jraddaoui commented May 13, 2017 • edited by sevein

sevein commented May 13, 2017

jraddaoui commented May 13, 2017

jraddaoui commented May 13, 2017 • edited

sevein commented May 14, 2017 via email

hakamine commented May 15, 2017 • edited

scollazo commented May 16, 2017

jraddaoui commented May 16, 2017 • edited

hakamine commented May 16, 2017

jraddaoui commented May 16, 2017

hakamine commented May 16, 2017

nickwilkinson commented Jun 6, 2017

sevein commented May 9, 2017 •

edited

jraddaoui commented May 13, 2017 •

edited by sevein

jraddaoui commented May 13, 2017 •

edited

hakamine commented May 15, 2017 •

edited

jraddaoui commented May 16, 2017 •

edited