Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: ES client timeout is not configurable #556

Merged
merged 1 commit into from May 16, 2017

Conversation

scollazo
Copy link
Member

With big METs files, 10 seconds might not be enough. This creates a configuration parameter in order to configure it.

I went for the conservative approach of only changing the ES aip index call, but this can also be handled at connection level, with something like:

es_client = Elasticsearch(**{
    'hosts': _es_hosts,
    'timeout': request_timeout,
    'dead_timeout': 2,
})

Refs: #10734

@qubot qubot closed this Mar 10, 2017
@Hwesta
Copy link
Contributor

Hwesta commented Mar 16, 2017

This was against qa/1.6.x, which has been replaced by stable/1.6.x Changing what branch this is merging to and reopening.

@Hwesta Hwesta changed the base branch from qa/1.6.x to stable/1.6.x March 16, 2017 22:13
@Hwesta Hwesta reopened this Mar 16, 2017
@Hwesta Hwesta added this to the 1.7.0 milestone Mar 16, 2017
@qubot qubot force-pushed the dev/elasticsearch-timeout-10734 branch 2 times, most recently from 4ed0398 to 7b70482 Compare April 26, 2017 22:09
@scollazo scollazo changed the title Common: set request_timeout configurable when indexing AIPs Common: set es timeout configurable when indexing AIPs Apr 26, 2017
@scollazo scollazo changed the title Common: set es timeout configurable when indexing AIPs Problem: ES client timeout is not configurable Apr 26, 2017
@qubot qubot force-pushed the dev/elasticsearch-timeout-10734 branch from 7b70482 to e323ed5 Compare April 26, 2017 22:15
@sevein sevein requested a review from jraddaoui May 9, 2017 20:31
@sevein
Copy link
Member

sevein commented May 9, 2017

@jraddaoui it would be awesome if you can give this a try. I would start looking at how the elasticsearchFunctions module is used - the dashboard initializes the module once from here: src/dashboard/src/wsgi.py - meaning that when a web worker is initialized to process web requests the first thing that happens is what it's in wsgi.py. Once you run elasticSearchFunctions.setup_reading_from_client_conf() the module holds a copy of the client which you can obtain with elasticSearchFunctions.get_client(). You can see the Dashboard does that quite a lot from different places. It's like a singleton really.

So Santi's PR is about setting a timeout when we hit ES. Once you have AM working locally (qa/1.x) you could verify that the change works by hitting any page in the dashboard that makes use of ES, e.g. the archival storage tab. Without ES running, the page return after 10 seconds with some error instead of blocking forever. If you add elasticsearchTimeout = 5 to your /etc/archivematica/MCPClient/etc/clientConfig.conf then it should only wait for 5 seconds. You would need to restart the Dashboard for the changes to apply I believe.

@sevein
Copy link
Member

sevein commented May 9, 2017

Once we confirm this is working properly we should include this in qa/1.x and stable/1.6.x.

@jraddaoui
Copy link
Contributor

Hi @sevein and @scollazo , I was tryting first to reproduce the issue in qa/1.x (without the fix) and, with the ES service down, I got an error page right away with the following message:

ConnectionError(<urllib3.connection.HTTPConnection object at 0x7f62e40c93d0>: Failed to establish a new connection: [Errno 111] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f62e40c93d0>: Failed to establish a new connection: [Errno 111] Connection refused)

Am I missing something to reproduce the issue? Maybe it's not about ES not being running but about big transfers/METS files being indexed?

@jraddaoui
Copy link
Contributor

Nevermind, I just saw the Redmine ticket, I'll try with a big ingest.

@sevein
Copy link
Member

sevein commented May 11, 2017

I was expecting the ES client to hang until the timeout is reached but maybe that's not the case. You may find other ways to reproduce but creating an ingest big enough may not be as easy. I wouldn't go too crazy, this change is pretty safe.

@qubot qubot force-pushed the dev/elasticsearch-timeout-10734 branch from e323ed5 to 42128d5 Compare May 12, 2017 20:33
@sevein sevein changed the base branch from stable/1.6.x to qa/1.x May 12, 2017 23:50
@qubot qubot force-pushed the dev/elasticsearch-timeout-10734 branch from 42128d5 to a4a62b4 Compare May 13, 2017 19:18
@jraddaoui
Copy link
Contributor

jraddaoui commented May 13, 2017

Hi @scollazo,

It took me a while to understand how this works and after I set the config setting in the right place I got the following exception:

  File "/usr/lib/archivematica/MCPClient/clientScripts/indexAIP.py", line 128, in <module>
    sys.exit(index_aip())
  File "/usr/lib/archivematica/MCPClient/clientScripts/indexAIP.py", line 63, in index_aip
    client = elasticSearchFunctions.get_client()
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 171, in get_client
    create_indexes_if_needed(_es_client)  # TODO: find a better place!
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 216, in create_indexes_if_needed
    create_index(client, 'aips')
  File "/usr/lib/archivematica/archivematicaCommon/elasticSearchFunctions.py", line 279, in create_index
    response = client.indices.create(index, ignore=400)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/client/indices.py", line 103, in create
    params=params, body=body)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/share/python/archivematica-mcp-client/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 89, in perform_request
    raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError(a float is required) caused by: TypeError(a float is required)

Changing config.get() to config.getfloat() fixed the error but I'm still not able to create a timeout by changing that config setting to 0.1 or 1, when the index AIP task is taking 6 seconds. Moreover, the fix you mention in the Redmine ticket sets the "request_timeout" in the index and search calls, while this fix sets the "timeout" while setting up the client. Maybe the default request timeout is prominent than the index timeout or maybe I'm missing something else, but I think we should double check this is working as expected before merging.

One thing Sevein and I noticed is that those settings are case insensitive, so I think it would be a good practice to use snake_case for them in the future. Also, do we have them documented somewhere? It would be nice to add a note about the new setting if we do.

Finally, we try to follow this guidelines for the commit message, specially the part about the message and content lines length.

I've updated this branch to merge the fix in qa/1.x and then cherry-pick it to stable/1.6.x and I've amended your commit fixing the float error and the commit message, but please, could you double check this is working as expected.

Thanks!

@sevein
Copy link
Member

sevein commented May 13, 2017

Changing config.get() to config.getfloat() fixed the error but I'm still not able to create a timeout by changing that config setting to 0.1 or 1, when the index AIP task is taking 6 seconds.

So... is it not working?

@jraddaoui
Copy link
Contributor

After parsing the value to a float the exception is not thrown, but it looks like the timeout value set in the client has no effect over the index request, at least not trying to set it down to force a timeout ...

@jraddaoui
Copy link
Contributor

jraddaoui commented May 13, 2017

Not even setting request_timeout=0.1 in https://github.com/artefactual/archivematica/blob/qa/1.x/src/archivematicaCommon/lib/elasticSearchFunctions.py#L442 creates a timeout for me while the index AIP task is executed ...

@sevein
Copy link
Member

sevein commented May 14, 2017 via email

@hakamine
Copy link
Member

hakamine commented May 15, 2017

This should be high priority, currently affecting a client production deployment (in stable/1.6.x). Adding myself as a reviewer.

@hakamine hakamine self-requested a review May 15, 2017 23:22
@scollazo
Copy link
Member Author

After setting elasticsearchTimeout = 0.1 in clientConfig, and restart mcp-client, I got this:

ERROR: error trying to index. ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host=u'localhost', port=9200): Read timed out. (read timeout=0.1)) ERROR: error trying to index.

The aip is finally indexed, after 6 timeouts like this, and the task duration goes to 62 seconds, while the default configuration does it in 3 . The default retry time is 10 seconds, so the behavior seems ok to me.

@jraddaoui
Copy link
Contributor

jraddaoui commented May 16, 2017

After chatting with Santiago and reducing the timeout setting to 0.01 and the ES memory to 256m I've been able to recreate the timeout.

Sorry for the troubles and thanks for you help @scollazo and @sevein

With big METs files, 10 seconds might not be enough. This creates
a configuration parameter in order to configure elasticsearch timeout.
@qubot qubot force-pushed the dev/elasticsearch-timeout-10734 branch from a4a62b4 to e4b9df6 Compare May 16, 2017 17:18
@qubot qubot merged commit e4b9df6 into qa/1.x May 16, 2017
@qubot qubot deleted the dev/elasticsearch-timeout-10734 branch May 16, 2017 17:20
@hakamine
Copy link
Member

I tested this PR on top of stable/1.6.x and it seems to be working fine. Will cherry pick to stable/1.6.x

@jraddaoui
Copy link
Contributor

Hi Hector, I've already done that ;)

@hakamine
Copy link
Member

Just noticed it's already cherry picked to stable/1.6.x. Thanks!

@nickwilkinson
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants