Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load test suite #819

Merged
merged 26 commits into from Aug 1, 2017
Merged

Load test suite #819

merged 26 commits into from Aug 1, 2017

Conversation

mohierf
Copy link
Contributor

@mohierf mohierf commented May 20, 2017

  1. Add a test suite for load tests

  2. Cleanings in the inter-daemon communication

    • catch connection errors and connection timeout as separate exceptions in the HTTP client
    • cleanings in the SatelliteLink with those exceptions
    • add an instance id in the receiver daemon (missing to get broks from this daemon)
  3. Cleanings in the scheduler loop

    • replace the handle_requests/sockets waiting function with a sleep
    • raise WARNING logs when the scheduler loop is over 1 second
    • add some counters and statistics
    • log counters when TEST_LOG_LOOP environment variable is set
  4. Some small fixes


Initially this PR was intended to create a load test suite that runs many checks for a lot of hosts but this raised some problems in the inter-daemon communication that were needed to get fixed!

@mohierf
Copy link
Contributor Author

mohierf commented Jun 4, 2017

I just rebased on develop to fix the conflicts

:param timeout: timeout to wait for activity
:type timeout: float
:return:Returns a 2-tuple:
* first value is the time spent for the time change chekc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: check instead chekc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

after = time.time()

if after - before > timeout:
return after - before, time_changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer store the after - before in a variable to not have to calculate the same thing many times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler does it for you 😉

if after - before > timeout:
return after - before, time_changed
# Time to sleep
time.sleep(timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not better to do a sleep of timeout - (after - before) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a very slight difference, but yes...

con = link['con'] = HTTPClient(uri=link['uri'],
strong_ssl=link['hard_ssl_name_check'],
timeout=timeout, data_timeout=data_timeout)
except HTTPClientConnectionException as exp: # pragma: no cover, simple protection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have exceptions in create connection, so we need define link['con'] = None in all exceptions here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need add it to all except HTTP.... in all modifications, because we have same thing in many files in your PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... not sure about this. I am looking... breaking the connection when a bad connetion or time out happens is not necessarily a good idea!


is_active = sched['active']
if not is_active:
logger.warning("The scheduler '%s' is not active, it is not possible to get broks "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message must be it's not possible to push external commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@@ -149,6 +151,120 @@ def get_external_commands(self):
self.external_commands = []
return res

@staticmethod
def is_connection_try_too_close(link, delay=5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name not very explicite :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not me that defined this ... I kept using a former existing function :/


def do_pynag_con_init(self, s_id, s_type='scheduler'):
"""Initialize a connection with scheduler having 'uuid'
Return the new connection to the scheduler if it succeeded,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return is always true, so make description like set the connection if it succeeded, otherwise set it to None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


:param s_id: scheduler s_id to connect to
:type s_id: int
:return: scheduler connection object or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return is always true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

self.pynag_con_init(sched_id)
sched['con'] = con

if con:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify by use else:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... you found out a bug 😉
It should be: con = sched['con'] ... I did not catched this one!


:param check: check to add
:type check: alignak.check.Check
:return: None
"""
if check is None:
return
if check.uuid in self.checks:
logger.debug("Already existing check: %s", check)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have it, it's not normal, so prefer use error or warning instead debug.
Same for actions, event_handler...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not a bug... if you set this at WARNING level, you will be spammed on each scheduler start/restart/new configuration ! This is why I left the DEBUG log level...

@mohierf
Copy link
Contributor Author

mohierf commented Jun 9, 2017

I am preparing a small refactoring of the Satellite/BaseSatellite classes ...

@mohierf
Copy link
Contributor Author

mohierf commented Jun 9, 2017

@ddurieux: I will merge this PR by myslef after having launched the tests on a more powerful server and after some more checks in the volume of logs produced by each daemon

@Alignak-monitoring Alignak-monitoring deleted a comment from coveralls Jun 9, 2017
@Alignak-monitoring Alignak-monitoring deleted a comment from coveralls Jun 9, 2017
@Alignak-monitoring Alignak-monitoring deleted a comment from coveralls Jun 9, 2017
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 84.739% when pulling 9d96008 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 84.779% when pulling 9a14a86 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 84.739% when pulling 9a14a86 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 84.732% when pulling 40d0084 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 84.91% when pulling a74eaea on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 84.769% when pulling cf1a341 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 84.723% when pulling 82b479a on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 84.731% when pulling 13c7311 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 84.886% when pulling 3c28635 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 84.894% when pulling 59ab106 on load-test-suite into 38b183e on develop.

@coveralls
Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage increased (+8.2%) to 93.825% when pulling 59ab106 on load-test-suite into 38b183e on develop.

@mohierf
Copy link
Contributor Author

mohierf commented Aug 1, 2017

As discussed with @ddurieux I merge this PR. Many uninterrupted days of tests with this source code on several servers make it considered as stable enough

@mohierf mohierf merged commit c05884e into develop Aug 1, 2017
@mohierf mohierf deleted the load-test-suite branch August 1, 2017 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erroneous what_i_managed daemon interface Actions never come back from poller/reactionner
3 participants