Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference tests and related improvements #3286

Merged
merged 1 commit into from Aug 25, 2019
Merged

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Aug 12, 2019

These changes are attempts to partially address #2894 and #3046.

User visible changes:

  • Removed settings:
    • [cylc]log resolved dependencies
    • [cylc][[reference test]]* except expected task failures.
  • Moved [cylc]abort if any task fails to [cylc][[events]]abort if any task fails so it lives with the other abort if/on ... settings.
  • Removed the cylc check-triggering command.
  • Log task trigger regardless, at level INFO.
  • Fixed cylc submit command - job unable to load job.sh.
  • Fixed cylc run --stop-cycle-point=POINT logic.
  • Ignore job poll message, when task is already in a retrying state. This fixes a flaky test in a busy environment when multiple messages and polls come in at quite a large interval - confusing the event manager.
    Fixed: retrying held tasks should no longer be released for submission (broken by hold_swap => is_held #3230).

Internal changes:

  • Cleaner reference test logic:
    • Detect reference test option automatically on shutdown.
    • Less logic required to deal with reference test configuration.
    • Remove the need to run an external command.
    • Generate a filtered test log in reference test - less loading/parsing.
    • Print only messages for reference/test log. (No more unnecessary date/time, level, etc in future reference logs.)
    • Parse reference/test logs on opened file handles instead of loading the full logs into memory.
  • Simplify abort on task failure logic.
  • Ensure that all test suite directories are cleaned up on success.
  • Taking out various suite timeout settings in the tests that were causing instability in busy environments. Auto inject 3 minutes inactivity settings in reference tests. (The timeout setting relies on the suite to stall before timing out. The inactivity setting is better in this respect.)

To follow on after this PR:

  • Look at any battery test files that are taking too long to run, and find ways to reduce.
  • Look at any battery test files that can be converted to unit tests.
  • Modify remaining test files (shell scripts) to pass shellcheck in normal mode.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Appropriate tests are included (unit and/or functional).
  • Already covered by existing tests.
  • Appropriate change log entry included.

@matthewrmshin
Copy link
Contributor Author

Discovered several flaws in various tests on forward port of #3288. All fixed now.

@cylc cylc deleted a comment Aug 13, 2019
@cylc cylc deleted a comment Aug 13, 2019
@cylc cylc deleted a comment Aug 14, 2019
@cylc cylc deleted a comment Aug 14, 2019
@cylc cylc deleted a comment Aug 14, 2019
@cylc cylc deleted a comment Aug 14, 2019
@matthewrmshin matthewrmshin marked this pull request as ready for review August 14, 2019 21:55
@cylc cylc deleted a comment Aug 14, 2019
@matthewrmshin
Copy link
Contributor Author

Sorry for the large number of files changed. I'll squash the branch later, but I want to get the test to pass first. Hopefully, it will now pass on a single attempt on Travis CI. (It has been doing that in my environment in my latest attempts.) 🤞

@hjoliver
Copy link
Member

One test failed, damn it 😠

@matthewrmshin
Copy link
Contributor Author

Should have moved those tests to flaky, but I haven't seen them fail for quite a while now. Unfortunately, I've just got 2 other failures in the test running in my environment. Usual suspects. I'll take a look at them tomorrow as well.

@cylc cylc deleted a comment Aug 14, 2019
@matthewrmshin
Copy link
Contributor Author

I have done enough for the PR. Would appreciate a quick review so I can move on to something else.

Travis CI passes, but there is no sign of Codacy and CodeCov checks for some reason. (Perhaps due to the earlier Github outage?)

@matthewrmshin
Copy link
Contributor Author

Re-based.

@matthewrmshin
Copy link
Contributor Author

A happy one-try-pass in my environment this morning 🌞 running:

./etc/bin/run-functional-tests.sh -j 12 ./tests/ && ./etc/bin/run-functional-tests.sh -j 1 ./flakytests/

Let's hope Travis CI is as happy.

@hjoliver
Copy link
Member

I have done enough for the PR. Would appreciate a quick review so I can move on to something else.

Travis CI passes, but there is no sign of Codacy and CodeCov checks for some reason. (Perhaps due to the earlier Github outage?)

I'll get on to this ASAP - thought the reload bug fix higher priority today.

User visible changes:
* Removed settings:
  * `[cylc]log resolved dependencies`
  * `[cylc][[reference test]]*` except `expected task failures`.
* Moved `[cylc]abort if any task fails` to `[cylc][[events]]abort if any task fails` so it lives with the other `abort if/on ...` settings.
* Removed the `cylc check-triggering` command.
* Log task trigger regardless, at level INFO.
* Fixed `cylc submit` command - job unable to load `job.sh`.
* Fixed `cylc run --stop-cycle-point=POINT` logic.
* Ignore job poll message, when task is already in a *retrying* state. This fixes a flaky test in a busy environment when multiple messages and polls come in at quite a large interval - confusing the event manager.
* Fixed: retrying held tasks should no longer be released for submission.

Internal changes:
* Cleaner reference test logic:
  * Detect reference test option automatically on shutdown.
  * Less logic required to deal with reference test configuration.
  * Remove the need to run an external command.
  * Generate a filtered test log in reference test - less loading/parsing.
  * Print only messages for reference/test log. (No more unnecessary date/time, level, etc in future reference logs.)
  * Parse reference/test logs on opened file handles instead of loading the full logs into memory.
* Simplify abort on task failure logic.
* Taking out various suite *timeout* settings in the tests that were causing instability in busy environments. Auto inject 3 minutes *inactivity* settings in reference tests. (The *timeout* setting relies on the suite to stall before timing out. The *inactivity* setting is better in this respect.)
@@ -99,7 +99,7 @@ def log_task_job_activity(ctx, suite, point, name, submit_num=None):
LOG.debug(ctx_str)


class TaskEventsManager(object):
class TaskEventsManager():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3 class definition syntax: the official docs actually omit the parentheses here (if no explicit inheritance). Doesn't really matter though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still interesting to know, had no idea it worked (actually quickly tested in a terminal here 😬 )

@cylc cylc deleted a comment Aug 25, 2019
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@hjoliver
Copy link
Member

hjoliver commented Aug 25, 2019

(@kinow - as 2nd reviewer, no need to spend too much time going over this; it's mostly "just tests" and pretty clearly a nice improvement).

@kinow kinow merged commit 283bda1 into cylc:master Aug 25, 2019
@kinow
Copy link
Member

kinow commented Aug 25, 2019

Oh, looking forward to some more stability in the functional tests now. Thanks @matthewrmshin !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants