Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented retry for task messaging commands: #114 #188

Merged
merged 4 commits into from
Dec 6, 2012

Conversation

hjoliver
Copy link
Member

To test this:

  1. in site/user config set a short timeout and retry interval under [task messaging]
  2. run a test suite with dummy tasks, until at least one task was submitted
  3. stop the suite and copy the generated job script
  4. run the job script manually in a terminal with the suite stopped, then start the suite up while the task is in the middle of its messaging retries.

Note that the connection timeout does not apply if "connection failed" occurs (no suite), only if a connection is made but is not completed for some reason (suspend a suite with Ctrl-Z to see this).

@hjoliver
Copy link
Member Author

This addresses Issue #114

@hjoliver
Copy link
Member Author

Should we put task messaging calls in the background so the retry process does not actually hold up the task?

@matthewrmshin
Copy link
Contributor

It would be a good idea to background the messaging calls.

@dpmatthews
Copy link
Contributor

You'd need to ensure that any previous message had completed before trying to send a new message?

@hjoliver
Copy link
Member Author

hjoliver commented Dec 2, 2012

Not necessarily, so long as message receive is atomic and we stop treating out-of-order messages as an error condition #115. But - I guess I shouldn't background the started message until that is the case. New issue ticket: #192

The first cut did not keep the validation-expanded user file so type
coercion of non-string user values was not retained (this is somewhat
tricky because we are using a single configspec to set default values
for two config files (user and site) that combine to get the result.)
@hjoliver
Copy link
Member Author

hjoliver commented Dec 3, 2012

(I think this is done now).

@arjclark
Copy link
Contributor

arjclark commented Dec 4, 2012

Appears to be working as expected. However, conflicts arise on merging with current master and parts of cylc seem to break once a merge is complete relating to:

KeyError: 'run directory'

It may be that I'm failing to correctly resolve the conflicts by hand. @hjoliver could you take a look at this?

Conflicts:
	bin/cylc-submit
	conf/siterc/cfgspec
	lib/cylc/scheduler.py
@hjoliver
Copy link
Member Author

hjoliver commented Dec 4, 2012

@arjclark I've merged from master and addressed the conflicts - try again now. Apologies for the hassle - in future I'll do this as a matter of course whenever github says the merge can't be done automatically.

@arjclark
Copy link
Contributor

arjclark commented Dec 6, 2012

Tested as working our end. Thanks @hjoliver for taking a look at that.

matthewrmshin added a commit that referenced this pull request Dec 6, 2012
Implemented retry for task messaging commands: #114
@matthewrmshin matthewrmshin merged commit 9da07ae into cylc:master Dec 6, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants