New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale in unregistered htex blocks #3232
Conversation
…caled in do this by submitting an infinite sleep worker
…at least one manager registered how to test this? make an htex with init_blocks=min_blocks=1 block and a worker command that does something like sleep forever (it needs to not fail, because failed blocks wont' get scaled in). and then shutdown without any tasks. in the old behaviour that unregistered block will not be scaled in, i think, but in the new behaviour it should be.
Since I wrote this code a few weeks ago, there's been a change to the high throughput executor status method to introduce a "MISSING" state, and I've also gone over the general status code with @yadudoc a bit. Most importantly, I'd like to check that this code isn't actually calling the provider status calls (eg. invoking There might also be opportunities for aligning this code better with the code that handles |
This draft PR fixes the issue described here: #3235. |
@yadudoc - I don't think it does? this PR doesn't fix any cache-based status update delays? I thought we tested a different (no-PR-number) patch? |
|
||
managers = self.connected_managers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes the earlier assignment redundant, yeah?
for (b_id, job_status) in provider_blocks.items(): | ||
|
||
# skip blocks that have already finished... | ||
if job_status.state in TERMINAL_STATES: | ||
continue | ||
|
||
if b_id not in block_info: | ||
block_info[b_id] = BlockInfo(tasks=0, idle=float('inf')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider_blocks
is a dictionary, yeah? If so, then b_id
is guaranteed to be unique through the loop. Coupled with the fact that block_info
is empty when the loop starts, the if-conditional is a waste. Could merge it all into the instantiation:
block_info = {
block_id: BlockInfo(tasks=0, idle=float('inf')
for block_id, job_status in provider_blocks.items()
if job_status.state not in TERMINAL_STATES
}
Later on, block_info
keys are set to zero on first use as well, so could make that structure a defaultdict
:
block_info: Dict[str, BlockInfo = defaultdict(lambda: BlockInfo(0, float('inf')))
for b_id, job_status in provider_blocks.items():
# skip blocks that have already finished...
if job_status.state in TERMINAL_STATES:
continue
# the defaultdict logic is now the authority on "empty BlockInfo" objects
block_info[b_id]
And could remove the if-branch in the following for-loop as well.
parsl.provider.* status() calls can be expensive, in the sense that they do things like execute batch system commands which system administrators dislike being run automatically and periodically, and even in some situations ban that behavior. So, Parsl tries to not do those calls too often - providers can define a status_polling_interval (which returns an interval in seconds) and the strategy code will call (through a chain of calls) the provider status method at most every status_polling_interval. Prior to this PR, this was implemented in the job status poller code, which makes the following chain of calls at most every status polling interval poller/strategy iterator => poll_item.poll (runs "often" - eg every 5 seconds driven by job status poller?) => determine if the rest of this stack should run, or otherwise use data cached as poll item._status attribute => executor.status() => provider.status() => (in the case of cluster provider, provider._status and LRM command line callout) There are a couple of issues related to this call stack: * #3235 - slow update of jobs that failed to submit - when a job fails to submit, at present it is recorded in a separate table, the _simulated_status dictionary, in BlockProviderExecutor. Entries added to that are not exposed to the scaling code until executor.status() is called, which may be many strategy iterations later. In that time, the scaling strategy make make the wrong decisions, based on incomplete information. That is the core of issue #3235 * #2627 - htex scale in needs to be aware of job status. Scale in code is not always driven in by the poller/strategy code; for example it executes at shutdown driven by the DataFlowKernel. That scale in code then is not in a position to make use of the cached data inside poll_item and it is more natural to call executor.status() to determine status - see PR #3232. That results in calls to provider.status() without any status_polling_interval rate limiting/caching. This current PR moves that rate limit/cache deeper into the stack to the BlockProviderExecutor. The job status poller code now polls the executor on every iteration, relying on the executor to perform rate limiting and caching. That has the effect of exposing simulated statuses to the scaling strategy code on each iteration, so that the delays in issue #3235 will not be experienced. This change also means that PR #3232 can safely call executor.status() without exceeding the rate limit. This PR changes when provider status gets called, because it is now a cache that refreshes whenever status is called past the expiry time, rather than more explicitly driven in a loop. That might change scaling behaviour, but I think not in any significant way. This PR changes the decision of when an executor is to be polled for status: previously non-positive values of status_polling_interval were used as magic values to inhibit provider status polling. Now, any executor with a provider will be polled. This PR makes the call stack above provider.status a little more data driven and less effectful (the effect being load on the LRM), which might make refactoring this stack a bit easier. However, it will also make that a bit more computation heavy - I think not much because usually there are not many blocks in existence. This PR adds a test that repeatedly calling BlockProviderExecutor.status does not result in an excessive number of calls to the provider status method but still updates eventually.
this is to accomodate future population of blockinfo from multiple sources how is this change tested? it *shouldn't* change anything. but scale_in is not well tested...
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232
this is to accomodate future population of blockinfo from multiple sources how is this change tested? it *shouldn't* change anything. but scale_in is not well tested...
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232
this is to accomodate future population of blockinfo from multiple sources how is this change tested? it *shouldn't* change anything. but scale_in is not well tested...
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232
this is to accomodate future population of blockinfo from multiple sources how is this change tested? it *shouldn't* change anything. but scale_in is not well tested...
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
This is to accomodate upcoming expansion of the block_info dictionary to include data from an additional source (the list of blocks known to the scaling code) This change arises from a review of PR #3232, an earlier prototype of that expansion work. Behaviourally, this shouldn't change anything: it changes invalid block accesses from key errors into blocks that are infinitely idle and completely unloaded. However there are no indexes into block_info except in code that is adding block information, so this situation should not arise.
This is to accomodate upcoming expansion of the block_info dictionary to include data from an additional source (the list of blocks known to the scaling code) This change arises from a review of PR #3232, an earlier prototype of that expansion work. Behaviourally, this shouldn't change anything: it changes invalid block accesses from key errors into blocks that are infinitely idle and completely unloaded. However there are no indexes into block_info except in code that is adding block information, so this situation should not arise.
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
…htex can now make use of that information source... so I can try to make a better implementation of PR #3232 i think this fixes 3232
Prior to this PR, the scale_in code for the HighThroughputExecutor will not scale in any block that has not had at least one manager register with the interchange, because it retrieves the list of blocks from the interchange. This is documented in issue #3232 This PR makes the htex scale in code also pay attention to blocks in the status_facade list - which includes blocks that have been submitted, and blocks which have been reported by the provider mechanism.
This PR will make the htex scale-in code consider blocks which have been launched through the provider mechanism but have not registered to the interchange. This is especially relevant at shutdown time, when previous blocks still waiting in the queue (and so, not registered) would be abandoned by parsl shutdown.
[first commit on this PR is a test that should break; when that happens, I'll push the rest; this is to get a demonstrated failure of test in CI]
Changed Behaviour
More blocks scaling in. Tidier shutdown.
Fixes
Fixes #2627
Type of change