Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On worker error, wait for worker initialization if one or more are initializing before sending request #5629

Closed
mhoeger opened this issue Feb 11, 2020 · 3 comments · Fixed by #5668
Assignees
Labels

Comments

@mhoeger
Copy link
Contributor

mhoeger commented Feb 11, 2020

We should not send requests while no worker channels are available AND we're currently trying to start one up. Kindof similar to #5624 , but in this case no worker channels are available and were NOT currently trying to start one (because things are shutting down).

In one case, I see a worker process executing find and then run into an uncaught exception (publish WorkerErrorEvent). We dispose the worker channel and go through "Initiating worker Process start up".

The Function Invocation Buffer and all is set up. However, it hasn't started yet and isn't found - likely due to this code:

IEnumerable<IRpcWorkerChannel> initializedWorkers = workerChannels.Where(ch => ch.State == RpcWorkerChannelState.Initialized);

@ghost ghost assigned ankitkumarr Feb 11, 2020
@mhoeger mhoeger assigned mhoeger and unassigned ankitkumarr Feb 11, 2020
@mhoeger mhoeger added this to the Functions Sprint 69 milestone Feb 11, 2020
@mhoeger mhoeger changed the title Wait for worker initialization if one or more are initializing before sending request On worker error, wait for worker initialization if one or more are initializing before sending request Feb 12, 2020
@pragnagopa
Copy link
Member

cc @yojagad
Ideally host should never send requests if there are no language workers. Following code:

private async Task DelayUntilFunctionDispatcherInitializedOrShutdown()
{
if (_functionDispatcher != null && _functionDispatcher.State == FunctionInvocationDispatcherState.Initializing)
{
_logger.LogDebug($"functionDispatcher state: {_functionDispatcher.State}");
bool result = await Utility.DelayAsync((_functionDispatcher.ErrorEventsThreshold + 1) * WorkerConstants.ProcessStartTimeoutSeconds, WorkerConstants.WorkerReadyCheckPollingIntervalMilliseconds, () =>
{
return _functionDispatcher.State != FunctionInvocationDispatcherState.Initialized;
});
if (result)
{
_logger.LogError($"Final functionDispatcher state: {_functionDispatcher.State}. Initialization timed out and host is shutting down");
_applicationLifetime.StopApplication();
}
}
}

Waits till atlease one language worker starts. If starting worker continues to fail, functions host is shutdown.

Needs more investigation.

@mhoeger
Copy link
Contributor Author

mhoeger commented Feb 12, 2020

The code above only works if FunctionInvocationDispatcherState.Initializing. Maybe if we are down to 0 workers, we set dispatcher state back to "Initializing".

Each language worker channel has an action buffer that delays requests until they're ready too though:

var disposableLink = _functionInputBuffers[loadResponse.FunctionId].LinkTo(invokeBlock);

If we remove this check,

IEnumerable<IRpcWorkerChannel> initializedWorkers = workerChannels.Where(ch => ch.State == RpcWorkerChannelState.Initialized);

requests will go into that queue. One potential error case is if the initializing worker ends up failing to start and we don't modify code to keep those queued requests from being lost or ending in error.

I think either the approach would resolve these issues though

@pragnagopa
Copy link
Member

Thanks for more details. Check to verify the status of worker channel:

IEnumerable<IRpcWorkerChannel> initializedWorkers = workerChannels.Where(ch => ch.State == RpcWorkerChannelState.Initialized);

is needed to avoid sending requests even before buffers are setup.

Resetting FunctionDispatcherState is the right approach!

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.