Skip to content

Conversation

B4nan
Copy link
Member

@B4nan B4nan commented Nov 19, 2021

Runs given handler and rejects with the given errorMessage (or Error instance) after given timeoutMillis, unless the original promise resolves or rejects earlier. Use tryCancel() function inside the handler after each await to finish its execution early when the timeout appears.

import { addTimeoutToPromise, tryCancel } from '@apify/timeout';

async function doSomething() {
    await doSomethingTimeConsuming();
    tryCancel();
    await doSomethingElse();
    tryCancel();
}

const res = await addTimeoutToPromise(
  () => handler(),
  200,
  'Handler timed out after 200ms!',
);

This uses AsyncLocalStorage to hold the AbortController instances created inside the addTimeoutToPromise function. The context is shared for this package, and respects nesting of addTimeoutToPromise calls. This means it can be used in different packages to cancel the same nested handler, e.g. we can create the timeout in SDK and use tryCancel from inside browser pool (we can still handle the timeouts explicitly in browser pool, and it will automatically used the upper context if there is some0.

@szmarczak
Copy link

Do we need AsyncLocalStorage at all? I've tried

export async function addTimeoutToPromise(handler, timee, errorMessage) {
    const cancelTimeout = new AbortController();

    return Promise.race([
        (async () => {
            try {
                await setTimeoutAsync(timee, undefined, { signal: cancelTimeout.signal });
            } catch {
                return;
            }

            throw new Error(errorMessage);
        })(),
        handler.finally(() => {
            cancelTimeout.abort();
        }),
    ]);
}

and it works as well.

@B4nan
Copy link
Member Author

B4nan commented Nov 19, 2021

And how are you going to implement the early escape via tryCancel()? That is why we need ALS, so we don't have to maintain the AbortController instances (and we automatically handle parent contexts, even cross package ones, which is needed for our use case).

edit: I believe your version is basically just a half of the problem, you handle just cancelling of the timeout promise, our issue is with cancelling the handler itself. we need to escape the handler as early as possible to avoid sideeffects (like those double processed requests).

@szmarczak
Copy link

We could pass tryCancel as an argument to handler. But yeah, I can see AsyncLocalStorage being more feasible 👍

@szmarczak
Copy link

szmarczak commented Nov 19, 2021

we need to escape the handler as early as possible to avoid sideeffects (like those double processed requests).

There is still a chance that it will time out when sending the actual HTTP request (when updating the storage). I think that it would be better to pass signal instead.


However it's still possible that the server will acknowledge the payload. So we'd have to make an additional request to check whether the data has been set or not.

@B4nan
Copy link
Member Author

B4nan commented Nov 19, 2021

There is still a chance that it will time out when sending the actual HTTP request (when updating the storage). I think that it would be better to pass signal instead.

We still can (once got supports this, it's not there currently, right? not sure about pup/pw). The signal is exposed via storage.getStore()?.cancelTask.signal (we can add a shortcut for that if we need it). But what I am trying to solve now is not about http timeouts.

@szmarczak
Copy link

Ah, right. In this case it would look like

if (signal) {
	gotPromise.cancel();
	gotPromise.catch(() => {});
}

LGTM.


await new Promise((resolve, reject) => {
storage.run(context, () => {
Promise.race([timeout(), wrap()]).then(resolve, reject);
Copy link
Member

@mnmkng mnmkng Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do timeouts with Promise.race() and ended up with memory leaks, because even though the timeout fired and the race resolved, the other promise kept existing and it often held a lot of data, e.g. full page HTML. Please make sure this will not happen here. It was the reason to create the addTimeoutToPromise function in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it will happen as we are explicitly aborting them (also I can see issues with the eager promise execution which is not present here). I did test this locally directly with SDK and puppeteer crawler. Will do more extensive testing once its handled in SDK master.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do timeouts with Promise.race() and ended up with memory leaks, because even though the timeout fired and the race resolved, the other promise kept existing and it often held a lot of data

This promise has no right to exist and should be collected by garbage collector. AFAIK, hanging promises are generally safe in JS. If it leaks then probably it's a bug in V8.

Copy link
Member

@mnmkng mnmkng Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this issue apify/crawlee#263 Not sure if it's still relevant, but at that time, it was an absolutely critical fix.

Edit: The SO link does not seem to work anymore 😢 But I found the related Node.js issue nodejs/help#1544 It seems to be resolved, but who knows.

Comment on lines +44 to +50
const res = await addTimeoutToPromise(
() => handler(),
200,
'timed out',
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth using fake timers to ensure the tests aren't flaky under load. But it might not be possible in this complex scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets deal with that when it starts happening, it feels like it will be safer to test it works with real timeouts properly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @B4nan. We're using fake timers in tests in Got and some internal functions need to be patched. I spent hours trying to get them to work and in the end gave up. We can always run hundreds of small, real timeouts instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we were not using fake timers in the SDK and we had A LOT of flaky tests 😄 But, ok, let's wait what happens 👍

Runs given handler and rejects with the given errorMessage (or Error instance)
after given timeoutMillis, unless the original promise resolves or rejects earlier.
Use `tryCancel()` function inside the handler after each await to finish its execution
early when the timeout appears.

```ts
async function doSomething() {
    await doSomethingTimeConsuming();
    tryCancel();
    await doSomethingElse();
    tryCancel();
}

const res = await addTimeoutToPromise(
  () => handler(),
  200,
  'Handler timed out after 200ms!',
);
```
@B4nan B4nan merged commit 9a54214 into master Nov 22, 2021
@B4nan B4nan deleted the timeout branch November 22, 2021 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants