Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ package | version
`@apify/markdown` | [![NPM version](https://img.shields.io/npm/v/@apify/markdown.svg)](https://www.npmjs.com/package/@apify/markdown)
`@apify/salesforce_client` | [![NPM version](https://img.shields.io/npm/v/@apify/salesforce_client.svg)](https://www.npmjs.com/package/@apify/salesforce_client)
`@apify/utilities` | [![NPM version](https://img.shields.io/npm/v/@apify/utilities.svg)](https://www.npmjs.com/package/@apify/utilities)
`@apify/timeout` | [![NPM version](https://img.shields.io/npm/v/@apify/timeout.svg)](https://www.npmjs.com/package/@apify/timeout)

Internal utilities and constants shared across <a href="https://www.apify.com">Apify</a> projects.
Unless you work at Apify ([interested?](https://apify.com/jobs)), you shouldn't use this package directly,
Expand Down
36 changes: 36 additions & 0 deletions packages/timeout/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"name": "@apify/timeout",
"version": "0.1.0",
"description": "Helper for adding timeouts to promises that support easy cancellation.",
"main": "dist/index.js",
"typings": "dist/index.d.ts",
"keywords": [
"apify"
],
"author": {
"name": "Apify",
"email": "support@apify.com",
"url": "https://apify.com"
},
"contributors": [
"Martin Adamek <martin@apify.com>"
],
"license": "Apache-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/apify/apify-shared-js"
},
"bugs": {
"url": "https://github.com/apify/apify-shared-js/issues"
},
"homepage": "https://apify.com",
"scripts": {
"build": "npm run clean && npm run compile && npm run copy",
"clean": "rimraf ./dist",
"compile": "tsc -p tsconfig.build.json",
"copy": "ts-node -T ../../scripts/copy.ts"
},
"publishConfig": {
"access": "public"
}
}
111 changes: 111 additions & 0 deletions packages/timeout/src/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
// eslint-disable-next-line max-classes-per-file
import { setTimeout } from 'timers/promises';
import { AsyncLocalStorage } from 'async_hooks';

export interface AbortContext {
cancelTimeout: AbortController;
cancelTask: AbortController;
}

/**
* `AsyncLocalStorage` instance that is used for baring the AbortContext inside user provided handler.
* We can use it to access the `AbortContext` instance via `storage.getStore()`, and there we can access
* both `cancelTimeout` and `cancelTask` instances of `AbortController`.
*/
export const storage = new AsyncLocalStorage<AbortContext>();

/**
* Custom error class that will be used for timeout error.
*/
export class TimeoutError extends Error {
}

/**
* Custom error class to handle `tryCancel()` checks.
* This should not be exposed to user land, as it will be caught in.
*/
class InternalTimeoutError extends TimeoutError {
}

/**
* Checks whether we are inside timeout handler created by this package, and cancels current
* task execution by throwing `TimeoutError`. This error will be ignored if the promise timed
* out already, or explicitly skipped in `addTimeoutToPromise`.
*
* Use this function after every async call that runs inside the timeout handler:
*
* ```ts
* async function doSomething() {
* await doSomethingTimeConsuming();
* tryCancel();
* await doSomethingElse();
* tryCancel();
* }
* ```
*/
export function tryCancel(): void {
const signal = storage.getStore()?.cancelTask.signal;

if (signal?.aborted) {
throw new InternalTimeoutError('Promise handler has been canceled due to a timeout');
}
}

/**
* Runs given handler and rejects with the given `errorMessage` (or `Error` instance)
* after given `timeoutMillis`, unless the original promise resolves or rejects earlier.
* Use `tryCancel()` function inside the handler after each await to finish its execution
* early when the timeout appears.
*
* ```ts
* const res = await addTimeoutToPromise(
* () => handler(),
* 200,
* 'Handler timed out after 200ms!',
* );
* ```
*/
export async function addTimeoutToPromise<T>(handler: () => Promise<T>, timeoutMillis: number, errorMessage: string | Error): Promise<T> {
// respect existing context to support nesting
const context = storage.getStore() ?? {
cancelTimeout: new AbortController(),
cancelTask: new AbortController(),
};
let returnValue: T;

// calls handler, skips internal `TimeoutError`s that might have been thrown
// via `tryCancel()` and aborts the timeout promise after the handler finishes
const wrap = async () => {
try {
returnValue = await handler();
} catch (e) {
if (!(e instanceof InternalTimeoutError)) {
throw e;
}
} finally {
context.cancelTimeout.abort();
}
};

const timeout = async () => {
try {
await setTimeout(timeoutMillis, undefined, { signal: context.cancelTimeout.signal });
context.cancelTask.abort();
} catch (e) {
// ignore rejections (task finished and timeout promise has been cancelled)
return;
}

throw errorMessage instanceof Error
? errorMessage
: new TimeoutError(errorMessage);
};

await new Promise((resolve, reject) => {
storage.run(context, () => {
Promise.race([timeout(), wrap()]).then(resolve, reject);
Copy link
Member

@mnmkng mnmkng Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do timeouts with Promise.race() and ended up with memory leaks, because even though the timeout fired and the race resolved, the other promise kept existing and it often held a lot of data, e.g. full page HTML. Please make sure this will not happen here. It was the reason to create the addTimeoutToPromise function in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it will happen as we are explicitly aborting them (also I can see issues with the eager promise execution which is not present here). I did test this locally directly with SDK and puppeteer crawler. Will do more extensive testing once its handled in SDK master.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do timeouts with Promise.race() and ended up with memory leaks, because even though the timeout fired and the race resolved, the other promise kept existing and it often held a lot of data

This promise has no right to exist and should be collected by garbage collector. AFAIK, hanging promises are generally safe in JS. If it leaks then probably it's a bug in V8.

Copy link
Member

@mnmkng mnmkng Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this issue apify/crawlee#263 Not sure if it's still relevant, but at that time, it was an absolutely critical fix.

Edit: The SO link does not seem to work anymore 😢 But I found the related Node.js issue nodejs/help#1544 It seems to be resolved, but who knows.

});
});

return returnValue!;
}
7 changes: 7 additions & 0 deletions packages/timeout/tsconfig.build.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"extends": "../../tsconfig.build.json",
"compilerOptions": {
"outDir": "./dist"
},
"include": ["src/**/*"]
}
4 changes: 4 additions & 0 deletions packages/timeout/tsconfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"extends": "../../tsconfig.json",
"include": ["src/**/*"]
}
91 changes: 91 additions & 0 deletions test/timeout.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
describe('timeout with abort controller', () => {
it('empty', async () => {
// empty test so jest won't fail in node < 15 due to no tests being executed
});

const [nodeVersion] = process.versions.node.split('.', 1);

if (+nodeVersion < 15) {
// skip tests as this package requires node 15+
return;
}

// we need to import via require after the above check to not fail on node < 15
// eslint-disable-next-line @typescript-eslint/no-var-requires,global-require
const { addTimeoutToPromise, tryCancel } = require('@apify/timeout');
// eslint-disable-next-line @typescript-eslint/no-var-requires,global-require
const { setTimeout } = require('timers/promises');

let position = 0;

async function handler() {
await setTimeout(30);
tryCancel();
position++;
await setTimeout(30);
tryCancel();
position++;
await setTimeout(30);
tryCancel();
position++;
await setTimeout(30);
tryCancel();
position++;
await setTimeout(30);
tryCancel();
position++;

return 123;
}

beforeEach(() => {
position = 0;
});

it('passes without timeouting', async () => {
const res = await addTimeoutToPromise(
() => handler(),
200,
'timed out',
);
Comment on lines +46 to +50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth using fake timers to ensure the tests aren't flaky under load. But it might not be possible in this complex scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets deal with that when it starts happening, it feels like it will be safer to test it works with real timeouts properly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @B4nan. We're using fake timers in tests in Got and some internal functions need to be patched. I spent hours trying to get them to work and in the end gave up. We can always run hundreds of small, real timeouts instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we were not using fake timers in the SDK and we had A LOT of flaky tests 😄 But, ok, let's wait what happens 👍

expect(res).toBe(123);
expect(position).toBe(5);
});

it('timeouts in the middle', async () => {
await expect(addTimeoutToPromise(
() => handler(),
100,
'timed out',
)).rejects.toThrowError();
expect(position).toBe(3);
});

it('timeouts with given error instance', async () => {
const err = new Error('123');
await expect(addTimeoutToPromise(
() => handler(),
100,
err,
)).rejects.toThrowError(err);
expect(position).toBe(3);
});

it('timeouts with nesting', async () => {
// this will timeout and cause failure, but it will happen sooner than in 200ms, so err 1 will be thrown
async function handler2() {
await addTimeoutToPromise(
() => handler(),
100,
'err 1',
);
}

await expect(addTimeoutToPromise(
() => handler2(),
200,
'err 2',
)).rejects.toThrowError('err 1');
expect(position).toBe(3);
});
});