fix: change sequential approach to parallel for Iterator first page#3402
Conversation
25 sounds all right to me. But if you wanna be safe, you can ask someone from other teams on slack I guess. |
|
Now I see you used this only in the memory storage, in that case 25 sounds all right and it makes no sense to raise this on slack, since we are not firing any API requests at all, right? |
true 👍 |
barjin
left a comment
There was a problem hiding this comment.
Looks pretty straightforward, thank you @l2ysho !
One note: this doesn't (fully) solve #3395, as the iterator still waits for the full page to be loaded before yielding the first item. This is what I'm describing in #3395 (comment).
Nonetheless, this is a step forward. 👍
| const limiter = pLimit(GET_RECORD_CONCURRENCY); | ||
| const results = await Promise.allSettled( | ||
| keysToFetch.map((item) => | ||
| limiter(() => getRecord(item.key).then((record) => ({ key: item.key, record }))), |
There was a problem hiding this comment.
nit: record is { key:string, value: any } already, maybe you can just pass record here?
crawlee/packages/types/src/storages.ts
Lines 124 to 127 in 136d368
yop, I should get rid of IEFF and figure out some lazy loading for the first page 🤔 But it started to be complex to read even without this. I will merge this as is and we can iterate it later. |
Agreed, please make/reopen an issue for that (the original one got closed by this PR). |
Batch concurrent record fetching in KVS
values()andentries()Promise pathSummary
The
values()andentries()methods onKeyValueStoreClientpreviously fetched records sequentially in their eager Promise path (used when callersawaitthe result directly). For stores with many keys, this meant eachgetRecord()call blocked before the next could start. This change replaces the sequential loop withPromise.allSettledgated byp-limit(concurrency of 25), fetching records in parallel while preserving order and gracefully skipping missing keys.closes #3395