Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(KeyValueStore): big buffers should not crash #1734

Merged
merged 2 commits into from
Jan 12, 2023

Conversation

vladfrangu
Copy link
Member

Closes #1732
Closes #1710

Simple fix really, tell shapeshift inputs can be buffers too (I'll also see if we can add a generic "just check that its an object" in shapeshift for times like these)

@corford
Copy link
Contributor

corford commented Jan 11, 2023

Simple fix really, tell shapeshift inputs can be buffers too (I'll also see if we can add a generic "just check that its an object" in shapeshift for times like these)

@vladfrangu Have you confirmed this also fixes the performance issues? In our issue (#1710) we experienced horrendous perf problems from shapeshift handling Buffers (to the extent it started blocking Node's event loop). Asking as I see the buffer test in this PR is also showing perf issues.

@vladfrangu
Copy link
Member Author

Its most likely that you're having those performance issues due to shapeshift interpreting the buffer as an object, and trying to parse its fields (given that a buffer is basically an array of bytes, you can guess what happens when size is big)

This PR should shortcircuit that check since any buffer is an instance of a buffer, and thus speed up tremendously... That said, if you can test it yourself that would be awesome! I'll try it out locally too)

We'll test locally
@B4nan B4nan merged commit 2f682f7 into master Jan 12, 2023
@B4nan B4nan deleted the fix/buffers-causing-crashes-when-saving-to-kvs branch January 12, 2023 15:08
@Umisyus
Copy link

Umisyus commented Jan 14, 2023

Thanks, glad you worked out the solution quickly. Have a gr8 year everyone!

@corford
Copy link
Contributor

corford commented Mar 22, 2023

@vladfrangu @B4nan Some bad news: we've just tested this (via Crawle v3.3.0) and the issue persists. It's slightly less pervasive than prior to this PR but still present (affects about 40% of our spiders, with same symptom as before: pegged event loop, causing degraded request throughput and in many cases spider crash).

Workaround for us has been to apply the following patch to node_modules/@crawlee/memory-storage/resource-clients/key-value-store.js:

@@ -192,11 +192,13 @@
         return record;
     }
     async setRecord(record) {
-        shapeshift_1.s.object({
-            key: shapeshift_1.s.string.lengthGreaterThan(0),
-            value: shapeshift_1.s.union(shapeshift_1.s.null, shapeshift_1.s.string, shapeshift_1.s.number, shapeshift_1.s.instance(Buffer), shapeshift_1.s.object({}).passthrough),
-            contentType: shapeshift_1.s.string.lengthGreaterThan(0).optional,
-        }).parse(record);
+        if (!Buffer.isBuffer(record?.value)) {
+            shapeshift_1.s.object({
+                key: shapeshift_1.s.string.lengthGreaterThan(0),
+                value: shapeshift_1.s.union(shapeshift_1.s.null, shapeshift_1.s.string, shapeshift_1.s.number, shapeshift_1.s.instance(Buffer), shapeshift_1.s.object({}).passthrough),
+                contentType: shapeshift_1.s.string.lengthGreaterThan(0).optional,
+            }).parse(record);
+        }
         // Check by id
         const existingStoreById = await (0, cache_helpers_1.findOrCacheKeyValueStoreByPossibleId)(this.client, this.name ?? this.id);
         if (!existingStoreById) {
@@ -204,8 +206,8 @@
         }
         const { key } = record;
         let { value, contentType } = record;
-        const valueIsStream = (0, utils_1.isStream)(value);
-        const isValueStreamOrBuffer = valueIsStream || (0, utils_1.isBuffer)(value);
+        const valueIsStream = Buffer.isBuffer(value) ? false : (0, utils_1.isStream)(value);
+        const isValueStreamOrBuffer = valueIsStream || Buffer.isBuffer(value);
         // To allow saving Objects to JSON without providing content type
         if (!contentType) {
             if (isValueStreamOrBuffer)

Applying the patch immediately resolved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants