Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect when user calls global SDK methods from "side-effect free" context #2364

Closed
janbuchar opened this issue Feb 26, 2024 · 1 comment · Fixed by #2371
Closed

Detect when user calls global SDK methods from "side-effect free" context #2364

janbuchar opened this issue Feb 26, 2024 · 1 comment · Fixed by #2371
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@janbuchar
Copy link
Contributor

This is mainly motivated by the new AdaptivePlaywrightCrawler. If the user calls e.g. Actor.pushData instead of the respective context helper, the functionality falls flat (and the dataset will become corrupted on top of that).

We should find a way to emit a warning when the user does something like this from a request handler when an adaptive crawler is running (for example).

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 26, 2024
@janbuchar
Copy link
Contributor Author

janbuchar commented Feb 27, 2024

Proposed solution:

  • Add a checkStorageWrite helper to @crawlee/utils, plus an AsyncLocalStorage instance (e.g. storageWriteChecker) that contains a callback invoked by checkStorageWrite
  • When a request handler is called from AdaptivePlaywrightCrawler, wrap it in storageWriteChecker.run with a callback that just raises an exception - this prevents request handlers from writing into storage directly
    • commitResult is called outside of the storageWriteChecker
  • a checkStorageWrite call needs to be added to each method of MemoryStorage and ApifyClient
    • or this could be done in a more dynamic fashion (monkey patching) somewhere in StorageManager
    • just checking in storages.Dataset and the like should be enough, too, and it's probably the easiest option

janbuchar added a commit that referenced this issue Feb 29, 2024
…unwanted side effects in adaptive crawler (#2371)

- closes #2364

---------

Co-authored-by: Martin Adámek <banan23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
1 participant