Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Request Queue v2 #1975

Merged
merged 28 commits into from Sep 21, 2023
Merged

feat: Request Queue v2 #1975

merged 28 commits into from Sep 21, 2023

Conversation

vladfrangu
Copy link
Member

@vladfrangu vladfrangu commented Jul 12, 2023

Closes #1365

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few initial comments. it would be good to use a common predecessor, as now it's hard to see what actually changed in a 1000 LoC file that is mostly a copy of the v1 approach.

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_queue_v2.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_queue_v2.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_queue_v2.ts Outdated Show resolved Hide resolved
@B4nan
Copy link
Member

B4nan commented Aug 8, 2023

@drobnikj can you please take a look as well

@vladfrangu I also see the puppeteer e2e tests failed on platform due to a chrome version mismatch, any idea why is that? maybe its just flakyness (since the scheduled run passed just fine), but it feels pretty weird

@B4nan B4nan requested review from drobnikj and barjin August 8, 2023 14:54
@B4nan
Copy link
Member

B4nan commented Aug 8, 2023

We will also surely need some unit tests for this, I don't want you to rewrite all existing tests to support v2, but with e2e tests we won't have such a confidence (as they won't run on each PR).

Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that is a lot of code, looks good as a good start, on top of the comments I miss these parts from spec.:

  • If the actor is migrating(??aborting/exiting/failing), it should unlock all requests which locked
  • Ensuring that the queue is empty should wait for the lockSecs after it is empty to potential request lock can be expired.

packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_queue_v2.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_queue_v2.ts Outdated Show resolved Hide resolved
@vladfrangu
Copy link
Member Author

Ready for another pair of reviews! Still need to add some tests for this, but I'm more confident it should just work

Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the code is much easier to read 👍

I'm still not sure if the local cache would not mess queue in parallel execution, can we do a simple test where we run multiple scrapers on one large site with shared request queue?

packages/core/src/request.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_provider.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_provider.ts Outdated Show resolved Hide resolved
packages/core/src/storages/request_queue_v2.ts Outdated Show resolved Hide resolved
@vladfrangu vladfrangu marked this pull request as ready for review September 11, 2023 08:41
@github-actions github-actions bot added this to the 72nd sprint - Tooling team milestone Sep 14, 2023
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Sep 14, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

@vladfrangu
Copy link
Member Author

vladfrangu commented Sep 18, 2023

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

@B4nan B4nan added the tech roadmap Issues contributing to tech roadmap. label Sep 19, 2023
@B4nan
Copy link
Member

B4nan commented Sep 19, 2023

FYI for the PR toolkit to pass, every PR needs to have an estimate and a correct label (or be connected to an issue that has those - the link needs to be set via zenhub, its unfortunately not enough to provide the closes #123 part, at least not after the PR is created).

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits, I would like to test it once the testing version will be publish npm

Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits, I would like to test it once the testing version will be publish npm

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is good to go?

Copy link
Contributor

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End of an era, really 😄

Looks good to me, let's see how it runs

@Strajk
Copy link
Contributor

Strajk commented Sep 21, 2023

tenor-166287321

@B4nan B4nan merged commit 70a77ee into master Sep 21, 2023
10 of 11 checks passed
@B4nan B4nan deleted the feat/request-queue-v2 branch September 21, 2023 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tech roadmap Issues contributing to tech roadmap. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request queue v2 - locking
5 participants