New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adaptive playwright crawler #2316
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/rendering-type-prediction.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/rendering-type-prediction.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
packages/playwright-crawler/src/internals/rendering-type-prediction.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, I haven't checked this in a while... good job!
Given we mark this as experimental, I have no problem merging + releasing it and maybe having the marketing team make some promo for this so we get some real-life feedback.
I'm just still a bit unsure about the portadom
- IMO it's good for the experiment right now, but 6 month stale package with no other dependents than us... just scares me a bit 😄
Same |
(I'm sure there was some reason, but it was too long ago, sorry) - why don't we use |
|
A-ha, I see now... clever! It's a bit shame that there is a new API now (I thought we could switch between crawlers just by changing the constructor name), but given the circumstances, this is imo a good compromise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i got just few nits and documentation notes, plus:
- lets remove
portadom
now if it's not tightly coupled to the implementation anyhow and we have a simple but good enough alternative - we might want to move the
querySelector
to all the crawlers just like we have theparseWithCheerio
(can happen later once we see this is the right direction) - as you mentioned yourself, the jsdoc for the adaptive crawler needs some work before we merge
- ran the e2e tests and they are passing, would be nice to have one for the adaptive crawling too
packages/playwright-crawler/src/internals/adaptive-playwright-crawler.ts
Outdated
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/utils/rendering-type-prediction.ts
Show resolved
Hide resolved
packages/playwright-crawler/src/internals/utils/rendering-type-prediction.ts
Outdated
Show resolved
Hide resolved
btw can you fix the lock file so the tests can run here? |
d24d78b
to
0e5eacd
Compare
so did you modify the lock file by hand or what was it about? :D |
rebased on master, dropped the original yarn-updating commit, ran |
This uses the newly added restricted crawling contexts to execute request handlers. This allows us to compare browser and http-only request handler runs for a request and switch to http-only crawling on sites that we predict to be static. More information will be added here.
The intended usage is as follows:
When handling a request from the queue, the crawler