Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enqueueLinks only checks the strategy on the input URLs but they can redirect outside of the domain #2173

Closed
1 task
B4nan opened this issue Nov 8, 2023 · 0 comments · Fixed by #2238
Closed
1 task
Assignees
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@B4nan
Copy link
Member

B4nan commented Nov 8, 2023

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/core

Issue description

When you call enqueueLinks on page that has links on the domain, but redirect outside of it, we end up with the request added to the queue and processed.

Example URL with redirect: https://www.menicka.cz/redirect.php?w=akce&id=f1ab8ae200bddaa17fd50150943d1e06

We should probably store the used strategy in the userData (into the internal __crawlee object) and check this again after the navigation with the request.loadedUrl.

Code sample

No response

Package version

3.5.8

Node.js version

20

Operating system

No response

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@B4nan B4nan added the bug Something isn't working. label Nov 8, 2023
@gippy gippy added the t-tooling Issues with this label are in the ownership of the tooling team. label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants