Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Web Crawler doesn't include search params in URLs #2300

Merged

Conversation

ahmosman
Copy link
Contributor

@ahmosman ahmosman commented May 1, 2024

There is an issue with Puppeter, Playwright and Cherioo Web Scrappers, they don't include search params from URLs.

E. g. I'd like to scrap URL https://wiki.minthcm.org/index.php?title=Process:Running_on_Docker and in te result this URL which is scrapped is truncated to https://wiki.minthcm.org/index.php. Current implemetation truncates all the parameters after "?".

Provided fix will cover this issue.

Here is the flow I was using
web-crawler-flow

@HenryHengZJ
Copy link
Contributor

awesome thanks @ahmosman !

for future reference:
const urlObj = new URL('https://example.org/abc?123')

urlObj.hostname = example.org

urlObj.pathname= /abc

urlObj.search= ?123

@HenryHengZJ HenryHengZJ merged commit 2254d16 into FlowiseAI:main May 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants