Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unify parseWithCheerio, add parseWithLinkeDOM [WIP] #1890

Closed
wants to merge 1 commit into from

Conversation

Strajk
Copy link
Contributor

@Strajk Strajk commented May 3, 2023

Just draft 👷‍♂️

Some issues with this approach:

  • Don't know how to type parseWith* in HttpCrawler without importing cheerio & linkedom
  • Don't know how to handle enqueueLinks, which is very convenient to have :(

@B4nan
Copy link
Member

B4nan commented May 3, 2023

Don't know how to type parseWith* in HttpCrawler without importing cheerio & linkedom

For cheerio it's fine to add it to the http crawler package dependencies, it is technically already there as a transitive one. Not so much for linkedom 👇

Don't know how to handle enqueueLinks, which is very convenient to have :(

Hmm, that's actually a tricky one to do in the base layer, as it requires the parsed response for finding the links. So maybe the idea with the helper won't fly - but I don't mind having the linkedom crawler as a class with its enqueue implementation, like the other crawlers do. So basically what you were trying in the first place (but with a lot of copypasta I guess, haven't really checked the code tbh).

In the end, the helper for parsing with linkedom would be quite a weird thing outside of the http crawler.

Let's make this PR just about adding the cheerio helper to http crawler, that's what we can merge first easily.

@Strajk
Copy link
Contributor Author

Strajk commented May 3, 2023

having the linkedom crawler as a class

TBH, I did not want to do that initially (as I mentioned, cheerio was "fine" for me), but after using it I came about to liking it :)

Agree! 🙏

B4nan added a commit that referenced this pull request May 9, 2023
)

As discussed in #1890

---------

Co-authored-by: Martin Adámek <banan23@gmail.com>
B4nan added a commit that referenced this pull request Jun 7, 2023
The clean approach of #1879
as discussed
#1890 (comment)

2023-05-09 16:05 Rebased

---------

Co-authored-by: Martin Adámek <banan23@gmail.com>
@Strajk
Copy link
Contributor Author

Strajk commented Jun 7, 2023

Closing #1906

@Strajk Strajk closed this Jun 7, 2023
@Strajk Strajk deleted the feat/parseWithLinkeDOM branch June 7, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants