New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selector doesn't work with newline after #76
Comments
I can't get that html to even parse. Are you sure that's what you used to trigger the issue? |
That's a minimal example. I don't know that's the issue, but that appears to be what's separating tags it finds vs ones it ignores. Example link it finds: <a href="https://github.com"> Example link it doesn't find: <a
href="https://github.com"> |
That seems to work.
fn main() {
let html = r#"<a
href="https://github.com">"#;
println!("Raw HTML: {:?}", html);
let document = scraper::Html::parse_document(html);
let a_sel = scraper::Selector::parse("a").unwrap();
for el in document.select(&a_sel) {
println!("{}", el.html());
}
} Output:
|
Hmm. I'll dig deeper and report back; that's equivalent to the code I'm having trouble with |
Hi - Sorry about the late reply. I have tried several troubleshooting approaches, and have not been able to narrow this down. I can provide this case to reproduce it: It will correctly pull the links at the header and footer of the page, but none of the articles linked in the middle will show up using the 'a' selector. |
I can't reproduce that. fn main() {
let url = "https://www.anyleaf.org/blog";
let html = ureq::get(url).call().unwrap().into_string().unwrap();
println!("Raw HTML: {:?}", html);
let document = scraper::Html::parse_document(&html);
let a_sel = scraper::Selector::parse("a").unwrap();
for el in document.select(&a_sel) {
println!("{}", el.html());
}
}
[package]
name = "scraper-issue-76"
version = "0.0.0"
edition = "2021"
[dependencies]
scraper = "0.13.0"
ureq = "2.4.0" Output:
|
Thanks for looking! Not sure what's up. I'll work between your code and mine and see where the disconnect is. |
I've also added a test for this (#82) so I'm reasonably confident it's not a bug. Please do let us know if this remains a problem. |
document.select
withSelector::parse
is not working when there's a newline directly after the the tag.Code:
HTML example that triggers this:
<a href="...")"
When printing these affected elements:
Other elements in the query that are of the form
Element(<a href="\\\"/...
don't trigger this problem. Happy for a workaround in the meanwhile.The text was updated successfully, but these errors were encountered: