Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last_in_text_node returns true at multiple positions in an element #93

Closed
jongiddy opened this issue Jun 27, 2021 · 3 comments
Closed

Comments

@jongiddy
Copy link
Collaborator

This code attempts to replace the last text node in a span:

fn main() {
    use lol_html::html_content::ContentType;
    use lol_html::{rewrite_str, text, RewriteStrSettings};

    let html = rewrite_str(
        r#"<span>Hello<b>hi</b></span>"#,
        RewriteStrSettings {
            element_content_handlers: vec![text!("span", |t| {
                if t.last_in_text_node() {
                    t.replace(" world", ContentType::Text);
                }
                Ok(())
            })],
            ..RewriteStrSettings::default()
        },
    )
    .unwrap();

    assert_eq!(html, r#"<span>Hello world<b>hi world</b></span>"#);
}

Since there's only one span in the input, I'd expected this code to replace text with " world" once, probably just before the </span> tag, but either of the two actual replacement points would also seem reasonable.

Replacing at two positions seems strange. Is this expected behavior? If yes, what determines if a node is last_in_text_node()? And is there a way to distinguish these two positions (maybe using a span > * rule)?

@jongiddy
Copy link
Collaborator Author

I managed to answer my last question. It is possible to use a span > * rule to fix this specific case:

fn main() {
    use lol_html::html_content::{ContentType, UserData};
    use lol_html::{rewrite_str, text, RewriteStrSettings};
    let html = rewrite_str(
        r#"<span>Hello<b>hi</b></span>"#,
        RewriteStrSettings {
            element_content_handlers: vec![
                text!("span > *", |t| {
                    t.set_user_data(true);
                    Ok(())
                }),
                text!("span", |t| {
                    if t.user_data().downcast_ref::<bool>().is_some() {
                        // ignore sub-elements
                        return Ok(());
                    }
                    if t.last_in_text_node() {
                        t.replace(" world", ContentType::Text);
                    }
                    Ok(())
                }),
            ],
            ..RewriteStrSettings::default()
        },
    )
    .unwrap();

    assert_eq!(html, r#"<span>Hello world<b>hi</b></span>"#);
}

But this still fires twice for "<span>Hello<b>hi</b>howdy</span>" to give "<span>Hello world<b>hi</b>howdy world</span>". So it doesn't seem possible in general to know when all the text in a tag has been received.

@jongiddy
Copy link
Collaborator Author

My current solution uses the start of the next tag (or the document end) to identify when all the text inside the tag (but not in sub-tags) has been read. This works because, for my actual use case, I don't need to modify the tag with the text. I just need to collect the text from the tag.

fn main() {
    use lol_html::html_content::UserData;
    use lol_html::{element, end, rewrite_str, text, RewriteStrSettings};
    use std::cell::RefCell;
    let buffer = RefCell::<Option<String>>::new(None);
    let html = rewrite_str(
        r#"<span>Hello<b>hi</b>howdy</span>"#,
        RewriteStrSettings {
            element_content_handlers: vec![
                element!("span *", |t| {
                    t.set_user_data(true);
                    Ok(())
                }),
                element!("*", |t| {
                    // The start of any element that is not a sub-element of
                    // a span indicates we reached the end of the previous span.
                    if t.user_data().downcast_ref::<bool>().is_some() {
                        // ignore sub-elements
                        return Ok(());
                    }
                    if let Some(s) = buffer.replace(None) {
                        dbg!(s);
                    }
                    Ok(())
                }),
                text!("span *", |t| {
                    t.set_user_data(true);
                    Ok(())
                }),
                text!("span", |t| {
                    if t.user_data().downcast_ref::<bool>().is_some() {
                        // ignore sub-elements
                        return Ok(());
                    }
                    buffer.replace(match buffer.replace(None) {
                        None => Some(t.as_str().to_owned()),
                        Some(mut s) => {
                            s.push_str(t.as_str());
                            Some(s)
                        }
                    });
                    Ok(())
                }),
            ],
            document_content_handlers: vec![end!(|_| {
                if let Some(s) = buffer.replace(None) {
                    dbg!(s);
                }
                Ok(())
            })],
            ..RewriteStrSettings::default()
        },
    )
    .unwrap();
}

Output:

[src/main.rs:22] s = "Hellohowdy"

This is pretty messy and repetitive. It could be nicer with:

For now, I'll close this issue as a duplicate but supporting case for #85.

@jongiddy
Copy link
Collaborator Author

jongiddy commented Jul 3, 2021

Using PR #97 I can simplify to:

fn main() {
    use lol_html::html_content::UserData;
    use lol_html::{element, rewrite_str, text, RewriteStrSettings};
    let buffer = std::rc::Rc::new(std::cell::RefCell::<Option<String>>::new(None));
    let html = rewrite_str(
        r#"<span>Hello<b>hi</b>howdy</span>"#,
        RewriteStrSettings {
            element_content_handlers: vec![
                element!("span", |el| {
                    let buffer = buffer.clone();
                    el.on_end_tag(move |_| {
                        if let Some(s) = buffer.replace(None) {
                            dbg!(s);
                        }
                        Ok(())
                    })?;
                    Ok(())
                }),
                text!("span *", |t| {
                    t.set_user_data(true);
                    Ok(())
                }),
                text!("span", |t| {
                    if t.user_data().is::<bool>() {
                        // ignore sub-elements
                        return Ok(());
                    }
                    buffer.replace(match buffer.replace(None) {
                        None => Some(t.as_str().to_owned()),
                        Some(mut s) => {
                            s.push_str(t.as_str());
                            Some(s)
                        }
                    });
                    Ok(())
                }),
            ],
            ..RewriteStrSettings::default()
        },
    )
    .unwrap();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant