Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handlers for when the rewriting is finished. #27

Merged
merged 7 commits into from Nov 26, 2019
Merged

Conversation

gabi-250
Copy link
Contributor

@gabi-250 gabi-250 commented Nov 11, 2019

This should enable users to add a handler that is called after the rewriter finishes.

@gabi-250 gabi-250 requested a review from a team as a code owner November 11, 2019 18:27
@gabi-250 gabi-250 changed the title Allow handlers for the end of parsing. Add handlers for when the rewriting is finished. Nov 11, 2019
@@ -37,6 +37,10 @@ fn main() {
}
),
],
end_handlers: vec![end!(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably a part of document_content_handlers as it's basically a handler for the end of the document content.

@@ -8,6 +8,7 @@ pub type CommentHandler<'h> = Box<dyn FnMut(&mut Comment) -> HandlerResult + 'h>
pub type TextHandler<'h> = Box<dyn FnMut(&mut TextChunk) -> HandlerResult + 'h>;
pub type ElementHandler<'h> = Box<dyn FnMut(&mut Element) -> HandlerResult + 'h>;
pub type EndTagHandler<'h> = Box<dyn FnOnce(&mut EndTag) -> HandlerResult + 'h>;
pub type EndHandler<'h> = Box<dyn FnMut() -> HandlerResult + 'h>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be FnOnce as it will be called only once in the end. Also, note that we'd like to have an ability to emit content in the end of the document as well. We'd like to follow the same API as for other rewritable units: it should be possible to emit content by multiple function calls with different types of content. So, we need to provide DocumentEnd rewritable unit which exposes just a single method append.

@sejoker sejoker self-requested a review November 13, 2019 16:42
@gabi-250 gabi-250 force-pushed the add-end-handlers branch 3 times, most recently from a63116a to cb5f31a Compare November 15, 2019 15:06
@gabi-250
Copy link
Contributor Author

@inikulin I think I've addressed your comments. I added an example that uses the handler (examples/on_end_append). I've had to make content_to_bytes pub(crate), because I used it in the DocumentEnd rewritable unit to implement append:

https://github.com/cloudflare/cool-thing/blob/cb5f31a1b7eb98e4e34c1540df614e5d9e5fef82/src/rewritable_units/document_end.rs#L21-L24

@gabi-250
Copy link
Contributor Author

Once this is reviewed, I am happy to update the tests and documentation where applicable. Also, @xtuc suggested implementing an error handler (which would be needed in order for cool-thing to fully support my usecase). If you want to support error handlers, I can implement that as part of this PR as well.

@inikulin
Copy link
Contributor

@gabi-250 We have everything in place for the error handler in cool-thing already. We need to implement in workers code only.

Copy link
Contributor

@inikulin inikulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need C API implementation and tests

};

// Create the rewriter
let mut rewriter = HtmlRewriter::try_new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example. However, we try to use examples directory to show how to implement some real world scenarios. So, it's worth changing it to do something more useful, or just remove it.

@@ -13,7 +13,7 @@ pub enum ContentType {
}

#[inline]
fn content_to_bytes(
pub(crate) fn content_to_bytes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be pub(super)


self.transform_controller
.handle_end(&mut document_end)
.expect("finish failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't panic. We need to propagate error to the rewriter's end() return result


#[inline]
pub fn append(&mut self, content: &str, content_type: ContentType) {
content_to_bytes(content, content_type, self.encoding, self.output_handler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent with other append functions - here it appends content to the previously inserted content, whereas, in all other cases, consequent calls to append insert content right after the rewritable unit.

Therefore, I suggest to not leak output sink in the rewritable unit and just collect appended content, implement Serialize on the unit and use it in the dispatcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think subsequent calls are meant to append to the previously inserted content (like the append of Element), so I think it should be fine to append the content on the fly using content_to_bytes.

I could implement Serialize for DocumentEnd, but I would be duplicating the implementation of content_to_bytes.

encoding: &'static Encoding,
}

impl<'a> DocumentEnd<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is required

@@ -299,6 +308,13 @@ macro_rules! doc_comments {
};
}

#[macro_export(local_inner_macros)]
macro_rules! end {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the macro used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to be used by the users of the API (like in the example I removed in eedcc36).

self.transform_controller
.handle_end(&mut document_end)
.expect("finish failed");
let result = self.transform_controller.handle_end(&mut document_end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should propagate the error immediately and don't perform any further actions

self.flush_remaining_input(input, input.len());

let output_sink = &mut self.output_sink;
let mut output_sink_handle_chunk = |c: &[u8]| output_sink.handle_chunk(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just store mutable reference for the output_sink in DocumentEnd, so we won't need intermediate closure

@gabi-250 gabi-250 force-pushed the add-end-handlers branch 3 times, most recently from 919febd to 80494cb Compare November 19, 2019 16:37
@inikulin
Copy link
Contributor

@gabi-250 Rust part lgtm, thanks. Now we need to test it properly and add C API for it.

@gabi-250 gabi-250 force-pushed the add-end-handlers branch 3 times, most recently from 57aab30 to 8626f65 Compare November 20, 2019 13:46
@gabi-250
Copy link
Contributor Author

Ready for re-review.

I added tests for the document end handler in src/rewritable_units/document_end.rs and in src/rewriter/mod.rs

I added the cool_thing_doc_end_append function to the C API, as well as a couple of tests for it.

I've had to make a slight change to the impl_replace_byte macro, because on some inputs, the output sink would receive an empty chunk before the end of the input stream:
https://github.com/cloudflare/cool-thing/blob/857da3a4f021d0f2f53cf6e8f5761c1e9bf0ce87/src/base/bytes.rs#L80-L84

$output_handler(&tail[..pos]);
// If pos is 0, tail[..pos] will be empty, thus erroneously interpreted
// as a finalizing chunk.
if pos != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! We can refactor it further:

let chunk = &tail[..pos];

if !chunk.is_empty() {
    $output_handler(chunk);
}

This way the code will be self descriptive as we obviously don't need to invoke the handler for empty chunks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you won't need the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be nice to have a regression test for this as this is obviously a bug that can cause abrupt responses in workers.

Copy link
Contributor Author

@gabi-250 gabi-250 Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing an empty chunk this way doesn't actually cause an error (except in the tests), since end is not called twice. The only reason I spotted this is because the test output sink checks if an empty chunk is received twice (it caused one of my tests to fail):
https://github.com/cloudflare/cool-thing/blob/ab462cbad4e6ac05d0c00a74f5a562eb6beae20c/src/lib.rs#L118-L128

Regardless, I agree it's a bug. I'll write a regression test.

@@ -529,6 +529,67 @@ mod tests {
}
}

#[test]
fn append_at_the_end() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved to document_end.rs, as tests here are for general cases, not specific to a particular rewritable unit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed these altogether, since the tests in document_end.rs are almost exactly the same.

@@ -150,7 +155,8 @@ void cool_thing_rewriter_builder_add_document_content_handlers(
cool_thing_comment_handler_t comment_handler,
void *comment_handler_user_data,
cool_thing_text_handler_handler_t text_handler,
void *text_handler_user_data
void *text_handler_user_data,
cool_thing_doc_end_handler_t doc_end_handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need user data for this handler as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -651,6 +657,13 @@ void cool_thing_element_user_data_set(
// Returns user data attached to the text chunk.
void *cool_thing_element_user_data_get(const cool_thing_element_t *element);

int cool_thing_doc_end_append(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@inikulin inikulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Thanks!

@inikulin
Copy link
Contributor

@gabi-250 this requires lol-rebase now

@gabi-250 gabi-250 force-pushed the add-end-handlers branch 4 times, most recently from 86a2b70 to 428db19 Compare November 26, 2019 16:36
* Make the end handler a document content handler.
* Allow appending at the end of the document.
* Append chunks on the fly.
* Remove unneeded example.
* Propagate error to `end()`.
* Propagate the error immediately.
* Store a reference to the OutputSink in DocumentEnd.
@gabi-250
Copy link
Contributor Author

@inikulin done

@inikulin inikulin merged commit 1aedf36 into master Nov 26, 2019
@inikulin inikulin deleted the add-end-handlers branch November 26, 2019 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants