New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocations #364

Open
Keats opened this Issue Nov 26, 2018 · 18 comments

Comments

Projects
None yet
2 participants
@Keats
Copy link
Owner

Keats commented Nov 26, 2018

Right now, the Context::insert method converts a value to JSON value and it seems to be cloned when passing the context to the render method.

It would be ideal to store in Context the data before converting (ie as dyn Serialize) but this is not possible. There might be a way to do that but I don't know any right now.

Another solution could be for Tera::render to take a Context by value instead of a T: Serialize so we can grab the data. It would disallow rendering with something other than a Context but we could add functions to still provide that if needed. If I'm not reading this wrong, it would be a win in terms of allocations too big to ignore.

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Nov 26, 2018

@andy128k you're better at traits than me, do you see any way we can store values pre-conversion in the context?

@andy128k

This comment has been minimized.

Copy link
Contributor

andy128k commented Nov 27, 2018

@Keats This sounds similar to what I was talking about here. Currently I'm experimenting in this branch.

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Nov 27, 2018

It's a bit different as the change I'm talking about is purely for the current context.
If you manage to have your branch working I would be super impressed/happy!

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 5, 2018

@andy128k how is it going? Do you think it's going to work?

@andy128k

This comment has been minimized.

Copy link
Contributor

andy128k commented Dec 6, 2018

@Keats I had no chance to work on it. I hope next week I'll have time.

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 6, 2018

No worries, I'm just super curious to see if ti's going to work :p

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 21, 2018

@andy128k I see you have made more changes since the last time I checked, do you think it's going to work overall?

@andy128k

This comment has been minimized.

Copy link
Contributor

andy128k commented Dec 21, 2018

@Keats It definitely will work. But, It looks like it affects almost all parts of the crate.

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 21, 2018

How would that look from a user perspective? Compared to the current Serialize automatic derive

@andy128k

This comment has been minimized.

Copy link
Contributor

andy128k commented Dec 21, 2018

serde_json::Value will still be possible to use, but built-in types like i32, String, Vec, HashMap are also usable.

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 27, 2018

Can I help in some ways?

@andy128k

This comment has been minimized.

Copy link
Contributor

andy128k commented Dec 27, 2018

Looking at amount of changes and amount if changes to do, I would say, this experiment failed. :(
A lot of efforts and risks to break something with questionable outcome.

This approach allows to clone less but has also drawbacks. Pass String by reference is good thing but doing the same for i32, is a total waste of resources (fat pointer is 4 times larger). Implementation of filters also becomes much trickier (Do we expect customers of Tera to write their own filters?).

Json also adds some simplifications. Having single number type is a big deal. Should nth filter accept u32, or only usize? What about i32?

IMHO, serde_json::Value could stay (unless you really want to have DateTime as first-class citizen or distinguish integers and floats).

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 27, 2018

The main issue is that serde_json serialization is the bottleneck of https://github.com/getzola/zola both in speed and memory usage. That's because there are many pages with a big string (the content) dwarfing everything else. I expect the perf loss from fat pointers to integers to not matter in comparison.

I do agree that JSON simplify some things, my only issue with it is the cloning needed to render a template that cannot be avoided

@andy128k

This comment has been minimized.

Copy link
Contributor

andy128k commented Dec 27, 2018

I see two directions to move here:

  1. focus on performance and try to clone less, by using references as much as possible. Here benchmarks should help.
  2. introduce own Value type. Serialization/deserialization is not needed if generic From/Into implementations will be provided.
@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 28, 2018

The main issue with serde-json is that it will clone when serialising. It's fine for small data but if you have a section with thousands of pages in Zola, that's going to be painful. Part of the issue is some Tera function that might be solved with the new traits, I still need to test it.

Keats pushed a commit that referenced this issue Dec 29, 2018

Vincent Prouillet
@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Dec 29, 2018

The commit above is what I had in mind when creating the issue. This will not reflect in benchmarks however as we need to clone it there but in practice it will mean lower memory usage.

Tracking memory/allocations is done using heaptrack on the bench example:

$ cargo build --release --example bench
$ heaptrack target/release/examples/bench               

Still shows 10k temp allocations in Processor::render_body but at least the memory consumed is down by 1/3.
This still needs to clone the full object though. In this case a 100x100 integer table is not too bad on the memory but there will be issues in Zola when you fetch an object with 100+ blog post length string content since it WILL need to be serialised to JSON, which means allocating lots of memory.

I'm not sure whether you've seen #340 (comment)

@Keats Keats changed the title Avoid allocations in Context Avoid allocations Jan 4, 2019

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Jan 4, 2019

@andy128k are you still working on that? I see some changes from the 28

I will add another bench example with a string-heavy template to represent a more common Zola usecase.

Regarding allocations, I believe we could use write! to bypass some allocations.
For example, the examples/bench will do current number -> string -> push to string result 10k times (the number of integer variables in the context`. This seems a bit wasteful.

@Keats

This comment has been minimized.

Copy link
Owner Author

Keats commented Jan 26, 2019

Another potential improvement is to try to make https://github.com/Keats/serde-tera work with Cow for the strings if it's possible. It would also allows us to have our own Value types, which means we could have SafeString as well that do not require escaping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment