Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease allocations of objects during (de)serialization #1436

Closed
GeertvanHorrik opened this issue Oct 9, 2019 · 13 comments · Fixed by #1443
Closed

Decrease allocations of objects during (de)serialization #1436

GeertvanHorrik opened this issue Oct 9, 2019 · 13 comments · Fixed by #1443
Assignees
Milestone

Comments

@GeertvanHorrik
Copy link
Member

@GeertvanHorrik GeertvanHorrik commented Oct 9, 2019

For example, the XmlSerializer serializes to xmlwriter, then to string, then parses it as XElement. There should be a better way to do this.

@GeertvanHorrik GeertvanHorrik added this to the 5.12.0 milestone Oct 9, 2019
@GeertvanHorrik GeertvanHorrik self-assigned this Oct 9, 2019
@GeertvanHorrik GeertvanHorrik removed this from the 5.12.0 milestone Oct 9, 2019
@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Oct 11, 2019

Just saw this issue, I may be able to help provide some guidance on this one (at least in regards to Xml Serialization). For instance the class wrapper we have for serializing to Xml uses a StreamWriter with an XmlWriter to write contents asynchronously and we have an XmlReader that only uses FileStream and XmlReader to read the contents in a lazy fashion (we want to refactor this though to use Async Streams but it's non-trivial right now).

@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 11, 2019

Funny , I just started working on this this evening :-)

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Oct 11, 2019

I’d be happy to throw up a gist of the reader, see if it’s any use for you.

@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 11, 2019

@vaecors would be great if we could work together on this. Been working on decreasing the memory footprint during serialization and in the ModelBase itself (caching boxed value type values), having some good results. Also want to work on performance. Not that we want to achieve the fastest engine in the world, but a bit better would be nice (I have a use-case now where we have models of 10+ mb (where it extensively uses graph id for circular references).

@GeertvanHorrik GeertvanHorrik added this to the 5.12.0 milestone Oct 11, 2019
@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 11, 2019

Also see #1437

@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 11, 2019

I’d be happy to throw up a gist of the reader, see if it’s any use for you.

Would be nice, could never hurt

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Oct 11, 2019

@GeertvanHorrik I'll definitely help where I can when my bandwidth allows it. Obviously aside from work related priorities I'm trying to get the wheels in motion on another open source project that I'd like to actually see to feature completion.

I've gone ahead and posted both the Writer and the Reader. The reader is not much and I really want to take advantage down the road with async streams but it's not in the time table. The writer does the best it can to keep objects in memory as short as possible and flush the serialized contents to the file. The downside with this is if something interrupts the process, the file may not be a valid form document (that may be desired, that may not be).

#1437 also looks interesting and I may take a peak into that and see if I notice anything that could be tweaked. I know for us when we deal with materializing a lot of objects (thousands) we use expression trees for reading/writing properties rather than caching of the member and property info.

@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 12, 2019

I'm trying to get the wheels in motion on another open source project that I'd like to actually see to feature completion.

What kind of project? Maybe I can help out (help writing code, put it under the WG umbrella, something else)?

Async doesn't necessarily mean it's faster, just better for overall system performance. That's one of the reasons we decided not to implement async support for most serializers, task handling was slower than just getting through it non-async.

We are now trying to serialize models of about 10 MB (including lots and lots of circular references), so performance starts to become a bit more important :-)

Will check out your writer / reader later this weekend, thanks for posting.

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Oct 12, 2019

Async doesn't necessarily mean it's faster, just better for overall system performance.

That's one of the reasons why I want to re-write it to use Async Streams. I'm not concerned about speed as much as just utilizing async where it makes sense.

What kind of project? Maybe I can help out (help writing code, put it under the WG umbrella, something else)?

The project in question is a .NET Global Tool called EfCoreGen. It was built because I was not happy with how Entity Framework Core's DbContext Scaffold tool works. It supports a lot more out of the box (including other DB types) and uses Roslyn to generate C# or VB.NET (depending on configuration). I'm re-writing it from the ground up because the one I currently built is a bit of a mess.

It just saves time because I do all my DB Development in SQL Server Projects, not in Code First. You're more than welcome to contribute when more of the code base has materialized. I just barely got the repository, code analysis and compilation pipelines setup.

@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 12, 2019

Async doesn't necessarily mean it's faster, just better for overall system performance. ...

@vaecors ever looked into llblgen?

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Oct 12, 2019

Never actually heard of it no, I may take a peek into it but it's probably not something we'd use since our architecture is heavily geared towards distributing databases as dacpacs (each schema has it's own dacpac, dal and entities nuget distribution) and deployed with custom tooling.

This tool is more of a pet project and just something to maintain public facing as a code portfolio concern. It also can also generate some extra concerns such as soft-delete query filtering, automatic audit fields and attaching of any custom code with the generation.

@GeertvanHorrik

This comment has been minimized.

Copy link
Member Author

@GeertvanHorrik GeertvanHorrik commented Oct 16, 2019

Unfortunately we have to introduce some (minor) breaking changes:

  1. ICustomXmlSerializable will use XmlReader and XmlWriter instead of XElement (we are not aware of supporting users that use this method)
  2. XmlSerializationContextInfo will accept XmlReader and XmlWriter instead of XElement in the ctor (but this type should only be used internally anyway)
  3. Some protected XDocument related methods have been removed from XmlSerializer, all replaced by XmlReader and XmlWriter for performance and memory allocation improvements
@lock

This comment has been minimized.

Copy link

@lock lock bot commented Oct 20, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant
You can’t perform that action at this time.