Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CsvWriter performance optimization #2248

Closed

Conversation

snaumenko-st
Copy link
Contributor

I noticed that CsvWriter allocated huge amount of memory during serialization, much more than the size of the data, up to 3 orders of magnitude. It doesn't have memory leaks in my scenario, however such memory traffic creates GC pressure and has noticeable performance impact on my application.

There are some breaking changes to the public types though. But they can be easily fixed if needed. I'd also consider to make them internal in further versions to lower the probability of someone uses them directly.

So that's what is implemented here:

  • caching of writer action for collections. We can optimize for the case when most of objects in the collection have the same type.
  • removed duplicate type resolving. GetType() is called too often.
  • optimized Type HashCodes. No need to get AssemblyQualifiedName to get it's HashCode, we can get it directly from the Type. However both approaches don't guarantee no collisions.
  • reduced memory allocation during writes.

Before

Method Typed Mean Error StdDev Gen0 Allocated
WriteObject False 561.6 ms 10.92 ms 14.58 ms 88000.0000 1059.73 MB
WriteStruct False 545.8 ms 10.69 ms 15.67 ms 91000.0000 1090.25 MB
WritePrimitive False 441.7 ms 8.80 ms 10.47 ms 77000.0000 922.42 MB
WriteExpando False 992.4 ms 19.65 ms 27.54 ms 100000.0000 1204.7 MB
WriteDynamic False 853.7 ms 16.69 ms 19.87 ms 100000.0000 1204.7 MB
WriteObject True 360.8 ms 7.13 ms 7.33 ms 31000.0000 380.7 MB
WriteStruct True 354.3 ms 7.08 ms 10.15 ms 31000.0000 380.7 MB
WritePrimitive True 249.8 ms 4.75 ms 4.67 ms 23333.3333 281.52 MB
WriteExpando True 984.7 ms 18.81 ms 15.71 ms 100000.0000 1204.7 MB
WriteDynamic True 858.9 ms 16.42 ms 16.86 ms 100000.0000 1204.7 MB

After

Method Typed Mean Error StdDev Gen0 Allocated
WriteObject False 126.46 ms 0.599 ms 0.500 ms 10000.0000 121.3 MB
WriteStruct False 135.19 ms 1.728 ms 1.532 ms 12500.0000 151.82 MB
WritePrimitive False 64.79 ms 1.078 ms 0.955 ms 5000.0000 60.26 MB
WriteExpando False 531.63 ms 4.189 ms 3.918 ms 23000.0000 281.52 MB
WriteDynamic False 399.09 ms 6.453 ms 5.720 ms 22000.0000 266.26 MB
WriteObject True 129.50 ms 1.278 ms 1.067 ms 10000.0000 121.3 MB
WriteStruct True 122.63 ms 1.732 ms 1.620 ms 10000.0000 121.3 MB
WritePrimitive True 62.77 ms 0.978 ms 0.867 ms 5000.0000 60.26 MB
WriteExpando True 589.30 ms 6.714 ms 6.280 ms 23000.0000 281.52 MB
WriteDynamic True 464.30 ms 3.689 ms 2.880 ms 22000.0000 266.26 MB

- removed duplicate type resolving
- optimized Type HashCodes
- reduced memory allocation during writes
@snaumenko-st
Copy link
Contributor Author

snaumenko-st commented Apr 19, 2024

Eliminating all remaining string allocations seems to me much more difficult. As almost half of the code needs to be rewritten from scratch. And of course we don't have convenient support for that in earlier versions of .net.
So this PR is mostly a compromise.

@JoshClose
Copy link
Owner

I'm working on a new parser and serializer that uses spans and has 0 allocations, outside the initial buffer. I'm not sure how I'm going to fit it into the existing codebase yet.

@botinko
Copy link

botinko commented Apr 23, 2024

@JoshClose Are there any chances that this PR will be concidered / reviewd / merged and then published in near future? We would really like to fix a performance issue in our project that is fixed in this PR.
If not, then we would consider creating a fork.

@JoshClose
Copy link
Owner

I'll try and get to it soon.

@JoshClose
Copy link
Owner

Does this same issue happen when reading?

@snaumenko-st
Copy link
Contributor Author

Does this same issue happen when reading?

No, when reading it's much better, and It allocates only a few bytes per operation.

@JoshClose
Copy link
Owner

This has been merged. I made other changes before you put in your second commit, that aren't in there, but I had done the same thing myself. Please look over everything and try it out. I will create a release once you do.

@JoshClose JoshClose closed this Apr 25, 2024
@snaumenko-st
Copy link
Contributor Author

@JoshClose I've tried the same benchmarks, and the results are almost the same, except with ExpandoObject. However it makes no sense to us.
Thanks a lot!

P.S. My CsvReader benchmarks before and after your latest fix show almost the same results. So, I'm not sure whether it improved anything.

@JoshClose
Copy link
Owner

ExpandoObject uses IDynamicMetaObjectProvider so I just removed the ExpandoObject specific code. Apparently it was faster. I need to do some work on the IDynamicMetaObjectProvider code to make it faster.

My CsvReader benchmarks before and after your latest fix show almost the same results. So, I'm not sure whether it improved anything.

Maybe not, but there is definitely less computations. Plus, it's consistent with the writer now.

I'll push a new build soon. Just need to update the change log.

@JoshClose
Copy link
Owner

I changed expando to use any IDictionary<string, object> now, so that should cover most dynamics. It'll fall back to IDynamicMetaObjectProvider, which is slower.

I added a fast dynamic object that is created instead of ExpandoObject when reading.

@JoshClose
Copy link
Owner

This will be in the next release 32.0.0 on NuGet.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants