Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading and writing the .NET Half type #418

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

adamreeve
Copy link
Contributor

Fixes #413

This adds support for the new Float16 logical type added in Arrow 15. I've added a new .NET 6 target to the ParquetSharp project to allow using the new Half type, which required fixing a few errors related to nullable reference type checking when building with the newer target.

One thing to be aware of is that writing Half values can be a bit slower than floats because of the extra overhead of writing these as fixed-length byte arrays rather than having a dedicated physical type. We could possibly improve this in future if it turns out to be a problem. I did some quick benchmarking of reading and writing 1 million random floats, doubles and half values, with dictionary encoding disabled:

Method Mean Error StdDev Ratio RatioSD
ReadHalf 1.267 ms 0.0253 ms 0.0329 ms 1.02 0.02
ReadFloat 1.240 ms 0.0236 ms 0.0299 ms 1.00 0.00
ReadDouble 2.292 ms 0.0411 ms 0.0563 ms 1.85 0.07
Method Mean Error StdDev Median Ratio RatioSD
WriteHalf 16.771 ms 0.3318 ms 0.5452 ms 16.466 ms 3.13 0.25
WriteFloat 5.336 ms 0.1280 ms 0.3775 ms 5.324 ms 1.00 0.00
WriteDouble 8.845 ms 0.1759 ms 0.2408 ms 8.849 ms 1.64 0.11

}
else
{
// Float-16 values are always stored in little-endian order
Copy link
Contributor Author

@adamreeve adamreeve Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness I thought we should handle big-endian machines, as we have a similar check for the guid type, but we currently only build x64 and arm64 native binaries for the nuget package, and there doesn't seem to be an easy way to test this. Maybe we should just throw a not implemented exception for big endian instead? This didn't seem to affect performance at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine as it is now.

Copy link
Contributor

@marcin-krystianc marcin-krystianc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, well done.

}
else
{
// Float-16 values are always stored in little-endian order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine as it is now.

@adamreeve adamreeve merged commit 3b562b0 into G-Research:master Feb 5, 2024
33 checks passed
@adamreeve adamreeve deleted the arrow-15-half branch February 5, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support round-tripping Half values using the new Float16 logical type
2 participants