Add support for reading and writing the .NET Half type #418

adamreeve · 2024-02-04T23:14:37Z

Fixes #413

This adds support for the new Float16 logical type added in Arrow 15. I've added a new .NET 6 target to the ParquetSharp project to allow using the new Half type, which required fixing a few errors related to nullable reference type checking when building with the newer target.

One thing to be aware of is that writing Half values can be a bit slower than floats because of the extra overhead of writing these as fixed-length byte arrays rather than having a dedicated physical type. We could possibly improve this in future if it turns out to be a problem. I did some quick benchmarking of reading and writing 1 million random floats, doubles and half values, with dictionary encoding disabled:

Method	Mean	Error	StdDev	Ratio	RatioSD
ReadHalf	1.267 ms	0.0253 ms	0.0329 ms	1.02	0.02
ReadFloat	1.240 ms	0.0236 ms	0.0299 ms	1.00	0.00
ReadDouble	2.292 ms	0.0411 ms	0.0563 ms	1.85	0.07

Method	Mean	Error	StdDev	Median	Ratio	RatioSD
WriteHalf	16.771 ms	0.3318 ms	0.5452 ms	16.466 ms	3.13	0.25
WriteFloat	5.336 ms	0.1280 ms	0.3775 ms	5.324 ms	1.00	0.00
WriteDouble	8.845 ms	0.1759 ms	0.2408 ms	8.849 ms	1.64	0.11

adamreeve · 2024-02-04T23:22:29Z

csharp/LogicalRead.cs

+            }
+            else
+            {
+                // Float-16 values are always stored in little-endian order


For completeness I thought we should handle big-endian machines, as we have a similar check for the guid type, but we currently only build x64 and arm64 native binaries for the nuget package, and there doesn't seem to be an easy way to test this. Maybe we should just throw a not implemented exception for big endian instead? This didn't seem to affect performance at least.

I think it is fine as it is now.

marcin-krystianc

Looks good, well done.

marcin-krystianc · 2024-02-05T14:12:01Z

csharp/LogicalRead.cs

+            }
+            else
+            {
+                // Float-16 values are always stored in little-endian order


I think it is fine as it is now.

adamreeve added 5 commits February 5, 2024 09:56

Add Float16 logical type

6cdc5d6

Add net6 target and fix associated build errors related to null checks

07328f7

Support round-tripping Half values with Float16 logical type

7e5906a

Handle big-endian systems

6ba1dde

Small tidy

b583d32

adamreeve requested a review from marcin-krystianc February 4, 2024 23:14

adamreeve commented Feb 4, 2024

View reviewed changes

Fix building tests for older targets

1a35668

marcin-krystianc approved these changes Feb 5, 2024

View reviewed changes

adamreeve merged commit 3b562b0 into G-Research:master Feb 5, 2024
33 checks passed

adamreeve deleted the arrow-15-half branch February 5, 2024 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for reading and writing the .NET Half type #418

Add support for reading and writing the .NET Half type #418

adamreeve commented Feb 4, 2024

adamreeve Feb 4, 2024 •

edited

marcin-krystianc Feb 5, 2024

marcin-krystianc left a comment

marcin-krystianc Feb 5, 2024

Add support for reading and writing the .NET Half type #418

Add support for reading and writing the .NET Half type #418

Conversation

adamreeve commented Feb 4, 2024

adamreeve Feb 4, 2024 • edited

Choose a reason for hiding this comment

marcin-krystianc Feb 5, 2024

Choose a reason for hiding this comment

marcin-krystianc left a comment

Choose a reason for hiding this comment

marcin-krystianc Feb 5, 2024

Choose a reason for hiding this comment

adamreeve Feb 4, 2024 •

edited