Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer fields in gzip_index are serialized independently to endianness #80

Merged
merged 1 commit into from
Sep 28, 2022

Conversation

vkuzniet
Copy link
Contributor

Signed-off-by: Viktor Kuznietsov vkuzniet@amazon.com

Issue #, if available:

Description of changes:
gzip_index's have, size and span_size as well as gzip_index_point's in, out fields are converted to Big Endian representations for serialization and are converted back to the hosts architecture upon consumption. This ensures that IndexByteData byte sequence is consistent across multiple architectures.

Testing performed:

  • make test && make check && make integration pass
  • deployed soci and soci-snapshotter-grpc to EC2 instance and ran container workload for rabbitmq in lazy loading mode and got successful execution.

All the layers on rabbitmq were lazily loaded.

soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/1/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/2/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/3/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/4/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/5/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/6/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/7/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/8/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)
soci on /var/lib/soci-snapshotter-grpc/snapshotter/snapshots/9/fs type fuse.rawBridge (rw,nodev,relatime,user_id=0,group_id=0,allow_other)

The workload completed successfully.

2022-09-27 18:23:09.431972+00:00 [info] <0.779.0> started TCP listener on [::]:5672
 completed with 3 plugins.
2022-09-27 18:23:09.647003+00:00 [info] <0.708.0> Server startup complete; 3 plugins started.
2022-09-27 18:23:09.647003+00:00 [info] <0.708.0>  * rabbitmq_prometheus
2022-09-27 18:23:09.647003+00:00 [info] <0.708.0>  * rabbitmq_web_dispatch
2022-09-27 18:23:09.647003+00:00 [info] <0.708.0>  * rabbitmq_management_agent

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@vkuzniet vkuzniet requested a review from a team as a code owner September 27, 2022 18:26
Copy link
Contributor

@wmesard wmesard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to say this, but the endian wars are over. The bad guys won. Quoting the FlatBuffers documentation:

Each scalar is also always represented in little-endian format, as this corresponds to all commonly used CPUs today. FlatBuffers will also work on big-endian machines, but will be slightly slower because of additional byte-swap intrinsics.

We should canonicalize to little-endian, not big-endian. That way these will all be nops on (Linux, Windows) x (x86, ARM).

But speaking of FlatBuffers, this problem would go away if we were modeling the gzip_index data in the fbs file instead of treating it as a binary blob. Should we do that?

c/indexer.c Outdated Show resolved Hide resolved
Signed-off-by: Viktor Kuznietsov <vkuzniet@amazon.com>
@vkuzniet
Copy link
Contributor Author

I hate to say this, but the endian wars are over. The bad guys won. Quoting the FlatBuffers documentation:

Each scalar is also always represented in little-endian format, as this corresponds to all commonly used CPUs today. FlatBuffers will also work on big-endian machines, but will be slightly slower because of additional byte-swap intrinsics.

We should canonicalize to little-endian, not big-endian. That way these will all be nops on (Linux, Windows) x (x86, ARM).

But speaking of FlatBuffers, this problem would go away if we were modeling the gzip_index data in the fbs file instead of treating it as a binary blob. Should we do that?

I think, long term this is the right approach. We would anyway needed to do that at the time we re-write the indexer to Go, so this work seems to be inevitable.

@vkuzniet
Copy link
Contributor Author

I hate to say this, but the endian wars are over. The bad guys won. Quoting the FlatBuffers documentation:

Each scalar is also always represented in little-endian format, as this corresponds to all commonly used CPUs today. FlatBuffers will also work on big-endian machines, but will be slightly slower because of additional byte-swap intrinsics.

We should canonicalize to little-endian, not big-endian. That way these will all be nops on (Linux, Windows) x (x86, ARM).
But speaking of FlatBuffers, this problem would go away if we were modeling the gzip_index data in the fbs file instead of treating it as a binary blob. Should we do that?

I think, long term this is the right approach. We would anyway needed to do that at the time we re-write the indexer to Go, so this work seems to be inevitable.

Created an issue #81 to address that. Once it's implemented, encode/decode methods from this PR will be obsolete.

c/indexer.c Show resolved Hide resolved
@vkuzniet vkuzniet merged commit 5fbbba7 into awslabs:main Sep 28, 2022
@vkuzniet vkuzniet deleted the index-endianness branch September 28, 2022 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants