Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C serialization support #115

Closed
wants to merge 59 commits into from

Conversation

davidlenfesty
Copy link
Contributor

Notes about code:

  • Generated code requires libcanard, as its base serialization functions are utilized.
  • Code is implemented as discussed in C serialization support #112
  • Implemented inline functions return bit length. This is so they can nest easily in the case of not being byte-aligned.

Notes about PR:

As of creating this, I believe I am mostly done serialization support. There are a few fixes and cleanups to do, but the bulk of it is there. (probably some edge cases I missed too).

Deserialization support is next, and I will likely finish that tomorrow.

I will take this PR out of Draft mode once it is in a state I believe should start getting in depth reviews (I will also clean up the commits a bit so you aren't looking through 20 random commits).

@pavel-kirienko
Copy link
Member

  1. You can prove at code generation time that a given field is always byte-aligned. If that is the case, do not check the offset alignment and just memcpy directly. Use assert() to enforce the correctness in debug builds.

  2. Please follow the coding conventions: https://kb.zubax.com/pages/viewpage.action?pageId=2195699

    • Braces go on separate lines.
    • If you don't use the result of an invocation, cast it to void explicitly.
    • Condition/loop bodies are always block statements.
  3. We can't depend on libcanard here because what if you want to use the generated code with different transport? We discussed it somewhere here but I can't find the reference at the moment. I suggest we re-implement the required serialization primitives in a separate single-file (header-only?) C library.

  4. You seem to be too concerned about performance and resource utilization. They are important but per the core project goals, they come after ensuring functional correctness and ease of use (the last two are often correlated). Assuming that the user will take care of never passing in bad inputs into the serialization function is impractical; the high-integrity coding standards like MISRA explicitly require that inputs shall be validated. Please update the serialization API to support error handling.

Copy link
Member

@thirtytwobits thirtytwobits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks SO much for taking this on. This is a huge help. We need to clone, rename, and fixup the canard serialization code before I can fully review and approve.

src/nunavut/lang/c/templates/Header.j2 Outdated Show resolved Hide resolved
src/nunavut/lang/c/templates/_composite_type.j2 Outdated Show resolved Hide resolved
src/nunavut/lang/c/templates/_composite_type.j2 Outdated Show resolved Hide resolved
src/nunavut/lang/c/templates/_canard_set.j2 Outdated Show resolved Hide resolved
@davidlenfesty
Copy link
Contributor Author

@thirtytwobits

As far as the canard serialization code, do you think that should be in this PR? Or is that a seperate issue that should be merged in before this gets fully reviewed?

@thirtytwobits
Copy link
Member

@thirtytwobits

As far as the canard serialization code, do you think that should be in this PR? Or is that a seperate issue that should be merged in before this gets fully reviewed?

It'll have to be in the same PR to go into mainline (non-breaking changes and all). You can work on a feature branch if you want to iterate on smaller PRs.

@davidlenfesty
Copy link
Contributor Author

I like leaving it in this PR. When it comes closer to full-review time I'll be rebasing and isolating things into very rigid commits.

@davidlenfesty
Copy link
Contributor Author

Going through your comments in more detail @pavel-kirienko

  1. In the case that the user doesn't use @assert _offset_ % 8 == {0}, we can't, given that types can be included in other types, so I believe the check always needs to happen. But in the case they do, I'm not sure how that is surfaced into the template implementation. Or I'm missing something and there's a way to check the alignment universally?

  2. The coding style doesn't seem to explicitly mention casting functions to (void) when not using the return value. I'm assuming it follows from casting unused variables to void. I'm also assuming it doesn't apply to void functions. Given the idea we cast to void to indicate we are explicitly ignoring the return value.

@davidlenfesty
Copy link
Contributor Author

I've literally just copied over all of the libcanard serialization functions, changing only naming and their storage to static inline.

I'm not sure if there are any tweaks to make here. Also I'm not over the moon with this method of doing things, because we have to manually copy over any changes. (I don't have any better solutions though ¯_(ツ)_/¯)

@pavel-kirienko
Copy link
Member

size_t isn't appropriate since this is sized based on the maximum addressable location for a system in bytes. By using uint32_t we avoid weird limitations on, say, AVR8.

I suggest we define an alias instead of using raw (u)int32_t for clarity and tunability, like NunavutBitLength/NunavutBitLengthSigned or something.

I'm not sure if there are any tweaks to make here. Also I'm not over the moon with this method of doing things, because we have to manually copy over any changes.

I don't think copying over changes will be necessary because the point here is to be compliant with the Specification rather than another implementation, so the implementations are independent from each other.

But I agree that it just seems odd having to copy-paste a bunch of code from one place to another. Yeah. Maybe in the future, we should just drop canard_dsdl.[ch] from libcanard completely? To make that possible, we should keep the serialization support header reusable outside of Nunavut, which I think is easy.

Regarding naming: we care about naming a lot here. Basically, if the name cannot be found on the map of Canada, it won't do. So instead of serialization.h I suggest a less generic, more specific name, like nunavut.h, and also updating the prefixes on all contained definitions to be nunavut/Nunavut/NUNAVUT instead of what we have now. The rationale is that many applications out there deal with serialization but very few of them deal with Canadian territories, which avoids confusion.

Since Nunavut specializes in DSDL code generation, we can simplify the names like nunavutCopyBits(), NUNAVUT_PLATFORM_IEEE754_DOUBLE, etc.

The current implementation of serialization.h nunavut.h spills out its implementation details like WIDTH32, Float32Bits, and so on. This is of course unacceptable and every such item should be explicitly prefixed. The namespacing business is perhaps one of the worst things about C, right after the type system.

In the case that the user doesn't use @assert _offset_ % 8 == {0}, we can't, given that types can be included in other types, so I believe the check always needs to happen. But in the case they do, I'm not sure how that is surfaced into the template implementation. Or I'm missing something and there's a way to check the alignment universally?

Assertion checks in the source definition have zero effect on the output. Assertion checks exist to let the data type author verify that the field layout matches the expectations (actually, they can be used for more than that but it doesn't matter atm).

Nested types can be handled by checking the alignment before the nested type field; if the type is aligned, then its (de)serialization can be delegated, otherwise, the (de)serialization logic can be emitted in-place as if the function was inlined. See OpenCyphal/pydsdl#24. For example, imagine we have some type N.1.0 which is nested in T.1.0 as follows:

N.1.0 foo
bool bar
N.1.0 baz

In this example, the serialization function of T.1.0 would serialize foo by invoking the serialization function of N.1.0 directly, but the serialization of baz would be done in-place due to the byte alignment being broken by bar. This logic is implemented in PyUAVCAN and it's not that hard to do, you can look here for inspiration:

https://github.com/UAVCAN/pyuavcan/blob/a38ce614162be948f61f421eebe1c903389dd578/pyuavcan/dsdl/_templates/serialization.j2#L143

Eventually it would be great to support full zero-cost serialization as described in https://forum.uavcan.org/t/future-zero-cost-serialization-constraint/469. It doesn't require any changes from the language specification because the implementation can just deduce if the layout is compatible with the native type layout automatically at code generation time. We'll get to that eventually.

The coding style doesn't seem to explicitly mention casting functions to (void) when not using the return value. I'm assuming it follows from casting unused variables to void. I'm also assuming it doesn't apply to void functions. Given the idea we cast to void to indicate we are explicitly ignoring the return value.

Yes, you got it right. The rules follow from MISRA.

@TSC21
Copy link

TSC21 commented Jul 10, 2020

@dagar FYI

@TSC21
Copy link

TSC21 commented Jul 15, 2020

@davidlenfesty anything on deserialization at this point?

@davidlenfesty
Copy link
Contributor Author

@TSC21 Nothing yet, but most of the architectural things I'm solving with the serialization should pretty much map 1:1 to deserialization so it'll be quick when I get to it.

@TSC21
Copy link

TSC21 commented Jul 15, 2020

@TSC21 Nothing yet, but most of the architectural things I'm solving with the serialization should pretty much map 1:1 to deserialization so it'll be quick when I get to it.

Perfect! Thanks!

@TSC21
Copy link

TSC21 commented Jul 16, 2020

@davidlenfesty @pavel-kirienko what are we missing here at this stage?

@davidlenfesty
Copy link
Contributor Author

@TSC21 time :)

More specifically I need to:

  • do a very thorough debugging. I'm sure there's quite a few corner cases that fail.
  • fix some formatting issues.
  • split nunavut.h into seperate files based on byte order.
  • do deserialization

@TSC21
Copy link

TSC21 commented Jul 17, 2020

@davidlenfesty thank you! Really great job so far!

@davidlenfesty
Copy link
Contributor Author

Nice little update: I've confirmed serialization works against pyuavcan (using uavcan.register.node.Access_1_0, aligned properly and also with a void6 stuffed in there). This traverses a surprising amount of the code paths. Obv. more testing is required. Ideally automated but I don't really have the time to set up that test fixturing.

TODO before I open for full review:

  • split nunavut.h out (most of the effort here is propogating the CLI argument down to templates)
  • deserialization (should be quite quick, it's just modifications to serialization essentially)

@pavel-kirienko @thirtytwobits I'm sure I can figure out, but if I could get the code path for CLI argument -> passing a Jinja2 variable from someone more familiar that would probably save me a significant amount of time.

@pavel-kirienko
Copy link
Member

@davidlenfesty I would like to add a few early comments:

  • When (de-)serializing an array of standard-size primitives, you can skip the runtime alignment check if PyDSDL tells you that the array is always aligned. This is the point of the static alignment proof.

  • Representing bool[...] as arrays of C bools might be impractical due to memory overheads. uavcan.register.Value uses uavcan.primitive.array.Bit which would take 2 KiB of memory with the current approach; it can be limited to 256 bytes if you use bit-level representations. That would require some shifting and masking though, but you can simplify it using your nunavutSetBit()/nunavutGetBit().

@davidlenfesty
Copy link
Contributor Author

Also noticed a bug in existing code: type_MAX_SERIALIZED_REPRESENTATION_SIZE_BYTES is defined as bit length.

Commenting so I don't forget.

@davidlenfesty davidlenfesty marked this pull request as ready for review July 25, 2020 23:21
Copy link
Contributor Author

@davidlenfesty davidlenfesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserialization is finished! 🎉 Well, it compiles anyways.

A few more notes:

  • I don't want to support full zero-cost serialization yet (I am very low on time and validating that on every C architecture could get nasty fast)
  • I have yet to optimize arrays of bools.
  • I have not tested this against pyuavcan at all. If there is any test harnessing anyone has set up between the different UAVCAN libraries I would like to know so I could test.
  • the support files (nunavut*.h) do not get included, some bug needs to get fixed there methinks.

I personally would like to skip bool array optimization and zero-cost serialization in this PR, as I would like to get a minimum viable product into master (I'm not being selfless making this PR, I need this for some of my projects ;)), and then work on another PR with those features. But if optimizing bools is important I can get it in in this PR too.

src/nunavut/cli.py Outdated Show resolved Hide resolved
src/nunavut/lang/c/__init__.py Outdated Show resolved Hide resolved
src/nunavut/lang/c/__init__.py Outdated Show resolved Hide resolved
src/nunavut/lang/c/support/nunavut_common.h Outdated Show resolved Hide resolved
src/nunavut/lang/c/support/nunavut_common.h Outdated Show resolved Hide resolved
@thirtytwobits
Copy link
Member

thirtytwobits commented Oct 5, 2020

I ran the libcanard tests on my implementation and the only thing that was failing was:

ASSERT_FLOAT_EQ(0b0111111111111111, nunavutFloat16Pack(std::nanf("")));  // nan

This is because we do discard the diagnostic information (from section 6.2 of IEEE754):

To facilitate propagation of diagnostic information contained in NaNs, as much of that information as possible should be preserved in NaN results of operations.

Notice the "should". As such your test is not guaranteed to succeed for compliant implementations. I'm willing to open an issue to try to restore this information but it seems like something that will take more research especially as we are losing bits for a value whose structure we don't understand (i.e. do we discard high bits or low bits? Is there some assumption about the diagnostic data we can make that should be accounted for? How does the receiver tell the difference between a signaling NaN with 0x00 as it's diagnostic when we have to enforce a non-zero bit to distinguish from INFINITY? etc)

This does pass:

EXPECT_TRUE(std::isnan(nunavutFloat16Unpack(nunavutFloat16Pack(std::nanf("")))));  // nan

@pavel-kirienko
Copy link
Member

pavel-kirienko commented Oct 5, 2020 via email

@thirtytwobits
Copy link
Member

can you please try the new ones...

Yeah, this is what I did. See thirtytwobits@0932091

Again, you can't assert that the bit pattern in the significand will be a known value after calling pack.
See thirtytwobits@2203ae9 for proof that unpack is working correctly. Let me know if I'm missing something?

@davidlenfesty
Copy link
Contributor Author

@thirtytwobits Yes I should be able to add in the buffer size params sometime in the next couple days.

@pavel-kirienko
Copy link
Member

@thirtytwobits With this change in 2203ae9#diff-5b6bfcb09795f7e8909ef7aeaf1a0bbcR460, float16Pack appears to be working as expected in AMD64 mode.

In x86 mode (-m32), with float16Pack(+std::numeric_limits<CanardDSDLFloat32>::signaling_NaN()) it takes the quiet branch where it is supposed to be taking the signaling branch. The problem with float16Unpack I mentioned earlier where it demotes a signaling NaN to a quiet one is also related to this. However, now I see that I was wrong in my understanding of the context.

As I said earlier, I never actually eneavored to read the IEEE 754 standard, relying instead on the existing conversion functions contributed by someone else. I just noticed that on Wikipedia it says this:

Encodings of qNaN and sNaN are not specified in IEEE 754 and implemented differently on different processors. The x86 family and the ARM family processors use the most significant bit of the significand field to indicate a quiet NaN.

The standard apparently makes it impossible for us to distinguish between signaling and quiet NaNs in a platform-agnostic way. The existing logic happens to be working well on AMD64 but per the standard that is a mere coincidence; it breaks in the x86 mode but nobody says it wouldn't. So what I suggest is that we replace this:

if ((0x400000UL & in.bits) != 0)  // NOLINT NOSONAR
{
    out = 0x7E00U;  // NOLINT NOSONAR
}
else
{
    out = 0x7D00U;  // NOLINT NOSONAR
}

with this:

out = 0x7E00U;  // NOLINT NOSONAR

I would like to remind here about this issue, as it seems serious:

0x3FFFFFU and 0x400000U may break on 8/16-bit platforms where sizeof(int) = 2 (a self-respecting compiler will notice it though).

@thirtytwobits
Copy link
Member

Encodings of qNaN and sNaN are not specified in IEEE 754

Huh. Perhaps this was added in the latest revision of 754?

(from IEEE Std 754-2019)

6.2.1 NaN encodings in binary interchange formats

All binary NaN bit strings have the sign bit S set to 0 or 1 and all the bits of the biased exponent field E set to 1 (see 3.4). A quiet NaN bit string should be encoded with the first bit (d1) of the trailing significand field T being 1. A signaling NaN bit string should be encoded with the first bit of the trailing significand field being 0. If the first bit of the trailing significand field is 0, some other bit of the trailing significand field must be non-zero to distinguish the NaN from infinity. In the preferred encoding just described, a signaling NaN shall be quieted by setting d1 to 1, leaving the remaining bits of T unchanged. Bits d2 d3 ... dp−1 of the trailing significand field contain the encoding of the payload, which might be diagnostic information (see 6.2).

That said, the specification does say "should" and not "shall". I missed that nuance when I wrote my tests.

@pavel-kirienko
Copy link
Member

Okay. Do we accept the change I suggested in the previous message? I already implemented it in libcanard (and updated the tests, too).

@thirtytwobits
Copy link
Member

Okay. Do we accept the change I suggested in the previous message? I already implemented it in libcanard (and updated the tests, too).

Yeah. Obviously reality trumps specifications. If this encoding isn't consistent in a significant population of hardware then we have to account for that. I can try setting -m32 to confirm but I trust you have seen the different quiet/signaling encoding in the wild.

@pavel-kirienko
Copy link
Member

I can try setting -m32

We build libcanard in different modes (four to be exact: c99/c11 * amd64/x86), I think Nunavut's verification suite would benefit from that, too.

@davidlenfesty
Copy link
Contributor Author

I guess just to confirm, how do we want to handle the buffer overflow case? Are we propogating the error up and out of the serialization functions? With the current return types and the way everything's written on the deserialization side we just silently fail.

@thirtytwobits
Copy link
Member

I guess just to confirm, how do we want to handle the buffer overflow case? Are we propagating the error up...

Yea. I think that's what I'd do.

@thirtytwobits
Copy link
Member

Looking good @davidlenfesty but I'm getting "rc not used" errors again. Can we get rid of the whole "rc" thing and just use this pattern instead:

    {
        const {{typename_error_type}} nunavut_result = nunavut_some_method_that_returns_result();
        if (nunavut_result < 0)
        {
            return nunavut_result;
        }
    }

The block scope will keep this from conflicting with other declarations of "result" and you still get a simpler experience when stepping through with a debugger then you would if we put the function call into the if clause. The exact same instructions for -O1 are generated either way (https://godbolt.org/z/zd9e66).

Previously these were declared at the top of the functions when needed,
which added unnecessary logic and extra surface area for errors (i.e. rc
is missing or rc is unused)
@davidlenfesty
Copy link
Contributor Author

davidlenfesty commented Oct 15, 2020

I took the liberty of doing the same with size_bits. Less code is more better.

Unfortunately I can't compile test_support.c for an odd reason - no previous declaration for testNunavutFloat16Unpack_INFINITY . Given specifically on the function definition for some reason. I don't really have the brain space to track it down so I'm going to leave any support testing there - it looks pretty complete :p. (Using GCC 10.2.0 if anyone wants to dive down that particular rabbit hole).

@thirtytwobits
Copy link
Member

This PR is getting so old and long that I'm inclined to commit it to a feature branch and continue development for there. @pavel-kirienko what do you thing?

@pavel-kirienko
Copy link
Member

pavel-kirienko commented Oct 21, 2020 via email

@PetervdPerk-NXP
Copy link
Contributor

@davidlenfesty I want to look at the compilation error on test_support.c but I can't get a grip on the mechanism to generate the C test code. Could you provide me some basic steps/info to reproduce?

@davidlenfesty
Copy link
Contributor Author

@PetervdPerk-NXP See this thread

Note you'll want to target test_support instead of test_compiles

make test_support # Checks that test_support.c compiles
# or
make run_test_support # compiles and runs the tests

@thirtytwobits
Copy link
Member

Okay, this PR has gotten too old and big. I've created a feature branch to work off of here:

https://github.com/UAVCAN/nunavut/tree/issue/115

@davidlenfesty and @PetervdPerk-NXP , I'm going to close this PR. Please submit all subsequent progress on #115 as PRs against the issue/115 branch.

Thank you all for your contributions.

@thirtytwobits
Copy link
Member

@davidlenfesty I want to look at the compilation error on test_support.c but I can't get a grip on the mechanism to generate the C test code. Could you provide me some basic steps/info to reproduce?

You can get a read-made toolchain using docker if you like:

cd path/to/nunavut
docker pull uavcan/c_cpp:ubuntu-18.04
docker run --rm -it -v $PWD:/repo uavcan/c_cpp:ubuntu-18.04

If you want to use your local tools then just get any modern GCC or Clang compiler toolchain and python3.7 or newer.

You can see the verification build instructions in this file:

https://github.com/UAVCAN/nunavut/blob/issue/115/.buildkite/verify.sh

for the c verification build this does:

mkdir build_c
pushd build_c
cmake -DNUNAVUT_FLAG_SET=linux -DNUNAVUT_VERIFICATION_LANG=c ..
cmake --build . --target all -- -j4
cmake --build . --target cov_all_archive

you can also do

make help

to see a list of targets.

Finally, we provide some help with using vscode for native, visual debugging here

Let me know if you need more help @PetervdPerk-NXP .

@PetervdPerk-NXP
Copy link
Contributor

Thank you both for the explanation @davidlenfesty @thirtytwobits.

I've got the docker running and can run the buildkite commands listed above.

However cmake fails on a assertion of the public_regulated_data_types

pydsdl._data_type_builder.AssertionCheckFailureError: /repo/submodules/public_regulated_data_types/uavcan/diagnostic/32760.Record.1.0.uavcan:15: Assertion check has failed
CMake Error at cmake/modules/Findnnvg.cmake:59 (message):
  Failed to retrieve a list of headers nnvg would generate for the
  dsdl-regulated target (1) (/repo/.tox/local/bin/nnvg)
Call Stack (most recent call first):
  CMakeLists.txt:103 (create_dsdl_target)

@davidlenfesty could it be you're using a different commit of the public_regulated_data_types?
Mine is c1191d36b19a082f85c74be24131e6b205e46e8f

@thirtytwobits
Copy link
Member

thirtytwobits commented Oct 23, 2020

No, this is the broken state of this change after the pydsdl update. Sorry, I'm on this one...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants