Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Conversation

lmoureaux
Copy link

@lmoureaux lmoureaux commented Jul 20, 2019

Description

Handle exceptions thrown by RPC methods and the Wisconsin RPC library. Turn them into two exceptions types at the calling site. For exceptions thrown on the remote side, a stack trace can be recorded and sent back (optional).

This PR adds a .cpp file that should be compiled into its a library. It is required on all architectures. The build system has not been modified to include it.

Example exception info

I triggered an exception by doing .at(-1) on a std::vector. The message is:

remote error: std::out_of_range: vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 10)

[feature was dropped]
The backtrace as retrieved from the thrown exception is (relevant frames are 5-7):

Stack trace (most recent call last):
#17   Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in 
#16   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f22249, in _start
#15   Object "/lib/x86_64-linux-gnu/libc.so.6", at 0x7f01aa1fdb96, in __libc_start_main
#14   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f228c9, in main
#13   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f22b25, in handle_client_connection(int, sockaddr_in&, unsigned int, int&)
#12   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f23779, in run_client(int)
#11   Object "wiscrpcsvc-server.pc/src/wiscrpcsvc-server.pc-build/rpcsvc", at 0x55b4f5f25fb5, in ModuleManager::invoke_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, wisc::RPCMsg*, wisc::RPCMsg*)
#10   Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a5212, in void xhal::rpc::invoke<Memory::Read, 0>(wisc::RPCMsg const*, wisc::RPCMsg*)
#9    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a5aad, in xhal::rpc::compat::void_holder<std::vector<unsigned int, std::allocator<unsigned int> > > xhal::rpc::compat::tuple_apply<std::vector<unsigned int, std::allocator<unsigned int> >, Memory::Read, unsigned int, unsigned int, 0>(Memory::Read&&, std::tuple<unsigned int, unsigned int> const&)
#8    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a674f, in decltype (((forward<Memory::Read>)({parm#1}))((get<0ul>)({parm#2}), (get<1ul>)({parm#2}))) xhal::rpc::compat::tuple_apply_impl<Memory::Read, unsigned int, unsigned int, 0ul, 1ul>(Memory::Read&&, std::tuple<unsigned int, unsigned int> const&, xhal::rpc::compat::index_sequence<0ul, 1ul>)
#7    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a34ed, in Memory::Read::operator()(unsigned int, unsigned int) const
#6    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a4a22, in std::vector<unsigned int, std::allocator<unsigned int> >::at(unsigned long)
#5    Object "/home/louis/Work/GEM/superbuild/gemsuperbuild/install/pc/usr/lib/rpcmodules/memory.so", at 0x7f01a93a50d3, in std::vector<unsigned int, std::allocator<unsigned int> >::_M_range_check(unsigned long) const
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", at 0x7f01aa873854, in 
#3    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b7a3e, in __cxa_throw
#2    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b789d, in xhal::rpc::helper::recordStackTrace()
#1    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b834d, in backward::StackTraceImpl<backward::system_tag::linux_tag>::load_here(unsigned long)
#0    Object "../install/pc/usr/lib/libxhal_rpc.so", at 0x7f01ab4b9d15, in unsigned long backward::details::unwind<backward::StackTraceImpl<backward::system_tag::linux_tag>::callback>(backward::StackTraceImpl<backward::system_tag::linux_tag>::callback, unsigned long)

Line numbers and even code snippets could be obtained by linking against libdwarf

How it works

Exceptions are caught on the server side (CTP7, in invoke) and an error message is generated there. The exact type of the exception is determined using a low-level C++ API (defined by the Itanium ABI). This information is sent back to the caller using the error and type keys. The call function checks for the keys and throws an exception with the relevant information.

In addition, code in the call function is now protected against exceptions. Any error coming from the Wisconsin RPC library results in an exception being thrown (a single type is used).

[feature was dropped]
The optional stack trace functionality works by inserting a hook in the low-level API used by throw statements. More specifically, it overrides __cxa_throw (also from the Itanium ABI) to record a stack trace every time an exception is thrown. When an exception is caught in invoke, the stack trace is queried and sent back to the caller.

Since recording a stack trace for every exception adds some overhead, it is made optional using a preprocessor macro. If NDEBUG is set when register.h is included, no trace will be recorded. (I'm not sure whether this applies at the level of a module, a compilation unit or even the part of it after the #include. This would need to be tested.)

CTP7 warning

[feature was dropped, irrelevant]
The version of libstdcxx on the CTP7 has a bug that prevents getting a stack trace. It also fools gdb. The best option seems to ask Wisconsin for an image based on a more recent Petalinux. We could also provide an updated version of libstdcxx to LD_PRELOAD when starting wiscrpcsvc, or a tiny library that would only patch the buggy function.

The code was tested primarily on x86, with some testing on a more recent ARM Linux.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context

We want to catch exceptions thrown by RPC modules, and provide devs with as much debugging information as possible.

How Has This Been Tested?

All prototyping was done on x86. Testing on ARM (CTP7 and another chip) was performed to check for architecture-dependent issues. This polished version was tested on x86.

The following client was used with an RPC-ified memory:

#include "memory.h"

#include <iomanip>
#include <iostream>

#include "xhal/rpc/call.h"

int main(int argc, char **argv)
{
    try
    {
        wisc::RPCSvc conn;
        conn.connect("localhost");

        conn.load_module("memory", "memory v1.0.1");

        auto mem = xhal::rpc::call<Memory::Read>(conn, 0, 10);
        std::cout << std::hex;
        for (std::uint32_t word : mem) {
            std::cout << " " << word;
        }
        std::cout << std::endl;
    }
    catch(const xhal::rpc::RemoteException &e)
    {
        std::cerr << "Remote call failed: " << e.what() << std::endl;
        if (e.hasTrace()) {
            std::cerr << e.trace() << std::endl;
        }
        return 1;
    }
    catch(const xhal::rpc::MessageException &e)
    {
        std::cerr << "Remote call failed: " << e.what() << std::endl;
        return 1;
    }

    return 0;
}

Compilation was tested on gem904daq01 with the Xilinx SDK 2016.2.

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

xhalcore/include/xhal/rpc/register.h Outdated Show resolved Hide resolved
@mexanick
Copy link
Contributor

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...
Also, shall we extend the exceptions propagation to include warnings alongside the errors?

@jsturdy
Copy link
Contributor

jsturdy commented Jul 25, 2019

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...

Indeed, but if we outline our strict requirements, we might get some support (or we can push to basically be allowed to control our own destiny viz the linux core, and blow a cc7 image if we so desire (as other groups are contemplating), and we should sell our GE1/1 experience as feedback for our GE2/1 system, where we need a more fluid and symbiotic relationship with this side of things

Also, shall we extend the exceptions propagation to include warnings alongside the errors?

I am really not in favour of using exceptions as anything other than exceptions...
Can you give an example of something you would consider a "warning" that you'd like to expose in this way?

@jsturdy jsturdy changed the title Exception handling in template RPC calls WIP: Exception handling in template RPC calls Jul 26, 2019
@jsturdy
Copy link
Contributor

jsturdy commented Aug 1, 2019

One concern/question I have is why do we need something that seemingly has a lot of added code and overhead, when at a minimum, I think what we need is just something that can:

  1. catch the exception in the invoke method running on the CTP7
  2. translate this message into an error key
  3. throw the exception (in the event of the presence of the error key) on the receiving end in the call method

At this point, the code that executed the call to the remote function simply handles the exception as it normally would.

I'd even be in favour of adding some exception classe(s) here (more than the two you have added) since this will be included by both ctp7_modules and cmsgemos), which can then easily allow the transferring of error cause and intent between contexts.

What is the use case for the complicated design pattern?

@jsturdy
Copy link
Contributor

jsturdy commented Aug 1, 2019

I'll also rebase this on top of the recently merged #123, unless @lmoureaux is back from holiday, as I'd have to force push to your github

@lmoureaux lmoureaux force-pushed the feature/templated-rpc-methods-exceptions branch 2 times, most recently from 4f2bbcf to 02836fd Compare August 5, 2019 21:33
@lmoureaux
Copy link
Author

Rebased on top of feature/templated-rpc-methods; implemented Jared's comment; fixed an issue with exceptions from the standard library.

Will look into build system integration next.

@lmoureaux
Copy link
Author

Also added an example backtrace to the summary above.

Copy link
Contributor

@lpetre-ulb lpetre-ulb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite good, mostly questions about details of the implementation and about the style.

xhalcore/include/xhal/rpc/call.h Outdated Show resolved Hide resolved
xhalcore/include/xhal/rpc/call.h Outdated Show resolved Hide resolved
xhalcore/include/xhal/rpc/call.h Outdated Show resolved Hide resolved
xhalcore/include/xhal/rpc/call.h Show resolved Hide resolved
xhalcore/include/xhal/rpc/call.h Show resolved Hide resolved
xhalcore/include/xhal/rpc/exceptions.h Outdated Show resolved Hide resolved
xhalcore/include/xhal/rpc/exceptions.h Outdated Show resolved Hide resolved
@lpetre-ulb
Copy link
Contributor

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...

Indeed, but if we outline our strict requirements, we might get some support (or we can push to basically be allowed to control our own destiny viz the linux core, and blow a cc7 image if we so desire (as other groups are contemplating), and we should sell our GE1/1 experience as feedback for our GE2/1 system, where we need a more fluid and symbiotic relationship with this side of things

As told in the first post, the issue lies in libstdcxx and even prevents gdb (via gdbserver) to get full backtraces. The issue is documented and fixed upstream. Our minimal requirement would then be to upgrade GCC from version 4.9.2 to version 4.9.3, even though a more recent version would definitively be better.

About CC7, do you know if an ARM release is provided? If yes, is it fully compatible with the kernel provided by Xilinx. If no, a more recent PetaLinux releases with more mainstream technology (such as overlay filesystems) would help.

I'd even be in favour of adding some exception classe(s) here (more than the two you have added) since this will be included by both ctp7_modules and cmsgemos), which can then easily allow the transferring of error cause and intent between contexts.

I fear defining more exception types would defeat the abstraction implemented in the templated RPC framework. If they were defined here, it would increase the difficulty to create standalone tools and if they were defined in the ctp7_modules, the templated RPC would have to follow all changes in the ctp7_modules. The only abstracted way I currently see is to register the exception classes via a template mechanism.

@jsturdy
Copy link
Contributor

jsturdy commented Aug 6, 2019

Considering asking UW to bake a new PetaLinux distributive: we can ask, but I'm quite sceptical whether it will ever be done...

Indeed, but if we outline our strict requirements, we might get some support (or we can push to basically be allowed to control our own destiny viz the linux core, and blow a cc7 image if we so desire (as other groups are contemplating), and we should sell our GE1/1 experience as feedback for our GE2/1 system, where we need a more fluid and symbiotic relationship with this side of things

As told in the first post, the issue lies in libstdcxx and even prevents gdb (via gdbserver) to get full backtraces. The issue is documented and fixed upstream. Our minimal requirement would then be to upgrade GCC from version 4.9.2 to version 4.9.3, even though a more recent version would definitively be better.

This is probably possible, but the primary issue might be that UW are not really supporting CTP7s, and have moved resources to future boards, so they really only provide some limited support. Again, it can't hurt to ask, and for something like this, I'd be optimistic (unlike say for the inclusion of a version with overlayfs...)

About CC7, do you know if an ARM release is provided? If yes, is it fully compatible with the kernel provided by Xilinx. If no, a more recent PetaLinux releases with more mainstream technology (such as overlay filesystems) would help.

I only know that there are other groups working on LHC projects who are putting centos variants on things like the Zynq. I believe that there is also some work being done to determine if an officially supported CERN release would be feasible.

I'd even be in favour of adding some exception classe(s) here (more than the two you have added) since this will be included by both ctp7_modules and cmsgemos), which can then easily allow the transferring of error cause and intent between contexts.

I fear defining more exception types would defeat the abstraction implemented in the templated RPC framework. If they were defined here, it would increase the difficulty to create standalone tools and if they were defined in the ctp7_modules, the templated RPC would have to follow all changes in the ctp7_modules. The only abstracted way I currently see is to register the exception classes via a template mechanism.

To be clear, I'm not saying we should have arbitrary exceptions, rather, specific named exceptions inheriting from these two basic types (even on-the-fly exception creation macros, a la xcept::Exception from xdaq), which is potentially interesting from the handling point of view; however, if any remote exception is going to trigger some error condition, this may be a moot point...

@jsturdy
Copy link
Contributor

jsturdy commented Aug 6, 2019

Rebased on top of feature/templated-rpc-methods; implemented Jared's comment; fixed an issue with exceptions from the standard library.

Will look into build system integration next.

@lmoureaux, you've rebased on top of the wrong branch:

* b006432        (louis/feature/templated-rpc-methods-exceptions) Drop boolean fields in RemoteException
* 02836fd        Add code to record the stack trace on throw
* a60eccb        Update \author info
* 82b6b08        Add support for sending back remote stack traces
* 69ce392        Handle RPCMsg and RPCSvc exceptions in call()
* eb5ea95        Handle the error key in call()
* 198386d        Catch exceptions thrown by RPC methods
| * 1c95eb5  (HEAD -> feature/templated-rpc-methods-exceptions) Add code to record the stack trace on throw
| * 4407b02  Update \author info
| * f25ec24  Add support for sending back remote stack traces
| * c690806  Handle RPCMsg and RPCSvc exceptions in call()
| * b190de8  Handle the error key in call()
| * dc54e5d  Catch exceptions thrown by RPC methods
| *   a4d188c        (gemdaq/feature/templated-rpc-methods, feature/templated-rpc-methods) Merge pull request #123 from lpetre-ulb/feature/templated-rpc-methods
| |\
| |/
|/|
* | dba680f  (louis/feature/templated-rpc-methods, laurent/feature/templated-rpc-methods) Relax namespace constraints for custom type serializers
* | 946f5e5  Add support for custom types
* | e259268  Add support for `std::map<std:string, T>`
* | 66fbbbf  Add support of `std::map<std::uint32_t, T>`
* | 2bd6734  Add support for `std::array` of integral types
| |
|/
*   32b6094  Merge pull request #121 from lpetre-ulb/feature/templated-rpc-methods

All exceptions thrown by RPC methods are caught, and an error message is sent
via the "error" key. In order to help debugging, the type of the exception is
also sent back to the client.

Build system support is left for later.
The error key is set on the message when an exception is thrown by the RPC
method. Throw an exception on the calling side when it is present.
@lmoureaux lmoureaux force-pushed the feature/templated-rpc-methods-exceptions branch from b006432 to a720e22 Compare August 6, 2019 10:23
The Wisconsin RPC messaging layer can throw a family of exceptions that need to
be enumerated in catch() statements. Replace them with a common type deriving
from std::runtime_error.
@lmoureaux lmoureaux force-pushed the feature/templated-rpc-methods-exceptions branch from a720e22 to 27a8049 Compare August 6, 2019 10:25
@lmoureaux
Copy link
Author

  • Rebased on top of the correct branch
  • Addressed all comments. Since I was rebasing anyway, edited the history
  • Dropped the backtrace feature as discussed this morning; it will go into a separate PR

Still no build system integration.

@jsturdy
Copy link
Contributor

jsturdy commented Aug 8, 2019

Still no build system integration.

Blocker

@lmoureaux lmoureaux force-pushed the feature/templated-rpc-methods-exceptions branch from 49dbb54 to 98d3e87 Compare August 8, 2019 11:36
@lmoureaux lmoureaux mentioned this pull request Aug 8, 2019
2 tasks
@lmoureaux lmoureaux force-pushed the feature/templated-rpc-methods-exceptions branch from 98d3e87 to f577bf2 Compare August 10, 2019 21:54
@lmoureaux
Copy link
Author

Added build system integration. Symbols are added to libxhal.so. Proof:

[lmoureau@gem904daq01 xhal]$ nm -a xhalcore/lib/libxhal.so | c++filt | grep ::getExc
00000000000343fb T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::TypeException const&)
00000000000343ce T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BadKeyException const&)
0000000000034462 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BufferTooSmallException const&)
00000000000344c9 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::CorruptMessageException const&)
00000000000344f6 T xhal::rpc::helper::getExceptionMessage(wisc::RPCSvc::RPCException const&)
0000000000034351 T xhal::rpc::helper::getExceptionMessage(std::exception const&)
[lmoureau@gem904daq01 xhal]$ nm -a xhalarm/lib/libxhal.so | c++filt | grep ::getExc
00024580 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::TypeException const&)
00024548 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BadKeyException const&)
000245e8 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::BufferTooSmallException const&)
00024650 T xhal::rpc::helper::getExceptionMessage(wisc::RPCMsg::CorruptMessageException const&)
00024688 T xhal::rpc::helper::getExceptionMessage(wisc::RPCSvc::RPCException const&)
000244d0 T xhal::rpc::helper::getExceptionMessage(std::exception const&)

@lmoureaux lmoureaux changed the title WIP: Exception handling in template RPC calls Exception handling in template RPC calls Aug 10, 2019
SRCS_XHAL = $(shell echo $(SrcLocation)/*.cpp)
SRCS_XHALPY = $(shell echo $(SrcLocation)/python_wrappers/*.cpp)
SRCS_RPC_MAN = $(shell echo $(SrcLocation)/rpc_manager/*.cpp)
SRCS_XHAL = $(wildcard $(PackageSourceDir)/*.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @mexanick, what is/was the rationale for separating *_XHAL and *_UTILS?
In this Makefile they all simply feed into the XHAL_LIB shared object.

Makefile Outdated
xhalcore: xhalarm
xhalcore:

xhalcore.RPM: xhalarm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be xhalcore.rpm, typo was in my stuff.
I also made a few further optimizations to these as well as the templates in config.
I don't think you need to include them in this PR, but I'll work them in during the release prep (they also impact other packages)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can edit your commit if you want stuff to be fixed. Just push fixes to #129 and ping me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config submodule was updated to cms-gem-daq-project/gembuild#4, and I'll fix the xhal commit with the merged hash of this, this afternoon (currently it's pushed with the hash of the pending PR if you want to test everything except the final submodule hash)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hotfix/simplify-makefiles is now updated to the merged hash of the config submodule

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You rebased it on top of a WIP commit that I removed yesterday.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Not really (the same commit message appears twice in history):

7eb3190 Clean up Makefiles -- Fixes #128
055b8aa Fix typo in rpc/register.h header guard
f577bf2 Compile rpc/exceptions.cpp into libxhal.so
9d9df61 Clean up Makefiles -- Fixes #128

I sorted it out. I took the freedom to drop the changes that added the rpc directory, since I prefer having them in a separate commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, thanks.
I missed that my rebase, rebased onto yours which had already included the commit

jsturdy and others added 3 commits August 13, 2019 00:25
* `config` submodule updated to pick up `XHAL_ROOT` and default `INSTALL_PATH`
* `xhalarm` and `xhalcore` `Makefile`s are simplified for reliable, reproducible builds
* `xhalarm` as dependency of `xhalcore` is moved to dependency of `xhalcore.RPM`
Copy link
Contributor

@jsturdy jsturdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK from me, just the final decision on #123 (review), which can be made in a separate cleanup PR

@jsturdy jsturdy merged commit e2a3df0 into cms-gem-daq-project:feature/templated-rpc-methods Aug 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants