New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized Error Handling and Error Codes #6983

Merged
merged 13 commits into from May 24, 2018

Conversation

@SenRamakri
Contributor

SenRamakri commented May 22, 2018

This is the implementation for Standardized error coding and handling for mbed-os.
The changes with this pull request implements standardized error code handling and error code definitions/encoding. With this, error handling is implemented as a Core-os service which captures and
logs the errors with context information. Each error code is captured with error location, error message,
error file(this is configurable as this can cause increase in binary size), calling thread info, module generating the error etc. Relevant APIs are provided to log and retrieve errors from error logging system.
A build time flag has been provided to switch off error logging system on memory constrained platforms.
Error reporting itself is logically separated and implemented in mbed_error_reporting.c/h. By doing this both exception handling and error handling can use the same reporting infrastructure. A sample error report is as below. Note that error status value captures error type, entity and error code. Error reporting prints these values separately as well to enable developers to search the source tree with error code.
Error handling service also provides an API to save the error log in file system if required.

++ MbedOS Error Info ++
Error Status: 0x80040103
Error Code: 259
Error Entity: 04
Error Message: HAL Entity error
Error Location: 0x000067C7
Error Value: 0x00005566
Current Thread: Id: 0x200024A8 EntryFn: 0x0000FB0D StackSize: 0x00001000 StackMem: 0x200014A8 SP: 0x2002FFD8
-- MbedOS Error Info --

The implementation is tested using the Greentea tests under TESTS/mbed_platform/error_handling.
Note that exception handling and reporting testing cannot be done using Greentea and was manually tested.

Pull request type

[ ] Fix
[ ] Refactor
[ ] New target
[X] Feature
[ ] Breaking change

The APIs/data structures/macros are all documented using Doxygen. In addition handbook documentation will also be generated for release.

Overall impact of this feature is ~1K(ROM) including logging. Logging can be disabled and if disabled
the overall impact would be around 0.75K.

//Error logger threads
void err_thread_func(mbed_error_status_t *error_status)
{
//printf("\nError Status = 0x%08X\n",*error_status);

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Please remove the dead code and printf's from all tests

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

Yep, we can remove it

}

i = mbed_get_error_log_count()-1;
//printf("\nError log count = %d\n", i+1);

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Printf should be removed

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

Can be removed


error = MBED_TEST_FILESYSTEM::format(&fd);
if(error < 0) {
printf("Failed formatting");

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Printf's - should be removed

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

@deepikabhavnani - These are printf in the case of errors, its helpful for debugging. Do you think it needs to be taken out? I don't think they are normally hit and shouldnt cause any adverse effects.

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

Im changing them to TEST_FAIL_MESSAGE, hope that works


utest::v1::status_t test_setup(const size_t number_of_cases)
{
GREENTEA_SETUP(300, "default_auto");

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Does the test take this long (300 sec) to finish?

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

I noticed that it takes quite a long for some slow targets(I saw one of them took 80secs it could be some test delays as well) as I see. But I can cut that down to 2 mins or less. Ill do that.

@@ -85,6 +85,11 @@ void error(const char* format, ...)
{
(void) format;
}

mbed_error_status_t mbed_error(mbed_error_status_t error_status, const char *error_msg, unsigned int error_value, const char *filename, int line_number)

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

From the comments above, it does not look like we need mbed_error here till we actually replace error with mbed_error in EvrRtxTimerError.

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

The intention is when the error happens we have more info to debug the issue. Thats the reason why we are replacing them. Hope that works since its a good direction.

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Code will not be called, till we actual update EvrRtxTimerError. We can add this later when we are making changes to CMSIS/RTX code. As it helps in keeping track of all CMSIS/RTX changes in single PR and dependencies as well when updating to higher versions.

@@ -55,6 +55,11 @@ struct Sync {
void error(const char* format, ...) {
(void) format;
}

mbed_error_status_t mbed_error(mbed_error_status_t error_status, const char *error_msg, unsigned int error_value, const char *filename, int line_number)

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Do we need this code here?

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

Can you please explain it? Do you mean we can have it under macro?

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

My comment was inline to previous one, till we replace error call from CMSIS/RTX lib, we should not add this code. Its should be part of same commit, to help in keeping track of CMSIS changes for updates.

@@ -123,15 +123,15 @@ u32_t sys_now(void) {
*---------------------------------------------------------------------------*/
err_t sys_mbox_new(sys_mbox_t *mbox, int queue_sz) {
if (queue_sz > MB_SIZE)
error("sys_mbox_new size error\n");
MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_NETWORK_STACK, MBED_ERROR_CODE_INVALID_SIZE), "sys_mbox_new size error\n", queue_sz);

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

I would recommend not to replace API in library/feature code (even when deprecating), let the feature owners do it after thorough testing.

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

You can share the samples and examples, but actual API usage should be done by feature owners.

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

@deepikabhavnani - Lot of these changes are already talked about during design review. In addition we are going to write separate requirements for each team once this is in. So, my understanding it is better to capture more info as long as they are not changing the behavior in any way. Do you still think we should remove them?

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Yes Senthil, I believe it should be done and verified by feature owners

@@ -57,14 +57,14 @@ int ReadOnlyBlockDevice::read(void *buffer, bd_addr_t addr, bd_size_t size)

int ReadOnlyBlockDevice::program(const void *buffer, bd_addr_t addr, bd_size_t size)
{
error("ReadOnlyBlockDevice::program() not allowed");
return 0;
MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr);

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Same comment, please dont replace the API everywhere in Mbed OS code, it should not be part of API addition. API addition and API adoption should be separate PR's

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

Same reason as above, we are only changing places where it doesnt cause any behavior change. We will be writing requirements for other teams as well to adopt this more broadly.

This comment has been minimized.

@geky

geky May 23, 2018

Member

Is it possible to pass addr and size to the MBED_ERROR macro?

Should we just not take any arguments in MBED_ERROR?


//Prevent corruption by holding out other callers
//and we also need this until we remove the error call completely
while (error_in_progress == 1);

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Spinning is fine in case of critical error, but in case of warnings we are unnecessary holding the CPU here

This comment has been minimized.

@SenRamakri

SenRamakri May 22, 2018

Contributor

Good point, I thought about that. But how do we sync-up with the current "error" API without this. Until we remove that we have to do something like this. I was thinking of returning but thats not desirable to return from warning API when some code is already in distress and doesn't know how to handle the situation. Do you think we should call some sort of wait to free up the cpu?What would you suggest

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Wait will have its own side effects, but do we support these warning/error from interrupts? if no then wait is fine


/* Prints the error information */
void mbed_report_error(mbed_error_ctx *error_ctx, char *error_msg)
{

This comment has been minimized.

@deepikabhavnani

deepikabhavnani May 22, 2018

Contributor

Query: Will this function be called in case of warning/non-fatal errors as well? If yes then I would say it is lacking synchronization with existing mbed_trace library.
I know the prints in the call are under critical section, but prints from application and mbed_trace are not in critical section. Consider a scenario where thread1 is in between printing mbed_trace log and context switches to thread_2 which has non-fatal error. It will start printing on UART console without previous log message dumped completely. Logging in this module does not guarantee line interleaving with existing libraries

@deepikabhavnani

I have added my comments at specific code section, major concern is line interleave (sync with other libraries) in case of warning messages and updating existing 'error` API's in feature and other libraries.

@SenRamakri

This comment has been minimized.

Contributor

SenRamakri commented May 23, 2018

@deepikabhavnani - See my new changes. The serial print functions are removed and now we print the error report only in the case of fatal errors, so it shouldn't cause issues when mbed_trace is active.
I have removed the MBED_ERROR calls from lwip. For all other files which are changed to use MBED_ERROR we do have component experts already in this review. "error" function is no longer deprecated. It will internally call mbed_error to capture the error info. In future, we may deprecate "error" once we have other components using the new mbed error apis.

@SenRamakri SenRamakri requested a review from sg- May 23, 2018

@kjbracey-arm

Do still need to remove all the hard-coded console device stuff from here (later), but in that regard this version is no worse than the existing error, and a teeny bit better in having ITM hard-coded. So I'm okay to approve this - no other concerns.

Open issue with my thoughts on that here: #6521

@c1728p9

@SenRamakri you have addressed the concerns I had. I would have liked more visibility on this during the concept phase, but what you have here looks good to me and matches the proposed design.

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented May 23, 2018

@SenRamakri - Changes look good. I will request just one more change in API to have it more aligned with other logging/tracing systems, const char string message as first arg.

--#define MBED_WARNING( error_status, error_msg, error_value ) 
++define MBED_WARNING( error_msg, error_status, error_value )
#ifdef MBED_CONF_ERROR_FILENAME_CAPTURE_ENABLED
#define MBED_ERROR( error_status, error_msg, error_value ) mbed_error( error_status, (const char *)error_msg, (uint32_t)error_value, (const char *)MBED_FILENAME, __LINE__ )
#else
#define MBED_ERROR( error_status, error_msg, error_value ) mbed_error( error_status, (const char *)error_msg, (uint32_t)error_value, NULL, 0 )

This comment has been minimized.

@geky

geky May 23, 2018

Member

Existing error does not take arguments, so most calls will not know what to pass as an error_value. I do see it as useful, but a single error_value is both additional work and limiting.

Since we're stuck once this lands, can we change MBED_ERROR to take two parameters and add MBED_ERROR1 or similar to take the error_value argument?

This would allow use to increase the arg count in the future if we decide that we want to do that.

MBED_ERROR(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed");
MBED_ERROR1(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr);
MBED_ERROR2(MBED_MAKE_ERROR(MBED_MODULE_BLOCK_DEVICE, MBED_ERROR_CODE_WRITE_PROTECTED), "ReadOnlyBlockDevice::program() not allowed", addr, size);
@geky

This comment has been minimized.

Member

geky commented May 23, 2018

@deepikabhavnani, interesting, I disagree, in this case error_status is not an argument, but the object as a target of the MBED_WARNING call:

Consider the order of arguments for sprintf:

#define MBED_WARNING(error_status, error_msg, error_value)
// vs
sprintf(buffer, msg, args);
@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented May 23, 2018

@SenRamakri @geky - As discussed internally will keep error_status as first arg

@SenRamakri SenRamakri dismissed stale reviews from deepikabhavnani, c1728p9, 0xc0170, and kjbracey-arm via 5ef6728 May 23, 2018

@SenRamakri SenRamakri force-pushed the SenRamakri:sen_ErrorHandling_Push2 branch from a9f2a0d to 5ef6728 May 23, 2018

@SenRamakri

This comment has been minimized.

Contributor

SenRamakri commented May 23, 2018

Pushed a minor change to split the MBED_ERROR macros into MBED_ERROR and MBED_ERROR1.

@geky

geky approved these changes May 23, 2018

From an external discussion:

I'm happy with this as long as we plan to unify POSIX and Mbed error codes in the future. Having them separate creates a challenge for users.

Thanks for responding to all my feedback 👍

@cmonr

This comment has been minimized.

Contributor

cmonr commented May 24, 2018

Starting CI since comments indicate multiple reviewers are good with the changes.
Note: It's possible that export and test builds may get shuffled around tomorrow.

/morph build

@mbed-ci

This comment has been minimized.

mbed-ci commented May 24, 2018

Build : SUCCESS

Build number : 2134
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6983/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci

This comment has been minimized.

@adbridge

This comment has been minimized.

Contributor

adbridge commented May 24, 2018

Test failed trying to gather gcov data on Nordic device.
Will try re-running the tests.
/morph test

@mbed-ci

This comment has been minimized.

@0xc0170

This comment has been minimized.

Member

0xc0170 commented May 24, 2018

Test aborted, this needs rebuild to have a fix in that was merged today

/morph build

@SenRamakri

This comment has been minimized.

Contributor

SenRamakri commented May 24, 2018

@0xc0170 - Do we need to rebase to get the fix that merged today?

@mbed-ci

This comment has been minimized.

mbed-ci commented May 24, 2018

Build : SUCCESS

Build number : 2136
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6983/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@0xc0170

This comment has been minimized.

Member

0xc0170 commented May 24, 2018

@0xc0170 - Do we need to rebase to get the fix that merged today?

not needed. CI does merge. Thus I restarted it

@mbed-ci

This comment has been minimized.

@mbed-ci

This comment has been minimized.

@cmonr

This comment has been minimized.

Contributor

cmonr commented May 24, 2018

Bringing it in!

@cmonr cmonr merged commit 527f9a1 into ARMmbed:master May 24, 2018

13 checks passed

AWS-CI uVisor Build & Test Success
Details
ci-morph-build build completed
Details
ci-morph-exporter build completed
Details
ci-morph-mbed2-build build completed
Details
ci-morph-test test completed
Details
continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
travis-ci/astyle Passed, 850 files
Details
travis-ci/docs Local docs testing has passed
Details
travis-ci/events Passed, runtime is 10127 cycles (+1108 cycles)
Details
travis-ci/gitattributestest Local gitattributestest testing has passed
Details
travis-ci/littlefs Passed, code size is 9964B (+0.00%)
Details
travis-ci/tools Local tools testing has passed
Details

@cmonr cmonr removed the needs: review label May 24, 2018

@0xc0170

I overlooked one commit change (suffixed 1 macros).

Where is docs PR ?

@@ -78,13 +78,13 @@ void Thread::constructor(Callback<void()> task,

switch (start(task)) {
case osErrorResource:
error("OS ran out of threads!\n");
MBED_ERROR1(MBED_MAKE_ERROR(MBED_MODULE_PLATFORM, MBED_ERROR_CODE_OUT_OF_RESOURCES), "OS ran out of threads!\n", task);

This comment has been minimized.

@0xc0170

0xc0170 May 25, 2018

Member

MBED_WARNING1 or this ERROR1 are confusing to me - what does 1 stands for, which one should I use. From the commit message and the changes I see there is error value, so we either have it or not, but having these 2 ?

@SenRamakri

This comment has been minimized.

Contributor

SenRamakri commented May 25, 2018

@0xc0170 - I'm working on Examples and Docs, example has been completed and available. Docs are in progress, will land the PR in Handbook early next week. The code changes will be reflected in docs changes as well.


core_util_critical_section_enter();
error_log_count++;
memcpy(&mbed_error_ctx_log[error_log_count % MBED_CONF_ERROR_HIST_SIZE], error_ctx, sizeof(mbed_error_ctx) );

This comment has been minimized.

@canhkha

canhkha Jun 8, 2018

Contributor

IAR warning without including string.h
Warning[Pe223]: function "memcpy" declared implicitly ...mbed\platform\mbed_error_hist.c 39

This comment has been minimized.

@0xc0170

0xc0170 Jun 12, 2018

Member

+1, I sent a fix here #7136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment