New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread safety to CRC class #7781

Merged
merged 1 commit into from Aug 29, 2018

Conversation

Projects
None yet
8 participants
@deepikabhavnani
Contributor

deepikabhavnani commented Aug 13, 2018

Description

Add thread safety to CRC class. Only commit for this PR is 839f46a

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Feature
[ ] Breaking change

Dependent on : #7735

CC @geky

}
return 0;
unlock();

This comment has been minimized.

@geky

geky Aug 13, 2018

Member

nit: The series of if statements was cleaner IMO. Not a block, but in the future this may be better written with a series of unlocks in each return statement, or a goto to a cleanup block if the cleanup is too expensive.

Generally gotos are considered harmful, but this is one of the places where a goto makes the intention clearer than a series of nested if statements.

This comment has been minimized.

@deepikabhavnani

deepikabhavnani Aug 13, 2018

Contributor

I disagree, had a look again. I would prefer this way as it looks neat, instead of repeated cleanup code / goto. I would have agreed to your point if if statements were 4/5 level deeper.

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

I tend to agree with @geky, I personally prefer when the expected/successful path of the algorithm is not buried inside nested if statements.

    int32_t error = compute_partial_start(crc);
    if (error) { goto cleanup; }
    
    error = compute_partial(buffer, size, crc);
    if (error) { goto cleanup; }

    error = compute_partial_stop(crc);
    if (error) { goto cleanup; }

    return 0;

cleanup:
    *crc = 0;
    return error;
@geky

geky approved these changes Aug 13, 2018

Looks good to me, nice test 👍

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 13, 2018

@deepikabhavnani Please take a look at some of the docs. Looks like Travis is back to normal.

@cmonr cmonr added needs: work and removed needs: review labels Aug 13, 2018

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:crc_safety branch from 839f46a to 85a0350 Aug 13, 2018

@pan-

I'm not sure to understand the design or the use case of the change.

  • Global/Singleton lock will induce unnecessary contention if multiples different CRCs are computed in parallel.
  • CRCs are usually not calculated from different sources (do you get some bytes from a thread and other bytes from others threads ?); it is usually not a shared resource either. Why would it be thread safe by default ?

I would have split this PR into two prs: one that handle optimized CRC and the other that handles the locking part.

@@ -22,6 +22,8 @@ namespace mbed {
/** \addtogroup drivers */
/** @{*/
SingletonPtr<PlatformMutex> mbed_crc_mutex;

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

Why is this a singleton ? It is not possible to have a mutex per MbedCRC instance ?
If the mutex is global then all crc calculations are serialized across threads; I'm not sure this is what we want.

This comment has been minimized.

@deepikabhavnani

deepikabhavnani Aug 14, 2018

Contributor

Locking is needed for HW, since we will have one HW module performing CRC computation for all the instances. I understand this is an overhead for software only computations or combination of software and hardware.

We can have lock/unlock only in case of hardware, does that look like correct solution or will cause confusion?

*crc = 0;
return status;
int32_t status = 0;
lock();

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

Could you use RAII lock objects ? It would reduce the code size and ensure no unlock are forgotten.

This comment has been minimized.

@geky

geky Aug 14, 2018

Member

What makes RAII reduce the code size?

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

I was mostly referring to the line of code required but in the final binary; we can observe code size reduction too as the compiler is very good at optimizing destruction of objects on the stack.

Here you go: https://godbolt.org/g/nXarPA

This comment has been minimized.

@geky

geky Aug 21, 2018

Member

It looks like it will behave the same if you have a cleanup label to jump to: https://godbolt.org/z/rgagh-

But good to know gcc can do that automatically for RAII objects. It's a shame it can't figure out that the unlock() calls can be shared.

}
return 0;
unlock();

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

I tend to agree with @geky, I personally prefer when the expected/successful path of the algorithm is not buried inside nested if statements.

    int32_t error = compute_partial_start(crc);
    if (error) { goto cleanup; }
    
    error = compute_partial(buffer, size, crc);
    if (error) { goto cleanup; }

    error = compute_partial_stop(crc);
    if (error) { goto cleanup; }

    return 0;

cleanup:
    *crc = 0;
    return error;
@@ -264,6 +288,20 @@ class MbedCRC {
return width;
}
/** Acquire exclusive access to CRC hardware/software
*/
virtual void lock()

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

Why is this virtual ?
Note: same question for the destructor.

This comment has been minimized.

@geky

geky Aug 14, 2018

Member

Hmm, I'm guessing to match the other driver classes.

Unrelated, but what are your thoughts on a "Lockable" abstract class to standardize this interface?

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

What would be the polymorphic use case for such thing ?

This comment has been minimized.

@geky

geky Aug 14, 2018

Member

We could make the RAII mutex generic for these classes for one.

I was more thinking of this as extra documentation. And that way we make the API is standard.

This comment has been minimized.

@deepikabhavnani

deepikabhavnani Aug 14, 2018

Contributor

Use case in my mind was: application if sure of using software only CRC can overwrite these locks to be empty functions.

This comment has been minimized.

@pan-

pan- Aug 14, 2018

Member

We could make the RAII mutex generic for these classes for one.

I'm slow today; I still don't see the use case; these class lock and unlock functions aren't supposed to be internal and self contained ?

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 14, 2018

Global/Singleton lock will induce unnecessary contention if multiples different CRCs are computed in parallel.

Global/Singleton lock is needed when using HW for CRC computation, we cannot do different algorithms in parallel with HW

CRCs are usually not calculated from different sources (do you get some bytes from a thread and other bytes from others threads ?); it is usually not a shared resource either. Why would it be thread safe by default ?

Agree for software point of view, but it is a share resource in case of HW. Please see https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_Freescale/TARGET_MCUXpresso_MCUS/TARGET_MCU_K64F/mbed_crc_api.c and newly added test case https://github.com/ARMmbed/mbed-os/pull/7781/files#diff-add91f4d5e17cbab3ddf004660cf1d4bR137

CRC computation is splitted in 3 parts start compute_partial and stop, if another thread configures hw with different polynomial in between than results won't be correct

@pan-

This comment has been minimized.

Member

pan- commented Aug 14, 2018

I do understand that we need to protect the hardware resource that compute CRC from concurrent access with a single mutex as access to that resource must be serialized. However, what I do not understand is why this requirement impacts software computation of the CRC. lock and unlock should do nothing if the mode is not hardware.

CRC computation is splitted in 3 parts start compute_partial and stop, if another thread configures hw with different polynomial in between than results won't be correct

It ain't pretty but it make sense .

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 14, 2018

lock and unlock should do nothing if the mode is not hardware.

👍 Will correct the behavior for software in that case and change lock/unlock functions to be non-virtual.

@geky

This comment has been minimized.

Member

geky commented Aug 14, 2018

It's not entirely clear, but the HAL CRC API is designed to keep the current state of the CRC inside hardware registers. I'm guessing this is because fetching the CRC value from the hardware adds a cost that isn't always needed. @scartmell-arm

Here's an example from the user's perspective:

void super_duper_fast_barcode_scanner() {
    Scanner scanner;
    MbedCRC<0xabcdabcd, 32> crc;

    crc.lock(); // lock so no one else messes with the crc hardware while we're using it
    crc.compute_partial_start();

    for (int i = 0; i < 100000000; i++) { // barcode is really long
        uint8_t c = scanner.scan(); // scans one character at a time
        crc.compute_partial(&c, 1);
    }

    uint32_t crc;
    crc.compute_partial_stop(&crc);
    crc.unlock();

    MBED_ASSERT(crc == 0);
}

If this example is correct, the lock/unlock functions are very much a part of the public API. This matches the current API for Serial/SPI and other streaming classes were function calls can end up with interspersed data.

Actually looking at this example raises a few other questions

  1. Why have crc be an argument for start and compute partial? Isn't it only available at the stop stage?
  2. Can start/stop lock/unlock the hardware for the user?
@pan-

This comment has been minimized.

Member

pan- commented Aug 15, 2018

I thought again about it and I came to the conclusion that lock/unlock functions should not be exposed to the application side as they are redundant and error prone:

When a CRC is computed by the hardware, the hardware lock must be held for the entirety of the computation. The following sequence would be a programming error if hardware crc is used:

crc.lock(); 
crc.compute_partial_start(...);
crc.unlock()

// get the data

crc.lock(); 
crc.compute_partial(...);
crc.compute_partial_stop(...);
crc.unlock()

The lock should not be released until the computation ends.

Therefore when a CRC is computed with hardware, the function compute_partial_start should acquire the lock and the function compute_partial_stop should release it. If an error happen in compute_partial; the lock can be released too.

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 15, 2018

I don't agree with this approach. Scenario above will be a programming error if hardware CRC is used and user should be aware of that. Hence user should have control of lock/unlock.

Also in all the drivers, we give that advantage to users. I would like to be consistent with all the drivers.
https://github.com/ARMmbed/mbed-os/blob/master/drivers/SPI.h#L140
https://github.com/ARMmbed/mbed-os/blob/master/drivers/I2C.h#L148

@pan-

This comment has been minimized.

Member

pan- commented Aug 15, 2018

I don't think the comparison with classes such as SPI is right: the SPI class doesn't needs to be locked to be used as locking is internal and the class itself keeps enough data to reconfigure the hardware whenever it acquires the bus. The CRC class doesn't offer that.

In CRC - unlike the SPI API - the flow of a transaction is well defined with prologue and epilogue functions that must be called by the user. Why not use these functions to lock and unlock the mutex associated with the hardware ? They have to be called anyway.

That would make the class thread safe like the other drivers we propose; it isn't at the moment.

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 15, 2018

the SPI class doesn't needs to be locked to be used as locking is internal

I believe this is not correct, though you can reconfigure hardware during acquire bus, external locks are still needed. https://github.com/ARMmbed/mbed-os/blob/master/drivers/SPI.h#L65

In case of SD driver as well https://github.com/ARMmbed/sd-driver/blob/master/SDBlockDevice.h#L215,
we have locks as few operations on SPI bus should be done sequentially before device can free the bus.

Consider a application where I am sure only single type of CRC (say CRC-16) for two different modules. In this case start will be called just once during initialization and combination of compute_partial and stop multiple times by both modules and guarded.

If user needs complete serialization then compute should be used. Else I don;t see purpose of other 3 API's.

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:crc_safety branch from 85a0350 to c7f6174 Aug 15, 2018

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 15, 2018

Consider a application where I am sure only single type of CRC (say CRC-16) for two different modules. In this case start will be called just once during initialization and combination of compute_partial and stop multiple times by both modules and guarded.

I did testing for this use case and it didn't worked from HAL side. HAL API;s does not pass the previous CRC (which could be used to configure the initial seed).

@geky - Please note compute_partial for LFS will not work for below sequence.

uint32_t crc = 0xffffffff;
compute_partial(&test, sizeof(test), &crc);
crc = 0xffffffff;
compute_partial(&test, sizeof(test), &crc);

It has to be

uint32_t crc = 0xffffffff;
compute(&test, sizeof(test), &crc);
crc = 0xffffffff;
compute(&test, sizeof(test), &crc);

Since we can't do much with compute_partial in hardware without compute_start. It make sense to have 'lock' and 'unlock' as private and set lock in start and only unlock in stop/error. Unless we are planning to update hal to allow setting seed value in compute_partial.

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:crc_safety branch from c7f6174 to b00facd Aug 15, 2018

@pan-

pan- approved these changes Aug 15, 2018 edited

Ideally the hal would change to accept an opaque state in the function that handles partial computation.

That would allow chunked parallel or interleaved hardware computation of different type of CRCs. As it is right now, the following use case is still broken without changes in the HAL (it has nothing to do with locking ...):

MbedCRC crc_a;
MbedCRC crc_b;

crc_a.compute_partial_start(...);
crc_b.compute_partial_start(...);

crc_a.compute_partial(...);
crc_b.compute_partial(...);

crc_a.compute_partial_stop(...);
crc_b.compute_partial_stop(...);

@scartmell-arm @bulislaw

@cmonr cmonr added the needs: review label Aug 16, 2018

@cmonr cmonr requested review from scartmell-arm and bulislaw Aug 16, 2018

@cmonr cmonr removed the needs: work label Aug 16, 2018

@bulislaw

This comment has been minimized.

Member

bulislaw commented Aug 21, 2018

Ideally the hal would change to accept an opaque state in the function that handles partial computation.
That would allow chunked parallel or interleaved hardware computation of different type of CRCs. As it is right now, the following use case is still broken without changes in the HAL (it has nothing to do with locking ...):

We considered it, but it wasn't an usecase. The driver should be able to protect the access to HW without bothering user to know when it's necessary to lock the object.

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 21, 2018

/morph build

@cmonr cmonr added the needs: CI label Aug 21, 2018

Add thread safety to CRC class
Thread safety is added to serialize the hardware CRC and will not
impact the software CRC.

@deepikabhavnani deepikabhavnani force-pushed the deepikabhavnani:crc_safety branch from b00facd to 986411c Aug 21, 2018

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 21, 2018

@deepikabhavnani Please take a a look at the CI failures.

Forgot to update test application (lock/unlock API's are private in MbedCRC class)

/morph build

@geky

This comment has been minimized.

Member

geky commented Aug 21, 2018

We considered it, but it wasn't an usecase. The driver should be able to protect the access to HW without bothering user to know when it's necessary to lock the object.

How in the world could this work in a multithreaded system? Is only one thread allowed to perform CRC at a time?

@0xc0170 0xc0170 added needs: review and removed needs: work labels Aug 22, 2018

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 22, 2018

@bulislaw Would you mind answering @geky's question?

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 23, 2018

@cmonr @geky - With current limitation in hal, yes only one thread allowed to perform CRC at a time.
However, this should be considered in future CRC hal updates.

@deepikabhavnani

This comment has been minimized.

Contributor

deepikabhavnani commented Aug 27, 2018

@cmonr - Can we start build on this?

@0xc0170

This comment has been minimized.

Member

0xc0170 commented Aug 27, 2018

/morph build

@mbed-ci

This comment has been minimized.

mbed-ci commented Aug 27, 2018

Build : SUCCESS

Build number : 2932
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7781/

Triggering tests

/morph test
/morph export-build
/morph mbed2-build

@mbed-ci

This comment has been minimized.

@mbed-ci

This comment has been minimized.

@cmonr cmonr added needs: CI and removed needs: review labels Aug 28, 2018

@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 28, 2018

Will need to retest when CI load is lower.

@NirSonnenschein

This comment has been minimized.

Contributor

NirSonnenschein commented Aug 29, 2018

/morph test

1 similar comment
@cmonr

This comment has been minimized.

Contributor

cmonr commented Aug 29, 2018

/morph test

@mbed-ci

This comment has been minimized.

@cmonr cmonr merged commit 1e3e694 into ARMmbed:master Aug 29, 2018

14 checks passed

ci-morph-build build completed
Details
ci-morph-exporter build completed
Details
ci-morph-mbed2-build build completed
Details
ci-morph-test test completed , RTOS ROM(+0.0%) RAM(+0.0%)
Details
continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/cloud_client_smoke_test Test job was successful
Details
travis-ci/astyle Passed, 585 files
Details
travis-ci/docs Local docs testing has passed
Details
travis-ci/events Passed, runtime is 10224 cycles (-263 cycles)
Details
travis-ci/gitattributestest Local gitattributestest testing has passed
Details
travis-ci/licence_check Local licence_check testing has passed
Details
travis-ci/littlefs Passed, code size is 8372B (+0.00%)
Details
travis-ci/tools-py2.7 Local tools-py2.7 testing has passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment