Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUCLEO_F756ZG/mbedtls: Add hw acceleration for SHA256 #4162

Merged
merged 1 commit into from
Aug 7, 2017

Conversation

adustm
Copy link
Member

@adustm adustm commented Apr 11, 2017

Description

Enable HW acceleration for SHA256 algorithm on STM32F756ZG
is #3961

Status

READY

Related PRs

This PR needs #4159 + #4161 (formerly #3954) to be merged first

Steps to test or reproduce

To test this feature, you have to modify TESTS/mbedtls/selfttest/main.cpp in order to call sha256 self test:
add:
#include "mbedtls/sha256.h"
then

#if defined(MBEDTLS_SHA256_C)
MBEDTLS_SELF_TEST_TEST_CASE(mbedtls_sha256_self_test)
#endif

then

#if defined(MBEDTLS_SHA256_C)
    Case("mbedtls_sha256_self_test", mbedtls_sha256_self_test_test_case),
#endif

@adustm
Copy link
Member Author

adustm commented Apr 11, 2017

@sg-

@0xc0170 0xc0170 requested review from yanesca and RonEld April 12, 2017 09:52
Copy link
Contributor

@RonEld RonEld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done my review, and added some comments

void mbedtls_sha256_update( mbedtls_sha256_context *ctx, const unsigned char *input, size_t ilen )
{
if (ctx->is224 == 0)
HAL_HASHEx_SHA256_Accumulate(&ctx->hhash_sha256, (uint8_t *)input, ilen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if iLen < 64? Do you need an accumulation buffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean 'what if ilen < 4' ?
I did not check as I did not find a use case for that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, what if the input is smaller than the SHA256 block length. Will HAL_HASHEx_SHA256_Accumulate handle every size correct?
SHA256 on an input of size 3 is legal. Note that the sha256 selftest has an input buffer of size 3, so I don't think there is a problem here, but I would check if there is a limitation of input length in HAL_HASHEx_SHA256_Accumulate , like other HAL_HASH functions have

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for MD5 and SHA1, the issue could occur if the following sequence happens:

SHA256_update(size=3);
SHA256_update(size=x);
SHA256_finish; // the HW cannot handle size = 3 if it is not the last byte before the finish

Do you think this can exist ? We have met this use case with SHA1 or MD5, not with SHA256

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm I don't know your HW. But if you look at sha356 selftest there is a test vector of length 3. I suggest you test the scenario that fails on MD5 and SHA1, to be sure


#if defined (MBEDTLS_SHA256_ALT)
#include "mbedtls/platform.h"
#include "mbedtls/config.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this include needed? Please check what header file includes are needed in mbedtls_alt.h file and what can be moved to mbedtls_alt.c file

#endif

#ifdef __cplusplus
extern "C" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

* \param output SHA-224/256 checksum result
* \param is224 0 = use SHA256, 1 = use SHA224
*/
void mbedtls_sha256( const unsigned char *input, size_t ilen,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to define. Already defined in sha256.h outside #if defined (MBEDTLS_SHA256_ALT) check

*
* \return 0 if successful, or 1 if the test failed
*/
int mbedtls_sha256_self_test( int verbose );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to define. Already defined in sha256.h outside #if defined (MBEDTLS_SHA256_ALT) check

int mbedtls_sha256_self_test( int verbose );

#ifdef __cplusplus
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed

@0xc0170
Copy link
Contributor

0xc0170 commented May 16, 2017

retest uvisor

@adustm
Copy link
Member Author

adustm commented May 23, 2017

Hello @RonEld
I just cherry-picked the commit from the F439ZI_SHA256 branch in this one to align every branches.
Could you handle the review please ?
Cheers
Armelle

@gilles-peskine-arm
Copy link

Almost identical to PR #4159 which I have reviewed.

Copy link

@andresag01 andresag01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is looking good, but I have a few comments:

  • The error codes from the hardware accelerator are not always checked. This is probably because the mbed TLS API itself doesnt actually have return code, meaning that is quite difficult to signal errors to the user. However, I think that for now we can add comments in the relevant places so it is easier to modify the code once return codes are available.
  • What are the requirements of this code in terms of thread safety? At the moment it looks like there isnt any and the clone function probably creates more problems regarding this.
  • I am slightly confused about checks for multiples of 4 in some places.

void mbedtls_sha256_clone( mbedtls_sha256_context *dst,
const mbedtls_sha256_context *src )
{
*dst = *src;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment: This is not a real clone as it only makes a shallow copy of the context structure. The context contains a HASH_HandleTypeDef field that actually has a few points:

typedef struct
{
      HASH_InitTypeDef           Init;              /*!< HASH required parameters       */

      uint8_t                    *pHashInBuffPtr;   /*!< Pointer to input buffer        */

      uint8_t                    *pHashOutBuffPtr;  /*!< Pointer to input buffer        */

     __IO uint32_t               HashBuffSize;      /*!< Size of buffer to be processed */

     __IO uint32_t               HashInCount;       /*!< Counter of inputed data        */

     __IO uint32_t               HashITCounter;     /*!< Counter of issued interrupts   */

      HAL_StatusTypeDef          Status;            /*!< HASH peripheral status         */

      HAL_HASH_PhaseTypeDef       Phase;             /*!< HASH peripheral phase          */

      DMA_HandleTypeDef          *hdmain;           /*!< HASH In DMA handle parameters  */

      HAL_LockTypeDef            Lock;              /*!< HASH locking object            */

     __IO HAL_HASH_StateTypeDef  State;             /*!< HASH peripheral state          */
} HASH_HandleTypeDef;

I am not sure that is the expected behaviour though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @andresag01
Would you have a test that uses 2 or more SHA256 or MD5 objects and this clone feature ? I'd be happy to try that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adustm In the mbed TLS test suite, there are tests of mbedtls_md_clone. The mbed TLS test framework isn't ready yet for on-target testing, but you could take the sequence of calls from the function md_text_multi in tests/suites/test_suite_md.function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @adustm

This clone feature is used in the majority of TLS handshakes: MD5 and SHA1 are used simultaneously in both TLS 1.0 and TLS 1.1, and TLS 1.2 uses SHA256 or SHA384. So if this feature does not work, hardware accelerated targets won't be able to complete the handshake and make a TLS connection at all.

The method @gilles-peskine-arm suggested should be an easy and direct way to test it, I only would like to mention, that MD5 and SHA1 are disabled in the mbed TLS configuration file in mbed OS by default, so the corresponding compile time macros should be enabled when testing them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, clone has been tested ok with the following test:

    //Init both contexts
    mbedtls_sha256_init( &ctx1);
    mbedtls_sha256_init( &ctx2);
    mbedtls_sha256_init( &ctx3);
    //Start both contexts
    mbedtls_sha256_starts( &ctx1, 0);
    mbedtls_sha256_starts( &ctx2, 0);

    printf("upd ctx1\n");
    mbedtls_sha256_update( &ctx1, test_buf, 56 );
    printf("upd ctx2\n");
    mbedtls_sha256_update( &ctx2, test_buf, 66 );
    printf("finish ctx1\n");
    mbedtls_sha256_finish( &ctx1, outsum1 );
    printf("upd ctx2\n");
    mbedtls_sha256_update( &ctx2, test_buf+66, 46 );
    printf("clone ctx2 in ctx3\n");
    mbedtls_sha256_clone(&ctx3, (const mbedtls_sha256_context *)&ctx2);
    printf("free ctx1\n");
    mbedtls_sha256_free( &ctx1 );
    printf("upd ctx2\n");
    mbedtls_sha256_update( &ctx2, test_buf+112, 56 );
    printf("upd ctx3 with different values than ctx2\n");
    mbedtls_sha256_update( &ctx3, test_buf2, 56 );
    printf("finish ctx2\n");
    mbedtls_sha256_finish( &ctx2, outsum2 );
    printf("finish ctx3\n");
    mbedtls_sha256_finish( &ctx3, outsum3 );
    printf("free ctx2\n");
    mbedtls_sha256_free( &ctx2 );
    printf("free ctx3\n");
    mbedtls_sha256_free( &ctx3 );

Cheers

void mbedtls_sha256_process( mbedtls_sha256_context *ctx, const unsigned char data[MBEDTLS_SHA256_BLOCK_SIZE] )
{
if (ctx->is224 == 0) {
HAL_HASHEx_SHA256_Accumulate(&ctx->hhash_sha256, (uint8_t *) data, MBEDTLS_SHA256_BLOCK_SIZE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check this function's return values.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresag01 Yes, we should. But we don't have a way to return an error with the current API. So this is to be improved when we (mbed TLS team) provide a better API.

There is one possible improvement: instead of returning an error, write random data to the hash result buffer. This ensures that if an error happens, then the resulting hash value is distinct from any valid hash. This trick was suggested by Marc Stevens in a somewhat similar scenario (when a hash calculation fails but has no way to return a failure status — in Marc Stevens's case, the failure was that the calculation detected a probable collision attempt).

I'm not sure if we should currently recommend this as a workaround.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gilles-peskine-arm: I agree that we do not currently have a workaround for that as I wrote in the main discussion message. However, It would be nice to at least put comments where these codes should be checked so that when the API is enhanced, we can easily fix the problem.

if (ctx->is224 == 0) {
HAL_HASHEx_SHA256_Accumulate(&ctx->hhash_sha256, (uint8_t *) data, MBEDTLS_SHA256_BLOCK_SIZE);
} else {
HAL_HASHEx_SHA224_Accumulate(&ctx->hhash_sha256, (uint8_t *) data, MBEDTLS_SHA256_BLOCK_SIZE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check this function's return values.

currentlen -= (MBEDTLS_SHA256_BLOCK_SIZE - ctx->sbuf_len);
mbedtls_sha256_process(ctx, ctx->sbuf);
// now process every input as long as it is %4 bytes
size_t iter = currentlen / 4;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly confused why this is using the value 4 instead of 64 (or MBEDTLS_SHA256_BLOCK_SIZE). As I understand from the original mbed TLS implementation, blocks are processed in 64-byte blocks. Similarly, the documentation for HAL_HASHEx_SHA256_Accumulate() states:

If the Size is not multiple of 64 bytes, the padding is managed by hardware.

I believe that the problem is that if at the start of the function sbuf_len is, say 0, and ilen is 69 then the first block will be as expected, but then the second block of 5 bytes will be padded by the hardware to reach 64-bytes. Then if the user calls mbedtls_sha256_update() this will yield something like SHA256(64 || 5 || padding || other bytes) instead of SHA256(64 || 5 || otherbytes). Perhaps I am missing something though...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresag01
I think the comment

If the Size is not multiple of 64 bytes, the padding is managed by hardware

is not very clear.
Here is the explanation of this HW automatic digest computation and automatic padding.
The HW is handling the digest function every 512 bits (ie 64 bytes).
If the message is longer than 512 bits, there will be a partial digest computation every 512 bits. This is done automatically by the HW when a new value is entered in the HW.
If the message is shorter than 512 bits (or not modulo 512 bits) and we want to have the output (ie mbedtls_xxx_finish) we generate the digest computation by writing __HAL_HASH_START_DIGEST() (as done at https://github.com/adustm/mbed/blob/STM_sha256_F439ZI/features/mbedtls/targets/TARGET_STM/sha256_alt.c#L133).

In mbed-os/mbedtls, if the message size is not modulo 64bytes, the automatic hw padding will only occur in mbedtls_xxx_finish

Now back to my code, the HW is not able to handle less that 32bits input (4 bytes). I then need to buffer 1, 2 or 3 bytes of any input whose length is not modulo 4.
I hope this is clear enough and answers your question.

// now process every input as long as it is %4 bytes
size_t iter = currentlen / 4;
if (ctx->is224 == 0) {
HAL_HASHEx_SHA256_Accumulate(&ctx->hhash_sha256, (uint8_t *)(input + MBEDTLS_SHA256_BLOCK_SIZE - ctx->sbuf_len), (iter * 4));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in some cases, this will call HAL_HASHEx_SHA256_Accumulate() HAL_HASHEx_SHA224_Accumulate() with input size 0. I think this is ok, but perhaps not as efficient. Just a comment though, no change request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to check the return code.

Copy link
Member Author

@adustm adustm May 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok, but perhaps not as efficient. Just a comment though, no change request.

It is well handled by the Accumulate function and not as efficient, but it avoids another check on the size of (iter*4)

if (ctx->is224 == 0) {
HAL_HASHEx_SHA256_Accumulate(&ctx->hhash_sha256, ctx->sbuf, ctx->sbuf_len);
} else {
HAL_HASHEx_SHA224_Accumulate(&ctx->hhash_sha256, ctx->sbuf, ctx->sbuf_len);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to check these return code.

mbedtls_zeroize(ctx->sbuf, MBEDTLS_SHA256_BLOCK_SIZE);
ctx->sbuf_len = 0;
__HAL_HASH_START_DIGEST();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Could you please remove these trailing whitespaces?

if (ctx->is224 == 0)
HAL_HASHEx_SHA256_Finish(&ctx->hhash_sha256, output, 10);
else
HAL_HASHEx_SHA224_Finish(&ctx->hhash_sha256, output, 10);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to check these return codes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether this timeout value should be somehow configurable, perhaps through some #define or something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would #define TLS_ST_SHA256_TIMEOUT 10 in sha256_alt.h fit your request ?

extern "C" {
#endif

#define MBEDTLS_SHA256_BLOCK_SIZE ((size_t)(64)) // must be a multiple of 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, the mbed TLS library macros follow the syntax MBEDTLS_*. However, this one is slightly misleading because it is not really a library macro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresag01
ST code for HW acceleration is located in mbedtls feature directory of mbed-os. From mbed-os point of view, it is part of the mbedtls library.

HASH->CR |= HASH_ALGOSELECTION_SHA256 | HASH_CR_INIT;
} else {
HASH->CR |= HASH_ALGOSELECTION_SHA224 | HASH_CR_INIT;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is it necessary to fiddle with the state of the accelerator when nothing is really happening?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andresag01,
There is a SHA1 test that calls sha1 update for a buffer of size 0, then directly calls sha1_finish. In this specific use case, the HASH hw block was not aware that it was supposed to handle SHA256 of 224 or SHA1 or MD5 algorithm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation.

* limitations under the License.
*
*/
#include "mbedtls/sha256.h"
Copy link

@andresag01 andresag01 May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be necessary to put the whole code inside a:

#if defined(MBEDTLS_SHA256_C) 
...
#endif /* MBEDTLS_SHA256_C */

Any thoughts @gilles-peskine-arm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought and after discussing it with @yanesca, this additional guard should NOT be included in the code right now. However, I was trying to think ahead as it will be required when this mbed TLS issue is fixed.

@adustm
Copy link
Member Author

adustm commented Jun 6, 2017

hello,
This PR for STM32F756 is now aligned with the one for STM32F439. If @andresag01 and @gilles-peskine-arm are happy with it, it should be ready to be merged.

Kind regards
Armelle

TEST_ASSERT_EQUAL_UINT8_ARRAY(outsum2, test_sum2,32);
}
#endif //MBEDTLS_SHA256_C

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests. I suggest testing {{mbedtls_sha256_clone}} as well. You can follow the same steps as in md_text_multi in the mbed TLS library unit tests. Or you can clone {{ctx2}} to a new {{ctx3}} after the first {{update}} on {{ctx2}}, then update {{ctx3}} with a different last part from {{ctx2}}, and check that the result is correct for both 2 and 3.

By the way, you can use the interface in {{mbedtls/md.h}} with the {{mbedtls_md_XXX}} functions to write the function once and then be able to execute it for MD5, SHA-1 and SHA-256.

mbedtls_zeroize( ctx, sizeof( mbedtls_sha256_context ) );

/* Enable HASH clock */
__HAL_RCC_HASH_CLK_ENABLE();
Copy link

@gilles-peskine-arm gilles-peskine-arm Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder whether this is the right time to switch on the accelerator. What could switch it off? The safe place would be in restore_hw_context but that might be too costly if calling this is expensive even if the accelerator is already on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share your concerns, but I think that with the current interface/framework it can't really be handled correctly. I agree that this shouldn't hold up the merging and I am going to add a section to our mbed OS documentation to warn application developers that they might need to shut down the hardware accelerator after mbed TLS stopped running.

@adustm
Copy link
Member Author

adustm commented Jun 8, 2017

Hello, @adbridge or @0xc0170
I have added a new test (TESTS/mbedtls/multi/main.cpp), but I am not sure this can work for every platforms.
Could you please let me know if I have to remove it or anything else ?
Kind regards
Armelle

@adustm
Copy link
Member Author

adustm commented Jun 12, 2017

bump ?

@sg-
Copy link
Contributor

sg- commented Jun 15, 2017

Given all the approvals and that mbed TLS 2.5 is part of master can this PR be rebased on master?

@sg- sg- changed the base branch from mbed-os-workshop-17q2 to master June 15, 2017 15:00
@adustm adustm changed the base branch from master to mbed-os-workshop-17q2 June 16, 2017 07:20
@adustm adustm changed the base branch from mbed-os-workshop-17q2 to master June 16, 2017 08:39
@mbed-bot
Copy link

mbed-bot commented Jul 7, 2017

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 743

All builds and test passed!

@theotherjimmy
Copy link
Contributor

@adustm #3954 was closed, but is also need for this PR.

This PR needs #4159 + #3954 to be merged first

What do we need to do here?

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 13, 2017

@adustm #3954 was closed, but is also need for this PR.
This PR needs #4159 + #3954 to be merged first
What do we need to do here?

any update?

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 24, 2017

bump @adustm to advice?

@adustm
Copy link
Member Author

adustm commented Jul 25, 2017

Hello @0xc0170

I just rebased this PR on the ARM master
Basically, it will only activate the sha256 on the NUCLEO_F756ZG, using the same sha256_alt.c and .h files than the NUCLEO_F429ZI

@adustm
Copy link
Member Author

adustm commented Jul 25, 2017

I updated the comment at the top of the PR from

This PR needs #4159 + #3954 to be merged first

to

This PR needs #4159 + #4161 (formerly #3954) to be merged first

@theotherjimmy
Copy link
Contributor

/morph test

@studavekar
Copy link
Contributor

/morph test

@mbed-bot
Copy link

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 912

Example Build failed!

@studavekar
Copy link
Contributor

re-triggering

/morph test

@mbed-bot
Copy link

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 915

Example Build failed!

@theotherjimmy
Copy link
Contributor

@studavekar Could you retrigger this after the release is done?

@adustm
Copy link
Member Author

adustm commented Aug 1, 2017

Hello, rebase done.
Every preceeding PRs have been merged in the master.
Cheers

@studavekar
Copy link
Contributor

/morph test

@mbed-bot
Copy link

mbed-bot commented Aug 2, 2017

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 922

All builds and test passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet