Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance worse than sjcl lib #18

Closed
rubensayshi opened this issue Aug 3, 2015 · 37 comments
Closed

Performance worse than sjcl lib #18

rubensayshi opened this issue Aug 3, 2015 · 37 comments

Comments

@rubensayshi
Copy link

@dcousens asked for a ticket ;)

performance on various engines is a lot worse compared to the 'sjcl' package.
see benchmarking test repo: https://github.com/rubensayshi/pbkdf2-benchmark

the most significant one is when using the old UIWebView on IOS (currently still the default for cordova apps without using a 3rd party plugin to upgrade to WKWebView) where it's close to a 10x difference.
but for the future WKWebView will not become the default to use in cordova 4.0.0+.

for chrome (on all platforms) there's also a significant difference between the 2 libs.

for w/e reason firefox sjcl is actually slower ... but (my) firefox is scrap xD

@dcousens dcousens self-assigned this Aug 3, 2015
@dcousens
Copy link
Member

dcousens commented Aug 3, 2015

ping @feross, thoughts on whether this might be a Buffer issue before I dig into it deeper?

@feross
Copy link
Member

feross commented Aug 4, 2015

You can check if the buffer instances are based on Uint8Array with buf instanceof Uint8Array or if we detect typed array support Buffer.TYPED_ARRAY_SUPPORT as defined here. If those are false, then you're getting the fallback Object implementation.

Safari 7 and below has a bug in Object.prototype.constructor, so we're just using the Object implementation instead of adding a workaround, as described here: feross/buffer#63 Perhaps this is the issue?

@dcousens dcousens added the bug label Aug 13, 2015
@dcousens dcousens changed the title performance vs sjcl lib Performance worse than sjcl lib Aug 13, 2015
@dcousens
Copy link
Member

I'm willing to look into this soon.

@feross
Copy link
Member

feross commented Jan 12, 2016

Safari 5-7 get the Uint8Array implementation as of the latest buffer version! So you might not even need to fix this anymore.

@dcousens
Copy link
Member

@feross it actually seems related to the fact that createHmac is creating a new Buffer every time compared to SJCL which re-uses the underlying array.

I'm not sure if this is avoidable without breaking compliance in the create* modules?

@dcousens
Copy link
Member

@feross
Copy link
Member

feross commented Jan 13, 2016

Not sure what you're trying to do, but if you want to create a buffer without allocating a whole new Uint8Array, you can now do this in node.js and the browser:

var buf1 = new Buffer(4)
var buf2 = new Buffer(buf1.buffer)

buf1 and buf2 now share the same underlying ArrayBuffer instance.

@dcousens
Copy link
Member

@feross no no, my point is, look at sha1.js, we are creating a new Buffer each time we digest. There is no way to inject a .buffer in there without breaking compliance with node? No?

@feross
Copy link
Member

feross commented Jan 13, 2016

Since iojs v3 and node v4, Buffer is a subclass of Uint8Array. So .buffer should work just fine!

@dcousens
Copy link
Member

@feross I don't understand.
How is that relevant?

@dcousens
Copy link
Member

I'm talking about .digest()

@dcousens
Copy link
Member

In https://github.com/crypto-browserify/sha.js/blob/master/hash.js#L41,

Hash.prototype.digest = function (enc) {

Would have to become

Hash.prototype.digest = function (enc, target) {

Where target is an optional target Buffer.
This would not be compliant with node/io, AFAIK.

@feross
Copy link
Member

feross commented Jan 13, 2016

I'm probably misunderstanding something here, just ignore me. ¯\_(ツ)_/¯

@feross
Copy link
Member

feross commented Jan 13, 2016

Ah, I see your point. Yeah that sounds like it's an API modification.

@dcousens
Copy link
Member

@rubensayshi our only options at this point would be to replace the entire underlying implementation with a different HMAC stack. I'm not really keen on that.

How are the benchmarks holding up?

@fanatid
Copy link
Contributor

fanatid commented Feb 26, 2016

Results with improved sha.js (2.4.5), pbkdf2 (3.0.4)


Google Chrome Version 49.0.2623.63 beta (64-bit)

sjcl.misc.pbkdf2 462 = 92.4
build.js:24059 pbkdf2.pbkdf2Sync 2599 = 519.8

Firefox 44.0.2 (64-bit)

sjcl.misc.pbkdf2 383 = 76.6
pbkdf2.pbkdf2Sync 2988 = 597.6

node.js v5.6.0 (using browser version)

sjcl.misc.pbkdf2 415 = 83
pbkdf2.pbkdf2Sync 1893 = 378.6

@dcousens
Copy link
Member

Awesome.

@dcousens
Copy link
Member

ping @fanatid we could also improve speed by somehow caching the ipad/opad values and their resultant base hash.
They are currently re-calculated every time.

@dcousens
Copy link
Member

@fanatid thoughts?
Here is some mock up example code I did in C++ where I take advantage of this trick to basically double the speed (twice as fast as OpenSSL):

#include <cassert>
#include <cstdint>
#include <cstring>
#include <openssl/sha.h>
#include <openssl/evp.h>

#define SHA256_DIGEST_LENGTH 32

void pbkdf2_hmac_sha256 (const uint8_t* password, const size_t passwordLength, const uint8_t* salt, const size_t saltLength, const size_t iterations, uint8_t* key, size_t keyLength) {
    SHA256_CTX ipadCtx, opadCtx;

    // pre-calculate the HMAC blocks to avoid constant re-calculation
    {
        uint8_t ipad[SHA256_CBLOCK], opad[SHA256_CBLOCK];
        memset(ipad, 0x36, SHA256_CBLOCK);
        memset(opad, 0x5C, SHA256_CBLOCK);

        if (passwordLength > SHA256_CBLOCK) {
            uint8_t tmp[SHA256_DIGEST_LENGTH];

            SHA256_CTX ctx;
            SHA256_Init(&ctx);
            SHA256_Update(&ctx, password, passwordLength);
            SHA256_Final(tmp, &ctx);

            for (size_t i = 0; i < SHA256_DIGEST_LENGTH; i++) {
                ipad[i] ^= tmp[i];
                opad[i] ^= tmp[i];
            }
        } else {
            for (size_t i = 0; i < passwordLength; i++) {
                ipad[i] ^= password[i];
                opad[i] ^= password[i];
            }
        }

        SHA256_Init(&ipadCtx);
        SHA256_Init(&opadCtx);
        SHA256_Update(&ipadCtx, ipad, SHA256_CBLOCK);
        SHA256_Update(&opadCtx, opad, SHA256_CBLOCK);
    }

    // divide and round upwards
    auto l = keyLength / SHA256_DIGEST_LENGTH;
    if (keyLength % SHA256_DIGEST_LENGTH != 0) l += 1;

    const auto r = keyLength - (l - 1) * SHA256_DIGEST_LENGTH;

    uint8_t T[SHA256_DIGEST_LENGTH], U[SHA256_DIGEST_LENGTH];

    // TODO: REMOVE write32BE 8-bit shortcut
    uint8_t uint32BE[4] = {};

    for (size_t i = 1; i <= l; i++) {
        // TODO: REMOVE write32BE 8-bit shortcut
        assert(i < 256);
        uint32BE[3] = (uint8_t) i;

        SHA256_CTX ctx;

        memcpy(&ctx, &ipadCtx, sizeof(SHA256_CTX));
        SHA256_Update(&ctx, salt, saltLength);
        SHA256_Update(&ctx, &uint32BE, sizeof(uint32BE));
        SHA256_Final(U, &ctx);

        memcpy(&ctx, &opadCtx, sizeof(SHA256_CTX));
        SHA256_Update(&ctx, U, SHA256_DIGEST_LENGTH);
        SHA256_Final(U, &ctx);

        memcpy(T, U, SHA256_DIGEST_LENGTH);

        for (size_t j = 1; j < iterations; j++) {
            memcpy(&ctx, &ipadCtx, sizeof(SHA256_CTX));
            SHA256_Update(&ctx, U, SHA256_DIGEST_LENGTH);
            SHA256_Final(U, &ctx);

            memcpy(&ctx, &opadCtx, sizeof(SHA256_CTX));
            SHA256_Update(&ctx, U, SHA256_DIGEST_LENGTH);
            SHA256_Final(U, &ctx);

            for (size_t k = 0; k < SHA256_DIGEST_LENGTH; k++) T[k] ^= U[k];
        }

        const auto destPos = (i - 1) * SHA256_DIGEST_LENGTH;
        const auto len = (i == l ? r : SHA256_DIGEST_LENGTH);
        memcpy(key + destPos, T, len);
    }
}

#include <iomanip>
#include <iostream>

int main (int argc, char** argv) {
    const uint8_t k[] = "password_________________________________________________________________________________________________________________________";
    const uint8_t s[] = "salt bbbbbbbbbbbbbb";
    const int kl = sizeof(k) - 1;
    const int sl = sizeof(s) - 1;

    // ours
    {
        uint8_t buffer[10];
        pbkdf2_hmac_sha256(k, kl, s, sl, 2000, buffer, sizeof(buffer));

        for (size_t i = 0; i < sizeof(buffer); ++i) {
            std::cout << std::hex << std::setfill('0') << std::setw(2) << (unsigned) buffer[i] << ' ';
        }
        std::cout << std::endl;
    }

    // openssl
    {
        uint8_t buffer[10];
        PKCS5_PBKDF2_HMAC(
            (const char*) k, kl,
            s, sl,
            2000,
            EVP_sha256(),
            sizeof(buffer),
            buffer
        );

        for (size_t i = 0; i < sizeof(buffer); ++i) {
            std::cout << std::hex << std::setfill('0') << std::setw(2) << (unsigned) buffer[i] << ' ';
        }
        std::cout << std::endl;
    }

    return 0;
}

@fanatid
Copy link
Contributor

fanatid commented Mar 24, 2016

@dcousens yep, we can cache it and I sure that it gives significant performance improvement, but it will be not easy and could be easily broken by createHash

@dcousens
Copy link
Member

@fanatid granted, but this doesn't have to rely on create-hash if the API isn't suitable.

@calvinmetcalf
Copy link
Contributor

we could also look back into using subtle crypto in the places that support it

@dcousens
Copy link
Member

@calvinmetcalf is that faster?

@calvinmetcalf
Copy link
Contributor

it's native so yes it would be, you could test with native-crypto which (in the browser) uses that if it's available and then this if not

@dcousens dcousens removed their assignment Mar 19, 2017
@dcousens dcousens reopened this Mar 19, 2017
@dcousens
Copy link
Member

Is this still a concern?

@rubensayshi
Copy link
Author

not for me, using asmcrypto.js ;)

@dcousens
Copy link
Member

Can you post a bench? It might be worth dropping it in.

@calvinmetcalf
Copy link
Contributor

with #59 we should get a speedup

@dcousens
Copy link
Member

Thanks @calvinmetcalf , forgot about it.
Merged :)

Still be worth comparing though :)

@calvinmetcalf
Copy link
Contributor

oh yes I agree, I think I'll also publish all 3 of the updates we have in the wings

@dcousens
Copy link
Member

dcousens commented May 19, 2017

Tested this today.

sjcl is 1.6x faster.
asmcrypto.js is 4x faster.

If we omit the new changes, sjcl is about 5x faster...

Win!
But haven't won the war yet.
asmcrypto.js is going to be hard to beat...

@dcousens
Copy link
Member

dcousens commented May 19, 2017

I don't think we'd ever beat asmcrypto.js without breaking module isolation.

It inlines the ipad/opad directly... https://github.com/vibornoff/asmcrypto.js/blob/master/src/hash/sha256/sha256.asm.js#L745

@calvinmetcalf
Copy link
Contributor

are you testing async in a browser because I bet we're faster there

@dcousens
Copy link
Member

@calvinmetcalf because of webcrypto?

@calvinmetcalf
Copy link
Contributor

calvinmetcalf commented May 19, 2017 via email

@rubensayshi
Copy link
Author

oh wow seems I've missed the webcrypto support addition, nice!

@dcousens
Copy link
Member

Closing in favour of #75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants