auth/Crypto: refactor to use OpenSSL's EVP API #23260

branch-predictor · 2018-07-26T13:26:56Z

OpenSSL's EVP API encapsulates different encryption mechanisms and engines, including AES-NI, ARM NEON, VIA Padlock and possibly other hardware crypto accelerators. Considering that AES encryption optimized with AES-NI alone is around 6x faster than directly-callable implementation, there's no reason to not use it.

Even with the need to create (malloc) and free EVP contexts for each encryption, anything shows the performance improvements, for example unittests:
Before:

[==========] Running 8 tests from 1 test case.           
[----------] Global test environment set-up.             
[----------] 8 tests from AES                            
[ RUN      ] AES.ValidateLegacy                          
[       OK ] AES.ValidateLegacy (0 ms)                   
[ RUN      ] AES.ValidateSecret                          
[       OK ] AES.ValidateSecret (0 ms)                   
[ RUN      ] AES.Encrypt                                 
[       OK ] AES.Encrypt (0 ms)                          
[ RUN      ] AES.EncryptNoBl                             
[       OK ] AES.EncryptNoBl (0 ms)                      
[ RUN      ] AES.Decrypt                                 
[       OK ] AES.Decrypt (0 ms)                          
[ RUN      ] AES.DecryptNoBl                             
[       OK ] AES.DecryptNoBl (0 ms)                      
[ RUN      ] AES.Loop                                    
[       OK ] AES.Loop (94 ms)                            
[ RUN      ] AES.LoopKey                                 
100000 encoded in 0.156125                               
[       OK ] AES.LoopKey (156 ms)                        
[----------] 8 tests from AES (250 ms total)             
                                                         
[----------] Global test environment tear-down           
[==========] 8 tests from 1 test case ran. (250 ms total)
[  PASSED  ] 8 tests.

after:

[==========] Running 8 tests from 1 test case.           
[----------] Global test environment set-up.             
[----------] 8 tests from AES                            
[ RUN      ] AES.ValidateLegacy                          
[       OK ] AES.ValidateLegacy (0 ms)                   
[ RUN      ] AES.ValidateSecret                          
[       OK ] AES.ValidateSecret (0 ms)                   
[ RUN      ] AES.Encrypt                                 
[       OK ] AES.Encrypt (0 ms)                          
[ RUN      ] AES.EncryptNoBl                             
[       OK ] AES.EncryptNoBl (0 ms)                      
[ RUN      ] AES.Decrypt                                 
[       OK ] AES.Decrypt (1 ms)                          
[ RUN      ] AES.DecryptNoBl                             
[       OK ] AES.DecryptNoBl (0 ms)                      
[ RUN      ] AES.Loop                                    
[       OK ] AES.Loop (32 ms)                            
[ RUN      ] AES.LoopKey                                 
100000 encoded in 0.072338                               
[       OK ] AES.LoopKey (72 ms)                         
[----------] 8 tests from AES (105 ms total)             
                                                         
[----------] Global test environment tear-down           
[==========] 8 tests from 1 test case ran. (105 ms total)
[  PASSED  ] 8 tests.

After with AES_NI disabled (OPENSSL_ia32cap="~0x200000200000000" bin/unittest_crypto):

[==========] Running 8 tests from 1 test case.           
[----------] Global test environment set-up.             
[----------] 8 tests from AES                            
[ RUN      ] AES.ValidateLegacy                          
[       OK ] AES.ValidateLegacy (1 ms)                   
[ RUN      ] AES.ValidateSecret                          
[       OK ] AES.ValidateSecret (0 ms)                   
[ RUN      ] AES.Encrypt                                 
[       OK ] AES.Encrypt (0 ms)                          
[ RUN      ] AES.EncryptNoBl                             
[       OK ] AES.EncryptNoBl (0 ms)                      
[ RUN      ] AES.Decrypt                                 
[       OK ] AES.Decrypt (0 ms)                          
[ RUN      ] AES.DecryptNoBl                             
[       OK ] AES.DecryptNoBl (0 ms)                      
[ RUN      ] AES.Loop                                    
[       OK ] AES.Loop (50 ms)                            
[ RUN      ] AES.LoopKey                                 
100000 encoded in 0.099907                               
[       OK ] AES.LoopKey (100 ms)                        
[----------] 8 tests from AES (151 ms total)             
                                                         
[----------] Global test environment tear-down           
[==========] 8 tests from 1 test case ran. (151 ms total)
[  PASSED  ] 8 tests.

Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com

rzarzynski · 2018-07-27T18:07:37Z

Thanks for bringing this. Cephx still has potential for improvement and moving it forward is definitely the thing many people can benefit from. Just from the reviewer POV it's also fascinating puzzle considering number of contributing factors. Let's start. :-)

Considering that AES encryption optimized with AES-NI alone is around 6x faster than directly-callable implementation, there's no reason to not use it.

It's entirely valid that AES-NI can make huge difference. The question is about the relationship between the benefit and the price we need to pay for it. Not surprisingly "it depends" can be told about both factors.

Data size. As Cephx is critical and the most prominent client of the CryptoKey I'm aware of, I focus entirely on it. If anybody knows another one, please don't hesitate and inform/start optimizing.

At the time of the NSS - OpenSSL transition it was just 32 bytes (including PKCS#7 padding). To bring yet another piece to the puzzle, CVE-2018-1129 & co bumped it up to 48. Unfortunately, our unittests are using 256+16 (AES.Loop) and 128+16 (AES.LoopKey). I tried to address that in PR #23291 by introducing more test cases.

Although I exhibit strong bias towards in-vivo testing, let me start from those new in-vitro tools.

Before

# bin/unittest_crypto --gtest_filter=AES.LoopCephx\*
...
Note: Google Test filter = AES.LoopCephx*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from AES
[ RUN      ] AES.LoopCephx
[       OK ] AES.LoopCephx (426 ms)
[ RUN      ] AES.LoopCephxV2
[       OK ] AES.LoopCephxV2 (603 ms)
[----------] 2 tests from AES (1029 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (1029 ms total)
[  PASSED  ] 2 tests.

After

with AES-NI

# bin/unittest_crypto --gtest_filter=AES.LoopCephx\*
...
Note: Google Test filter = AES.LoopCephx*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from AES
[ RUN      ] AES.LoopCephx
[       OK ] AES.LoopCephx (472 ms)
[ RUN      ] AES.LoopCephxV2
[       OK ] AES.LoopCephxV2 (508 ms)
[----------] 2 tests from AES (980 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (980 ms total)
[  PASSED  ] 2 tests.

without AES-NI

# OPENSSL_ia32cap="~0x200000200000000" bin/unittest_crypto --gtest_filter=AES.LoopCephx\*
...
Note: Google Test filter = AES.LoopCephx*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from AES
[ RUN      ] AES.LoopCephx
[       OK ] AES.LoopCephx (632 ms)
[ RUN      ] AES.LoopCephxV2
[       OK ] AES.LoopCephxV2 (704 ms)
[----------] 2 tests from AES (1336 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (1336 ms total)
[  PASSED  ] 2 tests.

The results appear, well, inconclusive. What wonders me most is degree of freedom we're omitting that way. This includes both machine, system (I wouldn't be surprised if the version of tcmalloc and its env. variables are involved ;-) and workload-specific items. For instance, the loop { malloc(); free(); } pattern we have in tests exhibit nice chances for optimization at the memory allocator's side. The in-vivo profiling is not easy as AES_cbc_encrypt is highly optimized, Perl-crafted assembly. I will keep digging.

I very like the unification between slice and bl-taking variants.

rzarzynski · 2018-07-27T18:21:22Z

src/auth/Crypto.cc

-    memcpy(iv, CEPH_AES_IV, AES_BLOCK_LEN);
-
-    // we aren't using EVP because of performance concerns. Profiling
-    // shows the cost is quite high. Endianness might be an issue.


This might be not valid anymore as Cephx's signature size has grown. More investigation and testing needed. WIP.

BTW: I can recall that ISAL Crypto was also considered during the transition. It doesn't have required certifications but maybe offering it as an optional implementation (with configurables and abstractions) would be possible. The technical pros are: it's already in the tree, offers very good performance and doesn't need malloc. Just thinking loudly.

branch-predictor · 2018-07-27T22:02:47Z

@rzarzynski As stated in commit comment and PR description, EVP API provides encapsulated access to specialised hardware (again - be it AES-NI, Padlock, or whatever world came up with). You can verify it with openssl speed:

[branch@predictor ~]$ openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 14378122 aes-128 cbc's in 2.97s
Doing aes-128 cbc for 3s on 64 size blocks: 3973446 aes-128 cbc's in 2.97s
Doing aes-128 cbc for 3s on 256 size blocks: 1025526 aes-128 cbc's in 2.98s
Doing aes-128 cbc for 3s on 1024 size blocks: 261927 aes-128 cbc's in 2.98s
Doing aes-128 cbc for 3s on 8192 size blocks: 32869 aes-128 cbc's in 2.97s
OpenSSL 1.0.1e-fips 11 Feb 2013
built on: Wed Mar 22 21:37:40 UTC 2017
options:bn(64,32) md2(int) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wa,--noexecstack -DPURIFY -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      77457.90k    85623.08k    88098.88k    90004.45k    90660.89k

[branch@predictor ~]$ openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 89705436 aes-128-cbc's in 2.96s
Doing aes-128-cbc for 3s on 64 size blocks: 21338855 aes-128-cbc's in 2.97s
Doing aes-128-cbc for 3s on 256 size blocks: 5967082 aes-128-cbc's in 2.97s
Doing aes-128-cbc for 3s on 1024 size blocks: 1540793 aes-128-cbc's in 2.97s
Doing aes-128-cbc for 3s on 8192 size blocks: 193836 aes-128-cbc's in 2.96s
OpenSSL 1.0.1e-fips 11 Feb 2013
built on: Wed Mar 22 21:37:40 UTC 2017
options:bn(64,32) md2(int) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wa,--noexecstack -DPURIFY -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     484894.25k   459827.18k   514334.34k   531236.37k   536454.23k

EVP (AES-NI) implementation was over 6x faster on 16-byte blocks -- and this is on my low-end VPS that happens to have AES-NI enabled. I've seen similar results on the bare metal Xeon boxes.

By using AES_cbc_encrypt directly, you're forcing usage of "highly optimized, Perl-crafted assembly" version that can't make use of any crypto accelerators available, including AES-NI. Depending on package maintainer, AES_cbc_encrypt may even not support SSE2, degrading performance even further.

branch-predictor · 2018-07-30T08:33:17Z

I ran your tests on some more reasonable machine, and here are the results:

EVP:

$ bin/unittest_crypto --gtest_filter=AES.LoopCephx\*                          
2018-07-30 08:08:30.309 7f0987d83900 -1 WARNING: all dangerous and experimental features are enabled.
2018-07-30 08:08:30.309 7f0987d83900 -1 WARNING: all dangerous and experimental features are enabled.
2018-07-30 08:08:30.352 7f0987d83900 -1 WARNING: all dangerous and experimental features are enabled.
Note: Google Test filter = AES.LoopCephx*                                                            
[==========] Running 2 tests from 1 test case.                                                       
[----------] Global test environment set-up.                                                         
[----------] 2 tests from AES                                                                        
[ RUN      ] AES.LoopCephx                                                                           
[       OK ] AES.LoopCephx (238 ms)                                                                  
[ RUN      ] AES.LoopCephxV2                                                                         
[       OK ] AES.LoopCephxV2 (231 ms)                                                                
[----------] 2 tests from AES (469 ms total)                                                         
                                                                                                     
[----------] Global test environment tear-down                                                       
[==========] 2 tests from 1 test case ran. (469 ms total)                                            
[  PASSED  ] 2 tests.

EVP (AES-NI disabled):

$ OPENSSL_ia32cap="~0x200000200000000" bin/unittest_crypto --gtest_filter=AES.LoopCephx\* 
2018-07-30 08:10:30.573 7f4f23452900 -1 WARNING: all dangerous and experimental features are enabled.            
2018-07-30 08:10:30.574 7f4f23452900 -1 WARNING: all dangerous and experimental features are enabled.            
2018-07-30 08:10:30.615 7f4f23452900 -1 WARNING: all dangerous and experimental features are enabled.            
Note: Google Test filter = AES.LoopCephx*                                                                        
[==========] Running 2 tests from 1 test case.                                                                   
[----------] Global test environment set-up.                                                                     
[----------] 2 tests from AES                                                                                    
[ RUN      ] AES.LoopCephx                                                                                       
[       OK ] AES.LoopCephx (352 ms)                                                                              
[ RUN      ] AES.LoopCephxV2                                                                                     
[       OK ] AES.LoopCephxV2 (376 ms)                                                                            
[----------] 2 tests from AES (728 ms total)                                                                     
                                                                                                                 
[----------] Global test environment tear-down                                                                   
[==========] 2 tests from 1 test case ran. (728 ms total)                                                        
[  PASSED  ] 2 tests.

OLD CODE

$ bin/unittest_crypto --gtest_filter=AES.LoopCephx\*                          
2018-07-30 08:19:00.916 7f6a0c839900 -1 WARNING: all dangerous and experimental features are enabled.
2018-07-30 08:19:00.916 7f6a0c839900 -1 WARNING: all dangerous and experimental features are enabled.
2018-07-30 08:19:00.959 7f6a0c839900 -1 WARNING: all dangerous and experimental features are enabled.
Note: Google Test filter = AES.LoopCephx*                                                            
[==========] Running 2 tests from 1 test case.                                                       
[----------] Global test environment set-up.                                                         
[----------] 2 tests from AES                                                                        
[ RUN      ] AES.LoopCephx                                                                           
[       OK ] AES.LoopCephx (314 ms)                                                                  
[ RUN      ] AES.LoopCephxV2                                                                         
[       OK ] AES.LoopCephxV2 (398 ms)                                                                
[----------] 2 tests from AES (712 ms total)                                                         
                                                                                                     
[----------] Global test environment tear-down                                                       
[==========] 2 tests from 1 test case ran. (712 ms total)                                            
[  PASSED  ] 2 tests.

That's more than conclusive for me.

rzarzynski · 2018-07-31T04:09:28Z

EVP (AES-NI) implementation was over 6x faster on 16-byte blocks -- and this is on my low-end VPS that happens to have AES-NI enabled. I've seen similar results on the bare metal Xeon boxes.

Piotr, I have no doubts that employing an AES-NI-augmented implementation can provide huge benefit. Whether it actually does that depends on several factors. To ensure we're on the same page, let me use an real-world analogy: if you need to move somewhere close, it can be faster to go on foot instead of waiting for a cab. I worry the numbers you provided from openssl speed benchmark tell only about how fast the cab travels after you got into it. They say nothing about the waiting time.

                                EVP_CIPHER_CTX_init(&ctx);
                                if(decrypt)
                                        EVP_DecryptInit_ex(&ctx,evp_cipher,NULL,key16,iv);
                                else
                                        EVP_EncryptInit_ex(&ctx,evp_cipher,NULL,key16,iv);
                                EVP_CIPHER_CTX_set_padding(&ctx, 0);

                                Time_F(START);
                                if(decrypt)
                                        for (count=0,run=1; COND(save_count*4*lengths[0]/lengths[j]); count++)
                                                EVP_DecryptUpdate(&ctx,buf,&outl,buf,lengths[j]);
                                else
                                        for (count=0,run=1; COND(save_count*4*lengths[0]/lengths[j]); count++)
                                                EVP_EncryptUpdate(&ctx,buf,&outl,buf,lengths[j]);
                                if(decrypt)
                                        EVP_DecryptFinal_ex(&ctx,buf,&outl);
                                else
                                        EVP_EncryptFinal_ex(&ctx,buf,&outl);
                                d=Time_F(STOP);
                                EVP_CIPHER_CTX_cleanup(&ctx);
                                }

More detailed gist is available as well. As you notice, both creation and deletion of EVP context are performed once and outside the time measurement. When I see such API, I expect a lot of quirkiness coming from life cycle management as it's often supposed to be infrequent. This stays in opposition to the use case we have in Ceph.

In the version of OpenSSL I have, EVP uses internally yet another Perl-crafted ;-) assembler piece named aesni_cbc_encrypt. With some terrible hackery, it should be possible to use it directly and derive the costs of context management.

# bin/unittest_crypto --gtest_filter=AES.LoopCephx\*
...
Note: Google Test filter = AES.LoopCephx*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from AES
[ RUN      ] AES.LoopCephx
[       OK ] AES.LoopCephx (79 ms)
[ RUN      ] AES.LoopCephxV2
[       OK ] AES.LoopCephxV2 (117 ms)
[----------] 2 tests from AES (196 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (196 ms total)
[  PASSED  ] 2 tests.

On my environment the cost is huge and definitely dominates the workload. At the moment I'm working to figure out the reason. There is a lot of possibilities: memory allocator performance, OpenSSL version, configured engines etc.

However, what bugs me is that the cab waiting time may depend on street traffic and the number of passengers calling taxi company the same time (e.g. malloc usage patterns). The unittest_crypto in that matter looks like a city of single passenger and single cab. Would it be possible to provide measurements from a live cluster running e.g. RBD workload?

Completely BTW: AFAIK there is an effort on SeaStar-based messenger. Hopefully it'll resolve the issue with context's thread safety. I bet your commit, enriched by context reusage, would be undoubtedly a big win. @tchaikov: do you know what's the current status?

UPDATE: I the discussion I focus on AES-NI. Other acceleration techniques with similar setup costs would fit as well. Bulky crypto accelerators/far placed coprocessors are IMHO out of question for 48 bytes encryption.

branch-predictor · 2018-07-31T06:51:18Z

On 18-07-31 06:10 AM, Radoslaw Zarzynski wrote: EVP (AES-NI) implementation was over 6x faster on 16-byte blocks -- and this is on my low-end VPS that happens to have AES-NI enabled. I've seen similar results on the bare metal Xeon boxes. Piotr, I have no doubts that employing an AES-NI-augmented implementation *can* provide huge benefit. Whether it actually does that depends on several factors. To ensure we're on the same page, let me use an real-world analogy: *if you need to move somewhere close, it can be faster to go on foot instead of waiting for a cab*. I worry the numbers you provided from |openssl speed| benchmark tell only about /how fast the cab travels *after you got into it*/. They say nothing about /the waiting time/. [..] > More detailed gist is available as well <https://gist.github.com/rzarzynski/988106ae3b1985c64f50f05e5e9b3a58>. As you notice, both creation and deletion of EVP context are performed once and *outside* the time measurement. When I see such API, I expect a lot of quirkiness coming from life cycle management as it's often supposed to be infrequent. This stays in opposition to the use case we have in Ceph.

That's a Ceph design decision problem that needs to be addressed anyway. I already envisioned a few solutions to this "problem", but I don't want to stack them together with this PR.

In the version of OpenSSL I have, EVP uses internally yet another Perl-crafted ;-) assembler piece named |aesni_cbc_encrypt|. With some terrible hackery, it should be possible to use it directly <https://github.com/rzarzynski/ceph/commits/wip-23260> and derive the costs of context management.

Although it's unlikely, if the function definition changes, we're in trouble. Also, it consumes just AES-NI version, ignoring Padlock and others. Less terrible hackery would involve keeping an EVP structure on stack, just like it was possible with pre-1.1.x OpenSSL, but that would be a security risk because now we don't know if and how EVP internals change. But anyway, let's keep away from any kind of hackery, especially in security-related code.

|# bin/unittest_crypto --gtest_filter=AES.LoopCephx\* ... Note: Google Test filter = AES.LoopCephx* [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from AES [ RUN ] AES.LoopCephx [ OK ] AES.LoopCephx (79 ms) [ RUN ] AES.LoopCephxV2 [ OK ] AES.LoopCephxV2 (117 ms) [----------] 2 tests from AES (196 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (196 ms total) [ PASSED ] 2 tests. | On my environment the cost is huge and definitely dominates the workload. At the moment I'm working to figure the reason.

Both context creation and destruction involves a call to memset() to sanitize EVP structure, that's a good security practice. Depending on libc version and processor, memset may cause you to run short on memory bandwidth. Also, here: https://github.com/ovh/ceph/blob/595537e1820703955184cd7cff58a5b4e6eeff5a/src/auth/Crypto.cc#L226 I have a code to copy IV from constant into variable which is in turn copied to EVP context. I removed this code and saw no performance difference... I believe compiler here is smart enough to fuse array construction with memset, so eventually there's just one memcpy involved and not two. I'll get rid of that, not every compiler may be that smart.

However, what bugs me is that the /cab waiting time/ may depend on street traffic and the number of passengers calling taxi company the same time (e.g. |malloc|). The |unittest_crypto| in that matter looks like a city of single passenger and single cab. Would it be possible to provide measurements from a live cluster running e.g. RBD workload?

That was my initial intention, but it turns out it's impossible to build master on Xenial in a way that doesn't require upgrading a bunch of packages on *installation* machine to ppa/testing versions and I can't do this on my testing, bare-metal cluster. The packages provided on http://download.ceph.com don't have this limitation because package builders have a right mix of dependencies - @tchaikov was working on that, I don't remember the PR number.

Completely BTW: AFAIK there is an effort on SeaStar-based messenger. Hopefully it'll resolve the issue with context's thread safety. I bet your commit, enriched by context reusage, would be undoubtedly a big win. @tchaikov <https://github.com/tchaikov>: do you know what's the current status?

As far as I'm aware, msgr2 is supposed to provide secure communication. "Big win" would be there in particular.

…

-- Piotr Dałek piotr.dalek@corp.ovh.com https://www.ovhcloud.com

tchaikov · 2018-08-08T11:08:30Z

@tchaikov: do you know what's the current status?

@rzarzynski sorry, i missed your comments. see https://github.com/ceph/ceph/tree/master/src/crimson/net . and i think it''d be a good chance for reusing the contexts without worrying the racing issue we'd be suffering.

rzarzynski · 2018-09-14T14:29:16Z

That was my initial intention, but it turns out it's impossible to build master on Xenial in a way that doesn't require upgrading a bunch of packages on installation machine to ppa/testing versions and I can't do this on my testing, bare-metal cluster. The packages provided on http://download.ceph.com don't have this limitation because package builders have a right mix of dependencies - @tchaikov was working on that, I don't remember the PR number.

@branch-predictor: is there any progress in the matter of in-OSD performance testing? Are the blockers gone now?

The results would be really helpful. Our benchmarks - in contrast to OSD - don't link with libtcmalloc. As the costs of EVP context setup are high, the difference in memory allocator might be an important factor.

branch-predictor · 2018-10-18T07:11:43Z

Right now I'm working on new test/benchmark hardware setup based on a bunch of current-gen CPUs, NVMes and plenty of powerful enough client hosts to flood the cluster. Hang on.

stale · 2018-12-17T16:57:35Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

OpenSSL's EVP API encapsulates different encryption mechanisms and engines, including AES-NI, ARM NEON, VIA Padlock and possibly other hardware crypto accelerators. Considering that AES optimized with AES-NI alone is around 6x faster than directly-callable implementation, there's no reason to not use it. Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

branch-predictor · 2019-03-04T08:43:14Z

Closing as with Messenger2 and new signing scheme this lost its relevance.

kvanals · 2020-12-07T12:55:58Z

All, I'd like to bring this one back up -- re-basing this PR to the latest stable release in conjunction with PR#32675, I was able to successfully get Ceph working reliably in a FIPS-validated environment without any special workarounds. As far as I can tell, both PR#23260 and PR#32675 are being treated solely as performance-enhancing, but they also resolve an issue with low-level crypto functions being forbidden in FIPS mode.

Thoughts?

edevil · 2021-06-22T21:29:43Z

I would also be interested in this PR for FIPS purposes. Is there a reason for not including it?

tchaikov requested a review from rzarzynski July 26, 2018 13:32

tchaikov added common performance labels Jul 26, 2018

rzarzynski reviewed Jul 27, 2018

View reviewed changes

branch-predictor force-pushed the bp-openssl-evp branch from 595537e to eb608d9 Compare August 3, 2018 06:04

stale bot added the stale label Dec 17, 2018

branch-predictor force-pushed the bp-openssl-evp branch from eb608d9 to ec05dce Compare January 17, 2019 10:07

stale bot removed the stale label Jan 17, 2019

branch-predictor force-pushed the bp-openssl-evp branch from ec05dce to 376a647 Compare February 1, 2019 13:24

branch-predictor force-pushed the bp-openssl-evp branch from 376a647 to 377be65 Compare February 6, 2019 07:27

branch-predictor closed this Mar 4, 2019

branch-predictor deleted the bp-openssl-evp branch May 27, 2019 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auth/Crypto: refactor to use OpenSSL's EVP API #23260

auth/Crypto: refactor to use OpenSSL's EVP API #23260

branch-predictor commented Jul 26, 2018

rzarzynski commented Jul 27, 2018

rzarzynski Jul 27, 2018 •

edited

branch-predictor commented Jul 27, 2018

branch-predictor commented Jul 30, 2018

rzarzynski commented Jul 31, 2018 •

edited

branch-predictor commented Jul 31, 2018 via email

tchaikov commented Aug 8, 2018

rzarzynski commented Sep 14, 2018

branch-predictor commented Oct 18, 2018

stale bot commented Dec 17, 2018

branch-predictor commented Mar 4, 2019

kvanals commented Dec 7, 2020

edevil commented Jun 22, 2021

auth/Crypto: refactor to use OpenSSL's EVP API #23260

auth/Crypto: refactor to use OpenSSL's EVP API #23260

Conversation

branch-predictor commented Jul 26, 2018

rzarzynski commented Jul 27, 2018

Before

After

with AES-NI

without AES-NI

rzarzynski Jul 27, 2018 • edited

Choose a reason for hiding this comment

branch-predictor commented Jul 27, 2018

branch-predictor commented Jul 30, 2018

EVP:

EVP (AES-NI disabled):

OLD CODE

rzarzynski commented Jul 31, 2018 • edited

branch-predictor commented Jul 31, 2018 via email

tchaikov commented Aug 8, 2018

rzarzynski commented Sep 14, 2018

branch-predictor commented Oct 18, 2018

stale bot commented Dec 17, 2018

branch-predictor commented Mar 4, 2019

kvanals commented Dec 7, 2020

edevil commented Jun 22, 2021

rzarzynski Jul 27, 2018 •

edited

rzarzynski commented Jul 31, 2018 •

edited