Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43286][SQL] Updates aes_encrypt CBC mode to generate random IVs #40969

Closed
wants to merge 2 commits into from

Conversation

sweisdb
Copy link
Contributor

@sweisdb sweisdb commented Apr 26, 2023

What changes were proposed in this pull request?

The current implementation of AES-CBC mode called via aes_encrypt and aes_decrypt uses a key derivation function (KDF) based on OpenSSL's EVP_BytesToKey. This is intended for generating keys based on passwords and OpenSSL's documents discourage its use: "Newer applications should use a more modern algorithm".

aes_encrypt and aes_decrypt should use the key directly in CBC mode, as it does for both GCM and ECB mode. The output should then be the initialization vector (IV) prepended to the ciphertext – as is done with GCM mode:
[16-byte randomly generated IV | AES-CBC encrypted ciphertext]

Why are the changes needed?

We want to have the ciphertext output similar across different modes. OpenSSL's EVP_BytesToKey is effectively deprecated and their own documentation says not to use it. Instead, CBC mode will generate a random vector.

Does this PR introduce any user-facing change?

AES-CBC output generated by the previous format will be incompatible with this change. That change was recently landed and we want to land this before CBC mode is used in practice.

How was this patch tested?

A new unit test in DataFrameFunctionsSuite was added to test both GCM and CBC modes. Also, a new standalone unit test suite was added in ExpressionImplUtilsSuite to test all the modes and various key lengths.

build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
build/sbt "sql/test:testOnly org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite"

CBC values can be verified with openssl enc using the following command:

echo -n "[INPUT]" | openssl enc -a -e -aes-256-cbc -iv [HEX IV] -K [HEX KEY]
echo -n "Spark" | openssl enc -a -e -aes-256-cbc -iv f8c832cc9c61bac6151960a58e4edf86 -K 6162636465666768696a6b6c6d6e6f7031323334353637384142434445464748

@github-actions github-actions bot added the SQL label Apr 26, 2023
@sweisdb sweisdb force-pushed the SPARK-43286 branch 2 times, most recently from d224b9b to 9075e57 Compare April 26, 2023 23:07
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the examples:

> SELECT base64(_FUNC_('Apache Spark', '1234567890abcdef', 'CBC', 'DEFAULT'));
U2FsdGVkX18JQ84pfRUwonUrFzpWQ46vKu4+MkJVFGM=

> SELECT _FUNC_(unbase64('U2FsdGVkX18JQ84pfRUwonUrFzpWQ46vKu4+MkJVFGM='), '1234567890abcdef', 'CBC');
Apache Spark

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sweisdb Could you remove the error class AES_SALTED_MAGIC from error-classes.json, and related function aesInvalidSalt() + the test "INVALID_PARAMETER_VALUE.AES_SALTED_MAGIC: AES decrypt failure - invalid salt"

@github-actions github-actions bot added the CORE label May 5, 2023
@sweisdb
Copy link
Contributor Author

sweisdb commented May 5, 2023

@sweisdb Could you remove the error class AES_SALTED_MAGIC from error-classes.json, and related function aesInvalidSalt() + the test "INVALID_PARAMETER_VALUE.AES_SALTED_MAGIC: AES decrypt failure - invalid salt"

Updated to address this.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

@MaxGekk
Copy link
Member

MaxGekk commented May 6, 2023

@sweisdb Could you re-trigger GitHub actions, please. Highly likely the failure of the test org.apache.spark.storage.BlockManagerBasicStrategyReplicationSuite is not related to your changes but just in case let's re-run all tests.

@sweisdb
Copy link
Contributor Author

sweisdb commented May 8, 2023

@MaxGekk Sure, that worked and it's all green now.

@MaxGekk
Copy link
Member

MaxGekk commented May 9, 2023

+1, LGTM. Merging to master.
Thank you, @sweisdb.

@MaxGekk MaxGekk closed this in 2643223 May 9, 2023
@MaxGekk
Copy link
Member

MaxGekk commented May 9, 2023

@sweisdb Congratulations with your first contribution to Apache Spark!

LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
### What changes were proposed in this pull request?

The current implementation of AES-CBC mode called via `aes_encrypt` and `aes_decrypt` uses a key derivation function (KDF) based on OpenSSL's [EVP_BytesToKey](https://www.openssl.org/docs/man3.0/man3/EVP_BytesToKey.html). This is intended for generating keys based on passwords and OpenSSL's documents discourage its use: "Newer applications should use a more modern algorithm".

`aes_encrypt` and `aes_decrypt` should use the key directly in CBC mode, as it does for both GCM and ECB mode. The output should then be the initialization vector (IV) prepended to the ciphertext – as is done with GCM mode:
`[16-byte randomly generated IV | AES-CBC encrypted ciphertext]`

### Why are the changes needed?

We want to have the ciphertext output similar across different modes. OpenSSL's EVP_BytesToKey is effectively deprecated and their own documentation says not to use it. Instead, CBC mode will generate a random vector.

### Does this PR introduce _any_ user-facing change?

AES-CBC output generated by the previous format will be incompatible with this change. That change was recently landed and we want to land this before CBC mode is used in practice.

### How was this patch tested?

A new unit test in `DataFrameFunctionsSuite` was added to test both GCM and CBC modes. Also, a new standalone unit test suite was added in `ExpressionImplUtilsSuite` to test all the modes and various key lengths.
```
build/sbt "sql/test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
build/sbt "sql/test:testOnly org.apache.spark.sql.catalyst.expressions.ExpressionImplUtilsSuite"
```

CBC values can be verified with `openssl enc` using the following command:
```
echo -n "[INPUT]" | openssl enc -a -e -aes-256-cbc -iv [HEX IV] -K [HEX KEY]
echo -n "Spark" | openssl enc -a -e -aes-256-cbc -iv f8c832cc9c61bac6151960a58e4edf86 -K 6162636465666768696a6b6c6d6e6f7031323334353637384142434445464748
```

Closes apache#40969 from sweisdb/SPARK-43286.

Authored-by: Steve Weis <steve.weis@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants