Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data at rest encryption #1575

Closed
acelyc111 opened this issue Aug 1, 2023 · 6 comments
Closed

Data at rest encryption #1575

acelyc111 opened this issue Aug 1, 2023 · 6 comments
Labels
type/enhancement Indicates new feature requests

Comments

@acelyc111
Copy link
Member

acelyc111 commented Aug 1, 2023

Motivation

There are some Pegasus users that store privacy data in Pegasus, it’s important to protect the data against unauthorized access by persons who gain access to the storage media used by Pegasus.

It's possible to support transparent data at rest encryption to provide a way to protect users’ data, which is transparent to users and straightforward to set up for operators.

Data at rest encryption refers to encrypting data for storage and decrypting it when reading the stored data. It uses symmetric encryption where the same key is used to encrypt and to decrypt the data. Keys need to be stored and handled securely as anyone with access to a key will be able to decrypt any data encrypted with it.

Cloud disk encryption

If your Pegasus clusters are deployed on public cloud service storages, it’s possible to use their own encryption solutions. See:

It’s not needed to enable Pegasus Data at rest encryption to avoid encrypting/decrypting data twice, which may lead to poor performance.

Goals

  • Data at rest encryption of all user data (key-values) on a fresh Pegasus cluster.
  • Pluggable key management to enable interfacing with existing key management systems, such as Hadoop KMS.
  • Cluster key architecture (see Key management).
  • User data in logs will be redacted.

Non-Goals

  • Enabling (or disabling) data at rest encryption on an existing cluster
    TODO: It's possible to implement this, after all the data been full compacted, the data could transfer to plaintext/ciphertext.
  • Multiple tenants, or table granularity encryption.
    TODO: It's possible to implement this after the cluster granularity encryption been implemented.
  • Selective encryption (certain tables are encrypted, others are not)
    TODO: Same to the above.
  • Encrypt data of shell-tools output.
  • Transport layer encryption.
    TODO: Use TLS libs.
  • Core dump encryption.
  • pegasus-spark
    pegasus-spark only supports to read plaintext data from source, the generated data is in plaintext as well, it doesn't break the security. When load the generated plaintext data into Pegasus, the data will be encrypted if the encrypt_data_at_rest feature is enabled.

Cryptography overview

Symmetric-key algorithm

Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of cipher-text. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric-key encryption, in comparison to public-key encryption (also known as asymmetric-key encryption).

AES

Advanced Encryption Standard, is a block cipher with a block size of 128 bits, but three different key lengths: 128, 192 and 256 bits. AES supersedes the Data Encryption Standard (DES), the algorithm described by AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data.

Block cipher

A block cipher is a deterministic algorithm that operates on fixed-length groups of bits, called blocks. Block ciphers are the elementary building blocks of many cryptographic protocols. They are ubiquitous in the storage and exchange of data, where such data is secured and authenticated via encryption.

A block cipher uses blocks as an unvarying transformation. Even a secure block cipher is suitable for the encryption of only a single block of data at a time, using a fixed key. A multitude of modes of operation have been designed to allow their repeated use in a secure way to achieve the security goals of confidentiality and authenticity. However, block ciphers may also feature as building blocks in other cryptographic protocols, such as universal hash functions and pseudorandom number generators.

ROT13

ROT13 ("rotate by 13 places") is a simple letter substitution cipher that replaces a letter with the 13th letter after it in the latin alphabet.

Because there are 26 letters (2×13) in the basic Latin alphabet, ROT13 is its own inverse; that is, to undo ROT13, the same algorithm is applied, so the same action can be used for encoding and decoding. The algorithm provides virtually no cryptographic security, and is often cited as a canonical example of weak encryption.

facebook/rocksdb uses ROT13 as an encryption sample.

block cipher mode of operation

In cryptography, a block cipher mode of operation is an algorithm that uses a block cipher to provide information security such as confidentiality or authenticity. A block cipher by itself is only suitable for the secure cryptographic transformation (encryption or decryption) of one fixed-length group of bits called a block. A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block.

Most modes require a unique binary sequence, often called an initialization vector (IV), for each encryption operation. The IV has to be non-repeating and, for some modes, random as well. The initialization vector is used to ensure distinct ciphertexts are produced even when the same plaintext is encrypted multiple times independently with the same key. Block ciphers may be capable of operating on more than one block size, but during transformation the block size is always fixed. Block cipher modes operate on whole blocks and require that the last part of the data be padded to a full block if it is smaller than the current block size. There are, however, modes that do not require padding because they effectively use a block cipher as a stream cipher.

IV,Initialization Vector

In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation.

CTR, Counter mode

Counter mode turns a block cipher into a stream cipher. It generates the next keystream block by encrypting successive values of a "counter". The counter can be any function which produces a sequence which is guaranteed not to repeat for a long time, although an actual increment-by-one counter is the simplest and most popular. The usage of a simple deterministic input function used to be controversial; critics argued that "deliberately exposing a cryptosystem to a known systematic input represents an unnecessary risk". However, today CTR mode is widely accepted, and any problems are considered a weakness of the underlying block cipher, which is expected to be secure regardless of systemic bias in its input. Along with CBC, CTR mode is one of two block cipher modes recommended by Niels Ferguson and Bruce Schneier.

OpenSSL

OpenSSL contains an open-source implementation of the SSL and TLS protocols. The core library, written in the C programming language, implements basic cryptographic functions and provides various utility functions. Wrappers allowing the use of the OpenSSL library in a variety of computer languages are available.

OpenSSL supports a number of different cryptographic algorithms, including AES mentioned above.

Design

Key management

Most of the design and implementation is inspired by Apache Kudu and TiKV, see Kudu data at rest encryption and TiKV encryption, thanks to the two projects!

For Pegasus , overview of the design:

  • Each disk file uses an independent File Key (FK) to encrypt data.
  • FK is generated locally.
  • FK is encrypted (as Encrypted FK, EFK) and store in the newly added file header of the file it used to encrypt/decrypt.
  • Each disk file has a fixed length file header to store encryption information (including EFK).
  • FK is encrypted by using the independent Server Key (SK) of each server as EFK.
  • SK is encrypted by using the shared Cluster Key (CK) among the servers in a Pegasus cluster as ESK.
  • Adds a new instance file on each server to store ESK.
  • ESK is encrypted by using the Cluster Key (CK), and stored in the instance file.
  • The plaintext SK is generated/obtained from the remote KMS by RESTful API:
    • GET <kms_url>/v1/key/<cluster_key_name>/_eek?eek_op=generate&num_keys=1
  • When the server bootstrap, the local ESK is decrypted by using the remote KMS, then the plaintext SK is stored in memory. Decrypt the ESK by KMS RESTful API:
    • POST <kms_url>/v1/keyversion/<key_version>/_eek?eek_op=decrypt
      with payload:
      • cluster_key_name
      • iv
      • ESK
  • SK is used inrocksdb::EncryptedEnv to encrypt and encrypt FK.

New Configurations

  • encrypt_data_at_rest

    bool(false), Whether sensitive files should be encrypted on the file system.

  • encryption_key_length

    int(128), Encryption key length. Can be 128, 192 or 256.

  • encryption_key_provider

    string("default"), Key provider implementation to generate and decrypt server keys. Valid values are: 'default' (not for production usage), and 'hadoop-kms'.

  • hadoop_kms_url

    string(""), Comma-separated list of Hadoop KMS server URLs. Must be set when 'encryption_key_provider' is set to 'hadoop-kms'.

  • encryption_cluster_key_name

    string("kudu_cluster_key"), Name of the cluster key that is used to encrypt server encryption keys as stored in Hadoop KMS.

  • redact_logs

    bool(false), Whether sensitive data (e.g. keys, values, table names) in logs should be redacted.

Implementation overview

RocksDB

Encryption file header

Encrypted Env has a fixed length of header, we can define it as 4096 (one page size).
The first of 64 bytes are used to store encryption information, including:

char magic[7];         // "encrypt"
uint8_t algorithm[1];  // Encryption algorithm, e.g. AES128/192/256CTR
char file_key[32];     // 32 bytes length of EFK
// char file_key[24];  // reserved

Encryption data

facebook/rocksdb uses ROT13 to encrypt data, it’s just a sample and can not be used in a product environment, we will use AES encryption algorithms.

tikv/rocksdb and Kudu have implemented AES encryption algorithms by using OpenSSL, we will use OpenSSL library as well.

Git repository

Because we are planning to add AES encryption on RocksDB, I guess it would a long journey to merge the modify code into the upstream facebook/rocksdb repository, so I suggest to maintenance Pegasus owned git repository (i.e. https://github.com/pegasus-kv/rocksdb), we can commit the patches to the upstream when the feature is fully tested and stable.

Now Pegasus uses official RocksDB 6.6.4, it’s a chance to upgrade the third-party library to the latest stable version (8.3.2 when write the doc).

Pegasus

Git repository

I'm planning to develop the functionality on the master branch of apache/incubator-pegasus after the 2.5 branch has been created.

Modules updates

native_linux_aio_provider

In fact the native_linux_aio_provider module doesn't use AIO since Pegasus 2.2.0, instead it uses pwrite and pread .

RocksDB uses pwrite and pread too, it's possible to replace the underlying implementation of filesystem of Pegasus by rocksdb::Env .

rocksdb::Env has a plenty of file operation features, includes mmap, direct io, prefetch, preallocate, encryption at rest, and so on, they are public APIs of RocksDB library, and we believe in the stability of RocksDB.

So we will introduce rocksdb::Env to Pegasus as the underlying implementation of filesystem layer.

plog

plog uses native_linux_aio_provider, if native_linux_aio_provider has implemented data at rest encryption, plog has this feature logically.

nfs

The nfs module is used to transfer files (e.g. rocksdb SST files) between replica servers. The files are encrypted if data at rest encryption is enabled, and different replica servers have different SK, so the nfs server side should support to decrypt data when uploading (by using the soure SK), the nfs client side should support to encrypt data when downloading (by using the target SK).

The nfs module uses native_linux_aio_provider too, so it's convenient to support encryption for nfs module.

block service

The block server module is used to backup and restore data, it supports 3 type of targets, including local filesystem, Xiaomi FDS and Apache HDFS. We should also provide the encryption ability of block service to ensure the data security. However, the corresponding SK is needed to be backed up and restored along with the data, the backup SK will be used to decrypt data when downloading in restore stage, and the data will be encrypted again by using the replica server's own SK when writing in restore stage.

logs

User key-values printed in logs should be redacted.

others

Some other modules which read/write files are possible to use rocksdb::Env to refactor as well, e.g. the replica_app_info module.

Roadmap

Prepare the rocksdb repository

Commits are merged to https://github.com/pegasus-kv/rocksdb/tree/v8.3.2-pegasus-encrypt firstly.

Cherry-pick encryption related commits from TiKV

Commits are cherryp-icked from branch https://github.com/tikv/rocksdb/commits/6.29.tikv

Remove the key manager

Implement the self-served file key managment

Update rocksdb to 8.5.3

Other fixes of pegasus-kv/rocksdb

Pegasus use rocksdb::EncryptedEnv when data at rest encryption enabled

Refactor Pegasus to use rocksdb::Env to access other disk files

@acelyc111 acelyc111 added the type/enhancement Indicates new feature requests label Aug 1, 2023
@GiantKing
Copy link
Contributor

  1. The config unit for encrypt_data_at_rest is cluster? Why not table?
  2. In bulkload process, we generate the underlying data file by spark. So we need to ensure the data parser/generator of RocksDB is running well.

@acelyc111
Copy link
Member Author

Hi @GiantKing , thanks for your reply!

  1. In the first step, I just want to implement cluster granularity encryption. It would be easy to extend to table granularity encryption.
  2. Sorry I didn't get your key point, does the design break some rules?

@kirbyzhou
Copy link

kirbyzhou commented Aug 1, 2023

Missing the authentication credentials required to connect KMS.

@kirbyzhou
Copy link

kirbyzhou commented Aug 1, 2023

Good Question
It depends on how RocksDb is distributed among multiple replication server.

One possible solution is that:
Spark encrypts FK of each rocksdb with a unified BK, then import them into pegasus with BK.
Pegasus use BK to decrypt FK then re-encrypt FK with its own SK and write into the header of rocksdb.

  • In bulkload process, we generate the underlying data file by spark. So we need to ensure the data parser/generator of RocksDB is running well.

@acelyc111
Copy link
Member Author

  • pegasus-spark only supports to read plaintext data from source, the generated data is in plaintext as well, it doesn't break the security. When load the generated plaintext data into Pegasus, the data will be encrypted if the encrypt_data_at_rest feature is enabled.

I added this as a non-goal.

pegasus-spark only supports to read plaintext data from source, the generated data is in plaintext as well, it doesn't break the security. When load the generated plaintext data into Pegasus, the data will be encrypted if the encrypt_data_at_rest feature is enabled.

empiredan pushed a commit to pegasus-kv/rocksdb that referenced this issue Aug 7, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@113b363

Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but
provides an `KeyManager` API to enable key management per file. Also
implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
empiredan pushed a commit to pegasus-kv/rocksdb that referenced this issue Aug 7, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@3d44a33

Summary:
Instead of using openssl's raw `AES_encrypt` and `AES_decrypt` API,
which is a low level call to encrypt or decrypt exact one block (16
bytes), we change to use the `EVP_*` API. The former is deprecated, and
will use the default C implementation without AES-NI support. Also the
EVP API is capable of handing CTR mode on its own.

Test Plan:
will add tests

Signed-off-by: Yi Wu <yiwu@pingcap.com>

---------

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Co-authored-by: yiwu-arbug <yiwu@pingcap.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 7, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@2360562

Summary:
Fix NewRandomRWFile and ReuseWritableFile misuse of `GetFile()` and
`NewFile()`. See inline comments.

Test Plan:
manual test with tikv

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Co-authored-by: yiwu-arbug <yiwu@pingcap.com>
empiredan pushed a commit to pegasus-kv/rocksdb that referenced this issue Aug 8, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@93e89a5

fix bug: tikv/tikv#9115

Summary: we need to update encryption metadata via
encryption::DataKeyManager, which cannot combine with the actual file
operation into one atomic operation. In RenameFile, when the src_file
has been removed, power is off, then we may lost the file info of
src_file next restart.

Signed-off-by: Xintao [hunterlxt@live.com](mailto:hunterlxt@live.com)

Signed-off-by: Xintao <hunterlxt@live.com>
Co-authored-by: Xintao <hunterlxt@live.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 8, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@bbd27cf

used LinkFile instead of RenameFile api of key manager. But LinkFile
needs check the dst file information, in RenameFile logic, we don't care
about that. So just skip encryption for current file.

Signed-off-by: Xintao [hunterlxt@live.com](mailto:hunterlxt@live.com)
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 8, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@1868d12

Signed-off-by: Xintao <hunterlxt@live.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 8, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@4cebfc1

* Add SM4-CTR encryption algorithm
* Adjust block size for sm4 encryption
* Add UT for SM4 encryption
* Adjust macros indentation for sm4
* Fix format for adding sm4

Signed-off-by: Jarvis Zheng <jiayang@hust.edu.cn>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 8, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@9464766

In some env, user installed openssl by yum install, and the openssl
software may compiled with OPENSSL_NO_SM4 flag, so although the version
is >= 1.1.1, but we still could not use sm4 in that situation.

Signed-off-by: Jarvis Zheng <jiayang@hust.edu.cn>
empiredan pushed a commit to pegasus-kv/rocksdb that referenced this issue Aug 8, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@acc624f

* hook delete dir in encrypted env
* add a comment

Signed-off-by: tabokie <xy.tao@outlook.com>
Co-authored-by: Xinye Tao <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 9, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@14f36f8
(without compaction related code)

* fix renaming encrypted directory

Signed-off-by: tabokie <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 16, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@113b363

Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but
provides an `KeyManager` API to enable key management per file. Also
implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 16, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@3d44a33

Summary:
Instead of using openssl's raw `AES_encrypt` and `AES_decrypt` API,
which is a low level call to encrypt or decrypt exact one block (16
bytes), we change to use the `EVP_*` API. The former is deprecated, and
will use the default C implementation without AES-NI support. Also the
EVP API is capable of handing CTR mode on its own.

Test Plan:
will add tests

Signed-off-by: Yi Wu <yiwu@pingcap.com>

---------

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Co-authored-by: yiwu-arbug <yiwu@pingcap.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 16, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@2360562

Summary:
Fix NewRandomRWFile and ReuseWritableFile misuse of `GetFile()` and
`NewFile()`. See inline comments.

Test Plan:
manual test with tikv

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Co-authored-by: yiwu-arbug <yiwu@pingcap.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Aug 16, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@93e89a5

fix bug: tikv/tikv#9115

Summary: we need to update encryption metadata via
encryption::DataKeyManager, which cannot combine with the actual file
operation into one atomic operation. In RenameFile, when the src_file
has been removed, power is off, then we may lost the file info of
src_file next restart.

Signed-off-by: Xintao [hunterlxt@live.com](mailto:hunterlxt@live.com)

Signed-off-by: Xintao <hunterlxt@live.com>
Co-authored-by: Xintao <hunterlxt@live.com>
@acelyc111
Copy link
Member Author

Another pull request to facebook/rocksdb, facebook/rocksdb#7020, but it seems not updated near 3 years.

acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

After all encryption related patches been cherry-picked from
[tikv](https://github.com/tikv/rocksdb/commits/6.29.tikv) and
merged, now we will improve the encrytion, including:
- Fix action job `build-linux-encrypted_env-no_compression-no_openssl`
  to build binaries without openssl and compression libs correctly.
- Fix action job `build-linux-encrypted_env-openssl` to export the
  `ENCRYPTED_ENV` enviroment variable correctly.
- Don not skip tests which are skipped by TiKV.
- Refactor `AESCTRCipherStream` and `AESEncryptionProvider` to support
  manage file key by the file itself, according to the design docs in
[Data at rest
encryption](apache/incubator-pegasus#1575).
- Remove all KeyManager related codes.
- Replace KeyManager tests by AES encryption tests.
- Refactor encryption/encryption_test.cc and add more tests.
- Make it possible to construct AESEncryptionProvider object via
  `EncryptionProvider::CreateFromString()` by registering a
  factory in "encryption" library.
  It's possible to construct an object by URI: `AES`, `AES://test` or
  `AES:<instance_key>,<EncryptionMethod>`.
- `ldb` tool support to parse `--fs_uri` flags as the URI mentioned
above.
- Add tests to create AESEncryptionProvider object in
  `CreateEncryptedEnvTest.CreateEncryptedFileSystem`
- `db_bench` support to run benchmark with encryption enabled, by adding
new flags for `db_bench`, they are `encryption_method` and
`encryption_instance_key`.
- Move code from the exported header directory (i.e.
include/rocksdb/encryption.h)
to rocksdb internal (i.e. encryption/encryption.h), do not expose them
to users.
- Code format.

Review hint: #17 shows all the
code changes
from the base branch (i.e. `pegasus-kv:v8.3.2-pegasus`), you can review
it together to
make sure the request branch `acelyc111:pk_enc_new` doesn't have vice
effect on the base.

Manual test:
```
// Generate some data.
./db_bench --encryption_method=AES128CTR --encryption_instance_key=test_instance_key  --num=10000

// Dump WAL OK
./tools/ldb --fs_uri="provider=AES; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES://test; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES:test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log

// Dump WAL failed. Pass bad provider parameters to --fs_uri, e.g.
./tools/ldb --fs_uri="provider=AES1:test_instance_key,1AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES:bad_test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES:test_instance_key,AES192CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log

// The same to other ldb tools.

```
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@113b363

Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but
provides an `KeyManager` API to enable key management per file. Also
implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@3d44a33

Summary:
Instead of using openssl's raw `AES_encrypt` and `AES_decrypt` API,
which is a low level call to encrypt or decrypt exact one block (16
bytes), we change to use the `EVP_*` API. The former is deprecated, and
will use the default C implementation without AES-NI support. Also the
EVP API is capable of handing CTR mode on its own.

Test Plan:
will add tests

Signed-off-by: Yi Wu <yiwu@pingcap.com>

---------

Signed-off-by: Yi Wu <yiwu@pingcap.com>
Co-authored-by: yiwu-arbug <yiwu@pingcap.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@2360562

Summary:
Fix NewRandomRWFile and ReuseWritableFile misuse of `GetFile()` and
`NewFile()`. See inline comments.

Test Plan:
manual test with tikv

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Co-authored-by: yiwu-arbug <yiwu@pingcap.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@93e89a5

fix bug: tikv/tikv#9115

Summary: we need to update encryption metadata via
encryption::DataKeyManager, which cannot combine with the actual file
operation into one atomic operation. In RenameFile, when the src_file
has been removed, power is off, then we may lost the file info of
src_file next restart.

Signed-off-by: Xintao [hunterlxt@live.com](mailto:hunterlxt@live.com)

Signed-off-by: Xintao <hunterlxt@live.com>
Co-authored-by: Xintao <hunterlxt@live.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@bbd27cf

used LinkFile instead of RenameFile api of key manager. But LinkFile
needs check the dst file information, in RenameFile logic, we don't care
about that. So just skip encryption for current file.

Signed-off-by: Xintao [hunterlxt@live.com](mailto:hunterlxt@live.com)
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@1868d12

Signed-off-by: Xintao <hunterlxt@live.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@4cebfc1

* Add SM4-CTR encryption algorithm
* Adjust block size for sm4 encryption
* Add UT for SM4 encryption
* Adjust macros indentation for sm4
* Fix format for adding sm4

Signed-off-by: Jarvis Zheng <jiayang@hust.edu.cn>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@9464766

In some env, user installed openssl by yum install, and the openssl
software may compiled with OPENSSL_NO_SM4 flag, so although the version
is >= 1.1.1, but we still could not use sm4 in that situation.

Signed-off-by: Jarvis Zheng <jiayang@hust.edu.cn>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@acc624f

* hook delete dir in encrypted env
* add a comment

Signed-off-by: tabokie <xy.tao@outlook.com>
Co-authored-by: Xinye Tao <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

Cherry-pick from
tikv@14f36f8
(without compaction related code)

* fix renaming encrypted directory

Signed-off-by: tabokie <xy.tao@outlook.com>
acelyc111 added a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

After all encryption related patches been cherry-picked from
[tikv](https://github.com/tikv/rocksdb/commits/6.29.tikv) and
merged, now we will improve the encrytion, including:
- Fix action job `build-linux-encrypted_env-no_compression-no_openssl`
  to build binaries without openssl and compression libs correctly.
- Fix action job `build-linux-encrypted_env-openssl` to export the
  `ENCRYPTED_ENV` enviroment variable correctly.
- Don not skip tests which are skipped by TiKV.
- Refactor `AESCTRCipherStream` and `AESEncryptionProvider` to support
  manage file key by the file itself, according to the design docs in
[Data at rest
encryption](apache/incubator-pegasus#1575).
- Remove all KeyManager related codes.
- Replace KeyManager tests by AES encryption tests.
- Refactor encryption/encryption_test.cc and add more tests.
- Make it possible to construct AESEncryptionProvider object via
  `EncryptionProvider::CreateFromString()` by registering a
  factory in "encryption" library.
  It's possible to construct an object by URI: `AES`, `AES://test` or
  `AES:<instance_key>,<EncryptionMethod>`.
- `ldb` tool support to parse `--fs_uri` flags as the URI mentioned
above.
- Add tests to create AESEncryptionProvider object in
  `CreateEncryptedEnvTest.CreateEncryptedFileSystem`
- `db_bench` support to run benchmark with encryption enabled, by adding
new flags for `db_bench`, they are `encryption_method` and
`encryption_instance_key`.
- Move code from the exported header directory (i.e.
include/rocksdb/encryption.h)
to rocksdb internal (i.e. encryption/encryption.h), do not expose them
to users.
- Code format.

Review hint: #17 shows all the
code changes
from the base branch (i.e. `pegasus-kv:v8.3.2-pegasus`), you can review
it together to
make sure the request branch `acelyc111:pk_enc_new` doesn't have vice
effect on the base.

Manual test:
```
// Generate some data.
./db_bench --encryption_method=AES128CTR --encryption_instance_key=test_instance_key  --num=10000

// Dump WAL OK
./tools/ldb --fs_uri="provider=AES; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES://test; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES:test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log

// Dump WAL failed. Pass bad provider parameters to --fs_uri, e.g.
./tools/ldb --fs_uri="provider=AES1:test_instance_key,1AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES:bad_test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log
./tools/ldb --fs_uri="provider=AES:test_instance_key,AES192CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log

// The same to other ldb tools.

```
empiredan pushed a commit to pegasus-kv/rocksdb that referenced this issue Sep 15, 2023
apache/incubator-pegasus#1575

1. Update the status badge to pegasus-kv/rocksdb's own site.
2. Also aim to check whether all tests could pass after cherry-picking
encryption related
patches to 8.5.3 branch.
acelyc111 added a commit that referenced this issue Sep 18, 2023
…cksdb (#1610)

#1575

This patch changes Pegasus to use the `v8.5.3-pegasus-encrypt` branch of
https://github.com/pegasus-kv/rocksdb.git repository.
The `v8.5.3-pegasus-encrypt` branch is based on the official `v8.5.3` tag of
facebook/rocksdb repository but adds the encryption feature which is implemented
by the Pegasus team.

There is nothing changed if not enable the encryption feature.
acelyc111 added a commit that referenced this issue Sep 19, 2023
#1575

Set option -DWITH_OPENSSL=ON to build rocksdb with encryption feature enabled.
empiredan pushed a commit that referenced this issue Sep 19, 2023
…rsion (#1614)

#1575

Fix a build error on lower OpenSSL version, the error looks like:
```
2023-09-19T02:53:45.4093185Z #11 924.7 /root/incubator-pegasus/thirdparty/build/Source/rocksdb/encryption/encryption.cc: In function 'const EVP_CIPHER* rocksdb::encryption::GetEVPCipher(rocksdb::encryption::EncryptionMethod)':
2023-09-19T02:53:45.4094191Z #11 924.7 /root/incubator-pegasus/thirdparty/build/Source/rocksdb/encryption/encryption.cc:112:44: error: cannot convert 'rocksdb::Status' to 'const EVP_CIPHER* {aka const evp_cipher_st*}' in return
2023-09-19T02:53:45.4094713Z #11 924.7            std::string(OPENSSL_VERSION_TEXT));
2023-09-19T02:53:45.4094991Z #11 924.7                                             ^
2023-09-19T02:53:45.5599505Z #11 924.7 gmake[5]: *** [CMakeFiles/rocksdb.dir/encryption/encryption.cc.o] Error 1
2023-09-19T02:53:45.5599938Z #11 924.7 gmake[4]: *** [CMakeFiles/rocksdb.dir/all] Error 2
2023-09-19T02:53:45.5600266Z #11 924.7 gmake[4]: *** Waiting for unfinished jobs....
```
acelyc111 added a commit that referenced this issue Sep 20, 2023
#1575

This patch introduces `PegasusEnv()` to obtain the `Env` instance used by RocksDB. Then
it's possible to obtain an encrypted Env instance by `PegasusEnv(FileDataType::kSensitive)`,
the encrypted Env is used for operating on sensitive files, the writing data to the file
will be encrypted and the reading data from the file will be decrypted.

Some file operate functions and related unit tests are added as well.
empiredan pushed a commit that referenced this issue Oct 16, 2023
#1575

This is a dependent work to implement encryption at rest, we can use the capacity of
rocksdb encryption after this patch.

- Use rocksdb APIs to implement class `native_linux_aio_provider`. Both of the
  implementations are using `pread()` and `pwrite()` system calls, so there isn't
  significant performance changes, see the newly added simple benchmark performance
  comparation below.
- Separate the file read and write operations for class `aio_provider`
acelyc111 pushed a commit that referenced this issue Oct 18, 2023
#1575

User key-values will be redacted if encryption enabled.
acelyc111 added a commit that referenced this issue Oct 23, 2023
#1575

- Mark all files as sensitive, thus all files will be encrypted when `encrypt_data_at_rest`
   is enabled
- Eanble both true and false for config `encrypt_data_at_rest` is related tests
- The FDS module has not implemented encryption feature yet, do not enable
   `encrypt_data_at_rest` if you are using FDS
- Some small refacors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

3 participants