New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-3594 Encrypted provenance repository implementation #1686
Conversation
…eration. Added BC dependency to nifi-persistent-provenance-repository module.
…der w/ 2 impls, Encryptor skeleton, and exceptions/utilities). Reorganized tests to proper path.
… Pausing to re-evaluate because work may need to be done at lower level (EventWriter/EventReader -- byte/Object serialization).
…t intercepting SchemaRecordReader/Writer serialization (no updates to schema necessary).
…ity-utils to include in nifi-data-provenance-utils.
Resolved RAT and checkstyle issues. All tests pass.
…nhanced error checking and guard controls).
…ption metadata serialization.
…ved cached ciphers to allow non-repeating IVs). Added unit tests.
Added validity checks for algorithm and version in AESProvenanceEventEncryptor. Added unit tests.
Refactored encryptor composition. Added unit tests.
…nce repository. Added utility methods for validation. Added unit tests.
…cryption. Added nifi.provenance.repository.encryption.key to default sensitive keys and updated unit tests and test resources. Added method to correctly calculate protected percentage of sensitive keys (unpopulated keys are no longer counted against protection %).
Moved getBestEventIdentifier() from StandardProvenanceEventRecord to ProvenanceEventRecord interface and added delegate in all implementations to avoid ClassCastException from multiple classloaders. Initialized IV before cipher to suppress unnecessary warnings. Added utility method to read encrypted provenance keys from key provider file. Suppressed logging of event record details in LuceneEventIndex. Added logic to create EncryptedSchemaRecordReader (if supported) in RecordReaders. Cleaned up EncryptedSchemaRecordReader and EncryptedSchemaRecordWriter. Added keyProvider, recordReaderFactory, and recordWriterFactory initialization to EncryptedWriteAheadProvenanceRepository to provide complete interceptor implementation. Added logic to RepositoryConfiguration to load encryption-related properties if necessary. Refactored WriteAheadProvenanceRepository to allow subclass implementation. Registered EncryptedWAPR in ProvenanceRepository implementations. Added unit tests for EWAPR. Added new nifi.properties keys for encrypted provenance repository.
} | ||
|
||
// Skip the first byte (SENTINEL) and don't need to copy all the serialized record | ||
byte[] metadataBytes = Arrays.copyOfRange(encryptedRecord, 1, encryptedRecord.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than copying the byte[] here and then wrapping in a ByteArrayInputStream, would recommend we instead just wrap the original byte[] in a ByteArrayInputStream, then call in.read() to discard the first byte -- avoids a good bit of garbage creation/collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
* @throws IOException this should never be thrown | ||
*/ | ||
public static byte[] concatByteArrays(byte[]... arrays) throws IOException { | ||
ByteArrayOutputStream boas = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're doing quite a lot of byte copying here, as the BAOS is potentially resizing a lot... then we call toByteArray() to copy it again. Would recommend calculating size of the arrays and then just creating an array of that size and copying the bytes directly. Alternatively, you could calculate the size of the arrays and pass that as an argument to BAOS so that it doesn't generate so much garbage - but this still has to copy the byte[] when baos.toByteArray() is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious about this, so I added an implementation as you described above and compared them. The BAOS is definitely faster for small byte[], but slower for large ones. Interesting.
public static byte[] concatByteArrays(byte[]... arrays) throws IOException {
int totalByteLength = 0;
for (byte[] bytes : arrays) {
totalByteLength += bytes.length;
}
byte[] totalBytes = new byte[totalByteLength];
int currentLength = 0;
for (byte[] bytes : arrays) {
System.arraycopy(bytes, 0, totalBytes, currentLength, bytes.length);
currentLength += bytes.length;
}
return totalBytes;
}
public static byte[] concatByteArraysWithBAOS(byte[]... arrays) throws IOException {
ByteArrayOutputStream boas = new ByteArrayOutputStream();
for (byte[] arr : arrays) {
boas.write(arr);
}
return boas.toByteArray();
}
Calculating small/small -- 3 arrays with avg length 11
Ran 100 of small/small (traditional) with a total wall time of 28720012 ns and average run of 195354 ns
Ran 100 of small/small (BAOS) with a total wall time of 6745072 ns and average run of 12300 ns
Calculating small/large -- 2 arrays with avg length 567
Ran 100 of small/large (traditional) with a total wall time of 2712642 ns and average run of 5807 ns
Ran 100 of small/large (BAOS) with a total wall time of 3774826 ns and average run of 11078 ns
Calculating large/small -- 145 arrays with avg length 8
Ran 100 of large/small (traditional) with a total wall time of 5173622 ns and average run of 27214 ns
Ran 100 of large/small (BAOS) with a total wall time of 5120322 ns and average run of 27663 ns
Calculating large/large -- 182 arrays with avg length 534
Ran 100 of large/large (traditional) with a total wall time of 11537912 ns and average run of 83769 ns
Ran 100 of large/large (BAOS) with a total wall time of 65845017 ns and average run of 612915 ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with 100 iterations, you're likely getting a lot of 'jvm warmup time' into your calculations... i'd go for at least 10,000 iterations as a 'warm up' that aren't even counted. Then probably 100,000 - 1 MM iterations to test actual performance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looks like with the higher iteration count the array copy is faster.
Ran 1000 iterations in 34702232 ns to warm up the JVM
Calculating small/small -- 2 arrays with avg length 5
Ran 1000000 of small/small (traditional) with a total wall time of 1339514066 ns and average run of 131 ns
Ran 1000000 of small/small (BAOS) with a total wall time of 1008624897 ns and average run of 83 ns
Calculating small/large -- 5 arrays with avg length 484
Ran 1000000 of small/large (traditional) with a total wall time of 1699096744 ns and average run of 805 ns
Ran 1000000 of small/large (BAOS) with a total wall time of 2978547636 ns and average run of 1975 ns
Calculating large/small -- 173 arrays with avg length 8
Ran 1000000 of large/small (traditional) with a total wall time of 2692231144 ns and average run of 1521 ns
Ran 1000000 of large/small (BAOS) with a total wall time of 3514575357 ns and average run of 2582 ns
Calculating large/large -- 147 arrays with avg length 579
Ran 1000000 of large/large (traditional) with a total wall time of 16443696576 ns and average run of 15418 ns
Ran 1000000 of large/large (BAOS) with a total wall time of 61356984740 ns and average run of 59737 ns
public static boolean keyIsValid(String encryptionKeyHex) { | ||
return isHexString(encryptionKeyHex) | ||
&& (isUnlimitedStrengthCryptoAvailable() | ||
? Arrays.asList(32, 48, 64).contains(encryptionKeyHex.length()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called a lot - would recommend removing the Arrays.asList(32, 48, 64) and just creating a private static final List or Set, or since there are only 3 possible values, checking them individually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
} | ||
} | ||
|
||
// TODO: Copied from EventIdFirstSchemaRecordReader to force local/overridden readRecord() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this TODO intended to remain here? Not sure what is actually to be done...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a note to myself/any reviewer about why I copied the entire method body from the other reader implementation. If I just referred to the other instance (i.e. comment out this entire method declaration), the operation will fail (see EncryptedSchemaRecordReaderWriterTest#testSkipToEvent()
) even though the code is identical. If there is a better suggestion for how to fix this, I am all ears.
keyId == KEY_ID | ||
}] as KeyProvider | ||
provenanceEventEncryptor.initialize(mockKeyProvider) | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Thanks.
|
||
private static final AtomicLong recordId = new AtomicLong() | ||
|
||
// @Rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a handful of places here where lines of code are commented out. Should probably remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the unnecessary ones. I think the ones that have the note about NIFI-3605 can stay because they have the correct code commented out to be switched back when that bug is patched.
@alopresto one thought regarding the StaticKeyProvider... rather than having a .key= perhaps it would make sense to instead use a .key.<key_id>= property scheme. This would allow multiple Key ID's to be used, which would resolve the 'Potential Issue' listed above, of now allowing the key to rotate, with rather minimal changes to the code?? |
@markap14 That's not a bad suggestion. I had envisioned the |
…ation of arrays. Added unit test demonstrating performance improvement.
…ency with Mark Payne's feedback. Cleaned up commented code.
Refactored StaticKeyProvider and FileBasedKeyProvider to reduce duplicate code. Added helper methods in NiFiProperties to read multiple key definitions for StaticKeyProvider. Fixed undetected NPE in tests (storing null value into properties). Added unit tests.
I had to make some substantial changes to enable multiple keys for
This should generate a key map of |
…rayCopy comparison.
…to trigger Maven running Groovy unit tests.
@alopresto happy to do some functional testing of this as well. |
@alopresto @YolandaMDavis just a quick update on my review... I think the code looks good and am a +1 on that front. I had no problem getting this up & running and things seem to work well. I just wanted to verify some corner cases like running out of disk space, kill -9 against NiFI, etc. and ensure that NiFi is still able to restart, as this is an issue that some have hit in the past. Otherwise all looks good to me. Thanks! |
@alopresto Ran a test where I had an existing unencrypted provenance repo and switched to the encrypted provider. The issues you've documented did occur (where switching between configuration would lead to errors in the log and when querying). However I'm wondering two things:
If clearing the provenance repo when changing configurations would resolve the issue can that step be included in the documentation (if needed I would recommend including that step in the issues section)? |
@YolandaMDavis Thanks Yolanda. During most of my testing, I did clear the repositories between application starts when switching the implementation (
I will re-evaluate those scenarios as soon as I finish reviewing PR 1712 for Bryan. I do think suppressing the stacktrace and providing a more descriptive error is a good idea and will tackle that as well. If it is determined there is a simple reason your queries were not working, great. If not, adding documentation instructing repository removal may be necessary. |
Yolanda, I can reproduce your described issue. I believe the difference is that when I tested and did not clear the existing provenance repository, I was switching between |
…en switching to encrypted implementation (see PR 1686 @ apache#1686 (comment)).
…o Admin Guide and User Guide. Added screenshot of encrypted provenance repository contents on disk. Added note about clearing existing provenance repository when switching to encrypted implementation (see PR 1686 @ #1686 (comment)). This closes #1713. Signed-off-by: Andy LoPresto <alopresto@apache.org>
The "improve switching UX" ticket is NIFI-3766. I propose merging this as is given the new documentation providing a work-around. |
@alopresto moving to a separate ticket makes sense. I have one more test to run (run out of disk space per @markap14 suggestion) and will provide any feedback shortly. |
@alopresto ran tests with the following corner cases: Also validated that switch between EWAPR and PPR (and vice versa) along with removal of content, flowfile and provenance repos would prevent classcastexception seen previously. +1 will merge shortly |
@alopresto actually Andy just noticed there's a conflict. If you could resolve, squash and push I'll be happy to continue with merging into master. |
I rebased and resolved the conflict. There were test errors in Merged. |
This is a big PR, and there is some helpful information before delving into the code.
What is it?
The
EncryptedWriteAheadProvenanceRepository
is a new implementation of the provenance repository which encrypts all event record information before it is written to the repository. This allows for storage on systems where OS-level access controls are not sufficient to protect the data while still allowing querying and access to the data through the NiFi UI/API.How does it work?
The code will provide more details, and I plan to write extensive documentation for the Admin Guide and User Guide NIFI-3721, but this will suffice for an overview.
The
WriteAheadProvenanceRepository
was introduced by @markap14 in NIFI-3356 and provided a refactored and much faster provenance repository implementation than the previousPersistentProvenanceRepository
. The encrypted version wraps that implementation with a record writer and reader which encrypt and decrypt the serialized bytes respectively.The fully qualified class
org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository
is specified as the provenance repository implementation innifi.properties
as the value ofnifi.provenance.repository.implementation
. In addition, new properties must be populated to allow successful initialization.The simplest configuration is below:
nifi.provenance.repository.debug.frequency
is a new configuration option to control the rate at which debug messages regarding performance statistics are printed to the logs (in DEBUG mode)nifi.provenance.repository.encryption.key.provider.implementation
is the Key Provider implementation. A key provider is the datastore interface for accessing the encryption key to protect the provenance events. There are currently two implementations --StaticKeyProvider
which reads a key directly fromnifi.properties
, andFileBasedKeyProvider
which reads n many keys from an encrypted file. The interface is extensible, and HSM-backed or other providers are expected in the future.nifi.provenance.repository.encryption.key.provider.location
is the location of the key provider data. ForStaticKeyProvider
, this is left blank. ForFileBasedKeyProvider
, this is a file path to the key provider definition file (e.g../keys.nkp
). For an HSM or other provider, this could be a URL, etc.nifi.provenance.repository.encryption.key.id
is the key ID which is used to encrypt the events.nifi.provenance.repository.encryption.key
is the hexadecimal encoding of the key for theStaticKeyProvider
. ForFileBasedKeyProvider
, this value is left blank. This value can also be encrypted by using theencrypt-config.sh
tool in the NiFi Toolkit, and is marked as sensitive by default.The
FileBasedKeyProvider
implementation reads from an encrypted definition file of the format:Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the master key defined by
nifi.bootstrap.sensitive.key
inconf/bootstrap.conf
.Once the repository is initialized, all provenance event record write operations are serialized according to the configured schema writer (
EventIdFirstSchemaRecordWriter
by default forWriteAheadProvenanceRepository
) to abyte[]
. Those bytes are then encrypted using an implementation ofProvenanceEventEncryptor
(the only current implementation isAES/GCM/NoPadding
) and the encryption metadata (keyId
,algorithm
,version
,IV
) is serialized and prepended. The completebyte[]
is then written to the repository on disk as normal.On record read, the process is reversed. The encryption metadata is parsed and used to decrypt the serialized bytes, which are then deserialized into a
ProvenanceEventRecord
object. The delegation to the normal schema record writer/reader allows for "random-access" (i.e. immediate seek without decryption of unnecessary records).Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted provenance repository. The Provenance Query operations work as expected with no change to the process.
Performance
While there is an obvious performance cost to cryptographic operations, I tried to minimize the impact and to provide an estimate of the metrics of this implementation in comparison to existing behavior.
In general, with low flow event volume, the performance impact is not noticeable -- it is perfectly inline with
WriteAheadProvenanceRepository
and more than twice as fast as the existingPersistentProvenanceRepository
.With a much higher volume of events, the impact is felt in two ways. First, the throughput of the flow is slower, as more resources are dedicated to encrypting and serializing the events (note the total events processed/events per second). In addition, the provenance queries are slightly slower than the original implementation (1% - 17%), and significantly slower than the new
WriteAheadProvenanceRepository
operating in plaintext (~110%). This is a known trade-off that will need to be evaluated by the deployment administrator given their threat model and risk assessment.Remaining Efforts
LuceneEventIndex
printed substantial information from the event record to the log. If the repository is encrypted, an administrator would reasonably expect this potentially-sensitive information not to be printed to the logs. In this specific instance, I changed the log statements to elide this information, but an audit needs to occur for the complete project to detect other instances where this may occur. Ideally, this could be variable depending on the encryption status of the repository, but this would require changing the method signature, and I didn't want to tackle that now. This is captured in NIFI-3388Potential Issues
StaticKeyProvider
does not provide a mechanism to support this. WithFileBasedKeyProvider
, they can simply specify a new key in the key provider file withnifi.provenance.repository.encryption.key.id
innifi.properties
and future events will be encrypted with that key. Previously-encrypted events can still be decrypted as long as that key is still available in the key definition filePersistentProvenanceRepository
toEncryptedWriteAheadProvenanceRepository
led to slower NiFi app shutdowns NIFI-3712. This was repeatable withWriteAheadProvenanceRepository
, so I don't believe it is dependent on the encryption changesThank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.