New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-3594 Encrypted provenance repository implementation #1686

Closed
wants to merge 28 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@alopresto
Contributor

alopresto commented Apr 21, 2017

This is a big PR, and there is some helpful information before delving into the code.

What is it?

The EncryptedWriteAheadProvenanceRepository is a new implementation of the provenance repository which encrypts all event record information before it is written to the repository. This allows for storage on systems where OS-level access controls are not sufficient to protect the data while still allowing querying and access to the data through the NiFi UI/API.

How does it work?

The code will provide more details, and I plan to write extensive documentation for the Admin Guide and User Guide NIFI-3721, but this will suffice for an overview.

The WriteAheadProvenanceRepository was introduced by @markap14 in NIFI-3356 and provided a refactored and much faster provenance repository implementation than the previous PersistentProvenanceRepository. The encrypted version wraps that implementation with a record writer and reader which encrypt and decrypt the serialized bytes respectively.

The fully qualified class org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository is specified as the provenance repository implementation in nifi.properties as the value of nifi.provenance.repository.implementation. In addition, new properties must be populated to allow successful initialization.

The simplest configuration is below:

nifi.provenance.repository.debug.frequency=100
nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
nifi.provenance.repository.encryption.key.provider.location=
nifi.provenance.repository.encryption.key.id=Key1
nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
  • nifi.provenance.repository.debug.frequency is a new configuration option to control the rate at which debug messages regarding performance statistics are printed to the logs (in DEBUG mode)
  • nifi.provenance.repository.encryption.key.provider.implementation is the Key Provider implementation. A key provider is the datastore interface for accessing the encryption key to protect the provenance events. There are currently two implementations -- StaticKeyProvider which reads a key directly from nifi.properties, and FileBasedKeyProvider which reads n many keys from an encrypted file. The interface is extensible, and HSM-backed or other providers are expected in the future.
  • nifi.provenance.repository.encryption.key.provider.location is the location of the key provider data. For StaticKeyProvider, this is left blank. For FileBasedKeyProvider, this is a file path to the key provider definition file (e.g. ./keys.nkp). For an HSM or other provider, this could be a URL, etc.
  • nifi.provenance.repository.encryption.key.id is the key ID which is used to encrypt the events.
  • nifi.provenance.repository.encryption.key is the hexadecimal encoding of the key for the StaticKeyProvider. For FileBasedKeyProvider, this value is left blank. This value can also be encrypted by using the encrypt-config.sh tool in the NiFi Toolkit, and is marked as sensitive by default.

The FileBasedKeyProvider implementation reads from an encrypted definition file of the format:

key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==

Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the master key defined by nifi.bootstrap.sensitive.key in conf/bootstrap.conf.

Once the repository is initialized, all provenance event record write operations are serialized according to the configured schema writer (EventIdFirstSchemaRecordWriter by default for WriteAheadProvenanceRepository) to a byte[]. Those bytes are then encrypted using an implementation of ProvenanceEventEncryptor (the only current implementation is AES/GCM/NoPadding) and the encryption metadata (keyId, algorithm, version, IV) is serialized and prepended. The complete byte[] is then written to the repository on disk as normal.

Encrypted provenance repository file on disk

On record read, the process is reversed. The encryption metadata is parsed and used to decrypt the serialized bytes, which are then deserialized into a ProvenanceEventRecord object. The delegation to the normal schema record writer/reader allows for "random-access" (i.e. immediate seek without decryption of unnecessary records).

Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted provenance repository. The Provenance Query operations work as expected with no change to the process.

Performance

While there is an obvious performance cost to cryptographic operations, I tried to minimize the impact and to provide an estimate of the metrics of this implementation in comparison to existing behavior.

In general, with low flow event volume, the performance impact is not noticeable -- it is perfectly inline with WriteAheadProvenanceRepository and more than twice as fast as the existing PersistentProvenanceRepository.

Small event size, low volume

With a much higher volume of events, the impact is felt in two ways. First, the throughput of the flow is slower, as more resources are dedicated to encrypting and serializing the events (note the total events processed/events per second). In addition, the provenance queries are slightly slower than the original implementation (1% - 17%), and significantly slower than the new WriteAheadProvenanceRepository operating in plaintext (~110%). This is a known trade-off that will need to be evaluated by the deployment administrator given their threat model and risk assessment.

Small event size, high volume

Remaining Efforts

  • Documentation -- as noted above, this effort is captured in NIFI-3721
  • Logging data leakage -- in various places, I noted that with logs set to DEBUG, the LuceneEventIndex printed substantial information from the event record to the log. If the repository is encrypted, an administrator would reasonably expect this potentially-sensitive information not to be printed to the logs. In this specific instance, I changed the log statements to elide this information, but an audit needs to occur for the complete project to detect other instances where this may occur. Ideally, this could be variable depending on the encryption status of the repository, but this would require changing the method signature, and I didn't want to tackle that now. This is captured in NIFI-3388
  • Other implementations -- While AES/GCM is (in my opinion) the best option for event encryption (it is AEAD which provides confidentiality and integrity, very fast, and does not need to be compatible with any external system), users may have requirements/requests for other algorithms
  • Other key providers -- as noted above, HSM is probably the biggest, but other software-based secure data stores like Vault or KeyWhiz, or JCEKS-backed to be compatible with Hadoop systems may be necessary
  • Refactoring shared code -- as part of the effort to provide encrypted repositories for content and flowfiles, some of this code will likely be moved to other modules

Potential Issues

  • Key rotation -- If a user wants to rotate the keys used, StaticKeyProvider does not provide a mechanism to support this. With FileBasedKeyProvider, they can simply specify a new key in the key provider file with nifi.provenance.repository.encryption.key.id in nifi.properties and future events will be encrypted with that key. Previously-encrypted events can still be decrypted as long as that key is still available in the key definition file
  • Switching between unencrypted and encrypted repositories
    • If a user has an existing repository that is not encrypted and switches their configuration to use an encrypted repository, the application writes an error to the log but starts up. However, previous events are not accessible through the provenance query interface and new events will overwrite the existing events. The same behavior occurs if a user switches from an encrypted repository to an unencrypted repository NIFI-3722
    • We should provide logic to handle encrypted -> unencrypted seamlessly as long as the key provider available still has the keys used to encrypt the events (see Key Rotation)
    • We should provide logic to handle unencrypted -> encrypted seamlessly as the previously recorded events simply need to be read with a plaintext schema record reader and then written back with the encrypted record writer
    • We should also provide a standalone tool in NiFi Toolkit to encrypt/decrypt an existing provenance repository to make the transition easier. The translation process could take a long time depending on the size of the existing repository, and being able to perform this task outside of application startup would be valuable NIFI-3723
  • Multiple repositories -- No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository).
  • Corruption -- when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual records (i.e. an entire repository file won't be irrecoverable due to the encryption)
  • Shutdown -- I noticed that switching from PersistentProvenanceRepository to EncryptedWriteAheadProvenanceRepository led to slower NiFi app shutdowns NIFI-3712. This was repeatable with WriteAheadProvenanceRepository, so I don't believe it is dependent on the encryption changes

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

alopresto added some commits Mar 14, 2017

NIFI-3594 Added first unit test for PersistentProvenanceRepository op…
…eration.

Added BC dependency to nifi-persistent-provenance-repository module.
NIFI-3594 Added skeleton of encrypted provenance repository (KeyProvi…
…der w/ 2 impls, Encryptor skeleton, and exceptions/utilities).

Reorganized tests to proper path.
NIFI-3594 Added encryption methods and reflective property accessors.…
… Pausing to re-evaluate because work may need to be done at lower level (EventWriter/EventReader -- byte/Object serialization).
NIFI-3594 Intermediate changes before discussion with Mark Payne abou…
…t intercepting SchemaRecordReader/Writer serialization (no updates to schema necessary).
NIFI-3594 Moved (Keyed)CipherProvider classes & tests into nifi-secur…
…ity-utils to include in nifi-data-provenance-utils.
NIFI-3594 Implemented encrypted read, write, and seek operations.
Resolved RAT and checkstyle issues.
All tests pass.
NIFI-3594 Delgated reader and writer to use AESKeyedCipherProvider (e…
…nhanced error checking and guard controls).
NIFI-3594 Refactored AESProvenanceEventEncryptor implementation (remo…
…ved cached ciphers to allow non-repeating IVs).

Added unit tests.
NIFI-3594 Added forAlgorithm static constructor for EncryptionMethod.
Added validity checks for algorithm and version in AESProvenanceEventEncryptor.
Added unit tests.
NIFI-3594 Refactored key availability interface contract.
Refactored encryptor composition.
Added unit tests.
NIFI-3594 Began adding configuration properties for encrypted provena…
…nce repository.

Added utility methods for validation.
Added unit tests.
NIFI-3594 Added new NiFi properties keys for provenance repository en…
…cryption.

Added nifi.provenance.repository.encryption.key to default sensitive keys and updated unit tests and test resources.
Added method to correctly calculate protected percentage of sensitive keys (unpopulated keys are no longer counted against protection %).
NIFI-3594 Implemented StaticKeyProvider and FileBasedKeyProvider.
Moved getBestEventIdentifier() from StandardProvenanceEventRecord to ProvenanceEventRecord interface and added delegate in all implementations to avoid ClassCastException from multiple classloaders.
Initialized IV before cipher to suppress unnecessary warnings.
Added utility method to read encrypted provenance keys from key provider file.
Suppressed logging of event record details in LuceneEventIndex.
Added logic to create EncryptedSchemaRecordReader (if supported) in RecordReaders.
Cleaned up EncryptedSchemaRecordReader and EncryptedSchemaRecordWriter.
Added keyProvider, recordReaderFactory, and recordWriterFactory initialization to EncryptedWriteAheadProvenanceRepository to provide complete interceptor implementation.
Added logic to RepositoryConfiguration to load encryption-related properties if necessary.
Refactored WriteAheadProvenanceRepository to allow subclass implementation.
Registered EncryptedWAPR in ProvenanceRepository implementations.
Added unit tests for EWAPR.
Added new nifi.properties keys for encrypted provenance repository.
Show outdated Hide outdated ...rc/main/java/org/apache/nifi/provenance/AESProvenanceEventEncryptor.java
}
// Skip the first byte (SENTINEL) and don't need to copy all the serialized record
byte[] metadataBytes = Arrays.copyOfRange(encryptedRecord, 1, encryptedRecord.length);

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

Rather than copying the byte[] here and then wrapping in a ByteArrayInputStream, would recommend we instead just wrap the original byte[] in a ByteArrayInputStream, then call in.read() to discard the first byte -- avoids a good bit of garbage creation/collection.

@markap14

markap14 Apr 24, 2017

Contributor

Rather than copying the byte[] here and then wrapping in a ByteArrayInputStream, would recommend we instead just wrap the original byte[] in a ByteArrayInputStream, then call in.read() to discard the first byte -- avoids a good bit of garbage creation/collection.

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

Good catch.

@alopresto

alopresto Apr 24, 2017

Contributor

Good catch.

Show outdated Hide outdated ...ovenance-utils/src/main/java/org/apache/nifi/provenance/CryptoUtils.java
* @throws IOException this should never be thrown
*/
public static byte[] concatByteArrays(byte[]... arrays) throws IOException {
ByteArrayOutputStream boas = new ByteArrayOutputStream();

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

We're doing quite a lot of byte copying here, as the BAOS is potentially resizing a lot... then we call toByteArray() to copy it again. Would recommend calculating size of the arrays and then just creating an array of that size and copying the bytes directly. Alternatively, you could calculate the size of the arrays and pass that as an argument to BAOS so that it doesn't generate so much garbage - but this still has to copy the byte[] when baos.toByteArray() is called.

@markap14

markap14 Apr 24, 2017

Contributor

We're doing quite a lot of byte copying here, as the BAOS is potentially resizing a lot... then we call toByteArray() to copy it again. Would recommend calculating size of the arrays and then just creating an array of that size and copying the bytes directly. Alternatively, you could calculate the size of the arrays and pass that as an argument to BAOS so that it doesn't generate so much garbage - but this still has to copy the byte[] when baos.toByteArray() is called.

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

Ok.

@alopresto

alopresto Apr 24, 2017

Contributor

Ok.

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

I was curious about this, so I added an implementation as you described above and compared them. The BAOS is definitely faster for small byte[], but slower for large ones. Interesting.

  public static byte[] concatByteArrays(byte[]... arrays) throws IOException {
        int totalByteLength = 0;
        for (byte[] bytes : arrays) {
            totalByteLength += bytes.length;
        }
        byte[] totalBytes = new byte[totalByteLength];
        int currentLength = 0;
        for (byte[] bytes : arrays) {
            System.arraycopy(bytes, 0, totalBytes, currentLength, bytes.length);
            currentLength += bytes.length;
        }
        return totalBytes;
    }

    public static byte[] concatByteArraysWithBAOS(byte[]... arrays) throws IOException {
        ByteArrayOutputStream boas = new ByteArrayOutputStream();
        for (byte[] arr : arrays) {
            boas.write(arr);
        }
        return boas.toByteArray();
    }
Calculating small/small -- 3 arrays with avg length 11
Ran 100 of small/small (traditional) with a total wall time of 28720012 ns and average run of 195354 ns
Ran 100 of small/small (BAOS) with a total wall time of 6745072 ns and average run of 12300 ns
Calculating small/large -- 2 arrays with avg length 567
Ran 100 of small/large (traditional) with a total wall time of 2712642 ns and average run of 5807 ns
Ran 100 of small/large (BAOS) with a total wall time of 3774826 ns and average run of 11078 ns
Calculating large/small -- 145 arrays with avg length 8
Ran 100 of large/small (traditional) with a total wall time of 5173622 ns and average run of 27214 ns
Ran 100 of large/small (BAOS) with a total wall time of 5120322 ns and average run of 27663 ns
Calculating large/large -- 182 arrays with avg length 534
Ran 100 of large/large (traditional) with a total wall time of 11537912 ns and average run of 83769 ns
Ran 100 of large/large (BAOS) with a total wall time of 65845017 ns and average run of 612915 ns
@alopresto

alopresto Apr 24, 2017

Contributor

I was curious about this, so I added an implementation as you described above and compared them. The BAOS is definitely faster for small byte[], but slower for large ones. Interesting.

  public static byte[] concatByteArrays(byte[]... arrays) throws IOException {
        int totalByteLength = 0;
        for (byte[] bytes : arrays) {
            totalByteLength += bytes.length;
        }
        byte[] totalBytes = new byte[totalByteLength];
        int currentLength = 0;
        for (byte[] bytes : arrays) {
            System.arraycopy(bytes, 0, totalBytes, currentLength, bytes.length);
            currentLength += bytes.length;
        }
        return totalBytes;
    }

    public static byte[] concatByteArraysWithBAOS(byte[]... arrays) throws IOException {
        ByteArrayOutputStream boas = new ByteArrayOutputStream();
        for (byte[] arr : arrays) {
            boas.write(arr);
        }
        return boas.toByteArray();
    }
Calculating small/small -- 3 arrays with avg length 11
Ran 100 of small/small (traditional) with a total wall time of 28720012 ns and average run of 195354 ns
Ran 100 of small/small (BAOS) with a total wall time of 6745072 ns and average run of 12300 ns
Calculating small/large -- 2 arrays with avg length 567
Ran 100 of small/large (traditional) with a total wall time of 2712642 ns and average run of 5807 ns
Ran 100 of small/large (BAOS) with a total wall time of 3774826 ns and average run of 11078 ns
Calculating large/small -- 145 arrays with avg length 8
Ran 100 of large/small (traditional) with a total wall time of 5173622 ns and average run of 27214 ns
Ran 100 of large/small (BAOS) with a total wall time of 5120322 ns and average run of 27663 ns
Calculating large/large -- 182 arrays with avg length 534
Ran 100 of large/large (traditional) with a total wall time of 11537912 ns and average run of 83769 ns
Ran 100 of large/large (BAOS) with a total wall time of 65845017 ns and average run of 612915 ns

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

So with 100 iterations, you're likely getting a lot of 'jvm warmup time' into your calculations... i'd go for at least 10,000 iterations as a 'warm up' that aren't even counted. Then probably 100,000 - 1 MM iterations to test actual performance...

@markap14

markap14 Apr 24, 2017

Contributor

So with 100 iterations, you're likely getting a lot of 'jvm warmup time' into your calculations... i'd go for at least 10,000 iterations as a 'warm up' that aren't even counted. Then probably 100,000 - 1 MM iterations to test actual performance...

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

Yep, looks like with the higher iteration count the array copy is faster.

Ran 1000 iterations in 34702232 ns to warm up the JVM
Calculating small/small -- 2 arrays with avg length 5
Ran 1000000 of small/small (traditional) with a total wall time of 1339514066 ns and average run of 131 ns
Ran 1000000 of small/small (BAOS) with a total wall time of 1008624897 ns and average run of 83 ns
Calculating small/large -- 5 arrays with avg length 484
Ran 1000000 of small/large (traditional) with a total wall time of 1699096744 ns and average run of 805 ns
Ran 1000000 of small/large (BAOS) with a total wall time of 2978547636 ns and average run of 1975 ns
Calculating large/small -- 173 arrays with avg length 8
Ran 1000000 of large/small (traditional) with a total wall time of 2692231144 ns and average run of 1521 ns
Ran 1000000 of large/small (BAOS) with a total wall time of 3514575357 ns and average run of 2582 ns
Calculating large/large -- 147 arrays with avg length 579
Ran 1000000 of large/large (traditional) with a total wall time of 16443696576 ns and average run of 15418 ns
Ran 1000000 of large/large (BAOS) with a total wall time of 61356984740 ns and average run of 59737 ns
@alopresto

alopresto Apr 24, 2017

Contributor

Yep, looks like with the higher iteration count the array copy is faster.

Ran 1000 iterations in 34702232 ns to warm up the JVM
Calculating small/small -- 2 arrays with avg length 5
Ran 1000000 of small/small (traditional) with a total wall time of 1339514066 ns and average run of 131 ns
Ran 1000000 of small/small (BAOS) with a total wall time of 1008624897 ns and average run of 83 ns
Calculating small/large -- 5 arrays with avg length 484
Ran 1000000 of small/large (traditional) with a total wall time of 1699096744 ns and average run of 805 ns
Ran 1000000 of small/large (BAOS) with a total wall time of 2978547636 ns and average run of 1975 ns
Calculating large/small -- 173 arrays with avg length 8
Ran 1000000 of large/small (traditional) with a total wall time of 2692231144 ns and average run of 1521 ns
Ran 1000000 of large/small (BAOS) with a total wall time of 3514575357 ns and average run of 2582 ns
Calculating large/large -- 147 arrays with avg length 579
Ran 1000000 of large/large (traditional) with a total wall time of 16443696576 ns and average run of 15418 ns
Ran 1000000 of large/large (BAOS) with a total wall time of 61356984740 ns and average run of 59737 ns
Show outdated Hide outdated ...ovenance-utils/src/main/java/org/apache/nifi/provenance/CryptoUtils.java
public static boolean keyIsValid(String encryptionKeyHex) {
return isHexString(encryptionKeyHex)
&& (isUnlimitedStrengthCryptoAvailable()
? Arrays.asList(32, 48, 64).contains(encryptionKeyHex.length())

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

This is called a lot - would recommend removing the Arrays.asList(32, 48, 64) and just creating a private static final List or Set, or since there are only 3 possible values, checking them individually

@markap14

markap14 Apr 24, 2017

Contributor

This is called a lot - would recommend removing the Arrays.asList(32, 48, 64) and just creating a private static final List or Set, or since there are only 3 possible values, checking them individually

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

Will do.

@alopresto

alopresto Apr 24, 2017

Contributor

Will do.

}
}
// TODO: Copied from EventIdFirstSchemaRecordReader to force local/overridden readRecord()

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

Was this TODO intended to remain here? Not sure what is actually to be done...

@markap14

markap14 Apr 24, 2017

Contributor

Was this TODO intended to remain here? Not sure what is actually to be done...

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

This was a note to myself/any reviewer about why I copied the entire method body from the other reader implementation. If I just referred to the other instance (i.e. comment out this entire method declaration), the operation will fail (see EncryptedSchemaRecordReaderWriterTest#testSkipToEvent()) even though the code is identical. If there is a better suggestion for how to fix this, I am all ears.

@alopresto

alopresto Apr 24, 2017

Contributor

This was a note to myself/any reviewer about why I copied the entire method body from the other reader implementation. If I just referred to the other instance (i.e. comment out this entire method declaration), the operation will fail (see EncryptedSchemaRecordReaderWriterTest#testSkipToEvent()) even though the code is identical. If there is a better suggestion for how to fix this, I am all ears.

Show outdated Hide outdated .../org/apache/nifi/provenance/EncryptedSchemaRecordReaderWriterTest.groovy
keyId == KEY_ID
}] as KeyProvider
provenanceEventEncryptor.initialize(mockKeyProvider)
//

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

Should this be removed?

@markap14

markap14 Apr 24, 2017

Contributor

Should this be removed?

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

Yes. Thanks.

@alopresto

alopresto Apr 24, 2017

Contributor

Yes. Thanks.

Show outdated Hide outdated ...pache/nifi/provenance/EncryptedWriteAheadProvenanceRepositoryTest.groovy
private static final AtomicLong recordId = new AtomicLong()
// @Rule

This comment has been minimized.

@markap14

markap14 Apr 24, 2017

Contributor

There are a handful of places here where lines of code are commented out. Should probably remove.

@markap14

markap14 Apr 24, 2017

Contributor

There are a handful of places here where lines of code are commented out. Should probably remove.

This comment has been minimized.

@alopresto

alopresto Apr 24, 2017

Contributor

I cleaned up the unnecessary ones. I think the ones that have the note about NIFI-3605 can stay because they have the correct code commented out to be switched back when that bug is patched.

@alopresto

alopresto Apr 24, 2017

Contributor

I cleaned up the unnecessary ones. I think the ones that have the note about NIFI-3605 can stay because they have the correct code commented out to be switched back when that bug is patched.

@markap14

This comment has been minimized.

Show comment
Hide comment
@markap14

markap14 Apr 24, 2017

Contributor

@alopresto one thought regarding the StaticKeyProvider... rather than having a .key= perhaps it would make sense to instead use a .key.<key_id>= property scheme. This would allow multiple Key ID's to be used, which would resolve the 'Potential Issue' listed above, of now allowing the key to rotate, with rather minimal changes to the code??

Contributor

markap14 commented Apr 24, 2017

@alopresto one thought regarding the StaticKeyProvider... rather than having a .key= perhaps it would make sense to instead use a .key.<key_id>= property scheme. This would allow multiple Key ID's to be used, which would resolve the 'Potential Issue' listed above, of now allowing the key to rotate, with rather minimal changes to the code??

@alopresto

This comment has been minimized.

Show comment
Hide comment
@alopresto

alopresto Apr 24, 2017

Contributor

@markap14 That's not a bad suggestion. I had envisioned the StaticKeyProvider as a simple default that only had the one key and didn't revisit it after creating the FileBasedKeyProvider. I'll make the change and add a test.

Contributor

alopresto commented Apr 24, 2017

@markap14 That's not a bad suggestion. I had envisioned the StaticKeyProvider as a simple default that only had the one key and didn't revisit it after creating the FileBasedKeyProvider. I'll make the change and add a test.

alopresto added some commits Apr 24, 2017

NIFI-3594 Switched concatByteArrays implementation to manual concaten…
…ation of arrays.

Added unit test demonstrating performance improvement.
NIFI-3594 Improved byte[] handling code for performance/memory effici…
…ency with Mark Payne's feedback.

Cleaned up commented code.
NIFI-3594 Added multiple key feature to StaticKeyProvider.
Refactored StaticKeyProvider and FileBasedKeyProvider to reduce duplicate code.
Added helper methods in NiFiProperties to read multiple key definitions for StaticKeyProvider.
Fixed undetected NPE in tests (storing null value into properties).
Added unit tests.
@alopresto

This comment has been minimized.

Show comment
Hide comment
@alopresto

alopresto Apr 25, 2017

Contributor

I had to make some substantial changes to enable multiple keys for StaticKeyProvider. Please exercise defining multiple keys using the following format in nifi.properties (note the format is nifi.provenance.repository.encryption.key.id.keyId):

nifi.provenance.repository.debug.frequency=100
nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
nifi.provenance.repository.encryption.key.provider.location=
nifi.provenance.repository.encryption.key.id=Key1
nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
nifi.provenance.repository.encryption.key.id.Key2=0000111122223333444455556666777788889999AAAABBBBCCCCDDDDEEEEFFFF

This should generate a key map of [Key1: 012...210, Key2: 000...FFF]. Anything defined in the default ...encryption.key property is assumed to be the value of the key ID in ...key.id, but it can also be empty, and in that case only the ...key.id.KeyN values will be used.

Contributor

alopresto commented Apr 25, 2017

I had to make some substantial changes to enable multiple keys for StaticKeyProvider. Please exercise defining multiple keys using the following format in nifi.properties (note the format is nifi.provenance.repository.encryption.key.id.keyId):

nifi.provenance.repository.debug.frequency=100
nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
nifi.provenance.repository.encryption.key.provider.location=
nifi.provenance.repository.encryption.key.id=Key1
nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
nifi.provenance.repository.encryption.key.id.Key2=0000111122223333444455556666777788889999AAAABBBBCCCCDDDDEEEEFFFF

This should generate a key map of [Key1: 012...210, Key2: 000...FFF]. Anything defined in the default ...encryption.key property is assumed to be the value of the key ID in ...key.id, but it can also be empty, and in that case only the ...key.id.KeyN values will be used.

alopresto added some commits Apr 25, 2017

NIFI-3594 Added Java JUnit test in nifi-data-provenance-utils module …
…to trigger Maven running Groovy unit tests.
@YolandaMDavis

This comment has been minimized.

Show comment
Hide comment
@YolandaMDavis

YolandaMDavis Apr 28, 2017

Contributor

@alopresto happy to do some functional testing of this as well.

Contributor

YolandaMDavis commented Apr 28, 2017

@alopresto happy to do some functional testing of this as well.

@markap14

This comment has been minimized.

Show comment
Hide comment
@markap14

markap14 Apr 28, 2017

Contributor

@alopresto @YolandaMDavis just a quick update on my review... I think the code looks good and am a +1 on that front. I had no problem getting this up & running and things seem to work well. I just wanted to verify some corner cases like running out of disk space, kill -9 against NiFI, etc. and ensure that NiFi is still able to restart, as this is an issue that some have hit in the past. Otherwise all looks good to me. Thanks!

Contributor

markap14 commented Apr 28, 2017

@alopresto @YolandaMDavis just a quick update on my review... I think the code looks good and am a +1 on that front. I had no problem getting this up & running and things seem to work well. I just wanted to verify some corner cases like running out of disk space, kill -9 against NiFI, etc. and ensure that NiFi is still able to restart, as this is an issue that some have hit in the past. Otherwise all looks good to me. Thanks!

@YolandaMDavis

This comment has been minimized.

Show comment
Hide comment
@YolandaMDavis

YolandaMDavis Apr 28, 2017

Contributor

@alopresto Ran a test where I had an existing unencrypted provenance repo and switched to the encrypted provider. The issues you've documented did occur (where switching between configuration would lead to errors in the log and when querying). However I'm wondering two things:

  1. Can there be a friendlier error message in the logs for those errors? I saw lots of errors around ClassCastException between the old and new Record Readers. Just wondering if NiFi could simply say detected a mismatch and that it could be due to a change in repo providers?

  2. I also saw the UI error (see below), however I expected to eventually be able to retrieve the events that were subject to encryption yet I am unable to. I have a super short and small flow that runs every 10s. Perhaps provenance hasn't been completely overwritten with the newer encrypted files in my case but each time I attempt to query provenance I get the ClassCastException. I also tried changing my search query to a smaller window after I made the change however I still encountered the error.

image

image

If clearing the provenance repo when changing configurations would resolve the issue can that step be included in the documentation (if needed I would recommend including that step in the issues section)?

Contributor

YolandaMDavis commented Apr 28, 2017

@alopresto Ran a test where I had an existing unencrypted provenance repo and switched to the encrypted provider. The issues you've documented did occur (where switching between configuration would lead to errors in the log and when querying). However I'm wondering two things:

  1. Can there be a friendlier error message in the logs for those errors? I saw lots of errors around ClassCastException between the old and new Record Readers. Just wondering if NiFi could simply say detected a mismatch and that it could be due to a change in repo providers?

  2. I also saw the UI error (see below), however I expected to eventually be able to retrieve the events that were subject to encryption yet I am unable to. I have a super short and small flow that runs every 10s. Perhaps provenance hasn't been completely overwritten with the newer encrypted files in my case but each time I attempt to query provenance I get the ClassCastException. I also tried changing my search query to a smaller window after I made the change however I still encountered the error.

image

image

If clearing the provenance repo when changing configurations would resolve the issue can that step be included in the documentation (if needed I would recommend including that step in the issues section)?

@alopresto

This comment has been minimized.

Show comment
Hide comment
@alopresto

alopresto May 1, 2017

Contributor

@YolandaMDavis Thanks Yolanda.

During most of my testing, I did clear the repositories between application starts when switching the implementation (rm -rf provenance_repository/ content_repository/ flowfile_repository/ from $NIFI_HOME/conf). I did run tests (at the time) where I left the repository intact (in each direction) and my notes are:

	-Verified switching from encrypted prov repo to plain will not allow access to previously-written events
		-Error in app log
			-Suppress stacktrace?
		-App still starts
		-UI error dialog on prov query
		-New provenance events overwrite previous
		-New prov queries work
	-Verified switching from plain prov repo to encrypted will not allow access to previously-written events
		-Error in app log
			-Suppress stacktrace?
		-App still starts
		-UI error dialog on prov query
		-New provenance events overwrite previous
		-New prov queries work

I will re-evaluate those scenarios as soon as I finish reviewing PR 1712 for Bryan.

I do think suppressing the stacktrace and providing a more descriptive error is a good idea and will tackle that as well. If it is determined there is a simple reason your queries were not working, great. If not, adding documentation instructing repository removal may be necessary.

Contributor

alopresto commented May 1, 2017

@YolandaMDavis Thanks Yolanda.

During most of my testing, I did clear the repositories between application starts when switching the implementation (rm -rf provenance_repository/ content_repository/ flowfile_repository/ from $NIFI_HOME/conf). I did run tests (at the time) where I left the repository intact (in each direction) and my notes are:

	-Verified switching from encrypted prov repo to plain will not allow access to previously-written events
		-Error in app log
			-Suppress stacktrace?
		-App still starts
		-UI error dialog on prov query
		-New provenance events overwrite previous
		-New prov queries work
	-Verified switching from plain prov repo to encrypted will not allow access to previously-written events
		-Error in app log
			-Suppress stacktrace?
		-App still starts
		-UI error dialog on prov query
		-New provenance events overwrite previous
		-New prov queries work

I will re-evaluate those scenarios as soon as I finish reviewing PR 1712 for Bryan.

I do think suppressing the stacktrace and providing a more descriptive error is a good idea and will tackle that as well. If it is determined there is a simple reason your queries were not working, great. If not, adding documentation instructing repository removal may be necessary.

@alopresto

This comment has been minimized.

Show comment
Hide comment
@alopresto

alopresto May 1, 2017

Contributor

Yolanda,

I can reproduce your described issue. I believe the difference is that when I tested and did not clear the existing provenance repository, I was switching between EncryptedWriteAheadProvenanceRepository and WriteAheadProvenanceRepository which both use a SchemaRecordReader that can be cross-cast (EncryptedRecordSchemaReader extends EventIdFirstRecordSchemaReader), so the class cast exception wasn't occurring. However, if you test by first using PersistentProvenanceRepository, this uses ByteArraySchemaRecordReader, which implements the same RecordReader interface but cannot be cast. As PersistentProvenanceRepository is the existing option that most people will be using, production switch over will encounter this issue. I will try to improve this user experience (opening a separate sub-task to do that). For now, I've included documentation in NIFI-3721 (see commit 2361ef0
) stating that the existing repository should be erased when switching (as this is a new feature and backward-compatibility is not yet provided).

Contributor

alopresto commented May 1, 2017

Yolanda,

I can reproduce your described issue. I believe the difference is that when I tested and did not clear the existing provenance repository, I was switching between EncryptedWriteAheadProvenanceRepository and WriteAheadProvenanceRepository which both use a SchemaRecordReader that can be cross-cast (EncryptedRecordSchemaReader extends EventIdFirstRecordSchemaReader), so the class cast exception wasn't occurring. However, if you test by first using PersistentProvenanceRepository, this uses ByteArraySchemaRecordReader, which implements the same RecordReader interface but cannot be cast. As PersistentProvenanceRepository is the existing option that most people will be using, production switch over will encounter this issue. I will try to improve this user experience (opening a separate sub-task to do that). For now, I've included documentation in NIFI-3721 (see commit 2361ef0
) stating that the existing repository should be erased when switching (as this is a new feature and backward-compatibility is not yet provided).

alopresto added a commit to alopresto/nifi that referenced this pull request May 1, 2017

NIFI-3721 Added note about clearing existing provenance repository wh…
…en switching to encrypted implementation (see PR 1686 @ apache#1686 (comment)).

asfgit pushed a commit that referenced this pull request May 1, 2017

NIFI-3721 Added documentation for Encrypted Provenance Repositories t…
…o Admin Guide and User Guide.

Added screenshot of encrypted provenance repository contents on disk.
Added note about clearing existing provenance repository when switching to encrypted implementation (see PR 1686 @ #1686 (comment)).

This closes #1713.

Signed-off-by: Andy LoPresto <alopresto@apache.org>
@alopresto

This comment has been minimized.

Show comment
Hide comment
@alopresto

alopresto May 1, 2017

Contributor

The "improve switching UX" ticket is NIFI-3766. I propose merging this as is given the new documentation providing a work-around.

Contributor

alopresto commented May 1, 2017

The "improve switching UX" ticket is NIFI-3766. I propose merging this as is given the new documentation providing a work-around.

@YolandaMDavis

This comment has been minimized.

Show comment
Hide comment
@YolandaMDavis

YolandaMDavis May 2, 2017

Contributor

@alopresto moving to a separate ticket makes sense. I have one more test to run (run out of disk space per @markap14 suggestion) and will provide any feedback shortly.

Contributor

YolandaMDavis commented May 2, 2017

@alopresto moving to a separate ticket makes sense. I have one more test to run (run out of disk space per @markap14 suggestion) and will provide any feedback shortly.

@YolandaMDavis

This comment has been minimized.

Show comment
Hide comment
@YolandaMDavis

YolandaMDavis May 2, 2017

Contributor

@alopresto ran tests with the following corner cases:
-flow running with normal shutdown invoked. NiFi shutdown and started without issue
-flow running with kill -9 on process invoked. NiFi process was terminated and I was able to start without issue
-flow running and disk exceeded space - I was able to shutdown NiFi clear out disk problem and start nifi without issue

Also validated that switch between EWAPR and PPR (and vice versa) along with removal of content, flowfile and provenance repos would prevent classcastexception seen previously.

+1

will merge shortly

Contributor

YolandaMDavis commented May 2, 2017

@alopresto ran tests with the following corner cases:
-flow running with normal shutdown invoked. NiFi shutdown and started without issue
-flow running with kill -9 on process invoked. NiFi process was terminated and I was able to start without issue
-flow running and disk exceeded space - I was able to shutdown NiFi clear out disk problem and start nifi without issue

Also validated that switch between EWAPR and PPR (and vice versa) along with removal of content, flowfile and provenance repos would prevent classcastexception seen previously.

+1

will merge shortly

@YolandaMDavis

This comment has been minimized.

Show comment
Hide comment
@YolandaMDavis

YolandaMDavis May 2, 2017

Contributor

@alopresto actually Andy just noticed there's a conflict. If you could resolve, squash and push I'll be happy to continue with merging into master.

Contributor

YolandaMDavis commented May 2, 2017

@alopresto actually Andy just noticed there's a conflict. If you could resolve, squash and push I'll be happy to continue with merging into master.

@alopresto

This comment has been minimized.

Show comment
Hide comment
@alopresto

alopresto May 2, 2017

Contributor

I rebased and resolved the conflict. There were test errors in nifi-gcp-processors and nifi-poi-processors where MockComponentLog#getErrorMessages() was returning twice the number of expected errors. With @bbende , we determined this was because src/test/resources/logback-test.xml in nifi-data-provenance-utils set the log level to DEBUG, and MockComponentLog double records the error message if DEBUG is enabled. I provided custom src/test/resources/logback-test.xml files in the two offending modules to ensure that the log level is set as expected when those tests run.

Merged.

Contributor

alopresto commented May 2, 2017

I rebased and resolved the conflict. There were test errors in nifi-gcp-processors and nifi-poi-processors where MockComponentLog#getErrorMessages() was returning twice the number of expected errors. With @bbende , we determined this was because src/test/resources/logback-test.xml in nifi-data-provenance-utils set the log level to DEBUG, and MockComponentLog double records the error message if DEBUG is enabled. I provided custom src/test/resources/logback-test.xml files in the two offending modules to ensure that the log level is set as expected when those tests run.

Merged.

@asfgit asfgit closed this in 7d24207 May 2, 2017

peter-gergely-horvath added a commit to peter-gergely-horvath/nifi that referenced this pull request May 3, 2017

NIFI-3594 Implemented encrypted provenance repository.
Added src/test/resources/logback-test.xml files resetting log level from DEBUG (in nifi-data-provenance-utils) to WARN because later tests depend on MockComponentLog recording a certain number of messages and this number is different than expected if the log level is DEBUG.

This closes #1686.

Signed-off-by: Bryan Bende, Yolanda M. Davis, and Mark Payne
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment