-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Improve key-pair auth performance #1082
Comments
Useful docs I used for testing: https://docs.snowflake.com/en/user-guide/key-pair-auth openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub ALTER USER <you> SET RSA_PUBLIC_KEY=''; -- yes this will require admin help; Don't include header/footer strings |
Hey team! I found this issue after switching from a 2048-bit key-pair to a 4096-bit one, and found my dbt run times increasing from ~5 minutes to ~15 minutes. I had a bit of a dig on this, and figured I'd share some findings (and can put together some suggestions for changes as well if there's a preference on how this is handled in the project). The core issue seems to be that the private key is being read (and validated - I'll get to that) on every dbt node, which eats up the time. First - this took me too long to find, but the easiest solution seems to just be using the reuse_connections profile config. Maybe it could be a suggestion in the key-pair authentication section of the docs to use that config? Anyway - in terms of testing what was going on, I did a few tests using the Snowflake Connector for Python and found that (with a fixed private key) execution times were virtually the same across either password, 2048-bit key, or 4096-bit key options. I had a look at I was a bit surprised by this, so I decided to check how long loading keys actually took. My test script looked like this: def benchmark_key_import(key: str, unsafe_skip_rsa_key_validation: bool = False, n_tries: int = 1000):
start = time.time()
for _ in range(0, n_tries):
private_key = serialization.load_pem_private_key(
data=bytes(key, 'utf-8'),
password=None,
unsafe_skip_rsa_key_validation=unsafe_skip_rsa_key_validation
)
key_bytes = private_key.private_bytes(
encoding=serialization.Encoding.DER,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption(),
)
end = time.time()
print(end - start) Some results from that:
That validation is pretty substantial, which is why I didn't want to immediately put a PR together. The cryptography docs don't provide much detail on exactly what's unsafe about skipping validation, and I don't know nearly enough about the security elements to say for sure. However, the Snowflake Connector for Python is also using cryptography, and either requires bytes (which it reads with validation) or an instance of RSAPrivateKey (which would already be validated). Essentially, this means that dbt-snowflake can (and perhaps should) skip validation since it's already being done later by the Snowflake Connector and there's no value in doing it twice. Caching would of course help as well; I imagine there aren't any cases where data changes during an execution such that a cached result became invalid (and if it did, then it could probably be stored in a dictionary instead to maintain it) but my sense based on the above is that skipping validation would be a more sensible solution and would effectively invalidate the need for caching. Beyond that, and as mentioned at the top, I think enabling the |
Is this your first time submitting a feature request?
Describe the feature
@VersusFacit's analysis:
400 vs 1000 is quite a dramatic difference!
Avenues for investigation:
The text was updated successfully, but these errors were encountered: