Use Key Pairs to connect to Snowflake using PySpark.
Snowflake data can be accessed from different Snowflake clients(e.g. SnowSQL CLI, JDBC Driver, Snowflake Connector for Spark etc.) For more details on Snowflake connector & driver use this link.
While Snowflake allows basic authentication, for enhance security it supports Key Pair(RSA) authentication.
This guide is to show how quickly we can build a pyspark application to do so.
- An Account in Snowflake (you can use free tier for 30 days)
- Spark installtion completed
- Spark Snowflake connector are installed
- Any IDE/text editor to build pyspark code
1. Generate Private Key
Snowflake allows using both encrypted and unencrypted keys, but some clients(SnowSQL CLI) need encrypted keys only. Also encrypted keys are recommended. Here we will use Openssl to create these keys.
- Create Unencrypted key
$ openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out sf_rsa_key.p8 -nocrypt
- Create Encrypted key
$ openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out sf_rsa_key.p8
2. Generate Public Key
This steps creates public key using the private key.
openssl rsa -in sf_rsa_key.p8 -pubout -out sf_rsa_key.pub
3. Assign public key to Snowflake user
Open the public key file in a text editor(I have used VSCode) and copy the key. Then execute below from Snowflake UI or CLI.
alter user <username> set RSA_PUBLIC_KEY = '<key-value>;
4. Veirfy
Use below command to verify that public key is added.
desc user <username>
5. Configure Snowflake Client(in this case PySpark script) to use RSA authentication
The pyspark code is added here. The code reads the private key, creates spark session, builds snowflake context and then finally connects to snowflake to read data.
To execute the code use below command
$ spark-submit .../TestSnowflakeRSA.py
Tip: Add these jars in Spark classpath - snowflake-jdbc-3.13.3.jar and spark-snowflake_2.12-2.8.5-spark_3.0.jar.
- This authentication method requires, as a minimum, a 2048-bit RSA key pair
- Snowflake supports uninterrupted rotation of public keys, uses two RSA Public Key properties to do same
- Creating encrypted private key, requires using a passphrase. Snowflake recommens PCI DSS standard to generate the passphrase.