Create SQL Alchemy connection string using Okta IdP #67

namtonthat · 2021-11-11T08:49:53Z

Driver version

redshift_connector 2.0.889

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.31651

Client Operating System

mac OS Monterery

Python version

3.9.7

Problem description

import redshift_connector

# An Okta auth page will pop up when initiating this connector
conn = redshift_connector.connect(
    host='<cluster_identifier>.ap-southeast-2.redshift.amazonaws.com',
    port=5439,
    region='ap-southeast-2',
    database='<db_name>',
    credentials_provider='BrowserSamlCredentialsProvider',
    login_url='<saml_url>',
    cluster_identifier='<cluster_identifer',
    preferred_role='arn:aws:iam::<iam-id>:role/<role-name>',
    user='',        # left empty string
    password='',    # left empty string
    iam=True,
    ssl=True
 )

cursor: redshift_connector.Cursor = conn.cursor()

Currently, I can connect to Redshift using the above credentials but when trying to utilise the SQL Alchemy connection string - it causes an identification error (using sqlalchemy-redshift). Is there a way to generate an appropriate URL for sqlalchemy to parse for IdP authentication?

I've noticed that Redshift was recently updated to work with sqlalchemy-redshift but have had no luck generating an appropriate URL login.

The text was updated successfully, but these errors were encountered:

Brooke-white · 2021-11-11T17:34:46Z

Hey @nam-tonthat-afterpay ,

Below is an example showing how to establish a connection to Redshift using Okta Browser authentication. Please note that the user and password field are no longer required (i.e. as you're using the browser authentication plugin they can be excluded).

import sqlalchemy as sa
from sqlalchemy.engine.url import URL

url = URL.create(
drivername='redshift+redshift_connector', # indicate redshift_connector driver and dialect will be used
)

conn_params = {
"ssl": True,
"iam": True, # must be enabled when authenticating via IAM
"credentials_provider": "BrowserSamlCredentialsProvider",
"database": '<db_name>', # Amazon Redshift database
"region": "ap-southeast-2",
"cluster_identifier": "<cluster_identifer>",
"login_url": "<saml_url>"
}

engine = sa.create_engine(url, connect_args=conn_params)

with engine.connect() as connection:
    result = connection.execute("select 1")
    for row in result:
        print(row)

engine.dispose()

Please let me know if you see any issue or have any questions.

namtonthat · 2021-11-12T03:58:22Z

Hey @Brooke-white,

I appreciate that! That works running it by itself but once integrated into Amundsen where the botocore/client.py takes the api_call, it returns this error.

File "/Users/nam.tonthat/miniconda/envs/amundsen/lib/python3.9/site-packages/redshift_connector/iam_helper.py", line 371, in set_cluster_credentials
    response = client.describe_clusters(ClusterIdentifier=info.cluster_identifier)
  File "/Users/nam.tonthat/miniconda/envs/amundsen/lib/python3.9/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Users/nam.tonthat/miniconda/envs/amundsen/lib/python3.9/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ClusterNotFoundFault: An error occurred (ClusterNotFound) when calling the DescribeClusters operation:

Uses the sql_alchemy_extractor.py found here

More context, I've used the following versions:

amundsen-common = 0.21.0
amundsen-databuilder = 6.3.1
amundsen-rds = 0.0.6
sqlalchemy = 1.4.22
sqlalchemy-redshift = 0.8.8
botocore = 1.23.2

Any idea on how I should start debugging this issue?

Brooke-white · 2021-11-12T17:50:07Z

Hi @nam-tonthat-afterpay ,

Thanks for providing this additional information. I'd recommend to try the following:

Enable debug logging for redshift_connector and verify nothing looks suspicious in the logs (e.g. region, cluster_identifier)
Are the cluster_identifier and region used for establishing the connection inline with what is seen in the debug logs? As a sanity check, does the cluster exist in the given region?

The error provided is being thrown after the authentication process w/ Okta has completed (i.e. we have temporary IAM credentials), when redshift_connector tries to retrieve the host and port of the cluster with the given cluster_identifier. It's a good sign we're able to check if the cluster if cluster_identifier exists or not.

In my experience, this exception occurs when authentication has been successful but there is some mis-match between cluster_identifier and region.

If you still see issue after trying the above, please let me know and I will work to reproduce on my end.

namtonthat · 2021-11-16T09:34:29Z

Hey @Brooke-white, do you have any tutorials on how to enable debug logging for redshift_connector? I'm still newish to logging.

I've tried using the Python tutorial but haven't had much luck.

Brooke-white · 2021-11-19T17:37:20Z

Hey @nam-tonthat-afterpay ,

Apologies for the late response. Here's an example of configuring debug logging to output to stdout.

This would likely be easier that outputting the logs to a file given your Amundsen instance is probably running in a docker container, and a log file would show up there rather than your local file system :)

import redshift_connector
import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)  # will write all logs with a level >= DEBUG
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)

with redshift_connector.connect(
...
) as conn:
    with conn.cursor() as cursor:
        cursor.execute("select 1")
        print(cursor.fetchall())

For me, this outputs:

2021-11-19 09:31:02,869 - redshift_connector - DEBUG - ===================================
2021-11-19 09:31:02,869 - redshift_connector - DEBUG - User provided connection arguments
2021-11-19 09:31:02,869 - redshift_connector - DEBUG - ===================================
2021-11-19 09:31:02,869 - redshift_connector - DEBUG - {{'access_key_id': None, 'allow_db_user_override': False, 'app_id': None, 'app_name': 'amazon_aws_redshift',....}

ref: https://stackoverflow.com/a/14058475

namtonthat · 2021-11-26T04:24:16Z

Hey @Brooke-white, thanks for the heads up; I've done some more testing with this and the cluster_identifer and region look correct but then when the query starts; it returns the ClusterNotFound error.

2021-11-26 15:16:40,716 - botocore.endpoint - DEBUG - Making request for OperationModel(name=DescribeClusters) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.18.21 Python/3.9.7 Darwin/21.1.0 Botocore/1.21.41'}, 'body': {'Action': 'DescribeClusters', 'Version': '2012-12-01', 'ClusterIdentifier': 'dwh'}, 'url': 'https://redshift.ap-southeast-2.amazonaws.com/', 'context': {'client_region': 'ap-southeast-2', 'client_config': <botocore.config.Config object at 0x7fa1685b5040>, 'has_streaming_input': False, 'auth_type': None}}
2021-11-26 15:16:40,716 - botocore.hooks - DEBUG - Event request-created.redshift.DescribeClusters: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7fa1685aff70>>
2021-11-26 15:16:40,716 - botocore.hooks - DEBUG - Event choose-signer.redshift.DescribeClusters: calling handler <function set_operation_specific_signer at 0x7fa168342b80>
2021-11-26 15:16:40,717 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2021-11-26 15:16:40,717 - botocore.auth - DEBUG - CanonicalRequest:
POST
/
2021-11-26 15:16:41,091 - urllib3.connectionpool - DEBUG - https://redshift.ap-southeast-2.amazonaws.com:443 "POST / HTTP/1.1" 404 287
2021-11-26 15:16:41,092 - botocore.parsers - DEBUG - Response headers: {'x-amzn-RequestId': '934b5778-7693-4be6-ae70-7efdbdcf975b', 'Content-Type': 'text/xml', 'Content-Length': '287', 'Date': 'Fri, 26 Nov 2021 04:16:40 GMT'}
2021-11-26 15:16:41,092 - botocore.parsers - DEBUG - Response body:
b'<ErrorResponse xmlns="http://redshift.amazonaws.com/doc/2012-12-01/">\n  <Error>\n    <Type>Sender</Type>\n    <Code>ClusterNotFound</Code>\n    <Message>Cluster dwh not found.</Message>\n  </Error>\n  <RequestId>934b5778-7693-4be6-ae70-7efdbdcf975b</RequestId>\n</ErrorResponse>\n'
2021-11-26 15:16:41,094 - botocore.parsers - DEBUG - Response headers: {'x-amzn-RequestId': '934b5778-7693-4be6-ae70-7efdbdcf975b', 'Content-Type': 'text/xml', 'Content-Length': '287', 'Date': 'Fri, 26 Nov 2021 04:16:40 GMT'}
2021-11-26 15:16:41,095 - botocore.parsers - DEBUG - Response body:
b'<ErrorResponse xmlns="http://redshift.amazonaws.com/doc/2012-12-01/">\n  <Error>\n    <Type>Sender</Type>\n    <Code>ClusterNotFound</Code>\n    <Message>Cluster dwh not found.</Message>\n  </Error>\n  <RequestId>934b5778-7693-4be6-ae70-7efdbdcf975b</RequestId>\n</ErrorResponse>\n'
2021-11-26 15:16:41,095 - botocore.hooks - DEBUG - Event needs-retry.redshift.DescribeClusters: calling handler <botocore.retryhandler.RetryHandler object at 0x7fa1685b5970>
2021-11-26 15:16:41,095 - botocore.retryhandler - DEBUG - No retry needed.
2021-11-26 15:16:41,096 - redshift_connector.iam_helper - ERROR - ClientError: An error occurred (ClusterNotFound) when calling the DescribeClusters operation: Cluster dwh not found.

Any ideas for this error?

Brooke-white · 2021-11-29T14:08:17Z

Hey @nam-tonthat-afterpay ,

Could you verify that IAM SAML federation role was created inline with the instructions here:

To set up the service provider, complete steps 5–7 from Federate Amazon Redshift access with Okta as an identity provider.

For step 7, the Amazon Redshift connector needs the following additional permission:

{
            "Sid": "DescribeClusters",
            "Effect": "Allow",
            "Action": "redshift:DescribeClusters",
            "Resource": "*"
}

If the role does not provide permissions to DescribeClusters, this could prevent us from being able to see the dwh cluster.

namtonthat · 2021-12-01T23:01:37Z

Hi @Brooke-white ,
I have checked and found the permission to allow the access to Action: redshift:DescribeClusters. I've done some debugging and made some progress but am thrown with another error.

On a simple query like this, it returns the ClusterNotFound error on the first run but on subsequent runs, it always returns (1,) as the output which is fantastic.

import redshift_connector
import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)  # will write all logs with a level >= DEBUG
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)



import sqlalchemy as sa
from sqlalchemy.engine.url import URL

url = URL.create(
drivername='redshift+redshift_connector', # indicate redshift_connector driver and dialect will be used
)

conn_params = {
"ssl": True,
"iam": True, # must be enabled when authenticating via IAM
"credentials_provider": "BrowserSamlCredentialsProvider",
"database": 'vega', # Amazon Redshift database
"region": "ap-southeast-2",
"cluster_identifier": "dwh",
"login_url": "<okta_login>"
}

engine = sa.create_engine(url, connect_args=conn_params)

with engine.connect() as connection:
    result = connection.execute("select 1")
    for row in result:
        print(row)

engine.dispose()

However, repeating it for the amundsen query (adjusting the sample_postgres_loader.py to suit), will always cause a ClusterNotFound connection error.

Any ideas on what could be the issue?

Brooke-white · 2021-12-06T18:02:11Z

Hi @nam-tonthat-afterpay,

Can you confirm if the Python script which only uses redshift_connector is now consistently connecting?

On a simple query like this, it returns the ClusterNotFound error on the first run but on subsequent runs, it always returns (1,) as the output which is fantastic.

When this behavior occurs, what environment(s) are used to run the Python script (e.g. Jupyter)?

However, repeating it for the amundsen query (adjusting the sample_postgres_loader.py to suit), will always cause a ClusterNotFound connection error.

Could you share the modified sample_postgres_loader.py so I can work on reproducing this behavior? Also, how is sample_postgres_loader.py being run? Is the idea that it is manually invoked or done so as a scheduled task?

Brooke-white · 2021-12-14T19:03:07Z

Hi @nam-tonthat-afterpay ,

Following up to see if you're still experiencing issues here.

namtonthat · 2021-12-15T01:52:11Z

Hi @Brooke-white , yes I am - I'm just a little tied up with some tickets that I have to close off - will update this thread by tomorrow! Apologies for the wait :'(.

namtonthat · 2021-12-16T19:50:41Z

Hi @Brooke-white, did a fresh install this morning and tried. Looks like everything is working perfectly. Closing the thread. The debug logging helped me iron out any issues. Thank you again for your help and patience on this!

Brooke-white · 2021-12-16T20:59:00Z

yay! I'm happy to hear this, @nam-tonthat-afterpay. Best wishes on your project :)

namtonthat closed this as completed Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create SQL Alchemy connection string using Okta IdP #67

Create SQL Alchemy connection string using Okta IdP #67

namtonthat commented Nov 11, 2021

Brooke-white commented Nov 11, 2021

namtonthat commented Nov 12, 2021 •

edited

Loading

Brooke-white commented Nov 12, 2021

namtonthat commented Nov 16, 2021

Brooke-white commented Nov 19, 2021

namtonthat commented Nov 26, 2021 •

edited

Loading

Brooke-white commented Nov 29, 2021

namtonthat commented Dec 1, 2021 •

edited

Loading

Brooke-white commented Dec 6, 2021

Brooke-white commented Dec 14, 2021

namtonthat commented Dec 15, 2021

namtonthat commented Dec 16, 2021

Brooke-white commented Dec 16, 2021

Create SQL Alchemy connection string using Okta IdP #67

Create SQL Alchemy connection string using Okta IdP #67

Comments

namtonthat commented Nov 11, 2021

Driver version

Redshift version

Client Operating System

Python version

Problem description

Brooke-white commented Nov 11, 2021

namtonthat commented Nov 12, 2021 • edited Loading

Brooke-white commented Nov 12, 2021

namtonthat commented Nov 16, 2021

Brooke-white commented Nov 19, 2021

namtonthat commented Nov 26, 2021 • edited Loading

Brooke-white commented Nov 29, 2021

namtonthat commented Dec 1, 2021 • edited Loading

Brooke-white commented Dec 6, 2021

Brooke-white commented Dec 14, 2021

namtonthat commented Dec 15, 2021

namtonthat commented Dec 16, 2021

Brooke-white commented Dec 16, 2021

namtonthat commented Nov 12, 2021 •

edited

Loading

namtonthat commented Nov 26, 2021 •

edited

Loading

namtonthat commented Dec 1, 2021 •

edited

Loading