Skip to content

redshift-connector giving connection time-outs on Codebuild. #212

@gecaro

Description

@gecaro

Driver version

2.0.918

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.63282

Client Operating System

Docker container: Debian GNU/Linux 11 (bullseye) on python3.9 docker image
Codebuild: Using aws/codebuild/standard:7.0

Python version

python3.9

Table schema

Problem description

  1. Expected behaviour:
    Codebuild machine connecting correctly to redshift.
  2. Actual behaviour:
    Codebuild machine not connecting to redshift.
  3. Error message/stack trace:
Traceback (most recent call last):
--
417 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 626, in __init__
418 | self._usock.connect(hostport)
419 | TimeoutError: [Errno 110] Connection timed out
420 |  
421 | During handling of the above exception, another exception occurred:
422 |  
423 | Traceback (most recent call last):
424 | File "/analytics-dbt/test_redshift_connector.py", line 29, in <module>
425 | with redshift_connector.connect(
426 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/__init__.py", line 376, in connect
427 | return Connection(
428 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 689, in __init__
429 | raise InterfaceError("communication error", e)
430 | redshift_connector.error.InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out'))
  1. Any other details that can be helpful:
  • We have been using dbt, which is a framework for data modeling. For now we are using dbt 1.4.5 which uses psycopg2.
  • We use codebuild as our CI and so far, when running dbt commands such as dbt compile or dbt run we had no problems on connecting to redshift with our codebuild environment.
  • We have tried to upgrade to dbt multiple times, from versions 1.5 to 1.7.8 (the last one) which start to use redshift-connector, and we have not been successful as we are getting time outs multiple times.
  • IMPORTANT: Sometimes we are able to connect, but most of the time we aren't.

Things we've tried:

  • Using a custom redshift-connector script to do a simple query (to discard that the issue is caused by dbt itself) -> Same result: connection time out
  • Using a custom psycopg2 script to do a simple query (to discard that the issue is caused by dbt itself) -> THIS WORKS!
  • Running psql -h <host> -p 5439 -U <user> -d <db> -> Same result: connection time out

It is also important to note that we have no problem when running any of the above approaches in our local machines.

Python Driver trace logs

Reproduction code

import redshift_connector
import os
import time

schema = os.environ.get("DBT_TARGET_SCHEMA")
# Establish a connection to the database
query = f"""
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike '{schema}'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike '{schema}'
"""
with redshift_connector.connect(
    host="<host>",
    database="dbt_ci",
    # user=os.environ.get("DBT_PROFILE_USER"),
    user="dbt_ci",
    password=os.environ.get("DBT_PROFILE_PASSWORD"),
    timeout=999999,
) as conn:
    # Create a new cursor
    with conn.cursor() as cursor:
        start_time = time.time()
        # Execute the SQL query
        cursor.execute(query)
        rows = cursor.fetchall()
        for row in rows:
            print(row)

This does not work, giving the following error:

Traceback (most recent call last):
--
417 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 626, in __init__
418 | self._usock.connect(hostport)
419 | TimeoutError: [Errno 110] Connection timed out
420 |  
421 | During handling of the above exception, another exception occurred:
422 |  
423 | Traceback (most recent call last):
424 | File "/analytics-dbt/test_redshift_connector.py", line 29, in <module>
425 | with redshift_connector.connect(
426 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/__init__.py", line 376, in connect
427 | return Connection(
428 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 689, in __init__
429 | raise InterfaceError("communication error", e)
430 | redshift_connector.error.InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out'))

Whereas the following snippet works:

import psycopg2
import os

schema = os.environ.get("DBT_TARGET_SCHEMA")
query = f"""
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike '{schema}'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike '{schema}'
"""

# Establish a connection to the database
conn = psycopg2.connect(
    dbname="dbt_ci",
    host="<host>",
    port="5439",
    user="dbt_ci",
    password=os.environ.get("DBT_PROFILE_PASSWORD"),
)

# Create a cursor object
cur = conn.cursor()

print(query)
# Execute a query
cur.execute(query)

# Fetch all the rows
rows = cur.fetchall()

for row in rows:
    print(row)

# Close the cursor and connection
cur.close()
conn.close()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions