Wrong error message when using insert_data_bulk #161

diegoxfx · 2023-05-26T19:53:10Z

Driver version

2.0.910

Redshift version

I don't know

Client Operating System

Amazon Linux 2

Python version

3.10.8

Table schema

Problem description

I write the columns name of the table i want to insert into a temporary table. Let's say one of the columns is not written well.

Expected behaviour:
What i expect is to get this error message: "Invalid column name: ..."
Actual behaviour:
What i get is: "TypeError: 'NoneType' object is not subscriptable"
Error message/stack trace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[32], line 2
      1 with get_redshift_connector() as conn:
----> 2     insert_data(conn, filename, columns_table, columns_type, file_index_columns, temp_table)

File ~/SageMaker/utils.py:34, in insert_data(conn, filename, columns_table, columns_type, columns_index_file, table_name, delimiter, batch_size)
     32 temp_table_query += '(' + ', '.join(cols) + ')'
     33 cursor.execute(temp_table_query)
---> 34 cursor.insert_data_bulk(
     35     filename=filename,
     36     table_name=table_name,
     37     parameter_indices=columns_index_file,
     38     column_names=columns_table,
     39     delimiter=',',
     40     batch_size=1024
     41 )

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/redshift_connector/cursor.py:306, in Cursor.insert_data_bulk(self, filename, table_name, parameter_indices, column_names, delimiter, batch_size)
    304 if not self.__is_valid_table(table_name):
    305     raise InterfaceError("Invalid table name passed to insert_data_bulk: {}".format(table_name))
--> 306 if not self.__has_valid_columns(table_name, column_names):
    307     raise InterfaceError("Invalid column names passed to insert_data_bulk: {}".format(table_name))
    308 orig_paramstyle = self.paramstyle

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/redshift_connector/cursor.py:363, in Cursor.__has_valid_columns(self, table, columns)
    361         self.execute(q, params)
    362         res = self.fetchone()
--> 363         if typing.cast(typing.List[int], res)[0] != 1:
    364             raise InterfaceError("Invalid column name: {} specified for table: {}".format(params[1], table))
    365 except:

TypeError: 'NoneType' object is not subscriptable

Reproduction code

temp_table = 'some_temp_table'
filename = '/path/to/file'
columns_table_good = ['a']
columns_table_wrong = ['A']
columns_type = ['varchar']
file_index_columns = [0]

def insert_data(conn, filename, columns_table, columns_type, columns_index_file,
                table_name, delimiter=',', batch_size=1024):
    with conn.cursor() as cursor:
        cols = list(map(lambda x: ' '.join(x), (zip(columns_table, columns_type))))
        temp_table_query = f'CREATE TEMPORARY TABLE {table_name} ' 
        temp_table_query += '(' + ', '.join(cols) + ')'
        cursor.execute(temp_table_query)
        cursor.insert_data_bulk(
            filename=filename,
            table_name=table_name,
            parameter_indices=columns_index_file,
            column_names=columns_table,
            delimiter=',',
            batch_size=1024
        )

# This will work
insert_data(conn, filename, columns_table_good, columns_type, file_index_columns, temp_table)

# This won't work
insert_data(conn, filename, columns_table_wrong, columns_type, file_index_columns, temp_table)

Apparently, the error is in these lines, because "res" is getting None type when error.

amazon-redshift-python-driver/redshift_connector/cursor.py

Lines 360 to 364 in 910be71

    
           for params in param_list: 
        
               self.execute(q, params) 
        
               res = self.fetchone() 
        
               if typing.cast(typing.List[int], res)[0] != 1: 
        
                   raise InterfaceError("Invalid column name: {} specified for table: {}".format(params[1], table))

The text was updated successfully, but these errors were encountered:

Brooke-white · 2023-06-01T22:56:23Z

Hi @diegoxfx ,

Thank you for reporting this issue. I will take a look into this behavior and update with my findings.

Brooke-white · 2023-06-21T16:55:55Z

Hi @diegoxfx , I've implemented a fix for this issue. The check performed on LOC363 of cursor.py is invalid and causes this confusing error. We are incorrectly checking the first element of the return value of fetchone() without ensuring something is actually returned. I anticipate a fix for this in our next release.

Brooke-white · 2023-07-05T17:08:06Z

Hi @diegoxfx , this was just fixed in 2.0.912. please open a new issue if you continue to experience this issue.

diegoxfx · 2024-01-12T15:01:14Z

Hi @Brooke-white i'm facing this error again, but this time with every column name:

with get_redshift_connector() as conn:
    with conn.cursor() as cursor:
        cursor.execute("create temporary table temp_table (col1 varchar)")
        cursor.insert_data_bulk(
                    filename=filename,
                    table_name="temp_table",
                    parameter_indices=[0],
                    column_names=["col1"],
                    delimiter=",",
                    batch_size=1024
                )

---------------------------------------------------------------------------
InterfaceError                            Traceback (most recent call last)
Cell In[34], line 4
      2 with conn.cursor() as cursor:
      3     cursor.execute("create temporary table temp_table (col1 varchar)")
----> 4     cursor.insert_data_bulk(
      5                 filename=filename,
      6                 table_name="temp_table",
      7                 parameter_indices=[0],
      8                 column_names=['col1'],
      9                 delimiter=",",
     10                 batch_size=1024
     11             )

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/redshift_connector/cursor.py:311, in Cursor.insert_data_bulk(self, filename, table_name, parameter_indices, column_names, delimiter, batch_size)
    309 if not self.__is_valid_table(table_name):
    310     raise InterfaceError("Invalid table name passed to insert_data_bulk: {}".format(table_name))
--> 311 if not self.__has_valid_columns(table_name, column_names):
    312     raise InterfaceError("Invalid column names passed to insert_data_bulk: {}".format(table_name))
    313 orig_paramstyle = self.paramstyle

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/redshift_connector/cursor.py:369, in Cursor.__has_valid_columns(self, table, columns)
    367 res = self.fetchone()
    368 if res is None:
--> 369     raise InterfaceError(
    370         "Invalid column name. No results were returned when performing column name validity check. Query: {} Parameters: {}".format(
    371             q, params
    372         )
    373     )
    374 if typing.cast(typing.List[int], res)[0] != 1:
    375     raise InterfaceError("Invalid column name: {} specified for table: {}".format(params[1], table))

InterfaceError: Invalid column name. No results were returned when performing column name validity check. Query: select 1 from pg_catalog.svv_all_columns where table_name = ? and column_name = ? Parameters: ['temp_table', 'col1']

Current version: 2.0.918

Brooke-white · 2024-01-12T17:21:01Z

Hi @diegoxfx , thank you for letting me know and including the repro. I see you're using a temporary table. By any chance does this reproduce for a non-temporary table? Also, could you let me know which redshift server version you are using?

diegoxfx · 2024-01-12T18:02:46Z

@Brooke-white I don't have permissions to create non-temporary tables, so i can't check that. Redshift server version is: PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.61626

Brooke-white · 2024-01-12T18:28:49Z

@diegoxfx I see you're on Redshift server "Patch 179". Our latest server version is "Patch 180". I'd recommend upgrading to the latest server version, Patch 180 as of writing, or awaiting the cluster maintenance window. Please let me know if the issue still presents after the upgrade. https://docs.aws.amazon.com/redshift/latest/mgmt/cluster-versions.html#cluster-version-180

diegoxfx · 2024-01-16T19:30:28Z

@Brooke-white Once the server is updated to Patch 180 I'll let you know. For the moment, downgrading redshift-connector worked for me

Brooke-white · 2024-01-16T19:49:35Z

Glad to hear it's working again for you @diegoxfx. Could you confirm the version of redshift-connector where you saw issue (I believe 2.0.910) and the version where you saw resolution? I'd like to double check there is no issue in our driver.

diegoxfx · 2024-02-02T19:28:35Z

@Brooke-white I think the problem is due to this validation:

amazon-redshift-python-driver/redshift_connector/cursor.py

Lines 311 to 312 in e457b2e

    
           if not self.__has_valid_columns(table_name, column_names): 
        
               raise InterfaceError("Invalid column names passed to insert_data_bulk: {}".format(table_name))

This line is checking if the pair table/column exists and returns error when that pair doesn't exist.

So, why is this problem happening in the last version and not before? For what i see, this line was change on this commit:

b199bfa#diff-7a712c2f5672b7e37cafedba351f2cbdd8a07f9478317a8ab7ffd1c60f9dcee6L356

The original line was:
q: str = "select 1 from information_schema.columns where table_name = ? and column_name = ?"
and the new one:
q: str = "select 1 from pg_catalog.svv_all_columns where table_name = ? and column_name = ?"

So i made some experiments on redshift and got this:

CREATE TEMPORARY TABLE test_temp_table(test_temp_column varchar);

select * from pg_catalog.svv_all_columns 
where 
    table_name LIKE '%test_temp_table%';

select * from information_schema.columns 
where 
    table_name LIKE '%test_temp_table%';

I basically created a temporary table using redshift editor, and searched for the table info using both pg_catalog.svv_all_columns and information_schema.columns.

For pg_catalog.svv_all_columns the result was empty

However, for information_schema.columns the table/column was found:

Maybe the problem could be resolved by creating a parameter that indicates whether the table is temporary or not, and based on that use pg_catalog.svv_all_columns or information_schema.columns

diegoxfx · 2024-02-05T14:37:54Z

I found another bug, but i think it should be in other issue. To be brief: If you have null values in the csv (for example, ''), an error is raised because in the insert, the value inserted is "''" instead of NULL

Brooke-white added the bug Something isn't working label Jun 21, 2023

Brooke-white closed this as completed Jul 5, 2023

Brooke-white reopened this Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong error message when using insert_data_bulk #161

Wrong error message when using insert_data_bulk #161

diegoxfx commented May 26, 2023

Brooke-white commented Jun 1, 2023

Brooke-white commented Jun 21, 2023

Brooke-white commented Jul 5, 2023

diegoxfx commented Jan 12, 2024

Brooke-white commented Jan 12, 2024

diegoxfx commented Jan 12, 2024

Brooke-white commented Jan 12, 2024 •

edited

diegoxfx commented Jan 16, 2024

Brooke-white commented Jan 16, 2024

diegoxfx commented Feb 2, 2024 •

edited

diegoxfx commented Feb 5, 2024

Wrong error message when using insert_data_bulk #161

Wrong error message when using insert_data_bulk #161

Comments

diegoxfx commented May 26, 2023

Driver version

Redshift version

Client Operating System

Python version

Table schema

Problem description

Reproduction code

Brooke-white commented Jun 1, 2023

Brooke-white commented Jun 21, 2023

Brooke-white commented Jul 5, 2023

diegoxfx commented Jan 12, 2024

Brooke-white commented Jan 12, 2024

diegoxfx commented Jan 12, 2024

Brooke-white commented Jan 12, 2024 • edited

diegoxfx commented Jan 16, 2024

Brooke-white commented Jan 16, 2024

diegoxfx commented Feb 2, 2024 • edited

diegoxfx commented Feb 5, 2024

Brooke-white commented Jan 12, 2024 •

edited

diegoxfx commented Feb 2, 2024 •

edited