Fix comparing columns in tables to the current stream keys. #94

mirelagrigoras · 2019-02-07T23:37:43Z

A "key" value, as it comes in the stream_buffer object, may or may not contain uppercase letters, but in Redshift, which is based on Postgres, columns are always created with lowercase letters.
As a consequence, in order to check that in the existing table there is a column that corresponds to a certain key and has the same type as the one provided for the key in the current schema, we must check that the type of the key is the same as the column that is mapped to this key.
E.g.: For a key called 'orderID', the new column that will be created in a Redshift table will be "orderid".Thus, in the metadata object the column 'orderid' will be mapped to the 'orderID' key.
To check that both the column in the table and the schema type for that key have the same sql type, we call the json_schema_to_sql_type method as it follows:
-current_table_schema['schema']['properties']['orderid']
-stream_buffer.schema['properties']['orderID'].

AlexanderMann

@mirelagrigoras I took a stab at this as well before seeing your work in here. This is great! Also, good catch with the key_properties being passed into persist_csv.

I don't see any failing tests herein, and we'll need those for future proofing etc. If you wanna check out my pr and steal the tests feel free.

#95

I'll leave it up to you as to whether you want to merge the prs, or if you want to review mine and we can try to get that one through!

AlexanderMann · 2019-02-08T21:27:54Z

target_postgres/postgres.py

@@ -459,7 +463,7 @@ def persist_csv_rows(self,

        update_sql = self._get_update_sql(remote_schema['name'],
                                          temp_table_name,
-                                          remote_schema['key_properties'],
+                                          [key.lower() for key in remote_schema['key_properties']],


I think you also have an issue with other values being canonicalized to _ by the canonicalize_identifier code. I think for your persist_csv code you'll need to do the same thing as what you did on line 142.

AlexanderMann · 2019-02-08T21:28:59Z

target_postgres/postgres.py

@@ -125,6 +124,8 @@ def write_batch(self, stream_buffer):
                                                             (stream_buffer.stream,),
                                                             stream_buffer.stream)

+                table_metadata = self._get_table_metadata(cur,


This should probably be moved to line ~140 as it's only used there. By doing that it better scopes the variable so that it's not being used accidentally in more places.

mirelagrigoras · 2019-02-08T22:13:29Z

Thank you for your feedback.I will do the changes you suggested and also run the tests that you wrote, if you don't mind and get back to you tomorrow. În Vin, 8 feb. 2019, 23:29 Alexander Mann <notifications@github.com a scris:

…

***@***.**** requested changes on this pull request. @mirelagrigoras <https://github.com/mirelagrigoras> I took a stab at this as well before seeing your work in here. This is great! Also, good catch with the key_properties being passed into persist_csv. I don't see any failing tests herein, and we'll need those for future proofing etc. If you wanna check out my pr and steal the tests feel free. #95 <#95> I'll leave it up to you as to whether you want to merge the prs, or if you want to review mine and we can try to get that one through! ------------------------------ In target_postgres/postgres.py <#94 (comment)> : > @@ -459,7 +463,7 @@ def persist_csv_rows(self, update_sql = self._get_update_sql(remote_schema['name'], temp_table_name, - remote_schema['key_properties'], + [key.lower() for key in remote_schema['key_properties']], I think you also have an issue with other values being canonicalized to _ by the canonicalize_identifier code. I think for your persist_csv code you'll need to do the same thing as what you did on line 142. ------------------------------ In target_postgres/postgres.py <#94 (comment)> : > @@ -125,6 +124,8 @@ def write_batch(self, stream_buffer): (stream_buffer.stream,), stream_buffer.stream) + table_metadata = self._get_table_metadata(cur, This should probably be moved to line ~140 as it's only used there. By doing that it better scopes the variable so that it's not being used accidentally in more places. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#94 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AtMRVAWidLq9wvubNtH9L3XuVBadqKlkks5vLexNgaJpZM4aq0hN> .

AlexanderMann · 2019-02-08T22:59:12Z

@mirelagrigoras sounds good. It looks like @awm33 already approved the other pr. I'll go ahead and merge and get releases cut so that we can get Redshift updated with the same changes.

Fix comparing columns in tables to the current stream keys.

e0066b5

AlexanderMann mentioned this pull request Feb 8, 2019

Fix/key properties canonicalized #95

Merged

AlexanderMann requested changes Feb 8, 2019

View reviewed changes

AlexanderMann closed this Feb 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix comparing columns in tables to the current stream keys. #94

Fix comparing columns in tables to the current stream keys. #94

mirelagrigoras commented Feb 7, 2019

AlexanderMann left a comment

AlexanderMann Feb 8, 2019

AlexanderMann Feb 8, 2019

mirelagrigoras commented Feb 8, 2019 via email

AlexanderMann commented Feb 8, 2019

Fix comparing columns in tables to the current stream keys. #94

Fix comparing columns in tables to the current stream keys. #94

Conversation

mirelagrigoras commented Feb 7, 2019

AlexanderMann left a comment

Choose a reason for hiding this comment

AlexanderMann Feb 8, 2019

Choose a reason for hiding this comment

AlexanderMann Feb 8, 2019

Choose a reason for hiding this comment

mirelagrigoras commented Feb 8, 2019 via email

AlexanderMann commented Feb 8, 2019