-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix comparing columns in tables to the current stream keys. #94
Fix comparing columns in tables to the current stream keys. #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirelagrigoras I took a stab at this as well before seeing your work in here. This is great! Also, good catch with the key_properties
being passed into persist_csv
.
I don't see any failing tests herein, and we'll need those for future proofing etc. If you wanna check out my pr and steal the tests feel free.
I'll leave it up to you as to whether you want to merge the prs, or if you want to review mine and we can try to get that one through!
@@ -459,7 +463,7 @@ def persist_csv_rows(self, | |||
|
|||
update_sql = self._get_update_sql(remote_schema['name'], | |||
temp_table_name, | |||
remote_schema['key_properties'], | |||
[key.lower() for key in remote_schema['key_properties']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also have an issue with other values being canonicalized to _
by the canonicalize_identifier
code. I think for your persist_csv
code you'll need to do the same thing as what you did on line 142.
@@ -125,6 +124,8 @@ def write_batch(self, stream_buffer): | |||
(stream_buffer.stream,), | |||
stream_buffer.stream) | |||
|
|||
table_metadata = self._get_table_metadata(cur, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be moved to line ~140 as it's only used there. By doing that it better scopes the variable so that it's not being used accidentally in more places.
Thank you for your feedback.I will do the changes you suggested and also
run the tests that you wrote, if you don't mind and get back to you
tomorrow.
În Vin, 8 feb. 2019, 23:29 Alexander Mann <notifications@github.com a scris:
… ***@***.**** requested changes on this pull request.
@mirelagrigoras <https://github.com/mirelagrigoras> I took a stab at this
as well before seeing your work in here. This is great! Also, good catch
with the key_properties being passed into persist_csv.
I don't see any failing tests herein, and we'll need those for future
proofing etc. If you wanna check out my pr and steal the tests feel free.
#95 <#95>
I'll leave it up to you as to whether you want to merge the prs, or if you
want to review mine and we can try to get that one through!
------------------------------
In target_postgres/postgres.py
<#94 (comment)>
:
> @@ -459,7 +463,7 @@ def persist_csv_rows(self,
update_sql = self._get_update_sql(remote_schema['name'],
temp_table_name,
- remote_schema['key_properties'],
+ [key.lower() for key in remote_schema['key_properties']],
I think you also have an issue with other values being canonicalized to _
by the canonicalize_identifier code. I think for your persist_csv code
you'll need to do the same thing as what you did on line 142.
------------------------------
In target_postgres/postgres.py
<#94 (comment)>
:
> @@ -125,6 +124,8 @@ def write_batch(self, stream_buffer):
(stream_buffer.stream,),
stream_buffer.stream)
+ table_metadata = self._get_table_metadata(cur,
This should probably be moved to line ~140 as it's only used there. By
doing that it better scopes the variable so that it's not being used
accidentally in more places.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#94 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AtMRVAWidLq9wvubNtH9L3XuVBadqKlkks5vLexNgaJpZM4aq0hN>
.
|
@mirelagrigoras sounds good. It looks like @awm33 already approved the other pr. I'll go ahead and merge and get releases cut so that we can get Redshift updated with the same changes. |
A "key" value, as it comes in the stream_buffer object, may or may not contain uppercase letters, but in Redshift, which is based on Postgres, columns are always created with lowercase letters.
As a consequence, in order to check that in the existing table there is a column that corresponds to a certain key and has the same type as the one provided for the key in the current schema, we must check that the type of the key is the same as the column that is mapped to this key.
E.g.: For a key called 'orderID', the new column that will be created in a Redshift table will be "orderid".Thus, in the metadata object the column 'orderid' will be mapped to the 'orderID' key.
To check that both the column in the table and the schema type for that key have the same sql type, we call the json_schema_to_sql_type method as it follows:
-current_table_schema['schema']['properties']['orderid']
-stream_buffer.schema['properties']['orderID'].