-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Convert columns in get_pandas_df to lowercase SnowflakeHook #13152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
|
it's not true that everything in snowflake is uppercase. snowflake renders unquoted identifiers as uppercase. but quoted identifiers it leaves as is. you can do so this would be no problem if you are never quoting identifiers in your code. however, if that's not true it could cause problems. perhaps a better example is this: with your change you might run into trouble |
|
ok but i see now you made it optional :) so maybe that's ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so i have a different concern with this change.
you are adding a global config option lowercase_columns (global in the context of the hook)
but this only applies to the narrow case of the get_pandas_df method. e.g. it does not apply to get_records. or fetchall on a cursor.
so i would vote not to approve this change
since this isn't a snowflake parameter, and can't be applied uniformly, i think it's the kind of change best left to be implemented in your own subclass --- not something in the airflow version.
people will not easily discover the functionality, and will find that it has only limited use, so not worth it IMO.
I agree that it was not ideal to have it as a global config option for just one method. However, I do think that this can be useful for more people, since it's pretty common to migrate from other databases to Snowflake and this will make things easier. Thus, I have changed the argument from connection to get_pandas_df method and I have added a little bit of documentation. I also agree that this could be potentially hidden from users, so I would be pleased to write more documentation somewhere else (I don't know where though). WDYT? |
|
Let me caveat that I am not a committer and can't merge this anyway, so my opinion is not decisive. I just took a look cus I have experience with snowflake and can help review. And let me also say that I totally get the impulse to add this -- as someone who has used snowflake for a couple years I also was annoyed by the weird handling of case. And also, let me share that I know what it's like to try to contribute or propose something and get a "meh". That all having been said, I just don't think this should be added. It's just too niche, and it's not worth the complexity and confusion it adds. True it's a tiny amount of complexity. But it's another configurable thing, and it is nonstandard -- i.e. it is not part of snowflake api, not part of pandas api, and not supported in other hook methods. Will people even discover the feature? I think it is best to simply accept snowflake as it is. Snowflake currently has a parameter you can set to make quoted identifiers case insensitive. And when enabled, even quoted identifiers are rendered as upper case. I think the best solution would be if snowflake extended this to make it possible that quoted identifiers are case insensitive and always rendered as lower. But that's an issue for another team :) Anyway, just my 2 cents. Others very well may feel differently. In apache language my vote would be -0: "I won't get in the way, but I'd rather we didn't do this." |
|
There is a new version of snowflake python connector coming, that we will incorporate very quickly as it solves some really bad behaviour of python connector. I think we should take a look again at the issue when the new version is released. I also have high hopes we can get much better cooperation between snowflake and open source community using their connectors. They seem to release that this is important for them. |
|
I tend to agree with @dstandish When you use hooks either you wrap it with python function or you inherit from and can overwrite the If this PR proceed forward my only comment is that at least don't call it Like @dstandish I'm - 0 for this PR but that is just me. |
Hi Jarek. Any news about this? |
|
Yeah. The new version of python connector is out and we switched to it. But after looking closely - I also think we should not merge this one - similarly to Daniel, this is -0 for me. Since everything we do is python, overriding the get_pandas_df in your own operator is almost as easy as passing a parameter and it gives you much better flexibility. Also, I think if Snowflake has a captitalised names, the best you can do is to keep it this way throughout the whole "data" journey - to avoid confusion, even for the cases where you have to do any kind of matching/lineage kind of analysis, it simply makes sense to keep it consistent. Adding an "optional parameter" seems like it is not dangerous, but I think we should avoid doing so if the option is "niche" and when you can do it almost as easily by other means. I think we are all for closing the PR. Are you convinced @JavierLopezT ?1 |
Sure. I'll close it myself |
In snowflake, everything is uppercase. In get_pandas_df this could be inconvenient. Sometimes you have to access certain columns, and you don't want to type the name of the columns in uppercase. This is especially inconvenient if you are migrating from Redshift or another DW to Snowflake, because you have to change all your pandas code.
I aim to solve this with this PR. If you include a boolean in the extra field of your snowflake connection as True, all the columns will get lowercase automatically.
I guess this needs test, but I have no idea how to begin. Could anyone help me, please? Thanks