Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OdbcHook string values in connect_kwargs dict converts to None #15016

Closed
krnhotwings opened this issue Mar 25, 2021 · 8 comments · Fixed by #15510
Closed

OdbcHook string values in connect_kwargs dict converts to None #15016

krnhotwings opened this issue Mar 25, 2021 · 8 comments · Fixed by #15510

Comments

@krnhotwings
Copy link

Apache Airflow version: 2.0.1

What happened:

OdbcHook returns None for non-boolean-like string values in connect_kwargs dict arg

What you expected to happen:

connect_kwarg values should remain as is.

How to reproduce it:

>>> from airflow.providers.odbc.hooks.odbc import OdbcHook
>>> OdbcHook('my_conn', connect_kwargs={'CurrentSchema': 'SCHEMA'}).connect_kwargs
{'CurrentSchema': None}

Anything else we need to know:

The issue lies in:

if val.lower() == 'true':
return True
elif val.lower() == 'false':
return False

There's no else block that returns val. As it is, any string value will instead return None.

Pardon my ignorance on the subject, but this raises a question: Why is clean_bool being called in the first place for a user-provided dictionary? I'm not sure how this is necessary because the user can provide a literal boolean value in the dictionary if needed, no? If in the event that a driver needs to take a case-sensitive boolean string for some parameter, then clean_bool would make it impossible to provide such a value.

@krnhotwings krnhotwings added the kind:bug This is a clearly a bug label Mar 25, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 25, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@Goodkat
Copy link
Contributor

Goodkat commented Apr 23, 2021

I can take it, if you don't mind

@marcosmarxm
Copy link
Contributor

@krnhotwings this function is useful because the connect_kwargs also is able to receive a JSON object and clean_bool maps the JSON bool type to Python boolean type. The function will convert "true" to True and doesn't change other strings.

The connect_kwargs you use can be also configured in the UI when adding the ODBC connection.
You can see more here:
https://airflow.apache.org/docs/apache-airflow-providers-odbc/stable/connections/odbc.html

@krnhotwings
Copy link
Author

@marcosmarxm Sorry, but I'm still not entirely sure why clean_bool is necessary. In Python's eyes, JSON objects would just be raw strings that need to be parsed by the json lib and converted into a dictionary with Python data types, correct?

It looks like clean_bool is being applied to the merged dicts of connection_extra_lower and _connect_kwargs. connection_extra_lower is just applying .lower() to the keys of connection.extra_dejson. extra_dejson is using Python's json lib to parse the extras string into a dictionary, which should handle unquoted JSON boolean values, if I'm not mistaken:

obj = json.loads(self.extra)

Looking at the documentation you linked, I noticed the quoted JSON boolean values in the very last example code block, but I would think that it should all work correctly with unquoted raw JSON booleans, which should make clean_bool unnecessary.

For example, if you created a connection testconn in the UI with Extra:

{
  "connect_kwargs": {
    "DummySetting": true
  }
}

it should yield:

>>> airflow.providers.odbc.hooks.odbc.OdbcHook('myconn').connect_kwargs
{'DummySetting': True}

Am I missing something else?

(Again, pardon any ignorance on my part.)

@marcosmarxm
Copy link
Contributor

@krnhotwings you are correct and nice work diving into this. Maybe this is just legacy structure that keep until nowadays.
Just for sanity: do you really add the connect_kwargs without double-quotes in the UI and works? OR can you just give a try?

@krnhotwings
Copy link
Author

@marcosmarxm Yes, I have tested in 2.0.2 and have my extras as follows:

{
  "connect_kwargs": {
    "CurrentSchema": "TEST",
    "Blah": true,
    "Blah2": false
  }
}
>>> from airflow.providers.odbc.hooks.odbc import OdbcHook                                         
>>> OdbcHook('myconn').connect_kwargs
{'CurrentSchema': None, 'Blah': True, 'Blah2': False}  

(Had to double-check today. It was getting late last night.)

@Goodkat
Copy link
Contributor

Goodkat commented Apr 26, 2021

@marcosmarxm I have removed clean_bool on my local copy and can confirm that it returns the schema_name instead of None now.
I used the folowing UI extras for my tests and it works for unquoted boolean values as well:

{
  "connect_kwargs": {
    "CurrentSchema": "SCHEMA",
    "Blah": true,
    "Blah2": false
  }
}
>>> from airflow.providers.odbc.hooks.odbc import OdbcHook                                         
>>> OdbcHook('myconn').connect_kwargs
{'CurrentSchema': 'SCHEMA', 'Blah': True, 'Blah2': False}

As the next step, I will make and commit the changes within the PR #15510

@dstandish
Copy link
Contributor

Why is clean_bool being called in the first place for a user-provided dictionary? I'm not sure how this is necessary because the user can provide a literal boolean value in the dictionary if needed, no? If in the event that a driver needs to take a case-sensitive boolean string for some parameter, then clean_bool would make it impossible to provide such a value.

TLDR I agree we should remove clean_bool

As to why it was put in, in the first place, here's what I think happened.

Observe what happens with airflow's connection URI format when we try to pass a boolean value:

>>> c = Connection(conn_id='hello', uri='hello://?my_val=True')
>>> c.extra_dejson
{'my_val': 'True'}

It's impossible to produce a json object with boolean values.

So when you are using top level key-value pairs in conn extra then in some cases it makes sense to cast to bool.

I suspect maybe initially in the development of this hook the connect kwargs were top level within extra, where doing this cast would make sense.

But when dealing with nested json, the boolean vs. string issue becomes irrelevant and you have new problems to solve. Namely, at the time this hook was merged, airflow's conn URI did not support nested json. So this hook did not actually allow for storage of connect_kwargs in extra when using the URI format. For that, you'd have to add a json.loads-if-str conversion here. But now that we have support for arbitrary json in conn extra, there's no need for such a conversion.

So since connect_kwargs is nested json, there's no valid reason for converting to bool, and I suspect this was just an oversight, and accordingly, even though it could be fixed, it is best to remove.

@eladkal eladkal linked a pull request May 11, 2021 that will close this issue
potiuk pushed a commit that referenced this issue Jun 13, 2021
* OdbcHook returns None. Related to #15016 issue.

This PR is related to #15016 issue.
OdbcHook returns None for non-boolean-like string values in connect_kwargs dict arg, however connect_kwarg values should remain as is in this case.
According to the discussion on #15016, we agreed to remove the clean_bool function, as it is not needed actually for processing boolean values in JSON .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants