Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hooks extended from http hook #16048

Closed
wants to merge 5 commits into from
Closed

Conversation

msumit
Copy link
Contributor

@msumit msumit commented May 25, 2021

Issue

When we call get_hook method on an HTTP connection, it returns an Airbyte hook & errors out.

>>> h.conn_type
'http'
>>> h.get_hook()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sumit/work/airflow/airflow/models/connection.py", line 304, in get_hook
    return hook_class(**{conn_id_param: self.conn_id})
TypeError: __init__() got an unexpected keyword argument 'http_conn_id'

>>> from airflow.providers_manager import ProvidersManager
>>> a = ProvidersManager().hooks.get(h.conn_type, (None, None, None, None))
>>> a
HookInfo(connection_class='airflow.providers.airbyte.hooks.airbyte.AirbyteHook', connection_id_attribute_name='http_conn_id', package_name='apache-airflow-providers-airbyte', hook_name='HTTP')

Fix

The fix looks rather simple, but not sure what should be the values of conn_type or hook_name of these provider hooks.

After change
>>> h.get_hook()
<airflow.providers.http.hooks.http.HttpHook object at 0x11ab6ef90>

If the approach looks good, will add relevant unit tests.

@potiuk
Copy link
Member

potiuk commented May 25, 2021

I think it looks good. What we need is updates in provider's tests (they are checking for available connections mainly).

@msumit
Copy link
Contributor Author

msumit commented May 25, 2021

@potiuk it creates one problem though, new connection types are getting created for them, something we might not want?

Screenshot 2021-05-25 at 10 50 39 PM

@potiuk
Copy link
Member

potiuk commented May 25, 2021

Actually It's expected, and I see no problems with having them in the list.

I do not think it introduces any problem. The connection type is really UI-only change (that drives the selection of fields automatically mapped from extras to fields in the UI). There is no other effect of the connection type selected - whether it's HTTP or Airbyte, makes no difference whatsoever - it's the connection id that matters.

So IMHO having the specific Airbyte/Dingding etc. connections is harmless (and actually it's good for consistency). Yes - it makes connection list longer, but only if you have the providers installed, which I think is a good reason on its own to show those connection types in the list.

@potiuk
Copy link
Member

potiuk commented May 25, 2021

BTW. Not sure if you are aware but you can at any point change connection type for any of the connections and it will continue to work regardless what type it is.

@msumit
Copy link
Contributor Author

msumit commented May 25, 2021

BTW. Not sure if you are aware but you can at any point change connection type for any of the connections and it will continue to work regardless what type it is.

yup, afaik we just fetch a connection with their conn_id & not really worry abt the type. If seeing some extra conn types is not an issue, then I'll get this PR finish.

@msumit msumit requested review from ashb and potiuk as code owners May 26, 2021 10:55
@msumit msumit changed the title WIP: Fix hooks extented from http hook Fix hooks extended from http hook May 26, 2021
@msumit
Copy link
Contributor Author

msumit commented May 26, 2021

@potiuk any idea on the error I'm getting in one of the checks

ERROR: Cannot install apache-airflow[devel-ci]==2.2.0.dev0 because these package versions have conflicting dependencies.
  
  The conflict is caused by:
      apache-airflow[devel-ci] 2.2.0.dev0 depends on arrow<1.0.0 and >=0.16.0
      The user requested (constraint) arrow==1.1.0

@potiuk
Copy link
Member

potiuk commented May 26, 2021

Yep. Needs rebase.

@potiuk
Copy link
Member

potiuk commented May 26, 2021

Actually @msumit - see #16094 the change brings a feature EXPECTED by the users :). Could you rebase please? I am about to prepare providers release, so I would love to merge that one :)

@msumit msumit closed this May 27, 2021
@msumit msumit reopened this May 27, 2021
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label May 27, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@msumit
Copy link
Contributor Author

msumit commented May 27, 2021

Not sure why the CI is not kicking in. Trying to close and reopen this PR, otherwise will raise a new PR.

@msumit msumit closed this May 27, 2021
@msumit msumit reopened this May 27, 2021
@msumit msumit closed this May 27, 2021
@ashb ashb deleted the fix_http_conn branch May 27, 2021 10:43
@potiuk
Copy link
Member

potiuk commented Jun 1, 2021

Hmm. Actually I was wrong @msumit -> I had a discussion today and it turned out that there is a problem if you are using get_uri() method from the connection, the conn_type is used as URI schema :(

I will think how to solve it (should be possible without breaking compatibility and leaving the custom connections)

@potiuk
Copy link
Member

potiuk commented Jun 1, 2021

This is the offending piece of code.

    def get_uri(self) -> str:
        """Return connection in URI format"""
        uri = f"{str(self.conn_type).lower().replace('_', '-')}://"

The get_uri() method is not used in many places (mostly in DBApi which is irrelevant). The get_uri method It's not used in HTTPHook to actually make any call (the HTTPHook uses "schema" field for the protocol), but it would be nice to get it correct in get_uri as well.

@potiuk
Copy link
Member

potiuk commented Jun 1, 2021

Though. maybe we should leave it as a 🐛 turning into a feature ;)

@msumit
Copy link
Contributor Author

msumit commented Jun 1, 2021

Ooops... not sure if it has caused any failures to these providers. Have to figure out a fix quickly or can change the conn_type to http for impacted providers.

@potiuk
Copy link
Member

potiuk commented Jun 1, 2021

No worries. This is my fault :D I overlooked it and I have not yet released the providers (I will wait until this is fixed). And It has really very little impact (if at all). The get_uri method for the HTTP hook is really not used. The hook uses schema field to build the URI. The `get_uri" method is used only to display connection details in "aiflow connections list/get" commands as far as I know. However yeah - someone could use it potentially in their DAGS/custom operators.

I think the easiest way to fix is to return a new HTTPConnection(Connection) object in 'get_connection()" method of the HTTPHook. This new HTTPConnection hook could simply replace the schema in their get_uri() method to 'http' or whatever the schema is.

I think that would be fully backwards compatible and behave "as expected".

@potiuk
Copy link
Member

potiuk commented Jun 1, 2021

There is a little difficulty in this because Connection is the SQLAlchemy model so this is not as easy as deriving from it. I am tempted to monkeypatch the get_uri() method of the Connection object in this case rather than derive from it.

This is a bit of a hack and results from lack of consistency how HTTP Hook treats schema/how other connections are building URIs (using conn_type for URI is terrible idea that's why I have not even checked it). So we should likely address it in 2.2 (I think we can) but for now maybe just monkeypatching is the best option? @uranusjr @ashb WDYT?

@msumit
Copy link
Contributor Author

msumit commented Jun 1, 2021

@potiuk I'm thinking something like this

    def get_uri_initials(self) -> str:
        try:
            hook = self.get_hook()
            return hook.get_uri_initials()
        except:
            return f"{str(self.conn_type).lower().replace('_', '-')}://"

    def get_uri(self) -> str:
        """Return connection in URI format"""
        uri = self.get_uri_initials()

@potiuk
Copy link
Member

potiuk commented Jun 1, 2021

Yeah. All good but only for the new version of Airflow. Connection is part of Airflow, Hooks are part of providers. So when we release new providers, they might still be used on Airflow 2.1.0 (which has unmodified Connection object).

I think then that monkey-patching the connection in the Hooks (temporarily until we fix it in upcoming Airflow release) is the simplest approach.

@msumit
Copy link
Contributor Author

msumit commented Jun 2, 2021

@potiuk sorry but I'm not able to understand what you are proposing. Can you raise a PR or a draft PR of what you are thinking.

@potiuk
Copy link
Member

potiuk commented Jun 2, 2021 via email

@potiuk
Copy link
Member

potiuk commented Jun 2, 2021

Will do - but not immediately - I am away this (very) long weekend we have in Poland

@potiuk
Copy link
Member

potiuk commented Jun 6, 2021

OK. I looked at it, and I think it's not worth changing. the get_uri() method of the Connection is really not going to be used in HttpHook. It uses "requests" under the hood, and the "get_conn()" method returns "requests.Session" - I do not think there is any use case where you would like to use get_connection().get_uri() and run http request for it.

I think we can easily fix it in the next release of Airflow and not worry about providers now.

@msumit
Copy link
Contributor Author

msumit commented Jun 7, 2021

Sounds good to me.

@potiuk
Copy link
Member

potiuk commented Jun 7, 2021

Documentation /changelogs generated for all the providers :P in #16294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants