Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpHook. Use request factory and respect defaults #14701

Merged
merged 4 commits into from May 4, 2021

Conversation

ngaranko
Copy link
Contributor

Use Request's session.request factory for HTTP request initiation, this will use environment variables and sensible defaults for requests.
Also use verify option only if it is provided to run method, as requests library already defaults to True.


Backstory for this PR:
Our organization uses firewalls and custom SSL certificates to communicate between systems, this can be achieved via CURL_CA_BUNDLE and REQUESTS_CA_BUNDLE environment variables.
Requests library takes both into account and uses them as default value for verify option when sending request to remote system.

Current implementation is setting verify to True, which overwrites defaults and as results requests can not be made due to SSL verification issues. This PR is fixing the problem.

@gkoller
Copy link

gkoller commented Mar 10, 2021

Your "backstory to this PR" perfectly documents the "why" hence should definitely be included in the commit message.

Comment on lines 184 to 197
request_options = {}
if "verify" in extra_options:
# Overwrite verify only if it is needed
request_options["verify"] = extra_options["verify"]

try:
response = session.send(
prepped_request,
stream=extra_options.get("stream", False),
verify=extra_options.get("verify", True),
proxies=extra_options.get("proxies", {}),
cert=extra_options.get("cert"),
timeout=extra_options.get("timeout"),
allow_redirects=extra_options.get("allow_redirects", True),
**request_options,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value in the requests Session library to verify is True and you can change the value in your hook.run execution to False using the extra_options={"verify": False}. If you dont sent this parameter from Airflow in the send function request will assume True anyway... right?

https://requests.readthedocs.io/en/master/_modules/requests/sessions/#Session.send

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @marcosmarxm thanks for looking into this!

Default verify=True is correct only for convenience methods of requests library (e.g. requests.get, requests.post etc), as they are using session factory and building correct session.verify value inside Session.request:
https://requests.readthedocs.io/en/master/_modules/requests/sessions/#Session.merge_environment_settings

I've updated hook and replaced requests.Request with session.request as it does more and behaves more like expected from requests.post.

Issue of session.send is that if we say session.send(..., verify=True) it will overwrite whatever is set by merge_environment_settings and will forcefully set verify=True (thanks to kwargs.setdefault('verify', self.verify) in Session.send code) and this breaks SSL certificate configuration provided by OS, same goes for verify=False.

you can change the value in your hook.run

I can not, as we're not using hook.run directly, rather via other hooks, such as slack hook etc.

Hope my comment clears out reasoning behind this change.

@marcosmarxm
Copy link
Contributor

@ngaranko can you take a look in the failed tests? All providers using HttpHook are failing.

@ngaranko ngaranko force-pushed the http_hook_verify_certificates branch from b386367 to 759c307 Compare March 15, 2021 19:03
@ngaranko
Copy link
Contributor Author

@marcosmarxm fixed tests and updated code. Also added extra tests to illustrate issue.

@ngaranko ngaranko force-pushed the http_hook_verify_certificates branch from 759c307 to 33c1ab5 Compare March 23, 2021 11:05
@kaxil kaxil requested a review from ephraimbuddy April 22, 2021 14:28
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See individual comments

@ngaranko
Copy link
Contributor Author

@ashb Corrected code and added explanation to why new empty method is needed inside FakeSession.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Apr 27, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.

@ashb
Copy link
Member

ashb commented May 4, 2021

Re-triggering CI.

@ashb ashb closed this May 4, 2021
@ashb ashb reopened this May 4, 2021
@ashb ashb merged commit ca432ee into apache:master May 4, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented May 4, 2021

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants