New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpHook. Use request factory and respect defaults #14701
Conversation
Your "backstory to this PR" perfectly documents the "why" hence should definitely be included in the commit message. |
airflow/providers/http/hooks/http.py
Outdated
request_options = {} | ||
if "verify" in extra_options: | ||
# Overwrite verify only if it is needed | ||
request_options["verify"] = extra_options["verify"] | ||
|
||
try: | ||
response = session.send( | ||
prepped_request, | ||
stream=extra_options.get("stream", False), | ||
verify=extra_options.get("verify", True), | ||
proxies=extra_options.get("proxies", {}), | ||
cert=extra_options.get("cert"), | ||
timeout=extra_options.get("timeout"), | ||
allow_redirects=extra_options.get("allow_redirects", True), | ||
**request_options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value in the requests Session library to verify is True
and you can change the value in your hook.run execution to False
using the extra_options={"verify": False}
. If you dont sent this parameter from Airflow in the send
function request will assume True
anyway... right?
https://requests.readthedocs.io/en/master/_modules/requests/sessions/#Session.send
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @marcosmarxm thanks for looking into this!
Default verify=True
is correct only for convenience methods of requests library (e.g. requests.get
, requests.post
etc), as they are using session factory and building correct session.verify
value inside Session.request
:
https://requests.readthedocs.io/en/master/_modules/requests/sessions/#Session.merge_environment_settings
I've updated hook and replaced requests.Request
with session.request
as it does more and behaves more like expected from requests.post
.
Issue of session.send
is that if we say session.send(..., verify=True)
it will overwrite whatever is set by merge_environment_settings
and will forcefully set verify=True
(thanks to kwargs.setdefault('verify', self.verify)
in Session.send
code) and this breaks SSL certificate configuration provided by OS, same goes for verify=False
.
you can change the value in your hook.run
I can not, as we're not using hook.run
directly, rather via other hooks, such as slack hook etc.
Hope my comment clears out reasoning behind this change.
@ngaranko can you take a look in the failed tests? All providers using HttpHook are failing. |
b386367
to
759c307
Compare
@marcosmarxm fixed tests and updated code. Also added extra tests to illustrate issue. |
759c307
to
33c1ab5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See individual comments
@ashb Corrected code and added explanation to why new empty method is needed inside |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease. |
Re-triggering CI. |
Awesome work, congrats on your first merged pull request! |
Use Request's
session.request
factory for HTTP request initiation, this will use environment variables and sensible defaults for requests.Also use
verify
option only if it is provided torun
method, as requests library already defaults toTrue
.Backstory for this PR:
Our organization uses firewalls and custom SSL certificates to communicate between systems, this can be achieved via
CURL_CA_BUNDLE
andREQUESTS_CA_BUNDLE
environment variables.Requests library takes both into account and uses them as default value for
verify
option when sending request to remote system.Current implementation is setting
verify
to True, which overwrites defaults and as results requests can not be made due to SSL verification issues. This PR is fixing the problem.