Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAGs log view of EmptyOperator shows confusing error Request URL missing protocol. #34228

Closed
2 tasks done
VergeDX opened this issue Sep 9, 2023 · 10 comments · Fixed by #35536
Closed
2 tasks done

DAGs log view of EmptyOperator shows confusing error Request URL missing protocol. #34228

VergeDX opened this issue Sep 9, 2023 · 10 comments · Fixed by #35536
Assignees
Labels
affected_version:2.7 Issues Reported for 2.7 area:core area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues good first issue kind:bug This is a clearly a bug

Comments

@VergeDX
Copy link
Contributor

VergeDX commented Sep 9, 2023

Apache Airflow version

2.7.1

What happened

At airflow log view (DAG -> task -> log) of EmptyOperator, an error shows in window:

*** Could not read served logs: Request URL is missing an 'http://' or 'https://' protocol.

Full trace in console:

 webserver | [2023-09-09T09:26:47.632+0800] {file_task_handler.py:524} ERROR - Could not read served logs
 webserver | Traceback (most recent call last):
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
 webserver | yield
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 218, in handle_request
 webserver | resp = self._pool.handle_request(req)
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 214, in handle_request
 webserver | raise UnsupportedProtocol(
 webserver | httpcore.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.
 webserver |
 webserver | The above exception was the direct cause of the following exception:
 webserver |
 webserver | Traceback (most recent call last):
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/airflow/utils/log/file_task_handler.py", line 507, in _read_from_logs_server
 webserver | response = _fetch_logs_from_service(url, rel_path)
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/airflow/utils/log/file_task_handler.py", line 90, in _fetch_logs_from_service
 webserver | response = httpx.get(
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_api.py", line 189, in get
 webserver | return request(
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_api.py", line 100, in request
 webserver | return client.request(
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_client.py", line 814, in request
 webserver | return self.send(request, auth=auth, follow_redirects=follow_redirects)
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_client.py", line 901, in send
 webserver | response = self._send_handling_auth(
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_client.py", line 929, in _send_handling_auth
 webserver | response = self._send_handling_redirects(
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_client.py", line 966, in _send_handling_redirects
 webserver | response = self._send_single_request(request)
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_client.py", line 1002, in _send_single_request
 webserver | response = transport.handle_request(request)
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 217, in handle_request
 webserver | with map_httpcore_exceptions():
 webserver | File "/nix/store/bc45k1n0pkrdkr3xa6w84w1xhkl1kkyp-python3-3.10.12/lib/python3.10/contextlib.py", line 153, in __exit__
 webserver | self.gen.throw(typ, value, traceback)
 webserver | File "/persistent/Works/apache-airflow/venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
 webserver | raise mapped_exc(message) from exc
 webserver | httpx.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.

What you think should happen instead

Old version of airflow (I tested 2.6.0), log view of EmptyOperator will show noting,
I'm not sure this behavior is documented, but It caused confusion for me at least.

How to reproduce

  1. I use python venv with whl in GitHub releases for installation.
  2. Run airflow standalone for development deploy, I also disable load_examples.
  3. Using EmptyOperator example like below, I also disable catchup of DAG:
import datetime

from airflow import DAG
from airflow.operators.empty import EmptyOperator

with DAG(
    dag_id="demo",
    start_date=datetime.datetime(2021, 1, 1),
    schedule="@daily",
    catchup=False,
):
    EmptyOperator(task_id="demo")
  1. Enable the DAG in airflow webserver, wait it for a run.
  2. Navigate to demo dag, demo task; press log view at toolbar.

Operating System

NixOS 23.05 (Stoat)

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.7.1
apache-airflow-providers-ftp==3.5.1
apache-airflow-providers-http==4.5.1
apache-airflow-providers-imap==3.3.1
apache-airflow-providers-sqlite==3.4.3

Deployment

Virtualenv installation

Deployment details

Our enterprise internal deployment using modified docker-compose, version 2.7.0 shows the same issue.
But our old 2.5.2 venv deployment works fine. EmptyOperator shows nothing as normal.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@VergeDX VergeDX added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Sep 9, 2023
@VergeDX
Copy link
Contributor Author

VergeDX commented Sep 9, 2023

I will keep testing with venv deployment, I tested:

Version From Result
2.6.0 nixpkgs Pass
2.7.0 nixpkgs Error
2.7.1 GitHub Release whl Error
2.6.3 GitHub Release whl Error
2.6.1 GitHub Release whl Error

So this "bug" comes between 2.6.0 ~ 2.6.1,
I think I should checkout release not & git blame...

@VergeDX VergeDX changed the title DAGs log view of EmptyOperator shows confusing error in airflow version [2.6.0, 2.7.0). DAGs log view of EmptyOperator shows confusing error Request URL missing protocol. Sep 9, 2023
@jscheffl jscheffl added area:webserver Webserver related Issues good first issue area:UI Related to UI/UX. For Frontend Developers. and removed needs-triage label for new issues that we didn't triage yet labels Sep 10, 2023
@jscheffl
Copy link
Contributor

You are right, EmptyOperator is not executed, so there will be no logs. Agree that an error message might be confusing to users.

@VergeDX
Copy link
Contributor Author

VergeDX commented Sep 11, 2023

EmptyOperator behaves the same in 2.6.0 and 2.6.1 and will not create the corresponding logs folder.

@WatchTower001110 WatchTower001110 mentioned this issue Sep 11, 2023
@Jyoticharan
Copy link

I am really very interested to work on this issue , please assign me the issue

@eladkal eladkal added the affected_version:2.7 Issues Reported for 2.7 label Sep 13, 2023
@VergeDX
Copy link
Contributor Author

VergeDX commented Sep 14, 2023

I hooked the function _fetch_logs_from_service and get the URL, and the following demo gives the same error:

url = 'http://:8793/log/dag_id=my_dag_name/run_id=manual__2023-09-14T02:35:48.490841+00:00/task_id=task/attempt=1.log'

if __name__ == '__main__':
    import httpx
    httpx.get(url)

error:

Traceback (most recent call last):
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_transports/default.py", line 218, in handle_request
    resp = self._pool.handle_request(req)
  File "/nix/store/yam9n86x37blnhz1rcpv8aq0zgxn64f3-python3.10-httpcore-0.15.0/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 208, in handle_request
    raise UnsupportedProtocol(
httpcore.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Repos/airflow/demo.py", line 5, in <module>
    httpx.get(url)
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_api.py", line 189, in get
    return request(
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_api.py", line 100, in request
    return client.request(
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_client.py", line 815, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_client.py", line 902, in send
    response = self._send_handling_auth(
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_client.py", line 930, in _send_handling_auth
    response = self._send_handling_redirects(
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_client.py", line 967, in _send_handling_redirects
    response = self._send_single_request(request)
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_client.py", line 1003, in _send_single_request
    response = transport.handle_request(request)
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_transports/default.py", line 217, in handle_request
    with map_httpcore_exceptions():
  File "/nix/store/zn2g96d0hhk5h8x7982m2gbbawgwsrvz-python3-3.10.11/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/nix/store/sn5yz2ac4qzq8gia029883sjjp8060g9-python3.10-httpx-0.23.0/lib/python3.10/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.

@VergeDX
Copy link
Contributor Author

VergeDX commented Sep 14, 2023

Also, RFC1738 declare HTTP host section cannot be omitted:
https://datatracker.ietf.org/doc/html/rfc1738

@vchiapaikeo
Copy link
Contributor

Happy to pick this up @Jyoticharan if you're no longer working on it / are stuck.

@vchiapaikeo
Copy link
Contributor

vchiapaikeo commented Nov 8, 2023

Looks like this is a result of this block 672ee7f#diff-e7f34f73940eb52d92bb991abedc1c963431c5373c12dff739c8fb7d03e93d3aR324-R333 introduced to find logs if no local or remote logs are found. This is causing the webserver to raise like the OP suggests.

image

@dstandish / @jedcunningham , wdyt if we introduced a condition to handle EmptyOperator? Or we can even catch a wide exception in the second elif block and pass. Alternatively, we can give EmptyOperator a hostname - it defaults to empty string. This will probably result in 403 though and be even more confusing. Do you see a cleaner solution?

image image

@eladkal
Copy link
Contributor

eladkal commented Nov 8, 2023

@dstandish / @jedcunningham , wdyt if we introduced a condition to handle EmptyOperator? Or we can even catch a wide exception in the second elif block and pass. Alternatively, we can give EmptyOperator a hostname - it defaults to empty string. This will probably result in 403 though and be even more confusing. Do you see a cleaner solution?

Can we simply excluded any operator that has

inherits_from_empty_operator = True

Any operator that has this set as true will not be sent to executor

@vchiapaikeo
Copy link
Contributor

Sure - here's a draft: #35536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.7 Issues Reported for 2.7 area:core area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants