Skip to content

Conversation

@ShelRoman
Copy link
Contributor

@ShelRoman ShelRoman commented Mar 13, 2024

I found this error while trying to use initContainers with spark job after upgrading apache-airflow-providers-cncf-kubernetes to 8.0.1 (reproducible on 8.0.0 as well).

[2024-03-13, 11:36:49 UTC] {custom_object_launcher.py:312} ERROR - Exception when attempting to create spark job
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 305, in start_spark_job
    self.check_pod_start_failure()
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 348, in check_pod_start_failure
    raise AirflowException(f"Spark Job Failed. Status: {waiting_reason}, Error: {waiting_message}")
airflow.exceptions.AirflowException: Spark Job Failed. Status: PodInitializing, Error: None
[2024-03-13, 11:36:49 UTC] {taskinstance.py:2728} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 439, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py", line 265, in execute
    self.pod = self.get_or_create_spark_crd(self.launcher, context)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py", line 223, in get_or_create_spark_crd
    driver_pod, spark_obj_spec = launcher.start_spark_job(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 313, in start_spark_job
    raise e
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 305, in start_spark_job
    self.check_pod_start_failure()
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/custom_object_launcher.py", line 348, in check_pod_start_failure
    raise AirflowException(f"Spark Job Failed. Status: {waiting_reason}, Error: {waiting_message}")
airflow.exceptions.AirflowException: Spark Job Failed. Status: PodInitializing, Error: None

Using a patched operator with these changes helped to overcome the issue.
There might be other statutes to be included in this check.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Mar 13, 2024
@eladkal
Copy link
Contributor

eladkal commented Mar 13, 2024

Can you add unit test to avoid regression?

@ShelRoman
Copy link
Contributor Author

Can you add unit test to avoid regression?

Sure, will do

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, add a few tests

@eladkal
Copy link
Contributor

eladkal commented Apr 3, 2024

@ShelRoman kind reminder, can you please add tests to cover this change?

@ShelRoman
Copy link
Contributor Author

@ShelRoman kind reminder, can you please add tests to cover this change?

done

@ShelRoman ShelRoman requested a review from amoghrajesh April 16, 2024 13:20
@eladkal eladkal requested a review from romsharon98 May 1, 2024 03:13
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me +1
@eladkal @hussein-awala WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants