Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add container_name option for SparkKubernetesSensor #26560

Merged
merged 11 commits into from
Oct 10, 2022

Conversation

hanna-liashchuk
Copy link
Contributor

@hanna-liashchuk hanna-liashchuk commented Sep 21, 2022

closes: #18468
closes: #23114

In case if a SparkApplications has a sidecar container, SparkKubernetesSensor fails to retrieve logs for the application because the container name is not specified. According to this constants, Spark driver container always named "spark-kubernetes-driver".

Error message look like this:

[2022-09-20, 17:06:41 EEST] {{spark_kubernetes.py:89}} WARNING - Could not read logs for pod XXXX. It may have been disposed.
Make sure timeToLiveSeconds is set on your SparkApplication spec.
underlying exception: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '86d533f0-a61e-4126-a4fa-a11d03c40ec0', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 20 Sep 2022 14:06:41 GMT', 'Content-Length': '268'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"a container name must be specified for pod XXXX, choose one of: [spark-kubernetes-driver minio-sidekick]","reason":"BadRequest","code":400}\n'

[2022-09-20, 17:06:41 EEST] {{spark_kubernetes.py:115}} INFO - Spark application ended successfully

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Sep 21, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@hanna-liashchuk
Copy link
Contributor Author

hi @jedcunningham, could you please take a look at my PR?

@eladkal eladkal changed the title [ISSUE-18468] add container_name option for SparkKubernetesSensor add container_name option for SparkKubernetesSensor Sep 23, 2022
@bbenzikry
Copy link
Contributor

bbenzikry commented Oct 5, 2022

Hi @hanna-liashchuk thanks for taking the time to work on this.

One thing in this regard is that the container name should probably be overridable
This will be good for custom spark distros while providing a bit of future-proofing in case the default does change upstream

@hanna-liashchuk
Copy link
Contributor Author

@bbenzikry, I was aiming to do it as a parameter for SparkKubernetesSensor, by analogy with other input parameters, so the user can specify it if it's different from the default. Do I miss some parts of this implementation? Please advice
Thanks

@bbenzikry
Copy link
Contributor

@hanna-liashchuk , what I meant by the comment isn't necessarily a code issue or a blocker for the PR ( sorry if that was the implication ) but just that it should be taken into consideration.
For example, one thing I thought of was that given a different container name, if there's an error - some output should be added to explain the reasoning, as a helpful indicator to the sensor user.
In addition, a differing container name should probably be reflected in a test case.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random error - merging.

@potiuk potiuk merged commit 5c97e5b into apache:main Oct 10, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 10, 2022

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
3 participants