Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition #36940

Closed
wants to merge 3 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 21, 2022

What changes were proposed in this pull request?

This PR borrows the idea from #36928 but adds some more changes in order for scheduled jobs to share the precondition so all conditional logic is consolidated here.

This PR also adds a new option to is-changed.py so dependent modules can be checked together. In this way, we don't have to change build_and_test.yml often when we add a new module.

In addition, this PR removes type because precondition job now replaces it.

Lastly, this PR enables PySpark, SparkR TPC-DS and Docker integration tests for scheduled jobs when applicable.

Closes #36928

Why are the changes needed?

To make it easier to read.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

Tested locally and in my fork (https://github.com/HyukjinKwon/spark/actions)

@HyukjinKwon HyukjinKwon marked this pull request as draft June 21, 2022 11:27
@HyukjinKwon
Copy link
Member Author

Copy link
Contributor

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice how this removes the need for that extra layer of logic.

.github/workflows/build_and_test.yml Show resolved Hide resolved
.github/workflows/build_java11.yml Show resolved Hide resolved
.github/workflows/build_java11.yml Show resolved Hide resolved
.github/workflows/build_hadoop2.yml Show resolved Hide resolved
.github/workflows/build_hadoop2.yml Show resolved Hide resolved
.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved
.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this. In general, the direction looks correct and good for the future.
BTW, how do we validate this PR?

@EnricoMi
Copy link
Contributor

I found it best to test by removing if "root" in test_modules from dev/is-changed.py, so that changes from this PR do are not considered by the CI in this PR. Then, by triggering some changes in spark or pyspark should just trigger those to be built.

The scheduled jobs are best to be tested by declaring this branch as the default branch, so that GitHub runs them as defined here.

Besides that, it can only be tested from master, sadly.

@HyukjinKwon
Copy link
Member Author

BTW, how do we validate this PR?

I will test locally, and try to run the scheduled jobs in my fork with some manual changes (to allow running in my fork). That's not perfect but would still test reasonably enough before getting this in.

@HyukjinKwon HyukjinKwon force-pushed the SPARK-39529 branch 4 times, most recently from bd3b11a to 4f1107b Compare June 22, 2022 10:00
@HyukjinKwon HyukjinKwon marked this pull request as ready for review June 22, 2022 10:15
pyspark_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
pyspark=`./dev/is-changed.py -m $pyspark_modules`
sparkr=`./dev/is-changed.py -m sparkr`
tpcds=`./dev/is-changed.py -m sql`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we don't need any change in the script. The script already checks the dependent modules.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jun 22, 2022

I tested all the possible scenarios (as far as I can tell). should be good to go.

Copy link
Contributor

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@HyukjinKwon
Copy link
Member Author

Will merge this tomorrow morning in my time (KST) .... so I can monitor and fix it if something goes wrong ...

@dongjoon-hyun
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member Author

Thanks!

HyukjinKwon added a commit that referenced this pull request Jul 5, 2022
…once

### What changes were proposed in this pull request?

This PR proposes to skip the builds that have never passed in history. #36940 added some more combinations to check if they passed, and this PR disables some of them that do not pass.

See also:
- SPARK-39681 (https://github.com/apache/spark/runs/7189972241?check_suite_focus=true)
- SPARK-39685 (https://github.com/apache/spark/runs/7189971643?check_suite_focus=true)
- SPARK-39682 (https://github.com/apache/spark/runs/7189971505?check_suite_focus=true)
- SPARK-39684 (https://github.com/apache/spark/runs/7054055518?check_suite_focus=true)

### Why are the changes needed?

In order to check the test status easily with each combination of profile and branch.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Virtually configuration changes. Will be checked after it's merged.

Closes #37091 from HyukjinKwon/SPARK-39686.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Dec 17, 2022
### What changes were proposed in this pull request?

This PR is a kind of a followup of #36940 which fixes a mistake of replacing `inputs.type`.

This PR changes it to use `PYSPARK_CODECOV` that's defined from https://github.com/apache/spark/blob/353df517f4d37aabbf807638385b0ab92dd95397/.github/workflows/build_coverage.yml#L39.

Similar approach is being used in https://github.com/apache/spark/blob/353df517f4d37aabbf807638385b0ab92dd95397/.github/workflows/build_ansi.yml#L39 with https://github.com/apache/spark/blob/919e556d182f86b3e1e22cd05a546e0f2fc3e307/.github/workflows/build_and_test.yml#L797

### Why are the changes needed?

To recover the Codecov site. It stopped working for few months (https://app.codecov.io/gh/apache/spark).

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Will monitor the test cases. I am sure it works :-)

Closes #39107 from HyukjinKwon/SPARK-41559.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the SPARK-39529 branch January 15, 2024 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants