Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17200][PROJECT INFRA][BUILD][SPARKR] Automate building and testing on Windows (currently SparkR only) #14859

Closed
wants to merge 32 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Aug 29, 2016

What changes were proposed in this pull request?

This PR adds the build automation on Windows with AppVeyor CI tool.

Currently, this only runs the tests for SparkR as we have been having some issues with testing Windows-specific PRs (e.g. #14743 and #13165) and hard time to verify this.

One concern is, this build is dependent on steveloughran/winutils for pre-built Hadoop bin package (who is a Hadoop PMC member).

How was this patch tested?

Manually, https://ci.appveyor.com/project/HyukjinKwon/spark/build/88-SPARK-17200-build-profile
This takes roughly 40 mins.

Some tests are already being failed and this was found in #14743 (comment). Fixed in SPARK-17339

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Aug 29, 2016

cc @rxin
@srowen (for build)
@JoshRosen (for project infra)
@dongjoon-hyun (who suggested AppVeyor CI)
@steveloughran (who is the author of winutils)
@felixcheung and @shivaram (for SparkR)

@srowen
Copy link
Member

srowen commented Aug 29, 2016

Hm, we also had Travis config that isn't used now, to try to add Java style checking. I can see the value in adding Windows testing, but here we have a third CI tool involved. I'm concerned that I for example wouldn't know how to maintain it.

I suppose we also need to decide if Windows is even supported? I don't think it is supported for development, certainly. For deployment -- best effort is my understanding, but may not work.

If this relies on a bunch of setup to run (including needing a sorta unofficial copy of Hadoop's winutils) then does testing it this way say much about how it works on Windows?

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Aug 29, 2016

Ah, I thought Windows is already officially supported assuming from this documentation https://github.com/apache/spark/blob/master/docs/index.md#downloading.

BTW, I do understand your concerns but I believe this will make easy to review Window-specific tests. I mean, at least, we can identify Windows-specific problems easily and as you already know, I believe it is hard to review the PRs for Windows-specific problems currently.

I wouldn't mind if this should be closed because, at least, I proposed a automated build on Windows here so reviewers can use this after manually (locally) merging this anyway.

My personal opinion is, though, to try to use this as it does not affect code-base or other builds.

@srowen
Copy link
Member

srowen commented Aug 29, 2016

Good point, I suppose there is a weak promise there that it runs on Windows.
Could anyone else who knows Windows weigh in? I assume @dongjoon-hyun is on board.

@shivaram
Copy link
Contributor

Thanks @HyukjinKwon - I for one would at least like the SparkR client to work on Windows as I think there are quite a few R users who are on Windows.

@HyukjinKwon Could you add some documentation on how this build works, where we can check its status and what we can do to restart a build etc ?

@JoshRosen
Copy link
Contributor

It looks like a few other Apache projects use AppVeyor and it looks like there's a designated process for asking INFRA to enable it for a GitHub mirror: https://ci.appveyor.com/project/ApacheSoftwareFoundation/thrift/history

For example, Thrift uses it: https://ci.appveyor.com/project/ApacheSoftwareFoundation/thrift

One concern of mine is whether the large number of Spark pull requests and commits would overwhelm Apache's AppVeyor account limits if every one of those builds were to be tested with AppVeyor.

Another minor concern of mine relates to GitHub build statuses: if AppVeyor reports its statuses via the GitHub commit status API but AMPLab Jenkins continues to report only via comments then the build status indicators on the pull request page would look confusing.

It would be cool if there was a way to have this enabled by default for the master branch but have it be opt-in for pull requests.

@shivaram
Copy link
Contributor

Good point on only running this on the master branch. We could even run it periodically (say nightly) instead of on every commit.

Pop-Location

# ========================== R
$rVer = "3.3.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try 3.3.1?

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64556 has finished for PR 14859 at commit 1f23b05.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Thanks all. Then, let me try to write some documentation for..

  • How this build works
  • Where we can check its status
  • What we can do to restart a build
  • Options to enable/disable the build for branch, PR and commits.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Aug 30, 2016

How to set up

  1. Sign-up https://ci.appveyor.com

    2016-09-04 11 07 48 2016-09-04 11 07 58 2016-09-04 11 08 10
  2. Go to profile

    2016-09-04 11 09 43
  3. Enable the link with GitHub

    2016-09-04 11 09 52 2016-09-04 11 10 05
  4. Add a project (Go to the PROJECTS menu and than add a new project)

    • 2016-08-30 12 16 31
    • 2016-08-30 12 16 35
  5. Click Github project list and then click

    • 2016-09-04 11 10 22
    • 2016-09-04 11 10 27
    • 2016-09-04 11 10 36
  6. Push any commit and check the build is running.

    • Click PROJECTS menu

      2016-08-30 12 16 31
    • Click Spark project

      2016-09-04 11 22 37

How to re-build/stop/check the builds

  1. Click PROJECTS and then Spark project

    • Click PROJECTS menu

      2016-08-30 12 16 31
    • Click Spark project

      2016-09-04 11 22 37
  2. Check build status

    2016-09-04 11 23 24
  3. Re-build

    2016-08-30 12 29 41
  4. Stop build

    2016-08-30 1 11 13

How this build works

Identically with Travis CI. Per-commit.

Options to enable/disable the build for branch, PR and commits.

  1. Master branch only

    • Click settings

      2016-08-30 1 19 12
    • Set the branch to build as below:

    2016-08-30 12 42 25 2016-08-30 12 42 33
  2. Run the build periodically

    2016-08-30 12 42 43
  3. Select commits to build (Some PRs)

    • It seems we can filter some commits to run a build as described here
    • If we enable Rolling build and filter the commits, I think it'd be okay maybe.
  4. Disable builds for Pull Reqeusts (no PRs) - this is Github-side.

    • Click Spark's Github Project settings

      2016-08-30 12 54 09

    • Clieck Webhooks & services

    2016-08-30 12 54 14 - Clieck https://ci.appveyor.com/api/github/webhook (pull_request and push) 2016-08-30 12 54 28 - Disable webhook for each pull requests 2016-08-30 12 54 35

BTW, there is a getting started guide here, https://www.appveyor.com/docs

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Aug 30, 2016

To cut it short, my suggestion is,

If it sounds good, I will go ahead and test. But before proceeding, it'd be great if I can hear other opinions. I think it'd be okay just to filter commits via message.

WDYT? - @srowen @JoshRosen @shivaram @felixcheung (I am on holiday until tomorrow. So I would like to test this environment quickly)

@SparkQA
Copy link

SparkQA commented Sep 4, 2016

Test build #64922 has finished for PR 14859 at commit ccf176d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Sep 7, 2016

@HyukjinKwon Given that the build is green after #14960 is this good to go ? Any other comments @dongjoon-hyun @srowen

@dongjoon-hyun
Copy link
Member

Hi, @shivaram and @HyukjinKwon . LGTM.

@HyukjinKwon
Copy link
Member Author

Thank you! I also thimk it's good to go.

- cmd: R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"

build_script:
- cmd: mvn -DskipTests -Psparkr -Phive -Phive-thriftserver package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are fixing the hadoop version that we download in install-dependencies.ps1 does it make sense to also pass in the same profile here ? i.e. -Phadoop-2.6 ?

@shivaram
Copy link
Contributor

shivaram commented Sep 8, 2016

LGTM pending the inline comment about hadoop-2.6

@HyukjinKwon
Copy link
Member Author

@HyukjinKwon
Copy link
Member Author

Please excuse me to upload image here (so that I can use it in the MD.)
2016-09-08 11 37 17

@shivaram
Copy link
Contributor

shivaram commented Sep 8, 2016

Thanks. I'll merge this once Jenkins passes

@felixcheung
Copy link
Member

felixcheung commented Sep 8, 2016

+@steveloughran since this is running his tool
How do we get alert when this fails?

# Install maven and dependencies
- ps: .\dev\appveyor-install-dependencies.ps1
# Required package for R unit tests
- cmd: R -e "install.packages('testthat', repos='http://cran.us.r-project.org')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to include e1071, survival for a few more compatibility tests.
(see DESCRIPTION file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have been broken with newer versions of testthat before, not sure if we should fix the version we run with here to match Jenkins?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I will.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point but its actually tricky to specify a version number using install.packages - FWIW on Jenkins I see

> packageVersion("testthat")
[1] ‘1.0.2’

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true - maybe just print out packageVersion into the log, in case it breaks

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65072 has finished for PR 14859 at commit 8e1954b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65073 has finished for PR 14859 at commit fe9419d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

I just added both survival and e1071. Also, it now prints the version for both.

I re-ran the test here - https://ci.appveyor.com/project/HyukjinKwon/spark/build/88-SPARK-17200-build-profile

@HyukjinKwon
Copy link
Member Author

BTW, I already cc the author of winutils in #14859 (comment) :).

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65083 has finished for PR 14859 at commit 9c06b92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 8, 2016

Test build #65095 has finished for PR 14859 at commit 9c06b92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Sep 8, 2016

Alright, I'm merging this into master. Lets see how this goes in the next few days and we can fine tune this with some follow up PRs if necessary.

@asfgit asfgit closed this in 78d5d4d Sep 8, 2016
@HyukjinKwon HyukjinKwon deleted the SPARK-17200-build branch January 2, 2018 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants