Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30601][BUILD] Add a Google Maven Central as a primary repository #27307

Closed
wants to merge 4 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jan 21, 2020

What changes were proposed in this pull request?

This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list.

  1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under .m2.

    • Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing)
    • Maven Central (default) seems less stable but synced quickly.

    We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo.

    val ADDITIONAL_REMOTE_REPOSITORIES =
    buildConf("spark.sql.additionalRemoteRepositories")
    .doc("A comma-delimited string config of the optional additional remote Maven mirror " +
    "repositories. This is only used for downloading Hive jars in IsolatedClientLoader " +
    "if the default Maven Central repo is unreachable.")
    .stringConf
    .createWithDefault(
    "https://maven-central.storage-download.googleapis.com/repos/central/data/")

  2. Currently, we have the hard-corded repository in sbt-pom-reader and this seems overwriting Maven's existing resolver by the same ID central with http:// when initially the pom file is ported into SBT instance. This uses http:// which latently Maven Central disallowed (see [SPARK-30534][INFRA] Use mvn in dev/scalastyle #27242)

    My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it seems using central with https properly. See also [SPARK-30601][BUILD] Add a Google Maven Central as a primary repository #27307 (comment).

    I double checked that we use https properly from the SBT build as well:

    [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ...
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1
    

    This was fixed by adding the same repo ([SPARK-30572][BUILD] Add a fallback Maven repository #27281), central_without_mirror, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback.

  3. While I am here, I fix another issue. Github Action at [SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature #27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS.

    org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12:jar:1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/)
        at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve     (DefaultPluginDependenciesResolver.java:131)
    

    mvn-scalafmt exists in Maven central:

    $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
        ...

    whereas not in GCS mirror:

    $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>%

    In this PR, simply make both repos accessible by adding to pluginRepositories.

  4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second)

Why are the changes needed?

To make the build and Github Action more stable.

Does this PR introduce any user-facing change?

No, dev only change.

How was this patch tested?

I roughly checked local and PR against my fork (HyukjinKwon#2 and HyukjinKwon#3).

@HyukjinKwon
Copy link
Member Author

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 21, 2020

Just for clarification, SPARK-30572 and SPARK-30534 are already fixed in the master. This PR addresses Sean's comment, and 3. in the PR description.

pom.xml Outdated
@@ -258,14 +258,12 @@
</snapshots>
</repository>
<repository>
<id>central_without_mirror</id>
<id>google-maven-central</id>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more like what I was thinking. But does it resolve the hard-coded SBT issue? If so, OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just for clarification, c992716 fixed both hard-coded SBT issue and Github Action issue (SPARK-30572, SPARK-30534) given my tests. This is just a different way to fix both.

@tgravescs
Copy link
Contributor

for the sbt issue, you are saying if it fails from the first maven repo then it falls back to the default version hardcoded in sbt-pom-reader that has http:? So if you have another repo in there it first falls back to try that one. I assume if that one fails then it falls back to the http: one again?

It seems like we should also update the sbt-pom-reader.

@SparkQA
Copy link

SparkQA commented Jan 21, 2020

Test build #117180 has finished for PR 27307 at commit 91bf2c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 21, 2020

@tgravescs, just for clarification, this change is needed to address all of four. So, even if sbt-pom-reader fixes the issue, this change here will still be needed to address other bullet points in the PR description.

For the sbt issue, I am a bit hesitant to explain because I am not super confident of my answer so just take it as a speculation or hypotheses. I think:

  1. sbt-pom-reader is loaded as a SBT plugin.
  2. It tries to convert POM file to SBT project structure
  3. During this process, it overwrites the default central maven resolver at here, which is http.
  4. The conversion is done.
  5. Now, we use SBT to resolve the dependencies by the resolvers defined here, which is https://... .

Before the fix, at 3., central is http and there's no other repos to fall back. So it fails.
After the fix, at 3., google-manve-central becomes a fallback so it can handle during 3.

And yes, currently, I think it's an issue in sbt-pom-reader too.

@srowen
Copy link
Member

srowen commented Jan 21, 2020

Oh yeah good point. Even if it fails to access central over http:// , it should try the Google repo then, this way. That would explain the fix I think.

@tgravescs
Copy link
Contributor

Ok, thanks for the explanation, I was just trying to understand how it fixed the SBT issue and what you say makes sense.

can we file a followup jira to try to fix the sbt-pom-reader as well, that way we don't have to wait for it to fail and fall back.

@SparkQA
Copy link

SparkQA commented Jan 21, 2020

Test build #117181 has finished for PR 27307 at commit 81bc576.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practical, this is a less robust approach than the existing one in the following reasons.

  1. The failure rate on Maven Central is much higher than Google Maven Central as we experienced in GitHub Action environment until now. So, in terms of the number of fallback call, Google Maven Central will be used more frequently as a fallback than the number of fallback call on Maven Central in the existing approach.

  2. More importantly, if Maven Central fails at new dependency (like Chill / Lz4) due to its flakiness, Google Maven Central can not provide a fallback because it doesn't have. It will end up a failure. So, the subset (Google Maven Central) cannot be a complete fallback of the superset (Maven Central) in theory.

    This is used as a fallback when central fails.

I'm -1 for this approach because this is a regression in GitHub Action environment.

@srowen
Copy link
Member

srowen commented Jan 21, 2020

Hm, but what's the problem with the fallback -- does it take a long time to time out? If it's rare, does it matter? I haven't seen Maven Central being particularly problematic (of course the http:// thing happens 100% of the time in sbt-pom-reader)

If Maven Central doesn't have an artifact, then no neither will the mirror. But, then the artifact effectively doesn't exist. This isn't a problem we can solve, but also isn't the problem here. That is, for example. chill 0.9.5 has always been on Maven Central since release.

The status quo doesn't have any fallback to Google, so I'm not seeing how that can be more robust?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 21, 2020

@srowen Maven Central fails during transportation due to heavy connections. I know it well because I've introduced GitHub Action here and have been monitoring it. It's not a problem of the absence of the artifact at all. Maven Central is the ground truth.

You are not considering my second case (2). Maven Central can fail on any artifacts like Chill 0.9.5. and Google Maven Central is not a valid fallback for that as we observed in your PR. Why subset can be a fallback of superset?

@srowen
Copy link
Member

srowen commented Jan 21, 2020

I'm confused, are you arguing for 'no fallback'? that can't possibly be better. As you say here: what do you do if you can't reach Maven Central?

The current implementation is basically: try central but one that is overridden to point to Google by <mirror>, and then try Maven Central under an alias second.

This PR does the same thing, just explicitly. I don't see the difference other than it being simpler?

If chill 0.9.5 were not on Central, it would effectively not exist. Having a mirror doesn't change that, yes. But it is also not the situation here. It is on Central. The Google repo is not here to help this situation. It's here to help your first point.

Conversely, accessing Maven Central is also here to address your second point. You need both at some level to solve both, right?

@dongjoon-hyun
Copy link
Member

The following is wrong, isn't it?

This PR does the same thing, just explicitly. I don't see the difference other than it being simpler?

In computer architecture, we try cache before memory. This PR try memory before cache.

@srowen
Copy link
Member

srowen commented Jan 21, 2020

Ah OK right, this PR tries Central, then Google. Right now, the status quo is trying Google then Central. But we can flip the order here. It would be simpler. Would that be OK?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 21, 2020

Yes. I'm completely okay for that approach Google Maven Central, then Maven Central.
So, that was my question to you previously. Did you remember that I was worrying that the community is okay with Google Maven Central as a main repo. :)

@srowen
Copy link
Member

srowen commented Jan 21, 2020

Yes I don't see a problem with it. It's just a mirror, and already being used in the build in one instance.
What is the potential problem you see?

@dongjoon-hyun
Copy link
Member

It would be great if we can ask it officially in the dev mailing list if @HyukjinKwon and @srowen want to set it a primary repo. I'll give +1 there.

@srowen
Copy link
Member

srowen commented Jan 21, 2020

I sent an email, but you might want to follow up with what the potential issue is. Others may also not be thinking of what you are.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 22, 2020

Okay, so the main argument to switch the order is:

  • Maven Central first Google Maven Central second is slow because the fallback would happen too many times in Github Actions.
  • Google Maven Central first Maven Central second is fast

I don't mind switching the order either; however, I think I have another option:

Why don't we just give a shot to switch the mirrors in .github/workflows/master.yml? Not sure if it works but I can test.

<mirrors>
  <mirror>
   <id>another-central</id>
   <name>Maven Central</name>
   <url>https://repo.maven.apache.org/maven2</url>
   <mirrorOf>google-maven-central</mirrorOf>
  </mirror> 
  <mirror>
   <id>another-google-maven-central</id>
   <name>Google Maven Central</name>
   <url>https://maven-central.storage.googleapis.com/repos/central/data/</url>
   <mirrorOf>central</mirrorOf>
  </mirror> 
</mirrors> 

I don't like to list up the same repos twice which look a bit odd. For switching order, I think it's fine but in the way above, we can do it a bit more conservatively.

This fallback issue seems specific to Github Actions and seems best to fix it for Github Actions specifically.

@HyukjinKwon
Copy link
Member Author

Alright, seems we're fine to switch. Let me just switch for now. We can consider #27307 (comment) as an option later if it becomes problematic for any reason.

@HyukjinKwon HyukjinKwon changed the title [SPARK-30572][SPARK-30534][BUILD][FOLLOW-UP] Add a fallback repository as GCS [SPARK-30601][BUILD] Add a Google Maven Central as a primary repository Jan 22, 2020
@HyukjinKwon
Copy link
Member Author

can we file a followup jira to try to fix the sbt-pom-reader as well, that way we don't have to wait for it to fail and fall back.

@tgravescs, it's actually tricky too because we use our own fork for sbt-pom-reader. Actually, we should find a way to don't rely on the fork first. I don't know if the upstream has the issue or not. cc @JoshRosen.

@JoshRosen
Copy link
Contributor

@HyukjinKwon,

Here's a code comment explaining the original rationale for the custom POM reader fork:

// Spark uses a custom fork of the sbt-pom-reader plugin which contains a patch to fix issues

There's an existing JIRA at SPARK-14401 for removing this custom fork: I attempted this in #12310 (April 2016) but was blocked by the issue described in sbt/sbt-pom-reader#14.

@SparkQA
Copy link

SparkQA commented Jan 22, 2020

Test build #117206 has finished for PR 27307 at commit 556b88d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -275,6 +273,17 @@
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. +1 for piggybacking this. I also agree that this is required to unblock #27279 .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
Since @srowen sent out the email today, let's wait for two or three days for feedbacks.

@HyukjinKwon
Copy link
Member Author

Thank you all. I will wait a couple of days and merge it.

@@ -224,6 +224,9 @@ object SparkBuild extends PomBuild {

// Override SBT's default resolvers:
resolvers := Seq(
// Google Mirror of Maven Central, placed first so that it's used instead of flaky Maven Central.
// See https://storage-download.googleapis.com/maven-central/index.html for more info.
"gcs-maven-central-mirror" at "https://maven-central.storage-download.googleapis.com/repos/central/data/",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the repo to the SBT side as well to match (and fixed some comments).

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117273 has finished for PR 27307 at commit 0b9d198.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Let me just merge it in. I think 5 +1s from PMCs are good enough.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Jan 23, 2020

Merged to master.

@dongjoon-hyun
Copy link
Member

Thank you, @HyukjinKwon and all!

@HyukjinKwon
Copy link
Member Author

It has a conflict against branch-2.4. I will open a backport.

HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Jan 23, 2020
This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list.

1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under `.m2`.

    - Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing)
    - Maven Central (default) seems less stable but synced quickly.

    We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo.
    https://github.com/apache/spark/blob/abf759a91e01497586b8bb6b7a314dd28fd6cff1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2111-L2118

2. Currently, we have the hard-corded repository in [`sbt-pom-reader`](https://github.com/JoshRosen/sbt-pom-reader/blob/v1.0.0-spark/src/main/scala/com/typesafe/sbt/pom/MavenPomResolver.scala#L32) and this seems overwriting Maven's existing resolver by the same ID `central` with `http://` when initially the pom file is ported into SBT instance. This uses `http://` which latently Maven Central disallowed (see apache#27242)

    My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it _seems_ using `central` with `https` properly. See also apache#27307 (comment).

    I double checked that we use `https` properly from the SBT build as well:

    ```
    [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ...
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1
    ```

    This was fixed by adding the same repo (apache#27281), `central_without_mirror`, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback.

3. While I am here, I fix another issue. Github Action at apache#27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS.

    ```
    org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12:jar:1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/)
        at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve     (DefaultPluginDependenciesResolver.java:131)
    ```

   `mvn-scalafmt` exists in Maven central:

    ```bash
    $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    ```

    ```xml
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
        ...
    ```

    whereas not in GCS mirror:

    ```bash
    $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    ```
    ```xml
    <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>%
    ```

    In this PR, simply make both repos accessible by adding to `pluginRepositories`.

4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second)

To make the build and Github Action more stable.

No, dev only change.

I roughly checked local and PR against my fork (#2 and #3).

Closes apache#27307 from HyukjinKwon/SPARK-30572.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Jan 23, 2020
…ository

### What changes were proposed in this pull request?

This PR backports #27307

This PR proposes to address four things. Three issues and fixes were a bit mixed so this PR sorts it out. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Adding-Maven-Central-mirror-from-Google-to-the-build-td28728.html for the discussion in the mailing list.

1. Add the Google Maven Central mirror (GCS) as a primary repository. This will not only help development more stable but also in order to make Github Actions build (where it is always required to download jars) stable. In case of Jenkins PR builder, it wouldn't be affected too much as it uses the pre-downloaded jars under `.m2`.

    - Google Maven Central seems stable for heavy workload but not synced very quickly (e.g., new release is missing)
    - Maven Central (default) seems less stable but synced quickly.

    We already added this GCS mirror as a default additional remote repository at SPARK-29175. So I don't see an issue to add it as a repo.
    https://github.com/apache/spark/blob/abf759a91e01497586b8bb6b7a314dd28fd6cff1/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2111-L2118

2. Currently, we have the hard-corded repository in [`sbt-pom-reader`](https://github.com/JoshRosen/sbt-pom-reader/blob/v1.0.0-spark/src/main/scala/com/typesafe/sbt/pom/MavenPomResolver.scala#L32) and this seems overwriting Maven's existing resolver by the same ID `central` with `http://` when initially the pom file is ported into SBT instance. This uses `http://` which latently Maven Central disallowed (see #27242)

    My speculation is that we just need to be able to load plugin and let it convert POM to SBT instance with another fallback repo. After that, it _seems_ using `central` with `https` properly. See also #27307 (comment).

    I double checked that we use `https` properly from the SBT build as well:

    ```
    [debug] downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom ...
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom
    [debug] 	public: downloading https://repo1.maven.org/maven2/com/etsy/sbt-checkstyle-plugin_2.10_0.13/3.1.1/sbt-checkstyle-plugin-3.1.1.pom.sha1
    ```

    This was fixed by adding the same repo (#27281), `central_without_mirror`, which is a bit awkward. Instead, this PR adds GCS as a main repo, and community Maven central as a fallback repo. So, presumably the community Maven central repo is used when the plugin is loaded as a fallback.

3. While I am here, I fix another issue. Github Action at #27279 is being failed. The reason seems to be scalafmt 1.0.3 is in Maven central but not in GCS.

    ```
    org.apache.maven.plugin.PluginResolutionException: Plugin org.antipathy:mvn-scalafmt_2.12:1.0.3 or one of its dependencies could not be resolved: Could not find artifact org.antipathy:mvn-scalafmt_2.12:jar:1.0.3 in google-maven-central (https://maven-central.storage-download.googleapis.com/repos/central/data/)
        at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve     (DefaultPluginDependenciesResolver.java:131)
    ```

   `mvn-scalafmt` exists in Maven central:

    ```bash
    $ curl https://repo.maven.apache.org/maven2/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    ```

    ```xml
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
        ...
    ```

    whereas not in GCS mirror:

    ```bash
    $ curl https://maven-central.storage-download.googleapis.com/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom
    ```
    ```xml
    <?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Details>No such object: maven-central/repos/central/data/org/antipathy/mvn-scalafmt_2.12/1.0.3/mvn-scalafmt_2.12-1.0.3.pom</Details></Error>%
    ```

    In this PR, simply make both repos accessible by adding to `pluginRepositories`.

4. Remove the workarounds in Github Actions to switch mirrors because now we have same repos in the same order (Google Maven Central first, and Maven Central second)

### Why are the changes needed?

To make the build and Github Action more stable.

### Does this PR introduce any user-facing change?

No, dev only change.

### How was this patch tested?

I roughly checked local and PR against my fork (HyukjinKwon#2 and HyukjinKwon#3).

Closes #27335 from HyukjinKwon/tmp.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
HyukjinKwon pushed a commit that referenced this pull request Feb 25, 2020
…ven Central

### What changes were proposed in this pull request?

This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Feb 25, 2020
…ven Central

### What changes were proposed in this pull request?

This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit f152d2a)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Feb 25, 2020
…ven Central

This PR is a followup to #27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

No.

Existing build + tests.

Closes #27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the SPARK-30572 branch March 3, 2020 01:16
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ven Central

### What changes were proposed in this pull request?

This PR is a followup to apache#27307: per https://travis-ci.community/t/maven-builds-that-use-the-gcs-maven-central-mirror-should-update-their-paths/5926, the Google Cloud Storage mirror of Maven Central has updated its URLs: the new paths are updated more frequently. The new paths are listed on https://storage-download.googleapis.com/maven-central/index.html

This patch updates our build files to use these new URLs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing build + tests.

Closes apache#27688 from JoshRosen/update-gcs-mirror-url.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants