Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6440][CORE]Handle IPv6 addresses properly when constructing URI #5424

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
5 participants
@nyaapa
Copy link
Contributor

commented Apr 8, 2015

No description provided.

if ( !addresses.isEmpty ) {
val addr = addresses.find( _.isInstanceOf[Inet4Address] ).getOrElse(addresses.head)
// because of Inet6Address.toHostName may add interface at the end if it knows about it
val strippedAddress = InetAddress.getByAddress(addr.getAddress)

This comment has been minimized.

Copy link
@srowen

srowen Apr 8, 2015

Member

Tell me more about what this fixes?

This comment has been minimized.

Copy link
@nyaapa

nyaapa Apr 8, 2015

Author Contributor

It's all about finding our external IP address: old logic was to use external ipv4 || 127.0.0.1.
That transformes it to ipv4 || ipv6 || 127.0.0.1

Whenewer we don't have any ipv4 it's better to use ipv6 than 127.0.0.1

This comment has been minimized.

Copy link
@srowen

srowen Apr 9, 2015

Member

OK, though I don't think that was the thrust of this JIRA, which was simply to format IPv6 addresses correctly. I think the reasoning makes some sense though.

Nits: can you filterNot(addr => addr.isLinkLocalAddress || addr.isLoopbackAddress)
and if (addresses.notEmpty) {? and find(_.isInstanceOf[Inet4Address])? Just questions of style in this block.

It might be worth a comment to the effect of your reply above here.

This comment has been minimized.

Copy link
@nyaapa

nyaapa Apr 9, 2015

Author Contributor

Ok, i agree with that. Should i make additional commit or change existing one?
*nonEmpty

This comment has been minimized.

Copy link
@srowen

srowen Apr 9, 2015

Member

Just push more commits to this branch

/**
* Get the local machine's URI.
*/
def localHostURI(): String = {

This comment has been minimized.

Copy link
@srowen

srowen Apr 8, 2015

Member

Is this intended to be an IP address or host name? looks like IP address. But then some of the changes above are changing a host name to an IP address. That might be OK but it's not obvious? what's the motivation there.

This comment has been minimized.

Copy link
@nyaapa

nyaapa Apr 8, 2015

Author Contributor

It is named localHostURI in compliance with localHostName function name.
The main idea is to use ip if and only if we don't have customHostname ( like in localHostName ).

This comment has been minimized.

Copy link
@srowen

srowen Apr 9, 2015

Member

Got it. I'm wondering when you wouldn't want to use the "URI safe" name? they're identical in almost every case. Hm. I suppose the safer thing is to only format for URIs where, well, creating URIs. I'll have to dig into this more but generally want to minimize the change here and narrowly address the formatting issue

This comment has been minimized.

Copy link
@nyaapa

nyaapa Apr 9, 2015

Author Contributor

I don't understand how to use "URI safe" name, something like localHostNameURISafe or what?
It's good to format URIs only in places where we need it, but currently we have more than one bug because of it.
However we can use something like InetAddresses.toUriString(InetAddress.getByName(Utils.localHostName())) to get URI in place without fixing Utils module, but i think that special URI method may prevent URI-based bugs in future due to its existence.

@srowen

This comment has been minimized.

Copy link
Member

commented Apr 9, 2015

ok to test

@SparkQA

This comment has been minimized.

Copy link

commented Apr 9, 2015

Test build #29936 has finished for PR 5424 at commit 84763d7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@nyaapa

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2015

Use getHostAddress instead of getHostName ( as it was before ), because of getHostName may return additional information like 95.108.174.236-red.dhcp.yndx.net.

@SparkQA

This comment has been minimized.

Copy link

commented Apr 9, 2015

Test build #29940 has started for PR 5424 at commit 2098081.

@shaneknapp

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2015

jenkins, test this please

@SparkQA

This comment has been minimized.

Copy link

commented Apr 9, 2015

Test build #29943 has finished for PR 5424 at commit 2098081.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@nyaapa

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2015

Seems like jenkins was killed again.

@shaneknapp

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2015

whoops, i missed retriggering this build. :)

jenkins, test this please

@srowen

This comment has been minimized.

Copy link
Member

commented Apr 9, 2015

Jenkins, retest this please.

@SparkQA

This comment has been minimized.

Copy link

commented Apr 9, 2015

Test build #29962 has finished for PR 5424 at commit 2098081.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@JoshRosen

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2015

Jenkins, retest this please.

@SparkQA

This comment has been minimized.

Copy link

commented Apr 9, 2015

Test build #29974 has finished for PR 5424 at commit 2098081.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@srowen

This comment has been minimized.

Copy link
Member

commented Apr 10, 2015

I think I've looked at 10 different versions of 2 PRs for this JIRA now, and all seem to be changing or adding something slightly different than what's essential here. Let me describe what I think needs to be done and why and we can settle that, then implement.

The code needs a local IP address in many cases. That's localIpAddress. (Separately, your update to find an IPv6 address sounds OK.)

localIpAddressHostname and the getAddressHostName() method actually look pointless. They make a String into an InetAddress and then ask for the exact same text representation back as a String. Personally I'd make localIpAddress an InetAddress, as you've done. I'd remove localIpAddressHostname and replace with localHostName() where a String is needed.

... because then customHostname is consistently applied, right? I think that must be the intent?

But then, localIpAddress itself doesn't have much value. It is always used for its string rep anyway. So, localHostName() as well? it's the only thing that needs to be non-private.

Back to the original issue: where localHostName() would then be used in a URI, it can be run through the utility method in InetAddresses. I don't think that we should instead have more methods in Utils just for adding a call to that method.

Finally, there are instances in SparkSQLEnv, KinesisReceiver and TestUtils that use InetAddress.getLocalHost directly. These should also use localHostName().

Maybe there's a reason that doesn't 100% work but it seems a lot less messy than the 3-5 ways code is reasoning about the local IP address string now.

What say, or, should I try it in my own PR?

@nyaapa

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2015

Ok, i'll try to:

  • replace localIpAddressHostname in Utils.localHostName with localIpAddress.getHostAddress
  • replace localIpAddressHostname usage with localHostName
  • remove localIpAddressHostname and getAddressHostName
  • make localIpAddress private
  • fix SparkSQLEnv, KinesisReceiver and TestUtils

today/tomorrow.

Back to the original issue: where localHostName() would then be used in a URI, it can be run through the utility method in InetAddresses.
I don't think that we should instead have more methods in Utils just for adding a call to that method.

I didn't find any method in InetAddresses like escapeHostname(String) => String. InetAddresses.toUriString(InetAddress.getByName(Utils.localHostName())), but this looks pointless like localIpAddressHostname = InetAddress.getByName(localIpAddress).getHostName.
And there we have two alternatives:

  1. remove localIpAddressURI variable and change def localHostURI() = customHostname.getOrElse(InetAddresses.toUriString(localIpAddress))
  2. ask everyone to use InetAddresses.toUriString(InetAddress.getByName(Utils.localHostName()))

imho additional method with is better. Also i don't like conversion from String to InetAddress and even worse is that we got String from InetAddress.

@srowen

This comment has been minimized.

Copy link
Member

commented Apr 10, 2015

InetAddresses.toUriString accepts a String and returns an InetAddress, so it does not need its input changed. Yes you have to call getHostName on the output.

@nyaapa

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2015

InetAddresses.toUriString is (InetAddress) => String, maybe you meant forUriString(String) => InetAddress or forString(String) => InetAddress ?
for forUriString

scala> val ipv6 = InetAddress.getByAddress(NetworkInterface.getNetworkInterfaces.toList.apply(0).getInetAddresses.toList.apply(0).getAddress).getHostAddress
ipv6: String = fe80:0:0:0:3623:87ff:fe6a:423c

scala> InetAddresses.forUriString(ipv6)
java.lang.IllegalArgumentException: Not a valid URI IP literal: 'fe80:0:0:0:3623:87ff:fe6a:423c'
  at com.google.common.net.InetAddresses.forUriString(InetAddresses.java:501)
  ... 33 elided

And for forString i still need InetAddresses.toUriString

scala> val ipv6 = InetAddress.getByAddress(NetworkInterface.getNetworkInterfaces.toList.apply(0).getInetAddresses.toList.apply(0).getAddress).getHostAddress
ipv6: String = fe80:0:0:0:3623:87ff:fe6a:423c

scala> InetAddresses.forString(ipv6).getHostName
res35: String = fe80:0:0:0:3623:87ff:fe6a:423c

scala> InetAddresses.forString(ipv6).getHostAddress
res36: String = fe80:0:0:0:3623:87ff:fe6a:423c

scala> InetAddresses.toUriString(InetAddresses.forString(ipv6))
res37: String = [fe80::3623:87ff:fe6a:423c]
@srowen

This comment has been minimized.

Copy link
Member

commented Apr 10, 2015

Oh gosh yeah looking at this the wrong way around of course. Aren't there just two places where the URI logic is needed? I think calling InetAddresses.toUriString(InetAddress.getByName(Utils.localHostName())) in two places isn't so bad. A new method is OK too if you rprefer; I suppose ... localHostNameForURI() or something to make the relation clear? Then we have just two methods for accessing 'the local host name' with a clear role.

@nyaapa

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2015

Ok, localHostNameForURI sounds reasonable.
About network/common/src/test/java/org/apache/spark/network/TestUtils.java: there is just only one method -- getLocalHost() => String, i can delete the whole class and use Utils.localHostname instead, but maybe using scala from java it's not a good practice?

@srowen

This comment has been minimized.

Copy link
Member

commented Apr 10, 2015

I'd just leave TestUtils for purposes here, but can make it call through to Utils.

[SPARK-6440][CORE] Remove Utils.localIpAddressHostname, Utils.localIp…
…AddressURI and Utils.getAddressHostName;

make Utils.localIpAddress private;
rename Utils.localHostURI into Utils.localHostNameForURI;
use Utils.localHostName in org.apache.spark.streaming.kinesis.KinesisReceiver and org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
@SparkQA

This comment has been minimized.

Copy link

commented Apr 12, 2015

Test build #30091 has finished for PR 5424 at commit 6b717aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.
@srowen

This comment has been minimized.

Copy link
Member

commented Apr 12, 2015

LGTM. This simplifies and standardizes the local host name logic across the code, fixes the original problem of IPv6 addresses in URIs, chooses a better local address in some IPv6 cases. Of course, making things consistent means slight behavior changes in some cases -- for example, customHostName now applies in cases it didn't before, but this actually seems more correct. I've eyeballed every change and it looks logical, and the tests pass.

@asfgit asfgit closed this in 9d117ce Apr 13, 2015

asfgit pushed a commit that referenced this pull request Apr 15, 2015

[HOTFIX] [SPARK-6896] [SQL] fix compile error in hive-thriftserver
SPARK-6440 #5424 import guava but did not promote guava dependency to compile level.

[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[info] Compiling 8 Scala sources to /root/projects/spark/sql/hive-thriftserver/target/scala-2.10/classes...
[error] bad symbolic reference. A signature in Utils.class refers to term util
[error] in package com.google.common which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling Utils.class.
[error]
[error] while compiling: /root/projects/spark/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
[error] during phase: erasure
[error] library version: version 2.10.4
[error] compiler version: version 2.10.4
[error] reconstructed args: -deprecation -classpath

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #5507 from adrian-wang/guava and squashes the following commits:

c337dad [Daoyuan Wang] fix compile error

preaudc pushed a commit to preaudc/spark that referenced this pull request Apr 17, 2015

[SPARK-6440][CORE]Handle IPv6 addresses properly when constructing URI
Author: nyaapa <nyaapa@gmail.com>

Closes apache#5424 from nyaapa/master and squashes the following commits:

6b717aa [nyaapa] [SPARK-6440][CORE] Remove Utils.localIpAddressHostname, Utils.localIpAddressURI and Utils.getAddressHostName; make Utils.localIpAddress private; rename Utils.localHostURI into Utils.localHostNameForURI; use Utils.localHostName in org.apache.spark.streaming.kinesis.KinesisReceiver and org.apache.spark.sql.hive.thriftserver.SparkSQLEnv
2098081 [nyaapa] [SPARK-6440][CORE] style fixes and use getHostAddress instead of getHostName
84763d7 [nyaapa] [SPARK-6440][CORE]Handle IPv6 addresses properly when constructing URI

preaudc pushed a commit to preaudc/spark that referenced this pull request Apr 17, 2015

[HOTFIX] [SPARK-6896] [SQL] fix compile error in hive-thriftserver
SPARK-6440 apache#5424 import guava but did not promote guava dependency to compile level.

[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null)
[info] Compiling 8 Scala sources to /root/projects/spark/sql/hive-thriftserver/target/scala-2.10/classes...
[error] bad symbolic reference. A signature in Utils.class refers to term util
[error] in package com.google.common which is not available.
[error] It may be completely missing from the current classpath, or the version on
[error] the classpath might be incompatible with the version used when compiling Utils.class.
[error]
[error] while compiling: /root/projects/spark/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
[error] during phase: erasure
[error] library version: version 2.10.4
[error] compiler version: version 2.10.4
[error] reconstructed args: -deprecation -classpath

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes apache#5507 from adrian-wang/guava and squashes the following commits:

c337dad [Daoyuan Wang] fix compile error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.