New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11353][IO] Update jets3t version to 0.9.4 #9306

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
7 participants
@lpiepiora
Contributor

lpiepiora commented Oct 27, 2015

This PR updates jets3t dependency to 0.9.4, because of an error, which is thrown when code tries to write to S3 bucket located in Frankfurt.

@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Oct 27, 2015

Member

I think this is OK since it's a maintenance release, but are there any other changes across the two releases that might be an issue?

Member

srowen commented Oct 27, 2015

I think this is OK since it's a maintenance release, but are there any other changes across the two releases that might be an issue?

@markgrover

This comment has been minimized.

Show comment
Hide comment
@markgrover

markgrover Oct 27, 2015

Member

LGTM too. HADOOP-9623 updated the jets3t version to 0.9.0 and it went in Hadoop 2.3.0, so Hadoop starting 2.3.0 or later should be fine. The only noteworthy thing from jets3t release notes was:
NOTE: Anyone who has implemented their own JetS3t service implemented the JetS3tRequestAuthorizer will need to adjust their code due to API changes.
A quick grep through spark code revealed no reference to JetS3tRequestAuthorizer so I think we should be ok.

Member

markgrover commented Oct 27, 2015

LGTM too. HADOOP-9623 updated the jets3t version to 0.9.0 and it went in Hadoop 2.3.0, so Hadoop starting 2.3.0 or later should be fine. The only noteworthy thing from jets3t release notes was:
NOTE: Anyone who has implemented their own JetS3t service implemented the JetS3tRequestAuthorizer will need to adjust their code due to API changes.
A quick grep through spark code revealed no reference to JetS3tRequestAuthorizer so I think we should be ok.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Oct 27, 2015

Test build #1958 has finished for PR 9306 at commit 34a28e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Oct 27, 2015

Test build #1958 has finished for PR 9306 at commit 34a28e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
Lukasz Piepiora
[SPARK-11353][IO] Update jets3t version to 0.9.4
Exclude transient dependencies to HttpClient by the way of upgrading the
version, because they conflict with HttpUnit.
@lpiepiora

This comment has been minimized.

Show comment
Hide comment
@lpiepiora

lpiepiora Oct 27, 2015

Contributor

Seems that JetS3t bumped their version of HttpClient to 4.5, which is conflicting with HtmlUnit, which actually tries to read private fields of HttpClientBuilder. Since the field name changed (from sslcontext to sslContext) the patch fails.

I've tested it by building Spark locally, and we should be good by sticking to Spark's provided version of HttpClient, so I'll exclude this (and HttpCore) transient dependency from the jets3t library.

Beside I've looked through Hadoop sources and compared it against diff between 0.9.3 and 0.9.4. The files affected by the change between 0.9.3 and 0.9.4 doesn't seem to touch any public members used directly by Hadoop code.

Contributor

lpiepiora commented Oct 27, 2015

Seems that JetS3t bumped their version of HttpClient to 4.5, which is conflicting with HtmlUnit, which actually tries to read private fields of HttpClientBuilder. Since the field name changed (from sslcontext to sslContext) the patch fails.

I've tested it by building Spark locally, and we should be good by sticking to Spark's provided version of HttpClient, so I'll exclude this (and HttpCore) transient dependency from the jets3t library.

Beside I've looked through Hadoop sources and compared it against diff between 0.9.3 and 0.9.4. The files affected by the change between 0.9.3 and 0.9.4 doesn't seem to touch any public members used directly by Hadoop code.

@lpiepiora

This comment has been minimized.

Show comment
Hide comment
@lpiepiora

lpiepiora Oct 28, 2015

Contributor

Jenkins, retest this please

Contributor

lpiepiora commented Oct 28, 2015

Jenkins, retest this please

@JoshRosen

This comment has been minimized.

Show comment
Hide comment
@JoshRosen

JoshRosen Oct 29, 2015

Contributor

Jenkins, retest this please.

Contributor

JoshRosen commented Oct 29, 2015

Jenkins, retest this please.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Oct 29, 2015

Test build #44634 has finished for PR 9306 at commit e1c9c09.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Oct 29, 2015

Test build #44634 has finished for PR 9306 at commit e1c9c09.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Oct 30, 2015

Test build #1961 has finished for PR 9306 at commit e1c9c09.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

SparkQA commented Oct 30, 2015

Test build #1961 has finished for PR 9306 at commit e1c9c09.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
@srowen

This comment has been minimized.

Show comment
Hide comment
@srowen

srowen Oct 30, 2015

Member

@lpiepiora OK I believe Jenkins, that this is a real failure. It is the UISeleniumSuite. Something about the output of a UI has changed: possibly an error. Are you able to reproduce that?

Member

srowen commented Oct 30, 2015

@lpiepiora OK I believe Jenkins, that this is a real failure. It is the UISeleniumSuite. Something about the output of a UI has changed: possibly an error. Are you able to reproduce that?

@lpiepiora

This comment has been minimized.

Show comment
Hide comment
@lpiepiora

lpiepiora Nov 2, 2015

Contributor

Yes, thanks - I'm able to reproduce it. Now, I'm trying to figure out what's wrong.

Contributor

lpiepiora commented Nov 2, 2015

Yes, thanks - I'm able to reproduce it. Now, I'm trying to figure out what's wrong.

@steveloughran

This comment has been minimized.

Show comment
Hide comment
@steveloughran

steveloughran Dec 9, 2015

Contributor

Moving to Hadoop 0.90 HADOOP-9623 was what could be described as "an accidental disaster"'; the patch swallowed exceptions "which should never happen", resulting in HADOOP-10589; a seek(0) on a 0-byte file NPE-ing. (trivia: It was fixed by probably the only piece of co-recursive code in core hadoop)

One issue with 0.90 is that the close() call on an input stream reads all remaining bytes on the resource HADOOP-12376. This hurts: moving up to 0.94 may fix it. From the hadoop core perspective, the move to 0.90 broke enough things that we are scared to go near the s3n code again; all future work is in s3a.

To summarise then: this may break s3n if not shaded, but you should be encouraging people to use s3a on Hadoop 2.7+ anyway

Contributor

steveloughran commented Dec 9, 2015

Moving to Hadoop 0.90 HADOOP-9623 was what could be described as "an accidental disaster"'; the patch swallowed exceptions "which should never happen", resulting in HADOOP-10589; a seek(0) on a 0-byte file NPE-ing. (trivia: It was fixed by probably the only piece of co-recursive code in core hadoop)

One issue with 0.90 is that the close() call on an input stream reads all remaining bytes on the resource HADOOP-12376. This hurts: moving up to 0.94 may fix it. From the hadoop core perspective, the move to 0.90 broke enough things that we are scared to go near the s3n code again; all future work is in s3a.

To summarise then: this may break s3n if not shaded, but you should be encouraging people to use s3a on Hadoop 2.7+ anyway

@lpiepiora

This comment has been minimized.

Show comment
Hide comment
@lpiepiora

lpiepiora Dec 9, 2015

Contributor

@steveloughran yes, that's exactly what happened to me in this PR. I wanted to fix it but, in general as you've said this just yields more problems on multiple levels.

However s3a is not a breeze either (even in newer Hadoop 2.7+ versions), especially with Frankfurt buckets, which support only AWS Signature V4.

I'll close this PR anyway, because I think this is not the right way either (even though this jets3t update was a minor one it upgraded transitive dependencies, which yielded multiple issues).

Contributor

lpiepiora commented Dec 9, 2015

@steveloughran yes, that's exactly what happened to me in this PR. I wanted to fix it but, in general as you've said this just yields more problems on multiple levels.

However s3a is not a breeze either (even in newer Hadoop 2.7+ versions), especially with Frankfurt buckets, which support only AWS Signature V4.

I'll close this PR anyway, because I think this is not the right way either (even though this jets3t update was a minor one it upgraded transitive dependencies, which yielded multiple issues).

@lpiepiora lpiepiora closed this Dec 9, 2015

@steveloughran

This comment has been minimized.

Show comment
Hide comment
@steveloughran

steveloughran Dec 9, 2015

Contributor

However s3a is not a breeze either (even in newer Hadoop 2.7+ versions), especially with Frankfurt buckets, which support only AWS Signature V4.

really? thought that worked. I know HADOOP-12537 mentioned it, but didn't think STS credentials were mandatory. As usual: file a JIRA.

Contributor

steveloughran commented Dec 9, 2015

However s3a is not a breeze either (even in newer Hadoop 2.7+ versions), especially with Frankfurt buckets, which support only AWS Signature V4.

really? thought that worked. I know HADOOP-12537 mentioned it, but didn't think STS credentials were mandatory. As usual: file a JIRA.

@Kinghack

This comment has been minimized.

Show comment
Hide comment
@Kinghack

Kinghack Apr 12, 2016

since latest version of jets3t does not build into latest spark. Is it possible for now that I could access s3 file from regions that supports AWS4-HMAC-SHA256 only?

Kinghack commented Apr 12, 2016

since latest version of jets3t does not build into latest spark. Is it possible for now that I could access s3 file from regions that supports AWS4-HMAC-SHA256 only?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment