Skip to content

Conversation

JingsongLi
Copy link
Contributor

What is the purpose of the change

flink-sql-connector-hive depends on hive-exec which is a uber jar and doesn't come with a proper NOTICE file. So we need to find out the artifacts included in hive-execand list them in our NOTICE file.

Brief change log

Update NOTICE file and add licenses for some dependencies.

@JingsongLi JingsongLi requested review from rmetzger and zentol June 17, 2020 04:50
@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit fcb7907 (Wed Jun 17 04:52:20 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 17, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@rmetzger
Copy link
Contributor

I propose to review this PR only after #12682 has been resolved. It seems we are not entirely sure how to deal with the shaded hive-exec dependencies.

@JingsongLi
Copy link
Contributor Author

I propose to review this PR only after #12682 has been resolved. It seems we are not entirely sure how to deal with the shaded hive-exec dependencies.

Agree~

Copy link
Contributor

@rmetzger rmetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've compiled the respective hive versions to compare the output with hive-exec shading.

For Hive 3.1.2 my versions are weirdly different. I guess we used different methods to obtain the dependency list.


- com.google.guava:guava:14.0.1
- com.googlecode.javaewah:JavaEWAH:0.3.2
- com.tdunning:json:1.14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My maven-shade-plugin run said

[INFO] Including com.tdunning:json:jar:1.8 in the shaded jar.

- org.apache.hive:hive-metastore:2.3.6
- org.apache.hive:hive-serde:2.3.6
- org.apache.hive:hive-service-rpc:2.3.6
- org.apache.hive:hive-storage-api:2.3.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some weird reason, my maven-shade plugin logged this

[INFO] Including org.apache.hive:hive-storage-api:jar:2.4.0 in the shaded jar.


- com.esotericsoftware:kryo-shaded:3.0.3
- com.esotericsoftware:minlog:1.3.0
- com.esotericsoftware:reflectasm:1.10.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.esotericsoftware:reflectasm is listed twice, and not ASL2.0 licensed.


- com.esotericsoftware:kryo-shaded:3.0.3
- com.esotericsoftware:minlog:1.3.0
- com.esotericsoftware:reflectasm:1.10.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you find this dependency listed? When I compiled hive 3.1.2, it wasn't listed by the shade plugin. However. there are indeed reflectasm classes in the final binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it by unzip -l flink-sql-connector-hive-3.1.2_2.11-1.12-SNAPSHOT.jar | grep pom...

- javax.jdo:jdo-api:3.0.1
- joda-time:joda-time:2.9.9
- net.sf.opencsv:opencsv:2.3
- org.apache.avro:avro-mapred:1.8.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[INFO] Including org.apache.avro:avro-mapred:jar:hadoop2:1.7.7 in the shaded jar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, I am confused by the version passed from flink-parquet.(Which is from flink pom avro version)

Copy link
Contributor Author

@JingsongLi JingsongLi Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.avro:avro-mapred:1.7.7 is enough? Do you we need hadoop2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same. I think it is fine to leave the "hadoop2" part out

- org.apache.hive:hive-service-rpc:3.1.2
- org.apache.hive:hive-spark-client:3.1.2
- org.apache.hive:hive-standalone-metastore:3.1.2
- org.apache.hive:hive-storage-api:3.1.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's going on with my maven

[INFO] Including org.apache.hive:hive-storage-api:jar:2.7.0 in the shaded jar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I made mistake.

- org.apache.orc:orc-shims:1.5.6
- org.apache.orc:orc-tools:1.5.6
- org.apache.parquet:parquet-hadoop-bundle:1.10.0
- org.apache.parquet:parquet-jackson:1.10.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet-jackson was not listed in my hive build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should come from parquet-hadoop-bundle, there are classes in shaded/parquet/org/codehaus/jackson.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

- org.apache.parquet:parquet-jackson:1.10.0
- org.apache.thrift:libthrift:0.9.3
- org.codehaus.jackson:jackson-core-asl:2.9.5
- org.codehaus.jackson:jackson-mapper-asl:2.9.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.13 in the shaded jar.
[INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 in the shaded jar.

@JingsongLi
Copy link
Contributor Author

I've compiled the respective hive versions to compare the output with hive-exec shading.

For Hive 3.1.2 my versions are weirdly different. I guess we used different methods to obtain the dependency list.

For Hive 3.1.2, maybe I was misunderstood by META-INF/DEPENDENCIES file.

Copy link
Contributor

@rmetzger rmetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is good to merge.

I guess we could spend many more hours and would find a tiny difference in the documented dependencies, but I believe this NOTICE file is a lot better than what Hive ships already.

@JingsongLi JingsongLi merged commit 3611fd7 into apache:master Jun 20, 2020
JingsongLi added a commit that referenced this pull request Jun 20, 2020
zhangjun0x01 pushed a commit to zhangjun0x01/flink that referenced this pull request Jul 8, 2020
@JingsongLi JingsongLi deleted the notice_hive branch November 5, 2020 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants