[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types by umehrot2 · Pull Request #1005 · apache/hudi

umehrot2 · 2019-11-09T01:10:16Z

Sending this PR out early to get feedback. Have not yet looked into what changes are required for tests. But in general these changes have been working for us on AWS EMR without any issues so far. This PR implements the following:

Migrates Hudi to Spark 2.4.4
Migrates Hudi to use spark-avro module instead of the deprecated databricks-avro
Adds support for Decimal/Date types correctly through Hive
Timestamp still needs to be supported as it is blocked on https://issues.apache.org/jira/browse/HUDI-83

umehrot2 · 2019-11-09T01:27:22Z

Sending this PR out early to get feedback. Have not yet looked into what changes are required for tests. But in general these changes have been working for us on AWS EMR without any issues so far. This PR implements the following:
* Migrates Hudi to Spark 2.4.4

* Migrates Hudi to use spark-avro module instead of the deprecated databricks-avro

* Adds support for Decimal/Date types correctly through Hive

* Timestamp still needs to be supported as it is blocked on https://issues.apache.org/jira/browse/HUDI-83

bvaradar · 2019-11-09T05:55:23Z

@n3nash : Can you and Modi review this PR and see if it is aligned with what you have done internally ?

modi95 · 2019-11-10T01:07:10Z

Hi Udit! Thanks for making this PR!

I've been working on upgrading HUDI to Spark 2.4 internally at Uber! So I'll list out a few things that I had to do, so that you're not trying to re-discover these things yourself :)

Some of the create functions in HoodieWrapperFileSystem don't fully work with Parquet 1.10+. See here. We'll need to make sure that all the create functions correctly call wrapOutputStream.
hive-exec is a fat JAR. It might be causing unit tests failure as it may introduce an older version of avro into the calsspath. We're currently trying to figure out how to address this. Let us know if you have any suggestions!

Btw - I also went to UIUC! Great to meet new Illini!

ezzz01 · 2019-11-10T12:57:07Z

Hi,

any plans to migrate Hudi to Scala 2.12? I don't see any Jira issue regarding this.

I would volunteer after this PR is merged. I already looked around, it seems that the biggest problem will be migrating to spark-streaming-kafka-0-10_2.12, since there is no spark-streaming-kafka-0-8 version for Scala 2.12. And the tests in hudi-utilities use the low level Kafka API that is gone in version spark-streaming-kafka-0-10.

pratyakshsharma · 2019-11-11T06:53:50Z

@ezhux I have migrated to spark-streaming-kafka-0-10_2.11 internally at my organisation. Let me know if I can help you in any way. :)

vinothchandar · 2019-11-12T00:21:52Z

@ezhux https://issues.apache.org/jira/browse/HUDI-238 tracks this I think.. yes we want to put up a 2.12 bundle as well. Please engage on the ticket for timelines

vinothchandar

Change looks good. But we need to get the tests working reliably and think about changes to bundling.. Our docker images are still 2.3x . Should we upgrade and push them as well ? @bvaradar do you want to shepherd this since you are working closely with @modi95 as well?

vinothchandar · 2019-11-12T00:26:07Z

I understand we can't bundle this since its tied to a spark version now. spark-avro is still a package and the user must explicitly include using --packages ? So, if I upgrade hudi in the next release, then as a user I need to change something? Should/how do we document this?

Yeah this will be an additional Jar, the user would have to pass while starting the spark-shell. We would have to document it. I don't see any documentation for spark.serializer=org.apache.spark.serializer.KryoSerializer either which is also a pre-requisite right. Shall we update it in the README ?

KryoSerializer command line is provided on the quickstart page. Adding it there could be a good thing. Do a follow on update?

umehrot2 · 2019-11-12T00:46:19Z

Hi Udit! Thanks for making this PR!

I've been working on upgrading HUDI to Spark 2.4 internally at Uber! So I'll list out a few things that I had to do, so that you're not trying to re-discover these things yourself :)
1. Some of the `create` functions in `HoodieWrapperFileSystem` don't fully work with Parquet 1.10+. See [here](https://github.com/apache/incubator-hudi/blob/b19bed442d84c1cb1e48d184c9554920735bcb6c/hudi-common/src/main/java/org/apache/hudi/common/io/storage/HoodieWrapperFileSystem.java#L146). We'll need to make sure that all the `create` functions correctly call `wrapOutputStream`.

2. `hive-exec` is a fat JAR. It might be causing unit tests failure as it may introduce an older version of avro into the calsspath. We're currently trying to figure out how to address this. Let us know if you have any suggestions!
Btw - I also went to UIUC! Great to meet new Illini!

@modi95 good to hear from a fellow Illini !

About the points you raised:

I guess I am not quite sure yet, where this comes into the picture. Did you run into an actual issue ? Is there an easy way to reproduce this ?
Yeah I need to look into the test failures. Not sure of the reasons yet. But first thing I guess I need to do is upgrade the docker images right, to use spark 2.4.4 and probably hive 2.3.6 ?

umehrot2 · 2019-11-12T00:50:56Z

Change looks good. But we need to get the tests working reliably and think about changes to bundling.. Our docker images are still 2.3x . Should we upgrade and push them as well ? @bvaradar do you want to shepherd this since you are working closely with @modi95 as well?

Yeah any guidance on this front would be appreciated. On how we want to go about it for getting the tests working. I will look into the docker setup.

vinothchandar · 2019-11-12T00:52:03Z

@umehrot2 I believe we can first push spark 2.4 docker images and get current master working (since com.databricks/spark-avro is shaded, it wil work).. @bhasudha or @bvaradar have a handle on this.

bvaradar · 2019-11-12T08:31:23Z

Mistakenly added a comment for another PR here. Deleted it.

bvaradar · 2019-11-12T08:43:33Z

@umehrot2 : If you want to give it a shot, its better to open a new PR, You would need to update docker.spark.version in pom files below docker/hadoop/... and also update spark version in Dockerfile in these directories : spark_base, sparkadhoc, sparkworker and sparkmaster.

You can then run docker/build_local_docker_images.sh to build new docker images locally and then run integration tests. We would have to push these docker images so that travis integration can pick it up (we will help on this).

@umehrot2 : If you need help, let us know. Either me or @bhasudha can get the docker images built and pushed.

umehrot2 · 2019-11-12T08:58:55Z

@umehrot2 : If you want to give it a shot, its better to open a new PR, You would need to update docker.spark.version in pom files below docker/hadoop/... and also update spark version in Dockerfile in these directories : spark_base, sparkadhoc, sparkworker and sparkmaster.

You can then run docker/build_local_docker_images.sh to build new docker images locally and then run integration tests. We would have to push these docker images so that travis integration can pick it up (we will help on this).

@umehrot2 : If you need help, let us know. Either me or @bhasudha can get the docker images built and pushed.

@bvaradar thanks for the suggestions. Will give it a shot tomorrow, and reach out in case of any doubts.

bvaradar · 2019-11-14T17:04:43Z

Re-kicked Integration tests

bvaradar · 2019-11-14T17:49:33Z

Looks like the integration tests are failing with dependency version mismatches.

umehrot2 · 2019-11-14T18:26:00Z

Looks like the integration tests are failing with dependency version mismatches.

@bvaradar yeah have been looking into them. Like you mentioned there are multiple dependency related issues going on. Working on a solution.

umehrot2 · 2019-11-14T21:32:51Z

@modi95 @bvaradar

I was able to fix the integration test dependency issues on my local atleast. Hoping that things run fine on Travis too. To give an overview, there were 3 major failures happening:

The ITTestHoodieSanity tests were failing firstly becuase of this error:

17:15:31.995 [pool-21-thread-2] ERROR org.apache.hudi.io.HoodieCreateHandle - Error writing record HoodieRecord{key=HoodieKey { recordKey=98ea14b7-b318-4b0b-9f14-0115900a10e0 partitionPath=2016/03/15}, currentLocation='null', newLocation='null'}

java.lang.NoSuchMethodError: org.apache.parquet.io.api.Binary.fromCharSequence(Ljava/lang/CharSequence;)Lorg/apache/parquet/io/api/Binary;

	at org.apache.parquet.avro.AvroWriteSupport.fromAvroString(AvroWriteSupport.java:371) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:346) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) ~[hive-exec-2.3.1.jar:1.10.1]

	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:288) ~[hive-exec-2.3.1.jar:1.10.1]

	at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvroWithMetadata(HoodieParquetWriter.java:91) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.hudi.io.HoodieCreateHandle.write(HoodieCreateHandle.java:101) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:150) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:142) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38) ~[hudi-spark-bundle-0.5.1-SNAPSHOT.jar:0.5.1-SNAPSHOT]

This is happening because in Hudi even for bits running through Spark we are using Hive 2.3.1 which is not really compatible with Spark. So, hive-exec 2.3.1 ends up in HoodieJavaApp classpath while running the example, and that has its own shaded parquet version which is old and conflicts with parquet 1.10.1.

What I propose here, is that we should use version of Hive that is compatible with Spark, atleast for the bits running inside Spark so that compatible versions of Hive end up in class paths. Now hive-exec 1.2.1.spark2 does not cause this issue as it does not shade parquet. Also, we have removed Hive shading in master now, so anyways we are dependent on runtime Hive version which is Spark's Hive version. So, from code's perspective also I think it makes sense to depend on Spark's Hive version for the code which is running inside of Spark to avoid such issues.

Post that ITTestHoodieSanity all the _rt tests were failing because now that our code is using Avro 1.8.2 and Hive is still on older versions, we need to shade avro in hudi-hadoop-mr-bundle which we had done internally for EMR through an optional profile. Now that we are migrating Hudi itself to Avro 1.8.2 we need to always shade Hive to get around this issue. More details on https://issues.apache.org/jira/browse/HUDI-268
Finally some tests were failing because spark-avro was not being passed while starting the spark-shell, and it was not finding the classes. So, I switched over to downloading spark-avro instead of databricks-avro

By making the above changes, the integration tests work now. Let me know your thoughts about these changes, if there are concerns.

umehrot2 · 2019-11-15T00:20:31Z

@bvaradar can we re-trigger the tests ? I think this time it failed due to flaky timeouts

bvaradar · 2019-11-15T01:10:25Z

@bvaradar can we re-trigger the tests ? I think this time it failed due to flaky timeouts

@umehrot2 : I re-triggered anyways but looks like the log length reached max limits

umehrot2 · 2019-11-15T01:38:21Z

@bvaradar can we re-trigger the tests ? I think this time it failed due to flaky timeouts

@umehrot2 : I re-triggered anyways but looks like the log length reached max limits

@bvaradar you are right, it failed again because of log length. Any suggestions ?

bvaradar · 2019-11-15T07:21:56Z

@umehrot2 : Comparing this run's logs with that of #1009 , I can see that somehow spark logging level became INFO with this PR (hence, so much logging). You can look at the logs corresponding to org.apache.hudi.integ.ITTestHoodieSanity. Both runs use the same docker image (spark-2.4.4). Can you check how Spark log level became INFO. I didn't dig further than that but let me know if you need help.

bvaradar · 2019-11-15T08:33:29Z

@umehrot2 : Regarding your comment regarding moving to spark.hive version for hudi-spark-bundle and hudi-utilities-bundle, I am ok as long as hive sync works with hiveserver/HMS 2.x

Regarding realtime queries, if we shade avro in hudi-hadoop-mr-bundle, we would need to follow same shading policy in user's jar when they use custom HoodieRecordPayload. AFAIK, there is no good solution here to resolve this dependency hell. If we go this approach, we would need to document this caveat and make it easy for users to perform this shading (with boiler-plate pom)

@vinothchandar : Any suggestions ?

@modi95 (cc @n3nash) : Please note that there will be similar issues with realtime queries and spark-2.4.4 when we eventually migrate RT tables to Spark 2.4.4

vinothchandar · 2019-11-15T13:48:54Z

On RecordPayload and avro bundling, I expect most people (esp with if you have record level indexes as we plan to) would be happy with OverwriteWithLatest payload, which is internal to the project? I think it may be fair to simplify things to supporting these default payloads well, while providing a guide for authoring Payloads.. (Need a JIRA for documenting such a guide).. Longer term, I am wondering if we should move away from Avro as the standard object (I did not want to invent our own object for obvious reasons of incurring additional conversion cost and code maintenance) or allow Payload to be written using all or some of Row/Avro/ArrayWritable? I know its not a clear answer, but its as concrete as I could get;

@umehrot2 is shading needed to avoid conflicts with Hive's avro? hows does this fit into this PR?

vinothchandar

LGTM

vinothchandar · 2019-11-17T22:05:05Z

KryoSerializer command line is provided on the quickstart page. Adding it there could be a good thing. Do a follow on update?

bvaradar · 2019-11-18T21:29:07Z

@umehrot2 : Sorry for the back-and-forth on this. Issue 1 (as mentioned in #1005 (comment)) is due to fat jar hive-exec. @n3nash proposed a solution in Uber which wont require moving to spark-hive. Instead of the test dependency : hive-exec, can you try depending on the non-fat version of the jar called : hive-exec-core. Hopefuly, we can control parquet/avro versions getting loaded for the tests.

umehrot2 · 2019-11-18T23:29:46Z

@umehrot2 : Sorry for the back-and-forth on this. Issue 1 (as mentioned in #1005 (comment)) is due to fat jar hive-exec. @n3nash proposed a solution in Uber which wont require moving to spark-hive. Instead of the test dependency : hive-exec, can you try depending on the non-fat version of the jar called : hive-exec-core. Hopefully, we can control parquet/avro versions getting loaded for the tests.

@bvaradar That's fine, we should take time and solve the right way.

In hudi-utilities it is a test dependency, but not in hudi-spark. hudi-spark depends on hive-service which is bringing in hive-exec as a transitive dependency. And that is the reason its ending up in tests classpath. We would have to get rid of hive-service if we were to do that.

Also I don't see any artifact like hive-exec-core. Maven does not recognize it. But besides that, when we are depending on runtime Spark's version of Hive, is there any reason why we are wanting to build it with hive 2.x instead ? May be hudi-hive in itself since that is an independent module makes sense to build with hive 2.x. But anything running within spark, why are we inclined to building with version of hive not supported by spark ?

umehrot2 · 2019-11-18T23:31:14Z

@vinothchandar Yes updating quick start page makes sense. Will you guys be doing that ?

n3nash · 2019-11-19T08:16:09Z

@umehrot2 Thanks for enumerating your thoughts. Let me add some more context here.

Firstly, hive-exec has a classifier core that allows you to get a dependency reduced version of the jar. Although this allows us to workaround the fat jar problem, there is another problem with this dependency reduced version of the jar which doesn't package some of the required transitive dependencies needed by classes in this jar. There are ways to fix this as well by including those relocated dependencies directly in Hudi (@modi95 was trying it at Uber)

Secondly, there is no support for Spark's fork of Hive (1.2.1.spark.2). This was forked by the Spark community to solve the exact issue of hive jars not bundling the correct dependencies that I described above, read more here : https://issues.apache.org/jira/browse/HIVE-16391?focusedCommentId=16032497&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16032497 and then some more changes were added to the fork which are NOT necessary according to the comments in the same jira.

In fact, there is a strong need in the spark community to move away from this forked version to the regular hive version. See here : http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Upgrade-built-in-Hive-to-2-3-4-td26153.html.

But I see your point on having the spark-modules depend on the spark-hive version, this way it's clear and we don't have to solve this issue ourselves.

I have a few hesitations in introducing a spark's forked hive version : a) This means we have 2 hive versions across the project b) The spark's forked version of hive doesn't have anything more apart from solving the hive-exec jar mess.
I'm actually okay with (b). @vinothchandar @bvaradar If you're okay with (a) and don't see any issues, I'm fine with taking this approach. Personally, don't have much foresight on the side-effects of having different versions across the project in the future.

bvaradar · 2019-11-19T23:35:59Z

Thanks @n3nash for your thoughts.

@umehrot2 : If it is possible to achieve spark 2.4 upgrade cleanly without moving to spark-hive version, it makes sense to me to retain native hive version. I think it is better to not get locked down on spark-version of hive.

As we are using custom code (non-spark) to do hive syncing, theoretically speaking - we may run into some hive issues which would need upgrade but as the issue is not seen in spark, they may be unwilling to patch their hive jars. we can use spark-hive as a last resort if we cannot upgrade to Spark 2.4 any other way :)

In that spirit, To your concern related to transitive dependencies in hudi-spark module - As maven honors dependency ordering, can we list hive-exec (with classifier as "core") in the dependency section before hive-service and add exclusions in the dependency section for hive-service to exclude hive-exec. I am not sure if this would work but don't have time to try this out myself.

Something along the lines of :

+    <dependency>
+      <groupId>${hive.groupid}</groupId>
+      <artifactId>hive-exec</artifactId>
+      <version>${hive.version}</version>
+      <classifier>core</classifier>
+    </dependency>
+
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-service</artifactId>
       <version>${hive.version}</version>
+      <exclusions>
+        <exclusion>
+          <groupId>${hive.groupid}</groupId>
+          <artifactId>hive-exec</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
     <dependency>

@umehrot2 : If we cannot make it to work any other way, I am ok with using spark-hive.

vinothchandar · 2019-11-22T04:38:14Z

What balaji suggested makes sense to me. spark.hive version has its own issues, but we can live with this if there is no other way.

cdmikechen · 2019-12-26T13:56:00Z

Want to know the progress of this PR now. I think not every user uses Spark2.4, Can we combine the two methods (databricks-avro and spark-avro) and simplify them as our own internal implementation, which can be compatible with most versions of spark2?

vinothchandar · 2020-01-07T06:07:05Z

@umehrot2 are you still driving this? We would like to merge this asap, giving us enough time for the next release to be cut..

cc @leesf

umehrot2 · 2020-01-07T19:17:03Z

@umehrot2 are you still driving this? We would like to merge this asap, giving us enough time for the next release to be cut..

cc @leesf
@vinothchandar just got back from my time off a couple of days back. Let me catch up on this PR and try to get it merged soon. When are we targeting for next release to be cut ?

vinothchandar · 2020-01-07T20:48:53Z

My suggestion is to freeze code by 15th, test the RC for a week and cut one by jan last/feb first week. @leesf is the release manager though.. So he can share plans..

leesf · 2020-01-07T23:58:26Z

@umehrot2 are you still driving this? We would like to merge this asap, giving us enough time for the next release to be cut..
cc @leesf
@vinothchandar just got back from my time off a couple of days back. Let me catch up on this PR and try to get it merged soon. When are we targeting for next release to be cut ?

My thought is to get code ready by 15th to get some buffer, and release at the end of this month.

umehrot2 · 2020-01-08T18:35:37Z

Ack working with that deadline in mind.

umehrot2 · 2020-01-08T20:00:44Z

@vinothchandar @bvaradar @n3nash I have updated the PR to now use the hive-exec with core classifier to solve the unit test issues that were occuring becuase of Hive. Removed the usage of spark.hive.version as desired. Let me know if this looks good.

vinothchandar

LGTM overall. Left two comments around bundling and shading.. we can wait for one of @bvaradar and @n3nash to chime in and merge if they agree as well

vinothchandar · 2020-01-09T03:46:11Z

        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer").master("local[1]").getOrCreate();
    JavaSparkContext jssc = new JavaSparkContext(spark.sparkContext());
+    spark.sparkContext().setLogLevel("WARN");
+    jssc.setLogLevel("WARN");


we need both?

I guess just the first one is fine. Will remove the second line.

vinothchandar · 2020-01-09T03:47:44Z

                  <include>org.apache.hive:hive-metastore</include>
                  <include>org.apache.hive:hive-jdbc</include>
-
-                  <include>com.databricks:spark-avro_2.11</include>


if we bundled org.apache.spark:spark-avro would n't that make life simpler for everyone?

I can give it a shot, but we need to carefully understand the consequences of shading a spark library, inside a Jar which is being run on Spark. I remember earlier we had some issue on EMR, but don't have the exact details. Nevertheless, let me try and see if tests pass.

I assume the tests will pass.. but I realize what you are saying.. the user could be running on a higher spark version say and we would be bundling 2.4.4 . Lets just open a JIRA to tackle this usability issue and keep it as -is now.. We can document the need for --packages ... when using spark-submit or spark-shell clearly for now.. and move on.

Yeah that can be one of the problems. Created a JIRA for this issue: https://issues.apache.org/jira/browse/HUDI-516

About documentation of --packages are you guys going to take care of that ?

vinothchandar · 2020-01-09T03:56:19Z

-
-  <profiles>
-    <profile>
-      <id>mr-bundle-shade-avro</id>


@bvaradar @n3nash : call this to your attention. is removing the profile going to affect the custom payload implementations at Uber?

@umehrot2 (cc @vinothchandar ) I will get back on this by today EOD.

(cc @n3nash ) Yeah, this would mean that we need to employ the same package relocation in the jar carrying custom record payloads. As discussed in the earlier threads, there is no way around it. @umehrot2 : We would need to document this caveat in Release Notes and add documentation on how to shade it. Can you create a ticket to track this ?

@vinothchandar @bvaradar Yes, this will affect the custom payload implementation on the reader side. But we are anyways going to make some changes in how the payload packages are loaded so we should be able to absorb this change as part of those considerations.

bvaradar

@umehrot2 : Looks good otherwise. Once you update LICENSE file, we can merge. While you are at it, can you also squash your commits. If not, we will do it when merging.

bvaradar · 2020-01-10T07:25:43Z

-
-  <profiles>
-    <profile>
-      <id>mr-bundle-shade-avro</id>


(cc @n3nash ) Yeah, this would mean that we need to employ the same package relocation in the jar carrying custom record payloads. As discussed in the earlier threads, there is no way around it. @umehrot2 : We would need to document this caveat in Release Notes and add documentation on how to shade it. Can you create a ticket to track this ?

…bricks-avro, add support for Decimal/Date types

umehrot2 · 2020-01-10T22:45:54Z

@bvaradar Created a JIRA to track documentation of Avro shading caveat https://issues.apache.org/jira/browse/HUDI-519

n3nash

LGTM

vinothchandar · 2020-01-12T07:02:10Z

About documentation of --packages are you guys going to take care of that ?

@umehrot2 typically the author of the PR also does the doc changes to keep things in sync.. Are you able to make the changes in docs? Mainly, it should be the quickstart, demo, writing_data pages

cdmikechen · 2020-01-13T00:40:21Z

@vinothchandar Glad to see this PR can be merged, does it mean we need to use spark2.4 and avro 1.8 in hudi 0.5.1 finally?
I think I should close my PR about spark avro before later.

bvaradar · 2020-01-13T01:59:32Z

@vinothchandar Glad to see this PR can be merged, does it mean we need to use spark2.4 and avro 1.8 in hudi 0.5.1 finally?
I think I should close my PR about spark avro before later.

Yes, It is merged and should be available in 0.5.1

cdmikechen · 2020-01-13T03:09:44Z

@bvaradar
OK~ I will close my PR #770 right now.
The problem about timestamp type maybe can be discussed and solved in other JIRA in later version.

…port multiple timeline servers with same host (apache#1005)

umehrot2 closed this Nov 9, 2019

umehrot2 reopened this Nov 9, 2019

vinothchandar reviewed Nov 12, 2019

View reviewed changes

vinothchandar assigned bvaradar Nov 12, 2019

umehrot2 mentioned this pull request Nov 13, 2019

Migrate integration tests to spark 2.4.4 #1012

Merged

umehrot2 force-pushed the spark-2.4.4-migration branch from 6f68984 to d195559 Compare November 14, 2019 01:51

vinothchandar approved these changes Nov 17, 2019

View reviewed changes

ezzz01 mentioned this pull request Dec 17, 2019

[HUDI-238] - Migrating to Scala 2.12 #1109

Closed

5 tasks

umehrot2 force-pushed the spark-2.4.4-migration branch from 768f03d to 7b3d943 Compare January 8, 2020 19:25

vinothchandar reviewed Jan 9, 2020

View reviewed changes

bvaradar approved these changes Jan 10, 2020

View reviewed changes

Migrate to spark 2.4.4, migrate to spark-avro library instead of data…

a3fb7d4

…bricks-avro, add support for Decimal/Date types

umehrot2 force-pushed the spark-2.4.4-migration branch from 07d9f3f to a3fb7d4 Compare January 10, 2020 22:33

n3nash self-requested a review January 11, 2020 00:27

n3nash approved these changes Jan 11, 2020

View reviewed changes

bvaradar merged commit ad50008 into apache:master Jan 12, 2020

cdmikechen mentioned this pull request Jan 13, 2020

remove com.databricks:spark-avro to build spark avro schema by itself #770

Closed

lamberken mentioned this pull request Feb 5, 2020

[HUDI-584] Relocate spark-avro dependency by maven-shade-plugin #1290

Merged

5 tasks

vinishjail97 added a commit to vinishjail97/hudi that referenced this pull request Nov 22, 2024

[ENG-16492][INTERNAL] Include port in TimelineServerIdentifier to sup…

80c48bb

…port multiple timeline servers with same host (apache#1005)

Conversation

umehrot2 commented Nov 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umehrot2 commented Nov 9, 2019

Uh oh!

bvaradar commented Nov 9, 2019

Uh oh!

modi95 commented Nov 10, 2019

Uh oh!

ezzz01 commented Nov 10, 2019

Uh oh!

pratyakshsharma commented Nov 11, 2019

Uh oh!

vinothchandar commented Nov 12, 2019

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

umehrot2 commented Nov 12, 2019

Uh oh!

umehrot2 commented Nov 12, 2019

Uh oh!

vinothchandar commented Nov 12, 2019

Uh oh!

bvaradar commented Nov 12, 2019

Uh oh!

bvaradar commented Nov 12, 2019

Uh oh!

umehrot2 commented Nov 12, 2019

Uh oh!

bvaradar commented Nov 14, 2019

Uh oh!

bvaradar commented Nov 14, 2019

Uh oh!

umehrot2 commented Nov 14, 2019

Uh oh!

umehrot2 commented Nov 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umehrot2 commented Nov 15, 2019

Uh oh!

bvaradar commented Nov 15, 2019

Uh oh!

umehrot2 commented Nov 15, 2019

Uh oh!

bvaradar commented Nov 15, 2019

Uh oh!

bvaradar commented Nov 15, 2019

Uh oh!

vinothchandar commented Nov 15, 2019

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bvaradar commented Nov 18, 2019

Uh oh!

umehrot2 commented Nov 18, 2019

Uh oh!

umehrot2 commented Nov 18, 2019

Uh oh!

n3nash commented Nov 19, 2019

Uh oh!

bvaradar commented Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinothchandar commented Nov 22, 2019

Uh oh!

cdmikechen commented Dec 26, 2019

Uh oh!

vinothchandar commented Jan 7, 2020

Uh oh!

umehrot2 commented Nov 9, 2019 •

edited

Loading

umehrot2 commented Nov 14, 2019 •

edited

Loading

bvaradar commented Nov 19, 2019 •

edited

Loading

cdmikechen commented Jan 13, 2020 •

edited

Loading