Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Build and code sample updates for Kafka Streams DSL for Scala #4949

Merged
merged 4 commits into from May 7, 2018

Conversation

Projects
None yet
4 participants
@seglo
Copy link
Contributor

commented Apr 30, 2018

Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.

Encode Scala major version into streams-scala artifacts.

To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.

After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:

$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install

Which results in the following files generated:

/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/maven-metadata-local.xml
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT-sources.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/maven-metadata-local.xml
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT-test-sources.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT.pom
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT-test.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT-javadoc.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT-scaladoc.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.12/2.0.0-SNAPSHOT/kafka-streams-scala_2.12-2.0.0-SNAPSHOT.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/maven-metadata-local.xml
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT-test.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/maven-metadata-local.xml
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT-sources.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT-scaladoc.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT-test-sources.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT-javadoc.jar
/home/seglo/.m2/repository/org/apache/kafka/kafka-streams-scala_2.11/2.0.0-SNAPSHOT/kafka-streams-scala_2.11-2.0.0-SNAPSHOT.pom

Review Code Samples

I reviewed all the code samples introduced by the main PR. I created a sample project which validates the main WordCount example. The other examples are code snippets instead of whole programs, but are equivalent to existing tests found in the apache/kafka streams-scala project, or the original lightbend/kafka-streams-scala project.

Code sample / documentation references

WordCountApplication example

Docs location:

  • /documentation/streams/ (Scala Example)
  • /documentation/streams/developer-guide/dsl-api.html#scala-dsl-sample-usage

See WordCountApplication in sample project.

NOTE: The WordCountApplication usage of count reveals how using the Materialized does not take advantage of implicit SerDes as other Kafka Streams operators do (operators that accept a Serialized, Consumed, Produced etc). This is not an issue when you do not want to provide a user-defined and named state store, but to make this example consistent with the Java examples we use the overload of the count API which takes the Materialized parameter. It's required to pass the key SerDes in this example because a global serializer is not defined in the Kafka Streams config. @debasishg is going to follow up with a proposal to amend this API in a separate thread.

Implicit SerDes Example

Docs location:

  • /documentation/streams/developer-guide/dsl-api.html#scala-dsl-implicit-serdes

See StreamToTableJoinScalaIntegrationTestImplicitSerdes test in apache/kafka

https://github.com/apache/kafka/blob/trunk/streams/streams-scala/src/test/scala/org/apache/kafka/streams/scala/StreamToTableJoinScalaIntegrationTestImplicitSerdes.scala#L77..L102

User-defined SerDes Example

Docs location:

  • /documentation/streams/developer-guide/dsl-api.html#scala-dsl-user-defined-serdes

See StreamToTableJoinScalaIntegrationTestImplicitSerdesWithAvro integration test in lightbend/kafka-streams-scala. This test doesn't exist in apache/kafka because we didn't want to add the Avro dep.

https://github.com/lightbend/kafka-streams-scala/blob/v0.2.1/src/test/scala/com/lightbend/kafka/scala/streams/StreamToTableJoinScalaIntegrationTestImplicitSerdesWithAvro.scala#L61..L142

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)
@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2018

@@ -980,7 +980,7 @@ project(':streams') {
project(':streams:streams-scala') {
println "Building project 'streams-scala' with Scala version ${versions.scala}"
apply plugin: 'scala'
archivesBaseName = "kafka-streams-scala"
archivesBaseName = "kafka-streams-scala_${versions.baseScala}"

This comment has been minimized.

Copy link
@ijuma

ijuma Apr 30, 2018

Contributor

Unrelated question, why is scalaLogging just a test compile dependency?

This comment has been minimized.

Copy link
@seglo

seglo Apr 30, 2018

Author Contributor

It's required by the LazyLogging trait referenced by several tests, but it doesn't look like any logging is actually done. It's probably safe to remove this. @debasishg can you provide some insight here?

This comment has been minimized.

Copy link
@debasishg

debasishg Apr 30, 2018

Contributor

+1 .. it's there, along with log4j.properties as part of logging infrastructure from the tests. Currently not used though. We can remove the imports and do away with the dependency. Do we keep log4j.properties ?

This comment has been minimized.

Copy link
@seglo

seglo Apr 30, 2018

Author Contributor

Let's remove the dependency. I think it makes sense to keep log4j.properties so users can see what's happening with the embedded Kafka instance and Kafka streams when the tests are run.

@guozhangwang

This comment has been minimized.

Copy link
Contributor

commented May 2, 2018

@seglo Thanks a lot for the PR. It lgtm overall.

Just a side note: could you double check if the release script https://github.com/apache/kafka/blob/trunk/release.py for publishing the built artifacts to maven central as part of the release process (https://cwiki.apache.org/confluence/display/KAFKA/Release+Process) needs to be updated; I made a quick look around https://github.com/apache/kafka/blob/trunk/release.py#L451-L456 and looks like we are well covered, but better have another pair of eyes.

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2018

Hey @guozhangwang,

I took a peek at release.py. The following 2 lines appear to actually trigger the publish/install of artifacts.

cmd("Building and uploading archives", "./gradlew uploadArchivesAll", cwd=kafka_dir, env=jdk7_env)
cmd("Building and uploading archives", "./gradlew uploadCoreArchives_2_12 -PscalaVersion=2.12", cwd=kafka_dir, env=jdk8_env)

The first line builds and publishes all artifacts compiled with the 1.7 JDK, whereas the second line I think just publishes Kafka core for Scala 2.12 compiled with 1.8 JDK. I tried executing ./gradlew uploadArchivesAll -PscalaVersion=2.12 and this publishes the correct version of kafka-streams-scala in my local test repository.

 seglo@slice  /tmp/kafkaRepo/org/apache/kafka  ls -la
total 72
drwxrwxr-x 18 seglo seglo 4096 May  4 17:02 .
drwxrwxr-x  3 seglo seglo 4096 May  4 16:48 ..
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 connect-api
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 connect-file
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 connect-json
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 connect-runtime
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 connect-transforms
drwxrwxr-x  3 seglo seglo 4096 May  4 16:48 kafka_2.11
drwxrwxr-x  3 seglo seglo 4096 May  4 16:53 kafka_2.12
drwxrwxr-x  3 seglo seglo 4096 May  4 16:48 kafka-clients
drwxrwxr-x  3 seglo seglo 4096 May  4 16:48 kafka-examples
drwxrwxr-x  3 seglo seglo 4096 May  4 16:48 kafka-log4j-appender
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 kafka-streams
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 kafka-streams-examples
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 kafka-streams-scala_2.11
drwxrwxr-x  3 seglo seglo 4096 May  4 17:02 kafka-streams-scala_2.12
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 kafka-streams-test-utils
drwxrwxr-x  3 seglo seglo 4096 May  4 16:49 kafka-tools

I think we could add that command to release.py, but it may do a lot more work than is needed and I'm not sure if it overwrites any of the 1.7 JDK artifacts. Based on dir timestamps it doesn't look like it overwrote the 1.7 artifacts. @ijuma Do you think adding something like the following is good enough or do we need to be more specific?

cmd("Building and uploading archives", "./gradlew uploadArchivesAll -PscalaVersion=2.12", cwd=kafka_dir, env=jdk8_env)
@ijuma

This comment has been minimized.

Copy link
Contributor

commented May 4, 2018

I think you'd want the equivalent of uploadCoreArchives_2_12 for Streams, but once we drop support for Java 7, then uploadArchivesAll is enough. So you could just skip the update to release.py under the expectation that things will just work once we drop support for Java 7.

@ijuma

ijuma approved these changes May 4, 2018

Copy link
Contributor

left a comment

LGTM for the build and test changes. Someone from Streams needs to review the documentation changes.

val textLines: KStream[String, String] = builder.stream[String, String]("TextLinesTopic")
val wordCounts: KTable[String, Long] = textLines
.flatMapValues(textLine => textLine.toLowerCase.split("\\W+"))
.groupBy((_, word) => word)
.count(Materialized.as("counts-store"))
.count(Materialized.as("counts-store").withKeySerde(DefaultSerdes.stringSerde))

This comment has been minimized.

Copy link
@guozhangwang
@guozhangwang
Copy link
Contributor

left a comment

Doc changes LGTM except a minor question.

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2018

@guozhangwang I've confirmed that with #4919 the Key serdes is no longer required. Good job! I updated the documentation code samples accordingly.

@guozhangwang guozhangwang merged commit 893e044 into apache:trunk May 7, 2018

2 of 3 checks passed

JDK 7 and Scala 2.11 FAILURE 7948 tests run, 1 skipped, 0 failed.
Details
JDK 10 and Scala 2.12 SUCCESS 8879 tests run, 7 skipped, 0 failed.
Details
JDK 8 and Scala 2.12 SUCCESS 8879 tests run, 7 skipped, 0 failed.
Details
@guozhangwang

This comment has been minimized.

Copy link
Contributor

commented May 7, 2018

@seglo Thanks! I've just merged it to trunk.

jcustenborder added a commit to jcustenborder/kafka that referenced this pull request May 16, 2018

MINOR: Build and code sample updates for Kafka Streams DSL for Scala (a…
…pache#4949)

Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.

Encode Scala major version into streams-scala artifacts.
To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.

After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:

$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install

Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>

umesh9794 added a commit to umesh9794/kafka that referenced this pull request Jun 5, 2018

MINOR: Build and code sample updates for Kafka Streams DSL for Scala (a…
…pache#4949)

Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.

Encode Scala major version into streams-scala artifacts.
To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.

After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:

$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install

Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>

ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018

MINOR: Build and code sample updates for Kafka Streams DSL for Scala (a…
…pache#4949)

Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.

Encode Scala major version into streams-scala artifacts.
To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.

After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:

$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install

Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>

nimosunbit added a commit to sunbit-dev/kafka that referenced this pull request Nov 6, 2018

MINOR: Build and code sample updates for Kafka Streams DSL for Scala (a…
…pache#4949)

Several build and documentation updates were required after the merge of KAFKA-6670: Implement a Scala wrapper library for Kafka Streams.

Encode Scala major version into streams-scala artifacts.
To differentiate versions of the kafka-streams-scala artifact across Scala major versions it's required to encode the version into the artifact name before its published to a maven repository. This is accomplished by following a similar release process as kafka core, which encodes the Scala major version and then runs the build for each major version of Scala supported. This is considered standard practice when releasing Scala libraries, but is not handled for us automatically with the basic Scala for Gradle support.

After this change you can generate and install the kafka-streams-scala artifact into the local maven repository:

$ ./gradlew -PscalaVersion=2.11 install
$ ./gradlew -PscalaVersion=2.12 install

Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.