Adding StringFormat to available formatters #283

alexnu · 2018-02-01T15:35:57Z

Creating StringFormat to be used with format.class.
Adding tests.

ghost · 2018-02-01T15:35:59Z

It looks like @alexnu hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

alexnu · 2018-02-01T15:40:18Z

[clabot:check]

ghost · 2018-02-01T15:40:19Z

@confluentinc It looks like @alexnu just signed our Contributor License Agreement. 👍

Always at your service,

clabot

OneCricketeer · 2018-02-01T16:02:20Z

Could this be used for CSV data, or just single lines?

alexnu · 2018-02-01T16:30:29Z

Just single lines and we've tested it together with StringConverter. But maybe it could work for CSV too, I'm not sure. All it does is write the whole value of the message as a string.

OneCricketeer · 2018-02-01T16:43:59Z

I was mostly curious about if CSV could be done with respect to Hive tables stored as text.

alexnu · 2018-02-01T20:36:49Z

Unfortunately, Hive is not supported by this PR. I followed the example of JSON formatter which throws an UnsupportedOperationException when called.

OneCricketeer · 2018-02-02T00:31:34Z

Yes, I noticed that, and noticed that your Unsupported Exception string also says "JSON" instead of "Text".

Also, wanted to ask if com.fasterxml.jackson.core.JsonGenerator was really necessary for just writing out a plain String.

alexnu · 2018-02-02T09:37:45Z

You are right about the Unsupported Exception string, I already changed it.

About the JsonGenerator I used it because it is already tested with regards to performance, but it is definitely not necessary. Maybe a plain BufferedWriter would be better?

alexnu · 2018-02-05T17:14:16Z

So I replaced JsonGenerator with a plain OutputStreamWriter. Plus I tested it and it works great.

alexnu · 2018-02-06T15:54:48Z

Not really sure why the tests are failing. I tried running them locally and they all pass.

kkonstantine

Thanks @alexnu!

The addition of this format seems straightforward and thus, I believe we can add it soon to target 4.1. It's also low risk. I've left just a few comments that should be easy to address.

kkonstantine · 2018-02-08T00:02:01Z

src/main/java/io/confluent/connect/hdfs/string/StringFileReader.java

@@ -0,0 +1,55 @@
+/**
+ * Copyright 2017 Confluent Inc.


kkonstantine · 2018-02-08T00:03:08Z

src/main/java/io/confluent/connect/hdfs/string/StringFormat.java

@@ -0,0 +1,59 @@
+/**
+ * Copyright 2017 Confluent Inc.


kkonstantine · 2018-02-08T00:04:10Z

src/main/java/io/confluent/connect/hdfs/string/StringFormat.java

+
+
+/**
+ * A storage format implementation that exports JSON records to text files with a '.json'


Javadoc is off. Remainder from copying JsonFormat

kkonstantine · 2018-02-08T00:04:35Z

src/main/java/io/confluent/connect/hdfs/string/StringRecordWriterProvider.java

@@ -0,0 +1,93 @@
+/**
+ * Copyright 2017 Confluent Inc.


kkonstantine · 2018-02-08T00:27:41Z

src/test/java/io/confluent/connect/hdfs/string/DataWriterStringTest.java

@@ -0,0 +1,96 @@
+/**
+ * Copyright 2017 Confluent Inc.


kkonstantine · 2018-02-09T22:11:21Z

src/main/java/io/confluent/connect/hdfs/string/StringRecordWriterProvider.java

+          log.trace("Sink record: {}", record.toString());
+          try {
+            String value = (String) record.value();
+


Let's keep things tighter here. Extra blank line

kkonstantine · 2018-02-09T22:11:29Z

src/main/java/io/confluent/connect/hdfs/string/StringRecordWriterProvider.java

+
+            writer.write(value);
+            writer.write(LINE_SEPARATOR);
+


Extra blank line here too

kkonstantine · 2018-02-09T22:14:03Z

src/test/java/io/confluent/connect/hdfs/string/DataWriterStringTest.java

+    List<SinkRecord> sinkRecords = new ArrayList<>();
+    for (long offset = 0, total = 0; total < size; ++offset) {
+      for (TopicPartition tp : partitions) {
+        String record = "Some random text...";


Let's include the offset in the text message to produce different content in every record.

kkonstantine · 2018-02-09T22:48:16Z

src/main/java/io/confluent/connect/hdfs/string/StringRecordWriterProvider.java

+      return new RecordWriter() {
+        final Path path = new Path(filename);
+        final OutputStream out = path.getFileSystem(conf.getHadoopConfiguration()).create(path);
+        final OutputStreamWriter writer = new OutputStreamWriter(out);


Pretty sure we need to wrap this with a BufferedWriter. Let's use one with 128*1024 bytes as size.

Also, similar to what we do elsewhere, any constant should be a private static final variable. Same for the WRITER_BUFFER_SIZE here

kkonstantine · 2018-02-09T22:49:06Z

src/main/java/io/confluent/connect/hdfs/string/StringRecordWriterProvider.java

+            String value = (String) record.value();
+
+            writer.write(value);
+            writer.write(LINE_SEPARATOR);


By using a BufferedWriter you can use writer.newLine() here instead and skip declaration of LINE_SEPARATOR altogether in this class.

kkonstantine · 2018-02-09T23:14:48Z

To unblock the build, you'll need to rebase on top of the recent changes in master branch.

alexnu · 2018-02-10T14:14:02Z

Thanks for the review @kkonstantine. I believe all of your comments are now resolved.

Also, I rebased from master but I'm getting errors at the following tests:

HiveIntegrationAvroTest
TopicPartitionWriterTest
HiveIntegrationParquetTest

kkonstantine

Build fails due to a findbugs error re: the charset used by the writer.

Pushing this change myself as well as a javadoc improvement so we can merge this promptly. Thanks for addressing the comments @alexnu!

kkonstantine · 2018-02-12T18:04:18Z

src/main/java/io/confluent/connect/hdfs/string/StringFormat.java

+
+/**
+ * A storage format implementation that exports records to text files with a '.txt'
+ * extension. In these files, records are separated by the BufferedWriter's new line


This is a bit too specific w.r.t to implementation details for this javadoc. I'll change to by the system's line separator.

kkonstantine · 2018-02-12T18:57:04Z

src/main/java/io/confluent/connect/hdfs/string/StringRecordWriterProvider.java

+      return new RecordWriter() {
+        final Path path = new Path(filename);
+        final OutputStream out = path.getFileSystem(conf.getHadoopConfiguration()).create(path);
+        final OutputStreamWriter streamWriter = new OutputStreamWriter(out);


Here's where findbugs is complaining about not specifying a charset. Still, I'll use the default charset that can be overridden by the file.encoding property and if not specified defaults to UTF-8 instead of introducing yet another connector config property.

alexnu mentioned this pull request Feb 1, 2018

How to set the 'format.class' to the text format? #267

Closed

kkonstantine reviewed Feb 9, 2018

View reviewed changes

Alexandros Nafas and others added 3 commits February 10, 2018 12:16

Adding StringFormat to available formatters

900e6b4

Correct exception message for Hive

01f20bd

Replacing JsonGenerator with OutputStreamWriter

baa7238

dmtrs force-pushed the string-format branch from 123f01d to baa7238 Compare February 10, 2018 10:27

Alexandros Nafas added 3 commits February 10, 2018 12:54

Fixing Javadoc and codestyle issues

75cd7e1

Adding offsets to DataWriterStringTest

db583f1

Adding offsets to test records too

ec512b5

kkonstantine added 2 commits February 12, 2018 10:49

Make class javadoc implementation independent.

4704851

Fix findbugs error by explicitly using the default charset.

1223d4b

kkonstantine reviewed Feb 12, 2018

View reviewed changes

kkonstantine merged commit f1b37ee into confluentinc:master Feb 12, 2018

OneCricketeer mentioned this pull request May 17, 2018

How can I sink some raw data (such as string) in a topic to hdfs? I'm totally a rookie confluentinc/schema-registry#374

Closed

OneCricketeer mentioned this pull request Sep 11, 2018

The problems is kafka to hdfs Use time range #186

Closed

kudojp mentioned this pull request Sep 11, 2019

format.class is not supporting StringFormat. confluentinc/kafka-connect-storage-cloud#218

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding StringFormat to available formatters #283

Adding StringFormat to available formatters #283

alexnu commented Feb 1, 2018

ghost commented Feb 1, 2018

alexnu commented Feb 1, 2018

ghost commented Feb 1, 2018

OneCricketeer commented Feb 1, 2018

alexnu commented Feb 1, 2018

OneCricketeer commented Feb 1, 2018

alexnu commented Feb 1, 2018

OneCricketeer commented Feb 2, 2018

alexnu commented Feb 2, 2018

alexnu commented Feb 5, 2018

alexnu commented Feb 6, 2018

kkonstantine left a comment

kkonstantine Feb 8, 2018

kkonstantine Feb 8, 2018

kkonstantine Feb 8, 2018

kkonstantine Feb 8, 2018

kkonstantine Feb 8, 2018

kkonstantine Feb 9, 2018

kkonstantine Feb 9, 2018

kkonstantine Feb 9, 2018

kkonstantine Feb 9, 2018

kkonstantine Feb 9, 2018

kkonstantine Feb 9, 2018

kkonstantine commented Feb 9, 2018

alexnu commented Feb 10, 2018

kkonstantine left a comment

kkonstantine Feb 12, 2018

kkonstantine Feb 12, 2018



		/**
		* A storage format implementation that exports JSON records to text files with a '.json'

Adding StringFormat to available formatters #283

Adding StringFormat to available formatters #283

Conversation

alexnu commented Feb 1, 2018

ghost commented Feb 1, 2018

alexnu commented Feb 1, 2018

ghost commented Feb 1, 2018

OneCricketeer commented Feb 1, 2018

alexnu commented Feb 1, 2018

OneCricketeer commented Feb 1, 2018

alexnu commented Feb 1, 2018

OneCricketeer commented Feb 2, 2018

alexnu commented Feb 2, 2018

alexnu commented Feb 5, 2018

alexnu commented Feb 6, 2018

kkonstantine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkonstantine commented Feb 9, 2018

alexnu commented Feb 10, 2018

kkonstantine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment