-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added HDFS Sink #2409
Added HDFS Sink #2409
Conversation
public static final String DEFLATE = "Deflate"; | ||
public static final String GZIP = "Gzip"; | ||
public static final String LZ4 = "Lz4"; | ||
public static final String SNAPPY = "Snappy"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be set as enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public static final String LZ4 = "Lz4"; | ||
public static final String SNAPPY = "Snappy"; | ||
|
||
private static final long serialVersionUID = 1L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lombok should be adding this automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
* A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop | ||
* will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration. | ||
*/ | ||
protected String hdfsConfigResources; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields should be marked as private (lombok generated getter/setters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
return null; | ||
} | ||
|
||
public String getHdfsConfigResources() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to write getters and setters. This is already marked as @Data
so they will be provided automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
<version>3.1.1</version> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use TestNG for all the tests (and it's already added as a test dependency)
|
||
@Test | ||
public final void loadFromYamlFileTest() throws IOException { | ||
File yamlFile = getFile("sinkConfig.yaml"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting in all files should be 4 spaces (no tabs). There is a formatting profile for Eclipse at https://github.com/apache/incubator-pulsar/blob/master/src/formatter.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and added "contrib-check" profile to main pom.xml file to validate check style compliance
@@ -0,0 +1,35 @@ | |||
package org.apache.pulsar.tests.integration.containers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASF headers are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@david-streamlio One question is also if it's possible to abstract out some parts that might be common with a future "S3" or "GCS" IO sink. |
@srkukarni @jerrypeng can you guys also review this PR? |
private String keytab; | ||
|
||
public void validate() { | ||
if (StringUtils.isEmpty(hdfsConfigResources) || StringUtils.isEmpty(directory)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments for hdfsConfigResources indicate that if this is unset we look into certain places. Thus it looks like this check shouldnt be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change the comment, since we cannot assume that this Sink will be executing on a node that has the Hadoop client configuration files available.
@merlimat @david-streamlio @srkukarni what is the current status of this PR? are we able to wrap this up by end of this week? |
I have corrected all of the issues found during the code review, and have conducted extensive testing locally. So, I believe the code is ready to go. |
run integration tests |
run java8 tests |
1 similar comment
run java8 tests |
run integration tests |
@srkukarni @merlimat can you review this PR and make sure we can land this in 2.2 release? |
…-pulsar into hdfs-connector
run integration tests |
run integration test |
run integration tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @sijie could you please review the pom changes
Motivation
Added the HDFS Sink connector.
Modifications
Added the hfs module to the pulsar-io sub-module
Result
After this change, there will be an HDFS sink available for writing Pulsar data to HDFS