Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added HDFS Sink #2409

Merged
merged 12 commits into from
Sep 6, 2018
Merged

Added HDFS Sink #2409

merged 12 commits into from
Sep 6, 2018

Conversation

david-streamlio
Copy link
Contributor

Motivation

Added the HDFS Sink connector.

Modifications

Added the hfs module to the pulsar-io sub-module

Result

After this change, there will be an HDFS sink available for writing Pulsar data to HDFS

@sijie sijie added type/feature The PR added a new feature or issue requested a new feature area/connector labels Aug 20, 2018
@sijie sijie added this to the 2.2.0-incubating milestone Aug 20, 2018
public static final String DEFLATE = "Deflate";
public static final String GZIP = "Gzip";
public static final String LZ4 = "Lz4";
public static final String SNAPPY = "Snappy";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be set as enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public static final String LZ4 = "Lz4";
public static final String SNAPPY = "Snappy";

private static final long serialVersionUID = 1L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lombok should be adding this automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop
* will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration.
*/
protected String hdfsConfigResources;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields should be marked as private (lombok generated getter/setters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return null;
}

public String getHdfsConfigResources() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to write getters and setters. This is already marked as @Data so they will be provided automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

<version>3.1.1</version>
</dependency>

<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use TestNG for all the tests (and it's already added as a test dependency)


@Test
public final void loadFromYamlFileTest() throws IOException {
File yamlFile = getFile("sinkConfig.yaml");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting in all files should be 4 spaces (no tabs). There is a formatting profile for Eclipse at https://github.com/apache/incubator-pulsar/blob/master/src/formatter.xml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added "contrib-check" profile to main pom.xml file to validate check style compliance

@@ -0,0 +1,35 @@
package org.apache.pulsar.tests.integration.containers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASF headers are missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@merlimat
Copy link
Contributor

@david-streamlio One question is also if it's possible to abstract out some parts that might be common with a future "S3" or "GCS" IO sink.

@sijie
Copy link
Member

sijie commented Aug 27, 2018

@srkukarni @jerrypeng can you guys also review this PR?

private String keytab;

public void validate() {
if (StringUtils.isEmpty(hdfsConfigResources) || StringUtils.isEmpty(directory)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments for hdfsConfigResources indicate that if this is unset we look into certain places. Thus it looks like this check shouldnt be there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change the comment, since we cannot assume that this Sink will be executing on a node that has the Hadoop client configuration files available.

@sijie
Copy link
Member

sijie commented Aug 29, 2018

@merlimat @david-streamlio @srkukarni what is the current status of this PR? are we able to wrap this up by end of this week?

@david-streamlio
Copy link
Contributor Author

I have corrected all of the issues found during the code review, and have conducted extensive testing locally. So, I believe the code is ready to go.

@srkukarni
Copy link
Contributor

run integration tests

@srkukarni
Copy link
Contributor

run java8 tests

1 similar comment
@srkukarni
Copy link
Contributor

run java8 tests

@srkukarni
Copy link
Contributor

run integration tests

@sijie
Copy link
Member

sijie commented Sep 4, 2018

@srkukarni @merlimat can you review this PR and make sure we can land this in 2.2 release?

@srkukarni
Copy link
Contributor

run integration tests

@srkukarni
Copy link
Contributor

run integration test

@srkukarni
Copy link
Contributor

run integration tests

Copy link
Contributor

@srkukarni srkukarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @sijie could you please review the pom changes

@srkukarni srkukarni merged commit 261eab1 into apache:master Sep 6, 2018
@david-streamlio david-streamlio deleted the hdfs-connector branch September 11, 2018 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connector type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants