Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

APEXMALHAR-2463 FTP input operator sample app, and documentation #596

Merged
merged 1 commit into from May 17, 2017

Conversation

francisf
Copy link
Contributor

@amberarrow @chaithu14 please review.

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch 2 times, most recently from fd21300 to 48f0cf0 Compare April 11, 2017 08:23
@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch 2 times, most recently from 291a94b to 577ba2f Compare April 21, 2017 06:29
@amberarrow
Copy link
Contributor

Ok, will do -- give me a couple of days.

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch from 577ba2f to 80537e3 Compare April 26, 2017 07:36
Copy link
Contributor

@chaithu14 chaithu14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment in application, rest looks good to me.

import com.datatorrent.api.StreamingApplication;
import com.datatorrent.api.annotation.ApplicationAnnotation;
import com.datatorrent.lib.io.AbstractFTPInputOperator.FTPStringInputOperator;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some description about the application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaithu14 Added description and comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add README.md for the example app similar to other example apps

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch from 80537e3 to 795707b Compare April 27, 2017 06:41
(org.apache.hadoop.fs.ftp.FTPFileSystem)

## Class Diagram
![FTPInputOperator class diagram](images/ftpInputOperator/classdiagram.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please show the complete class diagram including all the interfaces implemented in the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amberarrow this operator extends the AbstractFileInputOperator in which the interfaces are implemented. The FTP input operator over-rides the getFSInstance() and partitioned() method. I feel adding the AbstractFileInputOperator to the class diagram will take too much space with no addition of value which is specific to the FTP input operator. Thoughts?

=============

## Operator Objective
This operator is designed to scan a directory from an FTP server for files, read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the class name here, e.g. This operator (AbstractFTPInputOperator) ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This operator is designed to scan a directory from an FTP server for files, read
and split file content into tuples such as lines or blocks of bytes, and finally
emit them on the output port for further processing by downstream operators.
The operator extends the AbstractFileInputOperator. It overrides the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backquotes around literal class name (several places)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

### Ports
Because this is an input operator, there are no input ports.

- output: output port on which data is emitted. A downstream operator receives tuples from the FTPInputOpertor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the first "output" bold since it is a sort of mini-title.
The second sentence is not especially useful and can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

2. **source**: the directory path from where to scan and read files.
3. **username**: the username for authenticating against the FTP server. This is
an optional property and can be skipped when anonymous FTP is enabled
4. **password**: the password to be used in conjunction with the above username.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show examples of setting these properties both in code as well as in properties files (but please test your examples with an actual running application before adding them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch 2 times, most recently from b2f6ab7 to 53ae2c6 Compare May 8, 2017 09:09
@francisf
Copy link
Contributor Author

francisf commented May 8, 2017

@amberarrow please go through the review comments.

Copy link
Contributor

@shubham-pathak22 shubham-pathak22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For operator documentation, I recommend formatting the page to have it consistent with other docs. E.g adding port and properties information in a table, code block for static and dynamic partitioning, clearly specifying operator related info like version number, package names etc. You may refer to https://github.com/apache/apex-malhar/blob/master/docs/operators/csvParserOperator.md

import com.datatorrent.api.StreamingApplication;
import com.datatorrent.api.annotation.ApplicationAnnotation;
import com.datatorrent.lib.io.AbstractFTPInputOperator.FTPStringInputOperator;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add README.md for the example app similar to other example apps

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch from 53ae2c6 to 1e916e5 Compare May 10, 2017 09:32
Copy link
Contributor

@shubham-pathak22 shubham-pathak22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments. Rest all looks good to me. Make sure commits are sqashed and jenkins checks are passing


#### Dynamic Partitioning
Dynamic partitioning -- changing the number of partitions of one or more operators
in a running application -- can be achieved in multiple ways:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a code block of how to use ThroughputBased / LatencyBased Partitioner as an example in addition to the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done via the setPartitionCount method which is a part of the super class AbstractFileInputOperator.

count, set the `repartitionRequired` flag in the response.

### Configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may move this section above. Have it just after Ports section

FTPStringInputOperator reader = dag.addOperator("Reader", new FTPStringInputOperator());
//Set properties for the FTP input operator
reader.setHost("localhost");
reader.setUserName("ftp");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to set these through property files instead of in Application.java ( they way it has been mentioned in the README.md )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in response to earlier comment on this PR for setting properties in the example application itself. Currently properties are being set in the code as well as properties.xml

// writer that writes strings to a file on hdfs
StringFileOutputOperator writer = dag.addOperator("Writer", new StringFileOutputOperator());
//Set properties for the output operator
writer.setFilePath("malhar_examples/ftp");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set these through property files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in response to earlier comment on this PR for setting properties in the example application itself. Currently properties are being set in the code as well as properties.xml

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch from 1e916e5 to fab1422 Compare May 11, 2017 10:45
@shubham-pathak22
Copy link
Contributor

@francisf Looks good to me. Please make sure Travis build checks are passed. Will merge it in a day if there are no other comments from the community

@francisf francisf force-pushed the APEXMALHAR-2463_FTP_input_example branch from fab1422 to c3f86f2 Compare May 16, 2017 06:52
@asfgit asfgit merged commit c3f86f2 into apache:master May 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants