APEXMALHAR-2463 FTP input operator sample app, and documentation #596
APEXMALHAR-2463 FTP input operator sample app, and documentation #596
Conversation
fd21300
to
48f0cf0
Compare
291a94b
to
577ba2f
Compare
Ok, will do -- give me a couple of days. |
577ba2f
to
80537e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment in application, rest looks good to me.
import com.datatorrent.api.StreamingApplication; | ||
import com.datatorrent.api.annotation.ApplicationAnnotation; | ||
import com.datatorrent.lib.io.AbstractFTPInputOperator.FTPStringInputOperator; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some description about the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaithu14 Added description and comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add README.md for the example app similar to other example apps
80537e3
to
795707b
Compare
docs/operators/ftpInputOperator.md
Outdated
(org.apache.hadoop.fs.ftp.FTPFileSystem) | ||
|
||
## Class Diagram | ||
![FTPInputOperator class diagram](images/ftpInputOperator/classdiagram.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please show the complete class diagram including all the interfaces implemented in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amberarrow this operator extends the AbstractFileInputOperator in which the interfaces are implemented. The FTP input operator over-rides the getFSInstance() and partitioned() method. I feel adding the AbstractFileInputOperator to the class diagram will take too much space with no addition of value which is specific to the FTP input operator. Thoughts?
docs/operators/ftpInputOperator.md
Outdated
============= | ||
|
||
## Operator Objective | ||
This operator is designed to scan a directory from an FTP server for files, read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention the class name here, e.g. This operator (AbstractFTPInputOperator
) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/operators/ftpInputOperator.md
Outdated
This operator is designed to scan a directory from an FTP server for files, read | ||
and split file content into tuples such as lines or blocks of bytes, and finally | ||
emit them on the output port for further processing by downstream operators. | ||
The operator extends the AbstractFileInputOperator. It overrides the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backquotes around literal class name (several places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
docs/operators/ftpInputOperator.md
Outdated
### Ports | ||
Because this is an input operator, there are no input ports. | ||
|
||
- output: output port on which data is emitted. A downstream operator receives tuples from the FTPInputOpertor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the first "output" bold since it is a sort of mini-title.
The second sentence is not especially useful and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
docs/operators/ftpInputOperator.md
Outdated
2. **source**: the directory path from where to scan and read files. | ||
3. **username**: the username for authenticating against the FTP server. This is | ||
an optional property and can be skipped when anonymous FTP is enabled | ||
4. **password**: the password to be used in conjunction with the above username. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show examples of setting these properties both in code as well as in properties files (but please test your examples with an actual running application before adding them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
b2f6ab7
to
53ae2c6
Compare
@amberarrow please go through the review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For operator documentation, I recommend formatting the page to have it consistent with other docs. E.g adding port and properties information in a table, code block for static and dynamic partitioning, clearly specifying operator related info like version number, package names etc. You may refer to https://github.com/apache/apex-malhar/blob/master/docs/operators/csvParserOperator.md
import com.datatorrent.api.StreamingApplication; | ||
import com.datatorrent.api.annotation.ApplicationAnnotation; | ||
import com.datatorrent.lib.io.AbstractFTPInputOperator.FTPStringInputOperator; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add README.md for the example app similar to other example apps
53ae2c6
to
1e916e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments. Rest all looks good to me. Make sure commits are sqashed and jenkins checks are passing
|
||
#### Dynamic Partitioning | ||
Dynamic partitioning -- changing the number of partitions of one or more operators | ||
in a running application -- can be achieved in multiple ways: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a code block of how to use ThroughputBased / LatencyBased Partitioner as an example in addition to the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done via the setPartitionCount method which is a part of the super class AbstractFileInputOperator.
docs/operators/ftpInputOperator.md
Outdated
count, set the `repartitionRequired` flag in the response. | ||
|
||
### Configuration | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may move this section above. Have it just after Ports section
FTPStringInputOperator reader = dag.addOperator("Reader", new FTPStringInputOperator()); | ||
//Set properties for the FTP input operator | ||
reader.setHost("localhost"); | ||
reader.setUserName("ftp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to set these through property files instead of in Application.java ( they way it has been mentioned in the README.md )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done in response to earlier comment on this PR for setting properties in the example application itself. Currently properties are being set in the code as well as properties.xml
// writer that writes strings to a file on hdfs | ||
StringFileOutputOperator writer = dag.addOperator("Writer", new StringFileOutputOperator()); | ||
//Set properties for the output operator | ||
writer.setFilePath("malhar_examples/ftp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set these through property files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done in response to earlier comment on this PR for setting properties in the example application itself. Currently properties are being set in the code as well as properties.xml
1e916e5
to
fab1422
Compare
@francisf Looks good to me. Please make sure Travis build checks are passed. Will merge it in a day if there are no other comments from the community |
fab1422
to
c3f86f2
Compare
@amberarrow @chaithu14 please review.