-
Notifications
You must be signed in to change notification settings - Fork 152
APEXMALHAR-2116 Added FS record reader operator, module, test #326
APEXMALHAR-2116 Added FS record reader operator, module, test #326
Conversation
@amberarrow Could you please review this? |
Ok, I'll take a look |
import com.datatorrent.lib.io.block.ReaderContext; | ||
|
||
/** | ||
* This operator can be used for reading records/tuples from Filesystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and several others have trailing whitespace; this is generally considered not a good thing -- see
for instance: http://programmers.stackexchange.com/questions/121555/why-is-trailing-whitespace-a-big-deal
Suggest removing all trailing white space.
@yogidevendra Is this being created for any specific use case? Can you please share that? |
@ChandniSingh Usecases, in this context do not intend to retain the original sequence of records at sources. They can consider each tuple to be independent and just focus on how to process them at scale. In this case, this module will be connected to "byte[] to POJO converter" (Say delimited parser) and then further to dedup or enrichment processing operator. Finally, output of dedup will be connected to some output operator. |
aaeb7fe
to
dd31bbe
Compare
Incorporated review comments. |
* 4. recursive: if scan recursively input directories<br/> | ||
* 5. blockSize: block size used to read input blocks of file<br/> | ||
* 6. readersCount: count of readers to read input file<br/> | ||
* 7. sequencialFileRead: If emit file blocks in sequence?<br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequencial => sequential
Still looking at this -- probably take a couple of days more. |
dd31bbe
to
75ef1d5
Compare
* Code for enabling BeanUtils to accept comma separated string to | ||
* initialize FIELD_TYPE[] | ||
*/ | ||
class RecordReaderModeConverter extends AbstractConverter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide more detail on why this is necessary and how it is used ?
75ef1d5
to
c4a6d55
Compare
c4a6d55
to
e5dba39
Compare
@amberarrow Removed the unnecessary code for BeanUtils convertor. But, facing some issues with travis build. Build failures seems to be non-deterministic. (Different unit tests failed at multiple attempts). Please let me know if you have any other suggestion regarding above code changes. I will update the PR for the same. We should keep this PR on hold till travis build is stable. |
@yogidevendra OK, rest looks good, though I'd like to test it with a small sample application on the cluster; do you have such a sample ? It would also be a good addition to the examples collection. |
Let me know the location to add sample app. I will open separate PR for sample app in the respective repo. |
https://github.com/DataTorrent/examples under tutorials |
* The module reads data in parallel, following parameters can be configured | ||
* <br/> | ||
* 1. files: list of file(s)/directories to read<br/> | ||
* 2. filePatternRegularExp: Files names matching given regex will be read<br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files names => Files with names
*/ | ||
public void setFiles(String files) | ||
{ | ||
this.files = files; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the field be named "directories" if we expect it to be a list of directories ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to keep it consistent with FSInputModule.java. If you feel that readability/intuitive names are more important than having consistency across operators then I am fine with changing this field name.
e5dba39
to
506ab58
Compare
/** | ||
* Length for fixed width record | ||
*/ | ||
private int recordLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either have default value or mark Min constraint to make sure it is set by user.
506ab58
to
5e955e4
Compare
@DT-Priyanka Incorporated changes based on your review feedback. |
@amberarrow Could you please merge this if there are no more comments. |
@DT-Priyanka Could you please merge this if there are no more comments? |
* Length for fixed width record | ||
*/ | ||
@Min(1) | ||
private int recordLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Record length will be optional in case of DELIMITED_RECORD and is compulsory in case of FIXED_LENGHT records. so keeping it either optional or compulsory field both have pros and cons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Good point.
Should we separate this into two different classes. FixedWidthRecordReader, DelimitedRecordReader? That will make configuration clean.
@amberarrow Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding another class seems like overkill; how about removing the annotation and adding a check in the code to ensure that the value is positive if mode is FIXED_WIDTH_RECORD ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amberarrow As per your suggestion, removed @ min annotation for record length.
Added validation in the code to ensure that the value is positive if FIXED_WIDTH_RECORD.
Also, added test case for validating this.
Could you have a look and merge this if it looks OK.
5e955e4
to
ba8bbbe
Compare
There are a couple of minor issues:
|
2. javadoc improvements. 3. Adding default values 4. Incorporating review comments.
ba8bbbe
to
02d657c
Compare
@amberarrow Updated as per the feedback. |
No description provided.