New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-6670: Add TextLineReader to read lines of text as single-field records #3735
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working through it, but have some questions/ideas for you consideration.
|
||
@Override | ||
protected List<PropertyDescriptor> getSupportedPropertyDescriptors() { | ||
final List<PropertyDescriptor> properties = new ArrayList<>(super.getSupportedPropertyDescriptors()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bit more efficient to create this all at once with a static constructor such as
public static final List<PropertyDescriptor> PROPS = Collections.unmodifiableList(Arrays.asList(.....));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattyb149 can you address this? or reply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done differently by different processors, I copied this part of the code from some other RecordReader.
this.linesLeftToSkip = skipLineCount; | ||
this.linesPerGroup = linesPerGroup; | ||
this.ignoreEmptyLines = ignoreEmptyLines; | ||
final List<RecordField> fieldList = Collections.singletonList(new RecordField(fieldName, RecordFieldType.STRING.getDataType())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a default, that's a good one, but there should be some connection to a schema registry so users can do things like specify numeric inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit here is that you don't need to provide a schema. Otherwise I would think a GrokReader would suffice?
This seems like the kind of thing that would be, or could be used as the base class for other "line by line readers". If this existed before grok reader for example, wouldn't thought go to extending this to implement that? How easy do you think it would be to inherit from this reader? |
This one isn't currently designed to be subclassed (none of the actual component implementations are), but if there is some reusable code (a base class for example) we could consider putting it in the API. @ottobackwards @MikeThomsen I've updated the PR, rebased against main, and added the additional feature of allowing the entire content to be treated as a record (by setting Lines Per Record to zero), any chance you could review again? Thanks in advance! |
final List<RecordField> fieldList = Collections.singletonList(new RecordField(fieldName, RecordFieldType.STRING.getDataType())); | ||
schema = new SimpleRecordSchema(fieldList); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there is no requirement for java doc, but this function is pretty important, and the most likely place for maintainability.
Could you add an overview/summary of the goals, approach, logic and concerns?
It will keep folks like me from messing it up
|
||
protected static final PropertyDescriptor IGNORE_EMPTY_LINES = new PropertyDescriptor.Builder() | ||
.name("linereader-ignore-empty-lines") | ||
.displayName("Ignore Empty Lines") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this. Can you reword and clarify how this relates to other properties?
@@ -0,0 +1,7 @@ | |||
Header line 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to test multiple sequential empty lines instead of just a single?
Closing my PR for now due to inactivity and the fact that you can use a GrokReader with Schema Access Strategy set to "Use String Fields from Grok Expression" (the default) and a Grok Expression of
|
Thank you for submitting a contribution to Apache NiFi.
Please provide a short description of the PR here:
Description of PR
Enables X functionality; fixes bug NIFI-YYYY.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically
master
)?Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not
squash
or use--force
when pushing to allow for clean monitoring of changes.For code changes:
mvn -Pcontrib-check clean install
at the rootnifi
folder?LICENSE
file, including the mainLICENSE
file undernifi-assembly
?NOTICE
file, including the mainNOTICE
file found undernifi-assembly
?.displayName
in addition to .name (programmatic access) for each of the new properties?For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.