Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#64855 Add filterbeforeconcat option to Concat #141

Merged
merged 1 commit into from
Dec 7, 2020

Conversation

TheConstructor
Copy link
Contributor

I propose this to solve https://bz.apache.org/bugzilla/show_bug.cgi?id=64855

To implement optional filtering before concatenation I moved the end-of-file-linebreak-fix into it's own LastLineFixingReader so that MultiReader.nextReader() can apply line-break-fixing and/or filtering on demand.

@TheConstructor
Copy link
Contributor Author

Is there anything I can/need to do?

@bodewig
Copy link
Member

bodewig commented Nov 30, 2020

@TheConstructor no, you've done all there is to do, many thanks. The silence is totally our fault, we just seem all to be busy right now. I'll find some time looking into this and #140 this week.

@TheConstructor
Copy link
Contributor Author

@bodewig thank you for the heads-up and for looking into this!

@bodewig
Copy link
Member

bodewig commented Dec 6, 2020

I like the approach of moving the logic for fixing the lat lines to a separate filter. Kudos for adding tests ;-)

A different approach could be to allow an alternative set of filters to be applied before concatenating, i.e. allowing users to define two sets of filters one to apply on each input stream individually and one to be applied to the merged stream. Naming could become an issue (and I've got a long track of picking bad names myself).

I'm not asking you to change your PR, I'd rather like to discuss the different approaches to see which would be best. So what do you think?

@TheConstructor
Copy link
Contributor Author

TheConstructor commented Dec 6, 2020

I briefly thought about that, but thought it would add another layer of complexity. In the end you could wrap a <concat filterbeforeconcat="true"> into another <concat> without filterbeforeconcat, if you want to filter before and after concatenation.

@bodewig
Copy link
Member

bodewig commented Dec 6, 2020

Yes, you are right.

@bodewig bodewig merged commit a29c130 into apache:master Dec 7, 2020
@TheConstructor TheConstructor deleted the input-file-tokenizer branch December 7, 2020 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants