-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV: Use placeholders when header and data length differs #2555
CSV: Use placeholders when header and data length differs #2555
Conversation
… headers using place holders when the length of each other differs - Adds test - Fixes the process function - Undoes some changes that shouldn't have been done - Renames combiner function - Fixes javadoc Fixes java doc
Thank you for this suggestion. What should happen if there is more than one extra data column? |
It should just add another header. If the user set a custom one it will add that one otherwise it will use the one configured by default:
maps to Map("eins" -> "11", "zwei" -> "12", "Missing header" -> "13" , "Missing Header" -> "14")
maps to Map("eins" -> "11", "zwei" -> "12", "custom" -> "13" , "custom" -> "14") I think that this way is easier to understand what happened to the data (this is more helpful from the developer's point of view). I also think that the more valuable use case for this change is when there are more headers than data as it is possible that the user wants to keep the data even when he might have forgotten to add a couple of commas on its input csv. |
The Map won't be able to hold multiple values with the same key. |
Your right. So, It think that it can be bypassed by adding some character to the placeholder, like a number. Something like
maps to Map("eins" -> "11", "zwei" -> "12", "Missing header" -> "13" , "Missing Header_1" -> "14") What do you think? Any idea is also welcome, thanks! |
Sounds reasonable to me. A 0-based index appended to the default missing key. Using your example:
would return
I would also suggest supporting a default value for missing values too.
would return
You'll need to support a javadsl as well. |
@yuferpegom If you can follow up on this PR soon it can be included in the Alpakka |
Oh I will, thanks |
Hi @yupegom, Thank you for your contribution! We really value the time you've taken to put this together. Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement: |
… the input - Adds support to javadsl
- Updates javadoc
02abc24
to
dbd9495
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is almost there. Just a few small things.
|
||
// #header-line | ||
val future = | ||
// format: off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason why exceptions need to be made for all this formatting. Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows keeping the special indentation to represent the columns and rows being passed as params.
BTW, I'm just following what was already done before in the spec.
* A flow translating incoming [[scala.List]] of [[akka.util.ByteString]] to a map of String and | ||
* ByteString using the stream's first element's values as keys. If the header values are shorter | ||
* than the data (or vice-versa) placeholder elements are used to extend the shorter collection to | ||
* the length of the longer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a copy/paste error? The types don't match. For this API they should use Java types and Javadoc conventions to link to those types (see other docs in this class).
* A flow translating incoming [[scala.List]] of [[akka.util.ByteString]] to a map of String keys | ||
* and values using the stream's first element's values as keys. If the header values are shorter | ||
* than the data (or vice-versa) placeholder elements are used to extend the shorter collection to | ||
* the length of the longer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadocs
* than the data (or vice-versa) placeholder elements are used to extend the shorter collection to | ||
* the length of the longer. | ||
* | ||
* @param charset the charset to decode [[akka.util.ByteString]] to [[scala.Predef.String]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadocs
- Fixes javadocs
@seglo I have addressed your last comments, please take a look when you have a chance and than you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR adds a couple of function that basically allows the user to include the empty fields (using some placeholders) in cases where there are more headers than data (or vice-versa) in the csv being transformed to a Map.
Examples:
maps to
The "Missing header" value is the default one, the user has the option to pass a custom value for this placeholder.
maps to
This would be helpful, especially in the second case, when I want to keep the headers even when I don't have values associated with them.