CSV: Use placeholders when header and data length differs #2555

yuferpegom · 2021-01-14T19:14:33Z

This PR adds a couple of function that basically allows the user to include the empty fields (using some placeholders) in cases where there are more headers than data (or vice-versa) in the csv being transformed to a Map.

Examples:

When there are more data than headers

eins,zwei
11,12,13

maps to

Map("eins" -> "11", "zwei" -> "12", "Missing header" -> "13")

The "Missing header" value is the default one, the user has the option to pass a custom value for this placeholder.

When there are more headers than data

eins,zwei,drei,vier,fünt
11,12,13

maps to

Map("eins" -> "11", "zwei" -> "12", "dreir" -> "13", "vier" -> "", "fünt" -> "")

This would be helpful, especially in the second case, when I want to keep the headers even when I don't have values associated with them.

… headers using place holders when the length of each other differs - Adds test - Fixes the process function - Undoes some changes that shouldn't have been done - Renames combiner function - Fixes javadoc Fixes java doc

ennru · 2021-01-15T13:58:02Z

Thank you for this suggestion.

What should happen if there is more than one extra data column?

Fixes akka#2555

yuferpegom · 2021-01-20T15:52:48Z

It should just add another header. If the user set a custom one it will add that one otherwise it will use the one configured by default:

Default placeholder:

eins,zwei
11,12,13,14

maps to

Map("eins" -> "11", "zwei" -> "12", "Missing header" -> "13" ,  "Missing Header" -> "14")

Custom placeholder

eins,zwei
11,12,13,14

maps to

Map("eins" -> "11", "zwei" -> "12", "custom" -> "13" ,  "custom" -> "14")

I think that this way is easier to understand what happened to the data (this is more helpful from the developer's point of view).

I also think that the more valuable use case for this change is when there are more headers than data as it is possible that the user wants to keep the data even when he might have forgotten to add a couple of commas on its input csv.

ennru · 2021-01-20T16:16:33Z

The Map won't be able to hold multiple values with the same key.

yuferpegom · 2021-01-21T20:41:11Z

Your right. So, It think that it can be bypassed by adding some character to the placeholder, like a number. Something like

Default placeholder:

eins,zwei
11,12,13,14

maps to

Map("eins" -> "11", "zwei" -> "12", "Missing header" -> "13" ,  "Missing Header_1" -> "14")

What do you think? Any idea is also welcome, thanks!

seglo · 2021-01-26T21:42:37Z

Your right. So, It think that it can be bypassed by adding some character to the placeholder, like a number.

Sounds reasonable to me. A 0-based index appended to the default missing key. Using your example:

CsvToMap.toMapCombineAll(
  headerDefault = "MissingHeader"
)

would return

Map("eins" -> "11", "zwei" -> "12", "MissingHeader0" -> "13" ,  "MissingHeader1" -> "14")

I would also suggest supporting a default value for missing values too.

CsvToMap.toMapCombineAll(
  valueDefault = "(missing)"
)

would return

Map("eins" -> "11", "zwei" -> "12", "dreir" -> "13", "vier" -> "(missing)", "fünt" -> "(missing)")

You'll need to support a javadsl as well.

seglo · 2021-03-10T21:32:43Z

@yuferpegom If you can follow up on this PR soon it can be included in the Alpakka 3.0.0-M1 release soon.

yuferpegom · 2021-03-10T21:45:03Z

Oh I will, thanks

lightbend-cla-validator · 2021-03-11T16:26:20Z

Hi @yupegom,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

https://www.lightbend.com/contribute/cla

… the input - Adds support to javadsl

…an headers

- Updates javadoc

seglo

I think this is almost there. Just a few small things.

seglo · 2021-04-28T17:46:10Z

csv/src/test/scala/docs/scaladsl/CsvToMapSpec.scala

+
+      // #header-line
+      val future =
+        // format: off


I don't see a reason why exceptions need to be made for all this formatting. Can you elaborate?

This allows keeping the special indentation to represent the columns and rows being passed as params.

BTW, I'm just following what was already done before in the spec.

seglo · 2021-04-28T17:51:23Z

csv/src/main/java/akka/stream/alpakka/csv/javadsl/CsvToMap.java

+   * A flow translating incoming [[scala.List]] of [[akka.util.ByteString]] to a map of String and
+   * ByteString using the stream's first element's values as keys. If the header values are shorter
+   * than the data (or vice-versa) placeholder elements are used to extend the shorter collection to
+   * the length of the longer.


Maybe a copy/paste error? The types don't match. For this API they should use Java types and Javadoc conventions to link to those types (see other docs in this class).

seglo · 2021-04-28T17:51:38Z

csv/src/main/java/akka/stream/alpakka/csv/javadsl/CsvToMap.java

+   * A flow translating incoming [[scala.List]] of [[akka.util.ByteString]] to a map of String keys
+   * and values using the stream's first element's values as keys. If the header values are shorter
+   * than the data (or vice-versa) placeholder elements are used to extend the shorter collection to
+   * the length of the longer.


seglo · 2021-04-28T17:51:56Z

csv/src/main/java/akka/stream/alpakka/csv/javadsl/CsvToMap.java

+   * than the data (or vice-versa) placeholder elements are used to extend the shorter collection to
+   * the length of the longer.
+   *
+   * @param charset the charset to decode [[akka.util.ByteString]] to [[scala.Predef.String]],


- Fixes javadocs

yuferpegom · 2021-11-22T20:51:15Z

@seglo I have addressed your last comments, please take a look when you have a chance and than you!

ennru

LGTM.

probot-autolabeler bot added the p:csv label Jan 14, 2021

yuferpegom changed the title ~~CSV: Use placeholders when header and data length is differet~~ CSV: Use placeholders when header and data length differs Jan 14, 2021

markarasev pushed a commit to markarasev/alpakka that referenced this pull request Jan 17, 2021

BigQuery: fix compilation

976ab98

Fixes akka#2555

yuferpegom added 3 commits March 11, 2021 13:50

- Adds index to the headers when there are more data than headers in…

e68b43d

… the input - Adds support to javadsl

- Allows the user to add a default value when there are more data th…

5bea49e

…an headers

- Renames variable

dbd9495

- Updates javadoc

yuferpegom force-pushed the zip-all-when-header-data-length-differs branch from 02abc24 to dbd9495 Compare March 11, 2021 18:52

seglo reviewed Apr 28, 2021

View reviewed changes

seglo added this to the 3.0.1 milestone May 12, 2021

ennru removed this from the 3.0.1 milestone May 29, 2021

- Fixes some identation issues

f3bb22e

- Fixes javadocs

ennru approved these changes Nov 23, 2021

View reviewed changes

ennru merged commit 2147c88 into akka:master Nov 23, 2021

ennru added this to the 3.0.4 milestone Nov 23, 2021

yuferpegom deleted the zip-all-when-header-data-length-differs branch April 19, 2022 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV: Use placeholders when header and data length differs #2555

CSV: Use placeholders when header and data length differs #2555

yuferpegom commented Jan 14, 2021 •

edited

ennru commented Jan 15, 2021

yuferpegom commented Jan 20, 2021

ennru commented Jan 20, 2021

yuferpegom commented Jan 21, 2021

seglo commented Jan 26, 2021 •

edited

seglo commented Mar 10, 2021

yuferpegom commented Mar 10, 2021

lightbend-cla-validator commented Mar 11, 2021

seglo left a comment

seglo Apr 28, 2021

yuferpegom Nov 22, 2021

seglo Apr 28, 2021

seglo Apr 28, 2021

seglo Apr 28, 2021

yuferpegom commented Nov 22, 2021

ennru left a comment

CSV: Use placeholders when header and data length differs #2555

CSV: Use placeholders when header and data length differs #2555

Conversation

yuferpegom commented Jan 14, 2021 • edited

ennru commented Jan 15, 2021

yuferpegom commented Jan 20, 2021

ennru commented Jan 20, 2021

yuferpegom commented Jan 21, 2021

seglo commented Jan 26, 2021 • edited

seglo commented Mar 10, 2021

yuferpegom commented Mar 10, 2021

lightbend-cla-validator commented Mar 11, 2021

seglo left a comment

Choose a reason for hiding this comment

seglo Apr 28, 2021

Choose a reason for hiding this comment

yuferpegom Nov 22, 2021

Choose a reason for hiding this comment

seglo Apr 28, 2021

Choose a reason for hiding this comment

seglo Apr 28, 2021

Choose a reason for hiding this comment

seglo Apr 28, 2021

Choose a reason for hiding this comment

yuferpegom commented Nov 22, 2021

ennru left a comment

Choose a reason for hiding this comment

yuferpegom commented Jan 14, 2021 •

edited

seglo commented Jan 26, 2021 •

edited