Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for joining array contents into a string #5028

Closed
rmoff opened this issue Apr 8, 2020 · 6 comments
Closed

Add support for joining array contents into a string #5028

rmoff opened this issue Apr 8, 2020 · 6 comments
Labels
enhancement user-defined-functions Tickets about UDF, UDAF, UDTF

Comments

@rmoff
Copy link
Contributor

rmoff commented Apr 8, 2020

I'd like to see a function that will take an array and just flatten it.

e.g. ['a','b','c'] becomes a,b,c

My use case is simplifying JDBC sink which doesn't support writing arrays to target DBs (even if they support it e.g. postgres)

@MichaelDrogalis MichaelDrogalis added the user-defined-functions Tickets about UDF, UDAF, UDTF label Apr 8, 2020
@hpgrahsl
Copy link
Contributor

@rmoff based on your simple, single example above I would assume you expect the following for such a UDF - let's call it ARRAY_JOINER for now:

  1. provide several overloaded variants for all different primitive data types i.e. BOOLEAN,INTEGER,BIGINT,DOUBLE,VARCHAR
  2. always returns a single value of type VARCHAR composed of the string representation of each value contained in the array
  3. offers a function parameter which allows to specify a custom separator character e.g. , or |...

If so, I assume this to be rather straigt-forward. Now the fun part:

  • What output would you expect for array elements of non-primitive types i.e. maps and structs?
  • Should this function be applicable for nested data structures e.g. arrays of arrays of... likewise what about deeply nested structs?

Your input as reporter is highly appreciated here @rmoff ;-)

@rmoff
Copy link
Contributor Author

rmoff commented May 11, 2020

TBH my motivation for this was just to get "something" that worked based on the original use case.

It'd probably be a good idea to look at existing solutions for this elsewhere (e.g. postgres) to understand common patterns.

On your three assumptions, I would agree with those. On the two fun questions, I have no opinion/answer :)

/cc @blueedgenick @derekjn who maybe have some good references / pointers on the direction for this to take

@derekjn
Copy link
Contributor

derekjn commented May 11, 2020

@hpgrahsl these are excellent questions. I see two approaches to such an array join builtin:

  1. We make it strict, requiring all input expressions to be strings. Users would be required to explicitly cast/stringify each array element.
  2. We make it non-strict, in which case the equivalent of CAST(element AS STRING) would be implicitly applied to each array element.

I believe both of these approaches should work on both primitive and structured types.

@hpgrahsl
Copy link
Contributor

hpgrahsl commented May 16, 2020

@derekjn thx for sharing your thoughts.

for what you call "strict": if I understood you correctly it would mean that users have to find a way to cast each of the elements first which is probably not so nice since you can have any number of elements inside the array. what would work is to provide an array helper function that does the CAST/STRINGIFY on all elements so that its output can be passed into the JOINER function. but then again it is an additional step for users so the "non-strict" approach where this happens implicitly is a much better choice IMHO. WDYT?

also if that's fine for you I would start working on this UDF to JOIN array elements to a string representation.

@derekjn
Copy link
Contributor

derekjn commented May 18, 2020

for what you call "strict": if I understood you correctly it would mean that users have to find a way to cast each of the elements first which is probably not so nice since you can have any number of elements inside the array

That understanding is correct. There are also array literals to consider:

SELECT ARRAY_JOIN('<delimiter>', ARRAY['one', 'two', 'three']) ...

But I agree that the non-strict approach would be a preferable UX here.

stevenpyzhang added a commit that referenced this issue Jun 17, 2020
Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com>
JimGalasyn added a commit that referenced this issue Jun 25, 2020
* feat: implements ARRAY_JOIN as requested in (#5028) (#5474) (#5638)

Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com>

* feat: new split_to_map udf (#5563)

New UDF split_to_map(input, entryDelimiter, kvDelimiter) to build a map from a string.

Useful for taking messages from upstream systems and converting them into a more structured and usable format.

* feat: add CHR UDF (#5559)

A new UDF, CHR, to turn a number representing a unicode codepoint into a single-character string. Very useful for dealing with non-printable characters (tab, CR, LF, ...) in strings or those characters not easily represented in your local codepage.

Co-authored-by: Steven Zhang <35498506+stevenpyzhang@users.noreply.github.com>
Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com>
Co-authored-by: Nick Dearden <blueedgenick@users.noreply.github.com>
@vcrfxia
Copy link
Contributor

vcrfxia commented May 27, 2021

This functionality is now supported by the ARRAY_JOIN method, added by @hpgrahsl . Closing this ticket as done. Thanks!

@vcrfxia vcrfxia closed this as completed May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement user-defined-functions Tickets about UDF, UDAF, UDTF
Projects
None yet
Development

No branches or pull requests

5 participants