Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new split_to_map udf #5563

Merged
merged 5 commits into from
Jun 11, 2020

Conversation

blueedgenick
Copy link
Contributor

Description

New UDF split_to_map(input, entryDelimiter, kvDelimiter) to build a map from a string.

Useful for taking messages from upstream systems and converting them into a more structured and usable format.

Testing done

New Unit & QTT tests.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@blueedgenick blueedgenick requested review from JimGalasyn and a team as code owners June 6, 2020 22:11
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two quick comments, i'll take a full look later

+ "'kvDelimiter'. If the same key is present multiple times in the input, the latest "
+ "value for that key is returned. Returns NULL f the input text or either of the "
+ "delimiters is NULL.")
public class SplitToMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to have EncodeMap(map, entry_delim, kv_delim) which encodes a map into a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add it to my udf backlog ;) - although i'm less concerned about that from a use-case perspective as you can always construct almost-arbitrary json output by using structs/maps/arrays if you need that for a downstream system. The primary motivator for this one is when you get, for example, some encoded message from a mainframe MQ system that needs to be parsed out this way

```

Splits a string into key-value pairs and creates a map from them. The
'entryDelimiter' splits the string into key-value pairs which are then split by 'kvDelimiter'. If the same key is present multiple times in the input, the latest value for that key is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'entryDelimiter' splits the string into key-value pairs which are then split by 'kvDelimiter'. If the same key is present multiple times in the input, the latest value for that key is returned.
`entryDelimiter` splits the string into key-value pairs which are then split by `kvDelimiter`. If the same key is present multiple times in the input, the latest value for that key is returned.

Splits a string into key-value pairs and creates a map from them. The
'entryDelimiter' splits the string into key-value pairs which are then split by 'kvDelimiter'. If the same key is present multiple times in the input, the latest value for that key is returned.

Returns NULL f the input text is NULL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns NULL f the input text is NULL.
Returns NULL if the input text is NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arghhh! thanks Jim :)

Copy link
Member

@JimGalasyn JimGalasyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a couple of suggestions.

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @blueedgenick

import java.util.Map;
import org.junit.Test;

public class SplitToMapTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test with whitespace? what do we want the behavior to be when there is whitespace (e.g. foo := bar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, test added!

@blueedgenick blueedgenick requested a review from agavra June 11, 2020 16:59
@blueedgenick blueedgenick merged commit 7e9e4d1 into confluentinc:master Jun 11, 2020
agavra pushed a commit that referenced this pull request Jun 11, 2020
New UDF split_to_map(input, entryDelimiter, kvDelimiter) to build a map from a string.

Useful for taking messages from upstream systems and converting them into a more structured and usable format.
@blueedgenick blueedgenick deleted the split_to_map_udf branch June 11, 2020 22:46
JimGalasyn added a commit that referenced this pull request Jun 25, 2020
* feat: implements ARRAY_JOIN as requested in (#5028) (#5474) (#5638)

Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com>

* feat: new split_to_map udf (#5563)

New UDF split_to_map(input, entryDelimiter, kvDelimiter) to build a map from a string.

Useful for taking messages from upstream systems and converting them into a more structured and usable format.

* feat: add CHR UDF (#5559)

A new UDF, CHR, to turn a number representing a unicode codepoint into a single-character string. Very useful for dealing with non-printable characters (tab, CR, LF, ...) in strings or those characters not easily represented in your local codepage.

Co-authored-by: Steven Zhang <35498506+stevenpyzhang@users.noreply.github.com>
Co-authored-by: Hans-Peter Grahsl <hpgrahsl@users.noreply.github.com>
Co-authored-by: Nick Dearden <blueedgenick@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants