Skip to content

Commit

Permalink
Rework
Browse files Browse the repository at this point in the history
  • Loading branch information
rschu1ze committed Sep 19, 2023
1 parent b583b80 commit 774c4b5
Show file tree
Hide file tree
Showing 13 changed files with 444 additions and 657 deletions.
11 changes: 5 additions & 6 deletions docs/en/operations/settings/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -4067,17 +4067,16 @@ Result:
└─────┴─────┴───────┘
```

## splitby_max_substring_behavior {#splitby-max-substring-behavior}
## splitby_max_substrings_includes_remaining_string {#splitby_max_substrings_includes_remaining_string}

Controls how functions [splitBy*()](../../sql-reference/functions/splitting-merging-functions.md) with given `max_substring` argument behave.
Controls whether function [splitBy*()](../../sql-reference/functions/splitting-merging-functions.md) with argument `max_substrings` > 0 will include the remaining string in the last element of the result array.

Possible values:

- `''` - If `max_substring` >=1, return the first `max_substring`-many splits.
- `'python'` - If `max_substring` >= 0, split `max_substring`-many times, and return `max_substring + 1` elements where the last element contains the remaining string.
- `'spark'` - If `max_substring` >= 1, split `max_substring`-many times, and return `max_substring + 1` elements where the last element contains the remaining string.
- `0` - The remaining string will not be included in the last element of the result array.
- `1` - The remaining string will be included in the last element of the result array. This is the behavior of Spark's [`split()`](https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.split.html) function and Python's ['string.split()'](https://docs.python.org/3/library/stdtypes.html#str.split) method.

Default value: ``.
Default value: `0`

## enable_extended_results_for_datetime_functions {#enable-extended-results-for-datetime-functions}

Expand Down
16 changes: 9 additions & 7 deletions docs/en/sql-reference/functions/splitting-merging-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ splitByChar(separator, s[, max_substrings]))

- `separator` — The separator which should contain exactly one character. [String](../../sql-reference/data-types/string.md).
- `s` — The string to split. [String](../../sql-reference/data-types/string.md).
- `max_substrings` — An optional `Int64` defaulting to 0. When `max_substrings` > 0, the returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible.
- `max_substrings` — An optional `Int64` defaulting to 0. If `max_substrings` > 0, the returned array will contain at most `max_substrings` substrings, otherwise the function will return as many substrings as possible.

**Returned value(s)**

Expand All @@ -39,7 +39,9 @@ For example,
- in v22.10: `SELECT splitByChar('=', 'a=b=c=d', 2); -- ['a','b','c=d']`
- in v22.11: `SELECT splitByChar('=', 'a=b=c=d', 2); -- ['a','b']`

The previous behavior can be restored by setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) = 'python'.
A behavior similar to ClickHouse pre-v22.11 can be achieved by setting
[splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string)
`SELECT splitByChar('=', 'a=b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1 -- ['a', 'b=c=d']`
:::

**Example**
Expand Down Expand Up @@ -82,7 +84,7 @@ Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-refere
- There are multiple consecutive non-empty separators;
- The original string `s` is empty while the separator is not empty.

Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0.
Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.

**Example**

Expand Down Expand Up @@ -137,7 +139,7 @@ Returns an array of selected substrings. Empty substrings may be selected when:

Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).

Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0.
Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.

**Example**

Expand Down Expand Up @@ -188,7 +190,7 @@ Returns an array of selected substrings.

Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).

Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0.
Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.

**Example**

Expand Down Expand Up @@ -227,7 +229,7 @@ Returns an array of selected substrings.

Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).

Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0.
Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.

**Example**

Expand Down Expand Up @@ -289,7 +291,7 @@ Returns an array of selected substrings.

Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).

Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0.
Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.

**Example**

Expand Down
2 changes: 1 addition & 1 deletion src/Core/Settings.h
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,7 @@ class IColumn;
M(Bool, reject_expensive_hyperscan_regexps, true, "Reject patterns which will likely be expensive to evaluate with hyperscan (due to NFA state explosion)", 0) \
M(Bool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.", 0) \
M(Bool, allow_introspection_functions, false, "Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.", 0) \
M(String, splitby_max_substring_behavior, "", "Control the behavior of the 'max_substring' argument in functions splitBy*(): '' (default), 'python' or 'spark'", 0) \
M(Bool, splitby_max_substrings_includes_remaining_string, false, "Functions 'splitBy*()' with 'max_substrings' argument > 0 include the remaining string as last element in the result", 0) \
\
M(Bool, allow_execute_multiif_columnar, true, "Allow execute multiIf function columnar", 0) \
M(Bool, formatdatetime_f_prints_single_zero, false, "Formatter '%f' in function 'formatDateTime()' produces a single zero instead of six zeros if the formatted value has no fractional seconds.", 0) \
Expand Down
22 changes: 3 additions & 19 deletions src/Functions/FunctionsStringArray.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ std::optional<Int64> extractMaxSplitsImpl(const ColumnWithTypeAndName & argument
return static_cast<Int64>(value);
}

std::optional<size_t> extractMaxSplits(const ColumnsWithTypeAndName & arguments, size_t max_substrings_argument_position, MaxSubstringBehavior max_substring_behavior)
std::optional<size_t> extractMaxSplits(const ColumnsWithTypeAndName & arguments, size_t max_substrings_argument_position)
{
if (max_substrings_argument_position >= arguments.size())
return std::nullopt;
Expand All @@ -35,24 +35,8 @@ std::optional<size_t> extractMaxSplits(const ColumnsWithTypeAndName & arguments,
arguments[max_substrings_argument_position].column->getName(),
max_substrings_argument_position + 1);

if (max_splits)
switch (max_substring_behavior)
{
case MaxSubstringBehavior::LikeClickHouse:
case MaxSubstringBehavior::LikeSpark:
{
if (*max_splits <= 0)
return std::nullopt;
break;
}
case MaxSubstringBehavior::LikePython:
{
if (*max_splits < 0)
return std::nullopt;
break;
}
}

if (*max_splits <= 0)
return std::nullopt;

return max_splits;
}
Expand Down
Loading

0 comments on commit 774c4b5

Please sign in to comment.