Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#31363 format template configure in settings #59088

Merged
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f174921
added format_schema_rows_template setting
Blargian Jan 18, 2024
eae39ff
#31363 - modified TemplateBlockOutputFormat to work with added format…
Blargian Jan 21, 2024
7b235fe
#31363 - remove schema delimiter setting and add test 00937_format_sc…
Blargian Jan 22, 2024
3832a82
#31363 - update documentation for En and Ru
Blargian Jan 22, 2024
276ccd3
empty commit to restart CI checks
Blargian Jan 23, 2024
e988f8a
fix typo in formats.md
Blargian Jan 24, 2024
6a9e7ab
Update 00937_format_schema_rows_template.sh
Blargian Jan 24, 2024
3889857
Merge branch 'master' into #31363_format_template_configure_in_settings
Felixoid Jan 24, 2024
ad196dd
Update 00937_format_schema_rows_template.sh
Blargian Jan 24, 2024
288d288
fix failing 00937_template_output_format
Blargian Jan 25, 2024
a74c78c
fix failing test 00937_format_schema_rows_template.sh
Blargian Jan 25, 2024
e6844a5
Merge branch 'ClickHouse:master' into #31363_format_template_configur…
Blargian Jan 25, 2024
64d18ad
CI trigger
Blargian Jan 25, 2024
4a8a720
rename of settings, add setting for resultset, extend test, fix docum…
Blargian Jan 29, 2024
e081a4f
Merge branch 'master' into #31363_format_template_configure_in_settings
Blargian Jan 29, 2024
8183074
Update src/Core/SettingsChangesHistory.h
Blargian Jan 29, 2024
c891ed0
update test to use CLICKHOUSE_TEST_UNIQUE_NAME so parallel tests don'…
Blargian Jan 30, 2024
a4ea7c4
2 tests fail, doesnt seem change related - try again
Blargian Jan 31, 2024
58ae4cf
Merge branch 'master' into #31363_format_template_configure_in_settings
Blargian Feb 1, 2024
e1121ea
Merge remote-tracking branch 'upstream/master' into #31363_format_tem…
Blargian Feb 1, 2024
790a2f3
trigger CI again
Blargian Feb 4, 2024
3d161ca
fix failing upgrade check due to identical previous_value and new_value
Blargian Feb 4, 2024
d7d299b
change back SettingsChangesHistory to empty strings - not the reason …
Blargian Feb 4, 2024
8e9344e
Merge branch 'master' into #31363_format_template_configure_in_settings
Blargian Feb 4, 2024
03460ff
Update SettingsChangesHistory.h
Blargian Feb 4, 2024
913c23f
Merge branch 'ClickHouse:master' into #31363_format_template_configur…
Blargian Feb 5, 2024
42628fd
Merge branch 'master' into #31363_format_template_configure_in_settings
Avogar Feb 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 4 additions & 2 deletions docs/en/interfaces/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ This format is also available under the name `TSVRawWithNamesAndNames`.

This format allows specifying a custom format string with placeholders for values with a specified escaping rule.

It uses settings `format_template_resultset`, `format_template_row`, `format_template_rows_between_delimiter` and some settings of other formats (e.g. `output_format_json_quote_64bit_integers` when using `JSON` escaping, see further)
It uses settings `format_template_resultset`, `format_template_row` (`format_template_row_format`), `format_template_rows_between_delimiter` and some settings of other formats (e.g. `output_format_json_quote_64bit_integers` when using `JSON` escaping, see further)

Setting `format_template_row` specifies the path to the file containing format strings for rows with the following syntax:

Expand All @@ -279,9 +279,11 @@ the values of `SearchPhrase`, `c` and `price` columns, which are escaped as `Quo

`Search phrase: 'bathroom interior design', count: 2166, ad price: $3;`

In cases where it is challenging or not possible to deploy format output configuration for the template format to a directory on all nodes in a cluster, or if the format is trivial then `format_template_row_format` can be used to set the template string directly in the query, rather than a path to the file which contains it.

The `format_template_rows_between_delimiter` setting specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default)

Setting `format_template_resultset` specifies the path to the file, which contains a format string for resultset. Format string for resultset has the same syntax as a format string for row and allows to specify a prefix, a suffix and a way to print some additional information. It contains the following placeholders instead of column names:
Setting `format_template_resultset` specifies the path to the file, which contains a format string for resultset. Setting `format_template_resultset_format` can be used to set the template string for the result set directly in the query itself. Format string for resultset has the same syntax as a format string for row and allows to specify a prefix, a suffix and a way to print some additional information. It contains the following placeholders instead of column names:

- `data` is the rows with data in `format_template_row` format, separated by `format_template_rows_between_delimiter`. This placeholder must be the first placeholder in the format string.
- `totals` is the row with total values in `format_template_row` format (when using WITH TOTALS)
Expand Down
8 changes: 8 additions & 0 deletions docs/en/operations/settings/settings-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -1660,6 +1660,10 @@ Result:

Path to file which contains format string for result set (for Template format).

### format_template_resultset_format {#format_template_resultset_format}

Format string for result set (for Template format)

### format_template_row {#format_template_row}

Path to file which contains format string for rows (for Template format).
Expand All @@ -1668,6 +1672,10 @@ Path to file which contains format string for rows (for Template format).

Delimiter between rows (for Template format).

### format_template_row_format {#format_template_row_format}

Format string for rows (for Template format)

## CustomSeparated format settings {custom-separated-format-settings}

### format_custom_escaping_rule {#format_custom_escaping_rule}
Expand Down
6 changes: 4 additions & 2 deletions docs/ru/interfaces/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ SELECT * FROM nestedt FORMAT TSV

Этот формат позволяет указать произвольную форматную строку, в которую подставляются значения, сериализованные выбранным способом.

Для этого используются настройки `format_template_resultset`, `format_template_row`, `format_template_rows_between_delimiter` и настройки экранирования других форматов (например, `output_format_json_quote_64bit_integers` при экранировании как в `JSON`, см. далее)
Для этого используются настройки `format_template_resultset`, `format_template_row` (`format_template_row_format`), `format_template_rows_between_delimiter` и настройки экранирования других форматов (например, `output_format_json_quote_64bit_integers` при экранировании как в `JSON`, см. далее)

Настройка `format_template_row` задаёт путь к файлу, содержащему форматную строку для строк таблицы, которая должна иметь вид:

Expand All @@ -227,9 +227,11 @@ SELECT * FROM nestedt FORMAT TSV

`Search phrase: 'bathroom interior design', count: 2166, ad price: $3;`

В тех случаях, когда не удобно или не возможно указать произвольную форматную строку в файле, можно использовать `format_template_row_format` указать произвольную форматную строку в запросе.

Настройка `format_template_rows_between_delimiter` задаёт разделитель между строками, который выводится (или ожмдается при вводе) после каждой строки, кроме последней. По умолчанию `\n`.

Настройка `format_template_resultset` задаёт путь к файлу, содержащему форматную строку для результата. Форматная строка для результата имеет синтаксис аналогичный форматной строке для строк таблицы и позволяет указать префикс, суффикс и способ вывода дополнительной информации. Вместо имён столбцов в ней указываются следующие имена подстановок:
Настройка `format_template_resultset` задаёт путь к файлу, содержащему форматную строку для результата. Настройка `format_template_resultset_format` используется для установки форматной строки для результата непосредственно в запросе. Форматная строка для результата имеет синтаксис аналогичный форматной строке для строк таблицы и позволяет указать префикс, суффикс и способ вывода дополнительной информации. Вместо имён столбцов в ней указываются следующие имена подстановок:

- `data` - строки с данными в формате `format_template_row`, разделённые `format_template_rows_between_delimiter`. Эта подстановка должна быть первой подстановкой в форматной строке.
- `totals` - строка с тотальными значениями в формате `format_template_row` (при использовании WITH TOTALS)
Expand Down
2 changes: 2 additions & 0 deletions src/Core/Settings.h
Original file line number Diff line number Diff line change
Expand Up @@ -1085,6 +1085,8 @@ class IColumn;
M(String, format_schema, "", "Schema identifier (used by schema-based formats)", 0) \
M(String, format_template_resultset, "", "Path to file which contains format string for result set (for Template format)", 0) \
Blargian marked this conversation as resolved.
Show resolved Hide resolved
M(String, format_template_row, "", "Path to file which contains format string for rows (for Template format)", 0) \
M(String, format_template_row_format, "", "Format string for rows (for Template format)", 0) \
M(String, format_template_resultset_format, "", "Format string for result set (for Template format)", 0) \
M(String, format_template_rows_between_delimiter, "\n", "Delimiter between rows (for Template format)", 0) \
\
M(EscapingRule, format_custom_escaping_rule, "Escaped", "Field escaping rule (for CustomSeparated format)", 0) \
Expand Down
2 changes: 2 additions & 0 deletions src/Core/SettingsChangesHistory.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
{"function_visible_width_behavior", 0, 1, "We changed the default behavior of `visibleWidth` to be more precise"},
{"max_estimated_execution_time", 0, 0, "Separate max_execution_time and max_estimated_execution_time"},
{"iceberg_engine_ignore_schema_evolution", false, false, "Allow to ignore schema evolution in Iceberg table engine"},
{"format_template_row_format", "none", "", "Template row format string can be set directly in query"},
{"format_template_resultset_format", "none", "", "Template result set format string can be set in query"},
Blargian marked this conversation as resolved.
Show resolved Hide resolved
{"optimize_injective_functions_in_group_by", false, true, "Replace injective functions by it's arguments in GROUP BY section in analyzer"},
{"update_insert_deduplication_token_in_dependent_materialized_views", false, false, "Allow to update insert deduplication token with table identifier during insert in dependent materialized views"}}},
{"23.12", {{"allow_suspicious_ttl_expressions", true, false, "It is a new setting, and in previous versions the behavior was equivalent to allowing."},
Expand Down
2 changes: 2 additions & 0 deletions src/Formats/FormatFactory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
format_settings.template_settings.resultset_format = settings.format_template_resultset;
format_settings.template_settings.row_between_delimiter = settings.format_template_rows_between_delimiter;
format_settings.template_settings.row_format = settings.format_template_row;
format_settings.template_settings.row_format_template = settings.format_template_row_format;
format_settings.template_settings.resultset_format_template = settings.format_template_resultset_format;
format_settings.tsv.crlf_end_of_line = settings.output_format_tsv_crlf_end_of_line;
format_settings.tsv.empty_as_default = settings.input_format_tsv_empty_as_default;
format_settings.tsv.enum_as_number = settings.input_format_tsv_enum_as_number;
Expand Down
2 changes: 2 additions & 0 deletions src/Formats/FormatSettings.h
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,8 @@ struct FormatSettings
String resultset_format;
String row_format;
String row_between_delimiter;
String row_format_template;
String resultset_format_template;
} template_settings;

struct
Expand Down
55 changes: 41 additions & 14 deletions src/Processors/Formats/Impl/TemplateBlockOutputFormat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ namespace DB
namespace ErrorCodes
{
extern const int SYNTAX_ERROR;
extern const int INVALID_TEMPLATE_FORMAT;
}

TemplateBlockOutputFormat::TemplateBlockOutputFormat(const Block & header_, WriteBuffer & out_, const FormatSettings & settings_,
Expand Down Expand Up @@ -193,34 +194,60 @@ void registerOutputFormatTemplate(FormatFactory & factory)
const FormatSettings & settings)
{
ParsedTemplateFormatString resultset_format;
auto idx_resultset_by_name = [&](const String & partName)
{
return static_cast<size_t>(TemplateBlockOutputFormat::stringToResultsetPart(partName));
};
if (settings.template_settings.resultset_format.empty())
{
/// Default format string: "${data}"
resultset_format.delimiters.resize(2);
resultset_format.escaping_rules.emplace_back(ParsedTemplateFormatString::EscapingRule::None);
resultset_format.format_idx_to_column_idx.emplace_back(0);
resultset_format.column_names.emplace_back("data");
if (settings.template_settings.resultset_format_template.empty())
{
resultset_format.delimiters.resize(2);
resultset_format.escaping_rules.emplace_back(ParsedTemplateFormatString::EscapingRule::None);
resultset_format.format_idx_to_column_idx.emplace_back(0);
resultset_format.column_names.emplace_back("data");
}
else
{
resultset_format = ParsedTemplateFormatString();
resultset_format.parse(settings.template_settings.resultset_format_template, idx_resultset_by_name);
}
}
else
{
/// Read format string from file
resultset_format = ParsedTemplateFormatString(
FormatSchemaInfo(settings.template_settings.resultset_format, "Template", false,
settings.schema.is_server, settings.schema.format_schema_path),
[&](const String & partName)
{
return static_cast<size_t>(TemplateBlockOutputFormat::stringToResultsetPart(partName));
});
idx_resultset_by_name);
if (!settings.template_settings.resultset_format_template.empty())
{
throw Exception(DB::ErrorCodes::INVALID_TEMPLATE_FORMAT, "Expected either format_template_resultset or format_template_resultset_format, but not both");
}
}

ParsedTemplateFormatString row_format = ParsedTemplateFormatString(
ParsedTemplateFormatString row_format;
auto idx_row_by_name = [&](const String & colName)
{
return sample.getPositionByName(colName);
};
if (settings.template_settings.row_format.empty())
{
row_format = ParsedTemplateFormatString();
row_format.parse(settings.template_settings.row_format_template, idx_row_by_name);
}
else
{
row_format = ParsedTemplateFormatString(
FormatSchemaInfo(settings.template_settings.row_format, "Template", false,
settings.schema.is_server, settings.schema.format_schema_path),
[&](const String & colName)
{
return sample.getPositionByName(colName);
});

idx_row_by_name);
if (!settings.template_settings.row_format_template.empty())
{
throw Exception(DB::ErrorCodes::INVALID_TEMPLATE_FORMAT, "Expected either format_template_row or format_template_row_format, but not both");
}
}
return std::make_shared<TemplateBlockOutputFormat>(sample, buf, settings, resultset_format, row_format, settings.template_settings.row_between_delimiter);
});

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Question: 'How awesome is clickhouse?', Answer: 'unbelievably awesome!', Number of Likes: 456, Date: 2016-01-02;
Question: 'How fast is clickhouse?', Answer: 'Lightning fast!', Number of Likes: 9876543210, Date: 2016-01-03;
Question: 'Is it opensource?', Answer: 'of course it is!', Number of Likes: 789, Date: 2016-01-04

===== Results =====
Question: 'How awesome is clickhouse?', Answer: 'unbelievably awesome!', Number of Likes: 456, Date: 2016-01-02;
Question: 'How fast is clickhouse?', Answer: 'Lightning fast!', Number of Likes: 9876543210, Date: 2016-01-03;
Question: 'Is it opensource?', Answer: 'of course it is!', Number of Likes: 789, Date: 2016-01-04
===================
47 changes: 47 additions & 0 deletions tests/queries/0_stateless/00937_format_schema_rows_template.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/usr/bin/env bash
# shellcheck disable=SC2016

CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh

# Test format_template_row_format setting

$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS template";
$CLICKHOUSE_CLIENT --query="CREATE TABLE template (question String, answer String, likes UInt64, date Date) ENGINE = Memory";
$CLICKHOUSE_CLIENT --query="INSERT INTO template VALUES
('How awesome is clickhouse?', 'unbelievably awesome!', 456, '2016-01-02'),\
('How fast is clickhouse?', 'Lightning fast!', 9876543210, '2016-01-03'),\
('Is it opensource?', 'of course it is!', 789, '2016-01-04')";

$CLICKHOUSE_CLIENT --query="SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
format_template_rows_between_delimiter = ';\n'";

echo -e "\n"

# Test that if both format_template_row_format setting and format_template_row are provided, error is thrown
echo -ne 'Question: ${question:Quoted}, Answer: ${answer:Quoted}, Number of Likes: ${likes:Raw}, Date: ${date:Raw}' > "$CURDIR"/00937_template_output_format_row.tmp
$CLICKHOUSE_CLIENT --multiline --multiquery --query "SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
format_template_row = '$CURDIR/00937_template_output_format_row.tmp', \
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
format_template_rows_between_delimiter = ';\n'; --{clientError 474}"

# Test format_template_resultset_format setting

$CLICKHOUSE_CLIENT --query="SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
format_template_resultset_format = '===== Results ===== \n\${data}\n===================\n', \
format_template_rows_between_delimiter = ';\n'";

# Test that if both format_template_result_format setting and format_template_resultset are provided, error is thrown
echo -ne '===== Resultset ===== \n \${data} \n ===============' > "$CURDIR"/00937_template_output_format_resultset.tmp
$CLICKHOUSE_CLIENT --multiline --multiquery --query "SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
format_template_resultset = '$CURDIR/00937_template_output_format_resultset.tmp', \
format_template_resultset_format = '===== Resultset ===== \n \${data} \n ===============', \
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
format_template_rows_between_delimiter = ';\n'; --{clientError 474}"

$CLICKHOUSE_CLIENT --query="DROP TABLE template";
rm "$CURDIR"/00937_template_output_format_row.tmp
rm "$CURDIR"/00937_template_output_format_resultset.tmp