Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Values with expressions #6781

Merged
merged 46 commits into from Oct 4, 2019
Merged

Conversation

tavplubix
Copy link
Member

@tavplubix tavplubix commented Sep 2, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • Performance Improvement

Short description (up to few sentences):
Merging the second part of #4354 (optimization of parsing SQL expressions in Values)

Detailed description:
If some field in Values contains an SQL expression, try to deduce template of the expression and parse the following rows using this template, supposing expressions in the following rows have the same structure, e.g.

(lower('Hello'), 1 + 2),
(lower('World'), 3 + 4),
...

It allows to parse Values with expressions much faster (up to 100x faster than parsing expressions separately).

@tavplubix tavplubix added can be tested pr-performance Pull request with some performance improvements labels Sep 2, 2019
@tavplubix
Copy link
Member Author

tavplubix commented Sep 20, 2019

Failed tests and known issues:

  • 00915_simple_aggregate_function and 00725_ipv4_ipv6_domains - CAST('string literal', 'IPv4') doesn't work because it parses string literal as UInt32, not as IPv4
  • 00748_insert_array_with_null - different error code (seems to be ok)
  • 00700_decimal_arithm, 00700_decimal_bounds, 00700_decimal_compare - there is no decimal literals, so decimals in expressions are parsed as Float64 (workaround: CAST('3.141592654' AS Decimal(P, S)))
  • 00645_date_time_input_format - setting date_time_input_format to best_effort doesn't affect CAST('string literal', 'DateTime')
  • 00567_parse_datetime_as_unix_timestamp - differences in behaviour of CAST('string literal', 'DateTime') and DataTypeDateTime::deserializeTextQuoted(...) (time zone)
  • 00520_tuple_values_interpreter - ConstantExpressionTemplate doesn't support comments in expressions
  • 00432_aggregate_function_scalars_and_constants - integer types promotion in ConstantExpressionTemplate causes exception Conversion from AggregateFunction(avg, UInt64) to AggregateFunction(avg, UInt8) is not supported
  • 00339_parsing_bad_arrays - implicit conversion of integer to string causes error (seems to be ok)
  • 00117_parsing_arrays - conversion of empty 1d-Array to 2d-Array causes error (seems to be ok)
  • There are some constant expressions (like arraySort([])), which should return ColumnConst, but they don't, so evaluateConstantExpression(...) fails
  • Expressions like arrayMap(x -> round(x), []) cause error

@tavplubix tavplubix marked this pull request as ready for review September 23, 2019 10:49
@tavplubix tavplubix added the pr-documentation Documentation PRs for the specific code PR label Sep 25, 2019
@abyss7
Copy link
Contributor

abyss7 commented Sep 26, 2019

Please provide description in some way about the feature in this PR

Copy link
Contributor

@abyss7 abyss7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest to have tests covering this situations:

  1. Reading values with changed template format in different ways, like assuming default values, changing expressions (lowerupper).
  2. Reading values with same template format, but incastable types, like enums.
  3. Reading broken values, that will lead to failure with standard parser.

docs/en/operations/settings/settings.md Outdated Show resolved Hide resolved
dbms/src/Functions/IFunction.h Outdated Show resolved Hide resolved
@abyss7
Copy link
Contributor

abyss7 commented Oct 2, 2019

Reviewed all changes and discussed all the questions in person. Awaiting for some of fixes:

  • Add tests to cover all gaps in the code, when template-parsing was enabled by default.
  • Fix behavior for case like VALUES (+1) (+1) (+1) ….
  • Move couple of methods to the base classes IFunction, IFunctionBuilder, etc.
  • Leave a comment about copy-pasted part of number parser about its efficiency, for rationale.

@filimonov
Copy link
Contributor

Maybe that can also be fixed by the way: #7174

@abyss7
Copy link
Contributor

abyss7 commented Oct 3, 2019

Can be merged after performance tests are OK.

@tavplubix tavplubix merged commit d73db7f into master Oct 4, 2019
@tavplubix tavplubix deleted the merging_values_with_expressions branch November 22, 2019 00:03
azat added a commit to azat/ClickHouse that referenced this pull request Jan 15, 2022
In case you pass array in VALUES section (ValuesBlockInputFormat), it
will be copied after it had been created.
This is not significant if array is small, however if you have huge
enough array, then this can become significant (especially for array of
bool, since for each element will be used Field anyway, and it's size is
56 byte).

Here is a simple reproducer:

    $ curl -sS 'http://127.1:8123/?input_format_values_deduce_templates_of_expressions=0' -d@- <<<"insert into function null('_ Int') values (length([0,$(yes 1, | head -n2000000 | tr -d '\n')1]))"

But note, that there is lots of work (evalute constant expressions from ClickHouse#6781)
after parsing collection, and so total memory usage for query does not
changed a lot (hence - no test).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-documentation Documentation PRs for the specific code PR pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants