You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Switching from VARCHAR to UUID causes a 10x slowdown when using ANY_VALUE()
To Reproduce
Performance change was observed with production data, synthetic data here reproduces.
importduckdbimportosimporttempfileimporttimedefuuid_perf() ->None:
QUERY=""" WITH event_names AS ( SELECT UNNEST(['event1', 'event2', 'event3', 'event5']) AS eventName ), event_timestamps AS ( SELECT CAST('2024-01-01 00:00:00+00:00' AS TIMESTAMPTZ) + INTERVAL (i) HOUR AS eventTime FROM GENERATE_SERIES(0, 1000000) AS s(i) ), log_records AS ( SELECT eventName, eventTime, gen_random_uuid() AS eventID FROM event_names CROSS JOIN event_timestamps ORDER BY eventId ), sample AS ( SELECT eventName, DATE_TRUNC('hour', eventTime) AS eventHour, ANY_VALUE(eventID) AS sampleEvent FROM log_records GROUP BY 1, 2 LIMIT 10 ) SELECT * FROM sample ; """deftime_query(q:str) ->None:
withtempfile.TemporaryDirectory() astmpdir:
withduckdb.connect(os.path.join(tmpdir, "test.db")) ascon:
print(con.execute("SELECT * FROM pragma_version();").fetchall())
start: int=time.clock_gettime_ns(time.CLOCK_MONOTONIC)
con.execute(q).fetchall()
stop: int=time.clock_gettime_ns(time.CLOCK_MONOTONIC)
elapsed=stop-startprint(f"elapsed {elapsed/1.0e9}s")
print("Original Query")
time_query(QUERY)
print("Query with TEXT CAST")
time_query(QUERY.replace("ANY_VALUE(eventID) AS sampleEvent",
"ANY_VALUE(eventID::TEXT) AS sampleEvent"))
uuid_perf()
# [('v0.10.1', '4a89d97db8')]# Original Query# elapsed 17.178034s# Query with TEXT CAST# elapsed 1.861703s# [('v0.10.3-dev388', '86fee9ed38')]# Original Query# elapsed 15.525825s# Query with TEXT CAST# elapsed 1.678268s
OS:
macOS 14.4.1 Apple M3 Max
DuckDB Version:
0.10.1
DuckDB Client:
Python
Full Name:
Rob Jackson
Affiliation:
exaforce.com
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a source build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
Yes, I have
The text was updated successfully, but these errors were encountered:
What happens?
Switching from VARCHAR to UUID causes a 10x slowdown when using ANY_VALUE()
To Reproduce
Performance change was observed with production data, synthetic data here reproduces.
OS:
macOS 14.4.1 Apple M3 Max
DuckDB Version:
0.10.1
DuckDB Client:
Python
Full Name:
Rob Jackson
Affiliation:
exaforce.com
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a source build
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: