Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle 0-list maps (erroring out) and add test-case #54

Closed
wants to merge 194 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
194 commits
Select commit Hold shift + click to select a range
9f82c21
Issue #5920: Ordered Aggregate Sorting
Apr 6, 2023
e2289ee
Merge branch 'master' into sorted-agg
Apr 7, 2023
6f58d85
Merge branch 'master' into sorted-agg
Apr 19, 2023
dbf1b8a
Issue #5920: Ordered Aggregate Sorting
Apr 19, 2023
e15b9ab
removed overflow check from operators
nickgerrets Aug 23, 2023
13430ca
multiply without overflow check
nickgerrets Aug 25, 2023
c0c9c5f
overflow wrapping
nickgerrets Aug 28, 2023
1910082
Fixed large decimal operations not throwing
nickgerrets Aug 28, 2023
e9f401b
Internal #425: TIMETZ Comparisons
hawkfish Nov 12, 2023
51039f7
Internal #425: TIMETZ Part Extract
hawkfish Nov 12, 2023
15a9458
Merge branch 'interval-time' into timetz-cmp
hawkfish Nov 13, 2023
d0f1b7e
Merge branch 'interval-time' into timetz-cmp
hawkfish Nov 13, 2023
b48015b
Merge branch 'interval-time' into timetz-cmp
hawkfish Nov 13, 2023
b14109b
Merge branch 'interval-time' into timetz-cmp
hawkfish Nov 14, 2023
05d5e35
Internal #425: TIMETZ Min Max
hawkfish Nov 15, 2023
e6309c2
Internal #425: TIMETZ Min Max
hawkfish Nov 15, 2023
f8870ab
initial implementation of FILE_SIZE_BYTES
lnkuiper Nov 30, 2023
5c53d4d
Merge branch 'uhugeint' into hugeint_faster_math
nickgerrets Dec 4, 2023
37244ee
void return for clarity
nickgerrets Dec 4, 2023
2c1ad67
Merge branch 'uhugeint' into hugeint_faster_math
nickgerrets Dec 6, 2023
6f41158
debug and bunch of tests for file_size_bytes
lnkuiper Dec 7, 2023
2d57efb
Merge branch 'main' into file_size_bytes
lnkuiper Dec 7, 2023
92949c3
refactor file_extension and make tests more lenient
lnkuiper Dec 8, 2023
70055e6
fix issue #9456
lnkuiper Dec 14, 2023
88cff66
implement switching SINGLE to LEFT
lnkuiper Dec 14, 2023
e6b9672
fix includes and add missing test
lnkuiper Dec 14, 2023
c1c19c8
include algorithm for find
lnkuiper Dec 15, 2023
ae9fd58
Merge branch 'main' into sorted-agg
Dec 15, 2023
0136371
combine deliminator tests and add depth to DelimCandidates
lnkuiper Dec 18, 2023
4ff7f1b
init right delim join
lnkuiper Dec 18, 2023
7f75f55
debug addition of right delim join and add ordering to tests
lnkuiper Dec 18, 2023
9f273e9
Merge branch 'main' into deliminator_stuff
lnkuiper Dec 18, 2023
07e0a60
Squashed commit of the following:
Tmonster Dec 18, 2023
1000f5c
Issue #9950: Array List Segments
Dec 18, 2023
38d6b26
Issue #9950: Ordered Aggregate Performance
Dec 18, 2023
04d0f6a
Merge branch 'main' into sorted-agg
hawkfish Dec 19, 2023
509a1fd
Issue #9950: Ordered Aggregate Performance
hawkfish Dec 19, 2023
ab38980
Issue #9950: Ordered Aggregate Performance
hawkfish Dec 19, 2023
3f75a8d
fix includes and add more ordering to test
lnkuiper Dec 19, 2023
19c3564
Merge branch 'main' into deliminator_stuff
lnkuiper Dec 19, 2023
65879fc
add missing include
lnkuiper Dec 19, 2023
59b3417
Merge branch 'main' into sorted-agg
hawkfish Dec 19, 2023
bb5e3ea
add regexp_escape function
chrisiou Dec 11, 2023
5173e1e
use RE2::QuoteMeta to escape special chars
chrisiou Dec 18, 2023
b5e76ac
Issue #9950: Ordered Aggregate Performance
hawkfish Dec 19, 2023
57d3add
Issue #9950: Ordered Aggregate Performance
hawkfish Dec 20, 2023
687791a
add ordering
lnkuiper Dec 20, 2023
72476aa
Merge branch 'uhugeint' into hugeint_faster_math
nickgerrets Dec 20, 2023
f263d04
Merge branch 'hugeint_faster_math' of github.com:nickgerrets/duckdb i…
nickgerrets Dec 20, 2023
a0f7652
Merge branch 'main' into hugeint_faster_math
nickgerrets Dec 20, 2023
05fb908
add ordering to underspecified test
lnkuiper Dec 20, 2023
7d050d3
Merge branch 'main' into deliminator_stuff
lnkuiper Dec 20, 2023
33bc462
implement PR feedback
lnkuiper Dec 20, 2023
a7ae159
add testcase
chrisiou Dec 20, 2023
dece225
Merge branch 'duckdb:main' into regexp-escape-func
chrisiou Dec 20, 2023
732900a
init ConcurrentOperatorMemoryManager
lnkuiper Dec 20, 2023
7452b86
pushdown filters into semi and anti joins
Tmonster Dec 20, 2023
ee681ba
make format-fix
Tmonster Dec 20, 2023
34427ee
better pushdown support
Tmonster Dec 20, 2023
7871470
maybe this works now?
Tmonster Dec 20, 2023
1f3b24b
more fixes for pushing down filters on semi and anti joins
Tmonster Dec 20, 2023
5814738
ignore logical any joins, since the bindings of any joins are harder …
Tmonster Dec 21, 2023
f85e718
rename to TemporaryMemoryManager, starting to take shape
lnkuiper Dec 21, 2023
9a7b238
initial version of TemporaryMemoryManager
lnkuiper Dec 21, 2023
d89a48d
fix bug with anti joins
Tmonster Dec 21, 2023
f66150d
make format-fix
Tmonster Dec 21, 2023
191bace
only pushdown left side
Tmonster Dec 21, 2023
0aff3af
make format-fix
Tmonster Dec 21, 2023
767300d
refactoring
nickgerrets Dec 22, 2023
711f395
Merge branch 'main' into hugeint_faster_math
nickgerrets Dec 22, 2023
44252ef
Merge branch 'main' into timetz-cmp
hawkfish Dec 22, 2023
01776e9
Internal #425: TIME_TZ + DATE
hawkfish Dec 23, 2023
33f1508
Internal #425: TIME_TZ Range
hawkfish Dec 24, 2023
9d0a61d
Merge branch 'main' into hugeint_faster_math
nickgerrets Dec 29, 2023
c65de7e
fixed negation
nickgerrets Dec 29, 2023
9621aee
interface change
nickgerrets Dec 29, 2023
c55d416
implementations
nickgerrets Dec 29, 2023
37deebc
change function name
nickgerrets Dec 29, 2023
bd03b42
updated hugeint scalar division wrapper
nickgerrets Dec 29, 2023
bc873e0
updated error message
nickgerrets Dec 29, 2023
cf4d231
format
nickgerrets Dec 29, 2023
38f6e22
Merge branch 'main' into timetz-cmp
hawkfish Jan 1, 2024
9e9dfd4
Internal #425: TIME_TZ +/- INTERVAL
hawkfish Jan 1, 2024
c58dab4
Internal #425: TIMEZONE(INTERVAL, TIMETZ)
hawkfish Jan 2, 2024
e718d45
finish initial version of TemporaryMemoryManager::UpdateState
lnkuiper Jan 2, 2024
f929d53
merge with main
lnkuiper Jan 2, 2024
14d356d
integrate TemporaryMemoryManager into PhysicalHashJoin
lnkuiper Jan 2, 2024
b5496e6
integrate TemporaryMemoryManager into RadixPartitionedHashTable
lnkuiper Jan 2, 2024
95a6cab
some tidy check stuff
lnkuiper Jan 2, 2024
7be2c0f
Merge branch 'main' into concurrent_operator_memory_manager
lnkuiper Jan 2, 2024
f309999
Internal #425: TIMEZONE(VARCHAR, TIMETZ)
hawkfish Jan 2, 2024
735ed48
Merge branch 'main' into timetz-cmp
hawkfish Jan 2, 2024
ef94672
Internal #425: Fix static template
hawkfish Jan 2, 2024
13388ed
need to add ability to push filters in semi anti joins in the same wa…
Tmonster Jan 2, 2024
025033a
Merge branch 'main' into sorted-agg
hawkfish Jan 2, 2024
5b1a248
Internal #425: Java TIMETZ
hawkfish Jan 2, 2024
ea7b2ca
add todo remark and enable disabled tests
Tmonster Jan 2, 2024
5495cd4
Merge branch 'main' into deliminator_stuff
lnkuiper Jan 3, 2024
07fa1ca
clarify and fix tests
lnkuiper Jan 3, 2024
d8614b8
Merge branch 'main' into hugeint_faster_math
nickgerrets Jan 3, 2024
d9ffb72
no overflow check (impossible)
nickgerrets Jan 3, 2024
84d2775
fix warning
lnkuiper Jan 3, 2024
c334a90
removed overflow check from operator
nickgerrets Jan 3, 2024
3a1c830
return 0 on division by zero
nickgerrets Jan 3, 2024
e66290b
division by zero now sets remainder to lhs
nickgerrets Jan 3, 2024
0e78be9
memory safety for MetaPipeline
lnkuiper Jan 3, 2024
8307756
add dependencies in reverse
lnkuiper Jan 3, 2024
a7d43b2
Merge branch 'main' into concurrent_operator_memory_manager
lnkuiper Jan 3, 2024
69413f5
add proper progress to out-of-core hash join
lnkuiper Jan 3, 2024
c1ee517
some executor memory safety
lnkuiper Jan 3, 2024
5a7005d
flip comparison
lnkuiper Jan 3, 2024
a4e74aa
remove unpartitioned variant (TemporaryMemoryManager may require part…
lnkuiper Jan 3, 2024
048afb5
fix constructor
lnkuiper Jan 3, 2024
642a66e
Merge branch 'main' into concurrent_operator_memory_manager
lnkuiper Jan 3, 2024
31c6fcd
remove num_added_samples. Add comment explaining why we return chunks…
Tmonster Jan 3, 2024
0be2725
Issue #9950: Correct Benchmark Result
Jan 3, 2024
4f0ee4a
fixes to remove num_added_samples
Tmonster Jan 3, 2024
f701939
rename reservoir sample test to slow test
Tmonster Jan 3, 2024
d1cd788
remove chunk collection entirely
Tmonster Jan 3, 2024
34b9da0
add missing files for removing chunk collection
Tmonster Jan 3, 2024
24a36d6
Merge remote-tracking branch 'upstream/main' into remove_chunk_collec…
Tmonster Jan 3, 2024
53ea627
remove redundant move to fix amalgmation build (hopefully)
Tmonster Jan 3, 2024
3f381b9
add datachunk.hpp to reservoir_sample.cpp
Tmonster Jan 3, 2024
96506e6
fix osx build and tweak radix ht limit
lnkuiper Jan 4, 2024
3c839f5
Merge branch 'main' into concurrent_operator_memory_manager
lnkuiper Jan 4, 2024
c35ec07
remove more unused variables
lnkuiper Jan 4, 2024
8db51d9
add more headers to get CI to pass
Tmonster Jan 4, 2024
08bdfbb
remove other CI
Tmonster Jan 4, 2024
989d7f9
Merge remote-tracking branch 'upstream/main' into pushdown_filters_in…
Tmonster Jan 4, 2024
938e5c1
Revert "remove other CI"
Tmonster Jan 4, 2024
2ad4e3f
Merge remote-tracking branch 'upstream/main' into remove_chunk_collec…
Tmonster Jan 5, 2024
481e870
limit memory usage
lnkuiper Jan 5, 2024
d0b401c
more tweaking
lnkuiper Jan 5, 2024
4ac040d
windows compilation
nickgerrets Jan 5, 2024
5caa285
reverted unnecessary skip of overflow check
nickgerrets Jan 5, 2024
1344bc4
Merge branch 'main' into file_size_bytes
lnkuiper Jan 5, 2024
4061d77
Merge branch 'main' into deliminator_stuff
lnkuiper Jan 5, 2024
f694813
disable tpch in substrait due to #9993
lnkuiper Jan 5, 2024
9693ec5
inlining
nickgerrets Jan 5, 2024
2c6a1a1
Merge branch 'main' into concurrent_operator_memory_manager
lnkuiper Jan 5, 2024
31a1d1e
stress test for TemporaryMemoryManager
lnkuiper Jan 5, 2024
b507c0f
small json httpfs optimization
samansmink Jan 5, 2024
5aa8163
added test to verify same seed, same sample. Regardless of number of …
Tmonster Jan 5, 2024
1404719
Issue #10138: Finite Temporal Helpers
hawkfish Jan 5, 2024
9acc64c
add test for json s3 optimization edge case
samansmink Jan 8, 2024
6c198ed
Merge branch 'main' into regexp-escape-func
hannes Jan 8, 2024
ba04700
Infrastructure: truncate not always available in CI, use dd
carlopi Jan 8, 2024
3d80549
Merge pull request #10110 from Tmonster/pushdown_filters_into_semi_an…
Mytherin Jan 8, 2024
08c1bb4
Merge branch 'main' into timetz-cmp
Mytherin Jan 8, 2024
c7351c8
Fix location
Mytherin Jan 8, 2024
79e5db7
Merge pull request #10045 from hawkfish/sorted-agg
Mytherin Jan 8, 2024
3699cf3
Minor fixes
Mytherin Jan 8, 2024
bce1e35
Merge pull request #10117 from nickgerrets/hugeint_faster_math
Mytherin Jan 8, 2024
ec8d094
implement PR feedback
lnkuiper Jan 8, 2024
8c09160
Merge branch 'main' into concurrent_operator_memory_manager
lnkuiper Jan 8, 2024
70f44c2
Fix #10074 - for materialized CTEs the final result names are not inf…
Mytherin Jan 8, 2024
6a1a2d6
Merge pull request #10044 from chrisiou/regexp-escape-func
Mytherin Jan 8, 2024
345ea12
Merge branch 'main' into deliminator_stuff
lnkuiper Jan 8, 2024
89018c0
eliminate duplicate code
lnkuiper Jan 8, 2024
b39a35b
Merge pull request #10162 from carlopi/fixsignaturemissigntruncate
Mytherin Jan 8, 2024
10bd11d
Merge pull request #10151 from samansmink/speed-up-json-httpfs-reads
Mytherin Jan 8, 2024
185f637
Merge branch 'main' into infinite-c
hawkfish Jan 8, 2024
3cc808e
Correctly handle recursive and nested types that refer to other types…
Mytherin Jan 8, 2024
66edcb4
Add tests
Mytherin Jan 8, 2024
a1260e8
flip comparison
lnkuiper Jan 8, 2024
8304175
Merge branch 'main' into file_size_bytes
lnkuiper Jan 8, 2024
c12a40d
implement pr feedback
lnkuiper Jan 8, 2024
8fb66c0
extract the names from the dataframe, they might have been deduplicat…
Tishj Jan 8, 2024
dfc1d1b
Merge pull request #10038 from Tmonster/remove_chunk_collection_from_…
Mytherin Jan 8, 2024
e6d0eda
Issue #10138: Finite Temporal Helpers
hawkfish Jan 8, 2024
4c36219
Merge branch 'main' into timetz-cmp
hawkfish Jan 8, 2024
fc0ff70
Internal #425: TIMETZ Functions
hawkfish Jan 8, 2024
7139116
Suspend duckdb shell on Ctrl+Z
gsauthof Jan 8, 2024
387c8e0
Issue #10138: Finite Temporal Helpers
hawkfish Jan 9, 2024
4af1505
check file size before writing
lnkuiper Jan 9, 2024
e261741
Merge pull request #10163 from Mytherin/issue10074
Mytherin Jan 9, 2024
4ddf4c4
Merge pull request #10165 from Tishj/python_fix_timestamptz_issue
Mytherin Jan 9, 2024
86da600
Merge pull request #10164 from Mytherin/issue10141
Mytherin Jan 9, 2024
76dd7c5
Merge pull request #10172 from gsauthof/ctrl-z
Mytherin Jan 9, 2024
d1f4056
Merge pull request #9993 from lnkuiper/deliminator_stuff
Mytherin Jan 9, 2024
8a578a4
Support unreserved and column name keywords in DETACH
Mytherin Jan 9, 2024
4377cf5
Detach can allow even more keywords
Mytherin Jan 9, 2024
2db3b3c
Fix #10057 - correctly propagate errors when binding aliases instead …
Mytherin Jan 9, 2024
2759c49
Add test case
Mytherin Jan 9, 2024
c75c9da
Format
Mytherin Jan 9, 2024
daac5aa
Merge branch 'main' into file_size_bytes
lnkuiper Jan 9, 2024
8051420
Merge pull request #10147 from lnkuiper/concurrent_operator_memory_ma…
Mytherin Jan 9, 2024
4d2ae47
Merge pull request #10157 from hawkfish/infinite-c
Mytherin Jan 9, 2024
30085a5
Merge pull request #10107 from hawkfish/timetz-cmp
Mytherin Jan 9, 2024
5b4b93a
require skip_reload
Mytherin Jan 9, 2024
8793ff3
Merge pull request #10176 from Mytherin/issue10057
Mytherin Jan 9, 2024
0f41fe7
Merge pull request #10175 from Mytherin/detachkeyword
Mytherin Jan 10, 2024
db09b50
Merge pull request #9920 from lnkuiper/file_size_bytes
Mytherin Jan 10, 2024
9f242f8
Handle 0-list maps (erroring out) and add test-case
carlopi Jan 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/config/uncovered_files.csv
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,6 @@ common/types/batched_data_collection.cpp 11
common/types/bit.cpp 9
common/types/blob.cpp 3
common/types/cast_helpers.cpp 2
common/types/chunk_collection.cpp 94
common/types/column/column_data_allocator.cpp 13
common/types/column/column_data_collection.cpp 55
common/types/column/partitioned_column_data.cpp 8
Expand Down Expand Up @@ -385,7 +384,6 @@ include/duckdb/common/sort/sorted_block.hpp 1
include/duckdb/common/string_util.hpp 9
include/duckdb/common/type_util.hpp 2
include/duckdb/common/types.hpp 5
include/duckdb/common/types/chunk_collection.hpp 6
include/duckdb/common/types/column/column_data_allocator.hpp 3
include/duckdb/common/types/column/partitioned_column_data.hpp 2
include/duckdb/common/types/datetime.hpp 13
Expand Down
19 changes: 19 additions & 0 deletions .github/patches/extensions/substrait/disable_tpch.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
diff --git a/test/sql/test_substrait_tpch.test b/test/sql/test_substrait_tpch.test
index ffa2666..10f815b 100644
--- a/test/sql/test_substrait_tpch.test
+++ b/test/sql/test_substrait_tpch.test
@@ -2,6 +2,14 @@
# description: Test get_substrait with TPC-H queries
# group: [sql]

+# test skipped since PR https://github.com/duckdb/duckdb/pull/9993
+# the PR re-introduces DelimJoins in TPC-H again for performance reasons
+# if there is a selection in the duplicate-eliminated side, we keep the DelimJoin
+# this is checked in Deliminator::HasSelection
+# if this function returns false, all DelimJoins are removed from TPC-H
+
+mode skip
+
require substrait

require tpch
6 changes: 3 additions & 3 deletions benchmark/micro/aggregate/ordered_first.benchmark
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
name Ordered First (Grouped)
group aggregate

#load
#PRAGMA ordered_aggregate_threshold=262144
load
CREATE TABLE t AS FROM range(10000000) tbl(i);

run
SELECT SUM(agg) FROM (
SELECT i // 2048 AS grp, FIRST(i ORDER BY i DESC) AS agg
FROM range(10000000) tbl(i)
FROM t
GROUP BY ALL
)

Expand Down
58 changes: 42 additions & 16 deletions extension/icu/icu-timezone.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@

namespace duckdb {

template <typename T>
static bool ICUIsFinite(const T &t) {
return true;
}

template <>
bool ICUIsFinite(const timestamp_t &t) {
return Timestamp::IsFinite(t);
}

struct ICUTimeZoneData : public GlobalTableFunctionState {
ICUTimeZoneData() : tzs(icu::TimeZone::createEnumeration()) {
UErrorCode status = U_ZERO_ERROR;
Expand Down Expand Up @@ -93,7 +103,7 @@ static void ICUTimeZoneFunction(ClientContext &context, TableFunctionInput &data

struct ICUFromNaiveTimestamp : public ICUDateFunc {
static inline timestamp_t Operation(icu::Calendar *calendar, timestamp_t naive) {
if (!Timestamp::IsFinite(naive)) {
if (!ICUIsFinite(naive)) {
return naive;
}

Expand Down Expand Up @@ -157,7 +167,7 @@ struct ICUFromNaiveTimestamp : public ICUDateFunc {

struct ICUToNaiveTimestamp : public ICUDateFunc {
static inline timestamp_t Operation(icu::Calendar *calendar, timestamp_t instant) {
if (!Timestamp::IsFinite(instant)) {
if (!ICUIsFinite(instant)) {
return instant;
}

Expand Down Expand Up @@ -289,8 +299,23 @@ struct ICULocalTimeFunc : public ICUDateFunc {
}
};

struct ICUToTimeTZ : public ICUDateFunc {
static inline dtime_tz_t Operation(icu::Calendar *calendar, dtime_tz_t timetz) {
// Normalise to +00:00, add TZ offset, then set offset to TZ
auto time = Time::NormalizeTimeTZ(timetz);

auto offset = ExtractField(calendar, UCAL_ZONE_OFFSET);
offset += ExtractField(calendar, UCAL_DST_OFFSET);
offset /= Interval::MSECS_PER_SEC;

date_t date(0);
time = Interval::Add(time, {0, 0, offset * Interval::MICROS_PER_SEC}, date);
return dtime_tz_t(time, offset);
}
};

struct ICUTimeZoneFunc : public ICUDateFunc {
template <typename OP>
template <typename OP, typename T>
static void Execute(DataChunk &input, ExpressionState &state, Vector &result) {
auto &func_expr = state.expr.Cast<BoundFunctionExpression>();
auto &info = func_expr.bind_info->Cast<BindData>();
Expand All @@ -307,28 +332,29 @@ struct ICUTimeZoneFunc : public ICUDateFunc {
ConstantVector::SetNull(result, true);
} else {
SetTimeZone(calendar, *ConstantVector::GetData<string_t>(tz_vec));
UnaryExecutor::Execute<timestamp_t, timestamp_t>(
ts_vec, result, input.size(), [&](timestamp_t ts) { return OP::Operation(calendar, ts); });
UnaryExecutor::Execute<T, T>(ts_vec, result, input.size(),
[&](T ts) { return OP::Operation(calendar, ts); });
}
} else {
BinaryExecutor::Execute<string_t, timestamp_t, timestamp_t>(tz_vec, ts_vec, result, input.size(),
[&](string_t tz_id, timestamp_t ts) {
if (Timestamp::IsFinite(ts)) {
SetTimeZone(calendar, tz_id);
return OP::Operation(calendar, ts);
} else {
return ts;
}
});
BinaryExecutor::Execute<string_t, T, T>(tz_vec, ts_vec, result, input.size(), [&](string_t tz_id, T ts) {
if (ICUIsFinite(ts)) {
SetTimeZone(calendar, tz_id);
return OP::Operation(calendar, ts);
} else {
return ts;
}
});
}
}

static void AddFunction(const string &name, DatabaseInstance &db) {
ScalarFunctionSet set(name);
set.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIMESTAMP}, LogicalType::TIMESTAMP_TZ,
Execute<ICUFromNaiveTimestamp>, Bind));
Execute<ICUFromNaiveTimestamp, timestamp_t>, Bind));
set.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIMESTAMP_TZ}, LogicalType::TIMESTAMP,
Execute<ICUToNaiveTimestamp>, Bind));
Execute<ICUToNaiveTimestamp, timestamp_t>, Bind));
set.AddFunction(ScalarFunction({LogicalType::VARCHAR, LogicalType::TIME_TZ}, LogicalType::TIME_TZ,
Execute<ICUToTimeTZ, dtime_tz_t>, Bind));
ExtensionUtil::AddFunctionOverload(db, set);
}
};
Expand Down
22 changes: 14 additions & 8 deletions extension/json/buffered_json_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ JSONBufferHandle::JSONBufferHandle(idx_t buffer_index_p, idx_t readers_p, Alloca

JSONFileHandle::JSONFileHandle(unique_ptr<FileHandle> file_handle_p, Allocator &allocator_p)
: file_handle(std::move(file_handle_p)), allocator(allocator_p), can_seek(file_handle->CanSeek()),
plain_file_source(file_handle->OnDiskFile() && can_seek), file_size(file_handle->GetFileSize()), read_position(0),
requested_reads(0), actual_reads(0), cached_size(0) {
file_size(file_handle->GetFileSize()), read_position(0), requested_reads(0), actual_reads(0), cached_size(0) {
}

bool JSONFileHandle::IsOpen() const {
Expand Down Expand Up @@ -55,6 +54,10 @@ bool JSONFileHandle::CanSeek() const {
return can_seek;
}

FileHandle &JSONFileHandle::GetHandle() {
return *file_handle;
}

idx_t JSONFileHandle::GetPositionAndSize(idx_t &position, idx_t requested_size) {
D_ASSERT(requested_size != 0);

Expand All @@ -68,12 +71,15 @@ idx_t JSONFileHandle::GetPositionAndSize(idx_t &position, idx_t requested_size)
return actual_size;
}

void JSONFileHandle::ReadAtPosition(char *pointer, idx_t size, idx_t position, bool sample_run) {
void JSONFileHandle::ReadAtPosition(char *pointer, idx_t size, idx_t position, bool sample_run,
optional_ptr<FileHandle> override_handle) {
D_ASSERT(size != 0);
if (plain_file_source) {
file_handle->Read(pointer, size, position);
auto &handle = override_handle ? *override_handle.get() : *file_handle.get();

if (can_seek) {
handle.Read(pointer, size, position);
} else if (sample_run) { // Cache the buffer
file_handle->Read(pointer, size, position);
handle.Read(pointer, size, position);

cached_buffers.emplace_back(allocator.Allocate(size));
memcpy(cached_buffers.back().get(), pointer, size);
Expand All @@ -84,7 +90,7 @@ void JSONFileHandle::ReadAtPosition(char *pointer, idx_t size, idx_t position, b
}

if (size != 0) {
file_handle->Read(pointer, size, position);
handle.Read(pointer, size, position);
}
}
if (++actual_reads > requested_reads) {
Expand All @@ -94,7 +100,7 @@ void JSONFileHandle::ReadAtPosition(char *pointer, idx_t size, idx_t position, b

idx_t JSONFileHandle::Read(char *pointer, idx_t requested_size, bool sample_run) {
D_ASSERT(requested_size != 0);
if (plain_file_source) {
if (can_seek) {
auto actual_size = ReadInternal(pointer, requested_size);
read_position += actual_size;
return actual_size;
Expand Down
7 changes: 5 additions & 2 deletions extension/json/include/buffered_json_reader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,13 @@ struct JSONFileHandle {

bool CanSeek() const;

FileHandle &GetHandle();

idx_t GetPositionAndSize(idx_t &position, idx_t requested_size);
void ReadAtPosition(char *pointer, idx_t size, idx_t position, bool sample_run);
idx_t Read(char *pointer, idx_t requested_size, bool sample_run);
//! Read at position optionally allows passing a custom handle to read from, otherwise the default one is used
void ReadAtPosition(char *pointer, idx_t size, idx_t position, bool sample_run,
optional_ptr<FileHandle> override_handle = nullptr);

private:
idx_t ReadInternal(char *pointer, const idx_t requested_size);
Expand All @@ -81,7 +85,6 @@ struct JSONFileHandle {

//! File properties
const bool can_seek;
const bool plain_file_source;
const idx_t file_size;

//! Read properties
Expand Down
6 changes: 6 additions & 0 deletions extension/json/include/json_scan.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,12 @@ struct JSONScanLocalState {
//! Whether this is the last batch of the file
bool is_last;

//! The current main filesystem
FileSystem &fs;

//! For some filesystems (e.g. S3), using a filehandle per thread increases performance
unique_ptr<FileHandle> thread_local_filehandle;

//! Current buffer read info
char *buffer_ptr;
idx_t buffer_size;
Expand Down
12 changes: 8 additions & 4 deletions extension/json/json_functions/copy_json.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
namespace duckdb {

static void ThrowJSONCopyParameterException(const string &loption) {
throw BinderException("COPY (FORMAT JSON) parameter %s expects a single argument.");
throw BinderException("COPY (FORMAT JSON) parameter %s expects a single argument.", loption);
}

static BoundStatement CopyToJSONPlan(Binder &binder, CopyStatement &stmt) {
Expand All @@ -23,7 +23,8 @@ static BoundStatement CopyToJSONPlan(Binder &binder, CopyStatement &stmt) {
// Parse the options, creating options for the CSV writer while doing so
string date_format;
string timestamp_format;
case_insensitive_map_t<vector<Value>> csv_copy_options;
// We insert the JSON file extension here so it works properly with PER_THREAD_OUTPUT/FILE_SIZE_BYTES etc.
case_insensitive_map_t<vector<Value>> csv_copy_options {{"file_extension", {"json"}}};
for (const auto &kv : info.options) {
const auto &loption = StringUtil::Lower(kv.first);
if (loption == "dateformat" || loption == "date_format") {
Expand All @@ -36,8 +37,6 @@ static BoundStatement CopyToJSONPlan(Binder &binder, CopyStatement &stmt) {
ThrowJSONCopyParameterException(loption);
}
timestamp_format = StringValue::Get(kv.second.back());
} else if (loption == "compression") {
csv_copy_options.insert(kv);
} else if (loption == "array") {
if (kv.second.size() > 1) {
ThrowJSONCopyParameterException(loption);
Expand All @@ -47,6 +46,11 @@ static BoundStatement CopyToJSONPlan(Binder &binder, CopyStatement &stmt) {
csv_copy_options["suffix"] = {"\n]\n"};
csv_copy_options["new_line"] = {",\n\t"};
}
} else if (loption == "compression" || loption == "encoding" || loption == "per_thread_output" ||
loption == "file_size_bytes" || loption == "use_tmp_file" || loption == "overwrite_or_ignore" ||
loption == "filename_pattern" || loption == "file_extension") {
// We support these base options
csv_copy_options.insert(kv);
} else {
throw BinderException("Unknown option for COPY ... TO ... (FORMAT JSON): \"%s\".", loption);
}
Expand Down
17 changes: 15 additions & 2 deletions extension/json/json_scan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,8 @@ JSONScanGlobalState::JSONScanGlobalState(ClientContext &context, const JSONScanD
JSONScanLocalState::JSONScanLocalState(ClientContext &context, JSONScanGlobalState &gstate)
: scan_count(0), batch_index(DConstants::INVALID_INDEX), total_read_size(0), total_tuple_count(0),
bind_data(gstate.bind_data), allocator(BufferAllocator::Get(context)), current_reader(nullptr),
current_buffer_handle(nullptr), is_last(false), buffer_size(0), buffer_offset(0), prev_buffer_remainder(0) {
current_buffer_handle(nullptr), is_last(false), fs(FileSystem::GetFileSystem(context)),
thread_local_filehandle(nullptr), buffer_size(0), buffer_offset(0), prev_buffer_remainder(0) {

// Buffer to reconstruct JSON values when they cross a buffer boundary
reconstruct_buffer = gstate.allocator.Allocate(gstate.buffer_capacity);
Expand Down Expand Up @@ -718,9 +719,21 @@ void JSONScanLocalState::ReadNextBufferSeek(JSONScanGlobalState &gstate, optiona
return;
}

auto &raw_handle = file_handle.GetHandle();
// For non-on-disk files, we create a handle per thread: this is faster for e.g. S3Filesystem where throttling
// per tcp connection can occur meaning that using multiple connections is faster.
if (!raw_handle.OnDiskFile() && raw_handle.CanSeek()) {
if (!thread_local_filehandle || thread_local_filehandle->GetPath() != raw_handle.GetPath()) {
thread_local_filehandle =
fs.OpenFile(raw_handle.GetPath(), FileFlags::FILE_FLAGS_READ | FileFlags::FILE_FLAGS_DIRECT_IO);
}
} else if (thread_local_filehandle) {
thread_local_filehandle = nullptr;
}

// Now read the file lock-free!
file_handle.ReadAtPosition(buffer_ptr + prev_buffer_remainder, read_size, read_position,
gstate.bind_data.type == JSONScanType::SAMPLE);
gstate.bind_data.type == JSONScanType::SAMPLE, thread_local_filehandle);
}

void JSONScanLocalState::ReadNextBufferNoSeek(JSONScanGlobalState &gstate, optional_idx &buffer_index) {
Expand Down
1 change: 0 additions & 1 deletion extension/parquet/column_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
#ifndef DUCKDB_AMALGAMATION
#include "duckdb/common/types/bit.hpp"
#include "duckdb/common/types/blob.hpp"
#include "duckdb/common/types/chunk_collection.hpp"
#endif

namespace duckdb {
Expand Down
1 change: 0 additions & 1 deletion extension/parquet/column_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
#include "duckdb/common/serializer/memory_stream.hpp"
#include "duckdb/common/serializer/write_stream.hpp"
#include "duckdb/common/string_map_set.hpp"
#include "duckdb/common/types/chunk_collection.hpp"
#include "duckdb/common/types/date.hpp"
#include "duckdb/common/types/hugeint.hpp"
#include "duckdb/common/types/uhugeint.hpp"
Expand Down
1 change: 0 additions & 1 deletion extension/parquet/include/column_reader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
#ifndef DUCKDB_AMALGAMATION

#include "duckdb/common/operator/cast_operators.hpp"
#include "duckdb/common/types/chunk_collection.hpp"
#include "duckdb/common/types/string_type.hpp"
#include "duckdb/common/types/vector.hpp"
#include "duckdb/common/types/vector_cache.hpp"
Expand Down
4 changes: 4 additions & 0 deletions extension/parquet/include/parquet_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ class ParquetWriter {
BufferedFileWriter &GetWriter() {
return *writer;
}
idx_t FileSize() {
lock_guard<mutex> glock(lock);
return writer->total_written;
}

static CopyTypeSupport TypeIsSupported(const LogicalType &type);

Expand Down
Loading
Loading