Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UUIDv7-generating functions #62852

Merged
merged 37 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d3a58ff
Added generateUUIDv7* functions
pet74alex Apr 22, 2024
142ce60
Added UUIDToNum and UUDv7ToDateTime functions
pet74alex Apr 22, 2024
4e4e72e
Update English version of uuid-functions.md
pet74alex Apr 22, 2024
c053d5e
Small fix in generateUUIDv7WithFastCounter documentation
pet74alex Apr 22, 2024
447aa5b
Mistypes fixes in generateUUIDv7.cpp
pet74alex Apr 23, 2024
e9f80b8
Update aspell-dict.txt
pet74alex Apr 23, 2024
2ba6be6
Small style fix in generateUUIDv7.cpp
pet74alex Apr 23, 2024
9c744e5
Update generateUUIDv7.cpp for style check test
pet74alex Apr 23, 2024
35d700a
Update generateUUIDv7.cpp small fixes for clang-tidy checks
pet74alex Apr 23, 2024
7c24d4f
Update Russian version of uuid-functions.md
pet74alex Apr 24, 2024
3344f43
UUIDv7 tests & fix for MS UUID representation in UUIDToNum (#2)
pet74alex Apr 24, 2024
4fb26f0
Update 00396_uuid.sql
pet74alex Apr 24, 2024
de99a4b
UUIDv7 tests in separate files (#3)
pet74alex Apr 24, 2024
218591e
Cosmetics
rschu1ze Apr 26, 2024
c3be672
Cosmetics, pt. II
rschu1ze Apr 26, 2024
c68a96e
Cosmetics, pt. III
rschu1ze Apr 26, 2024
9235845
Performance optimizations + docs and tests changes (#4)
pet74alex Apr 27, 2024
37a9a2c
Update aspell-dict.txt
pet74alex Apr 27, 2024
113ad9b
Update generateUUIDv7.cpp
pet74alex Apr 27, 2024
e351b51
Update generateUUIDv7.cpp
pet74alex Apr 27, 2024
1c5c97f
Update generateUUIDv7.cpp
pet74alex Apr 27, 2024
6ef5df9
Update generateUUIDv7.cpp
pet74alex Apr 27, 2024
74a9b71
Update generateUUIDv7.cpp
pet74alex Apr 27, 2024
fcc7737
Cosmetics, pt. IV
rschu1ze Apr 28, 2024
29e70f5
In-source and Russian .md UUIDv7 docs synced with English .md docs
pet74alex Apr 29, 2024
4ac190f
Added negative tests for UUIDToNum and UUIDv7ToDateTime functions
pet74alex Apr 29, 2024
c8847d8
Fixes for spell-checker
pet74alex Apr 29, 2024
6985784
Merge branch 'master' into UUIDv7
pet74alex Apr 29, 2024
918abe8
Fixed test in 00396_uuid_v7.sql
pet74alex Apr 29, 2024
93df064
Fixes for the binary tidy build
pet74alex Apr 30, 2024
5660769
Cosmetics, pt. V
rschu1ze Apr 30, 2024
3bf2982
Add a warning about the UUID sort order
rschu1ze Apr 30, 2024
c820bc3
Fix memory sanitizer report
rschu1ze Apr 30, 2024
0e8575f
Remove UUIDv7ToDateTime due to memory sanitizer issues
rschu1ze Apr 30, 2024
dde3462
Revert "Remove UUIDv7ToDateTime due to memory sanitizer issues"
pet74alex Apr 30, 2024
668c83b
Fix for memory sanitizer
pet74alex Apr 30, 2024
d2070e5
Update FunctionsCodingUUID.cpp
pet74alex Apr 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 42 additions & 1 deletion docs/en/sql-reference/data-types/uuid.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ sidebar_label: UUID

A Universally Unique Identifier (UUID) is a 16-byte value used to identify records. For detailed information about UUIDs, see [Wikipedia](https://en.wikipedia.org/wiki/Universally_unique_identifier).

While different UUID variants exist (see [here](https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis)), ClickHouse does not validate that inserted UUIDs conform to a particular variant. UUIDs are internally treated as a sequence of 16 random bytes with [8-4-4-4-12 representation](https://en.wikipedia.org/wiki/Universally_unique_identifier#Textual_representation) at SQL level.
While different UUID variants exist (see [here](https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis)), ClickHouse does not validate that inserted UUIDs conform to a particular variant.
UUIDs are internally treated as a sequence of 16 random bytes with [8-4-4-4-12 representation](https://en.wikipedia.org/wiki/Universally_unique_identifier#Textual_representation) at SQL level.

Example UUID value:

Expand All @@ -22,6 +23,46 @@ The default UUID is all-zero. It is used, for example, when a new record is inse
00000000-0000-0000-0000-000000000000
```

Due to historical reasons, UUIDs are sorted by their second half (which is unintuitive).
UUIDs should therefore not be used in an primary key (or sorting key) of a table, or as partition key.

Example:

``` sql
CREATE TABLE tab (uuid UUID) ENGINE = Memory;
INSERT INTO tab SELECT generateUUIDv4() FROM numbers(50);
SELECT * FROM tab ORDER BY uuid;
```

Result:

``` text
┌─uuid─────────────────────────────────┐
│ 36a0b67c-b74a-4640-803b-e44bb4547e3c │
│ 3a00aeb8-2605-4eec-8215-08c0ecb51112 │
│ 3fda7c49-282e-421a-85ab-c5684ef1d350 │
│ 16ab55a7-45f6-44a8-873c-7a0b44346b3e │
│ e3776711-6359-4f22-878d-bf290d052c85 │
│ 1be30226-57b2-4739-88ec-5e3d490090f2 │
│ f65853a9-4375-4f0e-8b96-906ff622ed3c │
│ d5a0c7a6-79c6-4107-8bb8-df85915edcb7 │
│ 258e6068-17d1-4a1a-8be3-ed2ceb21815c │
│ 04b0f6a9-1f7b-4a42-8bfc-62f37b8a32b8 │
│ 9924f0d9-9c16-43a9-8f08-0944ab495aed │
│ 6720dc14-4eab-4e3e-8f0c-10c4ae8d2673 │
│ 5ddadb52-0452-4f5d-9030-c3f969af93a4 │
│ [...] │
│ 2dde30e6-59a1-48f8-b260-eb37921185b6 │
│ d5402a1b-77b3-4897-b288-29edf5c3ed12 │
│ 01843939-3ba7-4fea-b2aa-45f9a6f1e057 │
│ 9eceda2f-6946-40e3-b725-16f2709ca41a │
│ 03644f74-47ba-4020-b865-be5fd4c8c7ff │
│ ce3bc93d-ab19-4c74-b8cc-737cb9212099 │
│ b7ad6c91-23d6-4b5e-b8e4-a52297490b56 │
│ 06892f64-cc2d-45f3-bf86-f5c5af5768a9 │
└──────────────────────────────────────┘
```

## Generating UUIDs

ClickHouse provides the [generateUUIDv4](../../sql-reference/functions/uuid-functions.md) function to generate random UUID version 4 values.
Expand Down
397 changes: 365 additions & 32 deletions docs/en/sql-reference/functions/uuid-functions.md

Large diffs are not rendered by default.

246 changes: 246 additions & 0 deletions docs/ru/sql-reference/functions/uuid-functions.md

Large diffs are not rendered by default.

201 changes: 193 additions & 8 deletions src/Functions/FunctionsCodingUUID.cpp
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
#include <Columns/ColumnDecimal.h>
#include <Columns/ColumnsDateTime.h>
#include <Columns/ColumnFixedString.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnVector.h>
#include <Common/BitHelpers.h>
#include <base/hex.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypeFixedString.h>
#include <DataTypes/DataTypeUUID.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunction.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/extractTimeZoneFromFunctionArguments.h>
#include <IO/WriteHelpers.h>
#include <Interpreters/Context_fwd.h>
#include <Interpreters/castColumn.h>
Expand All @@ -17,11 +21,11 @@

namespace DB::ErrorCodes
{
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int ILLEGAL_COLUMN;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int LOGICAL_ERROR;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ARGUMENT_OUT_OF_BOUND;
extern const int ILLEGAL_COLUMN;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int LOGICAL_ERROR;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}

namespace
Expand All @@ -32,7 +36,7 @@ enum class Representation
LittleEndian
};

std::pair<int, int> determineBinaryStartIndexWithIncrement(const ptrdiff_t num_bytes, const Representation representation)
std::pair<int, int> determineBinaryStartIndexWithIncrement(ptrdiff_t num_bytes, Representation representation)
{
if (representation == Representation::BigEndian)
return {0, 1};
Expand All @@ -42,15 +46,15 @@ std::pair<int, int> determineBinaryStartIndexWithIncrement(const ptrdiff_t num_b
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "{} is not handled yet", magic_enum::enum_name(representation));
}

void formatHex(const std::span<const UInt8> src, UInt8 * dst, const Representation representation)
void formatHex(const std::span<const UInt8> src, UInt8 * dst, Representation representation)
{
const auto src_size = std::ssize(src);
const auto [src_start_index, src_increment] = determineBinaryStartIndexWithIncrement(src_size, representation);
for (int src_pos = src_start_index, dst_pos = 0; src_pos >= 0 && src_pos < src_size; src_pos += src_increment, dst_pos += 2)
writeHexByteLowercase(src[src_pos], dst + dst_pos);
}

void parseHex(const UInt8 * __restrict src, const std::span<UInt8> dst, const Representation representation)
void parseHex(const UInt8 * __restrict src, const std::span<UInt8> dst, Representation representation)
{
const auto dst_size = std::ssize(dst);
const auto [dst_start_index, dst_increment] = determineBinaryStartIndexWithIncrement(dst_size, representation);
Expand Down Expand Up @@ -322,10 +326,191 @@ class FunctionUUIDStringToNum : public IFunction
}
};


class FunctionUUIDToNum : public IFunction
{
public:
static constexpr auto name = "UUIDToNum";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionUUIDToNum>(); }

String getName() const override { return name; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
bool isInjective(const ColumnsWithTypeAndName &) const override { return true; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool isVariadic() const override { return true; }

DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
checkArgumentCount(arguments, name);

if (!isUUID(arguments[0]))
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of first argument of function {}, expected UUID",
arguments[0]->getName(),
getName());
}

checkFormatArgument(arguments, name);

return std::make_shared<DataTypeFixedString>(uuid_bytes_length);
}

ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
const ColumnWithTypeAndName & col_type_name = arguments[0];
const ColumnPtr & column = col_type_name.column;

const bool defaultFormat = (parseVariant(arguments) == UUIDSerializer::Variant::Default);

if (const auto * col_in = checkAndGetColumn<ColumnUUID>(column.get()))
{
const auto & vec_in = col_in->getData();
const UUID * uuids = vec_in.data();
const size_t size = vec_in.size();

auto col_res = ColumnFixedString::create(uuid_bytes_length);

ColumnString::Chars & vec_res = col_res->getChars();
vec_res.resize(size * uuid_bytes_length);

size_t dst_offset = 0;

for (size_t i = 0; i < size; ++i)
{
uint64_t hiBytes = DB::UUIDHelpers::getHighBytes(uuids[i]);
uint64_t loBytes = DB::UUIDHelpers::getLowBytes(uuids[i]);
unalignedStoreBigEndian<uint64_t>(&vec_res[dst_offset], hiBytes);
unalignedStoreBigEndian<uint64_t>(&vec_res[dst_offset + sizeof(hiBytes)], loBytes);
if (!defaultFormat)
{
std::swap(vec_res[dst_offset], vec_res[dst_offset + 3]);
std::swap(vec_res[dst_offset + 1], vec_res[dst_offset + 2]);
std::swap(vec_res[dst_offset + 4], vec_res[dst_offset + 5]);
std::swap(vec_res[dst_offset + 6], vec_res[dst_offset + 7]);
}
dst_offset += uuid_bytes_length;
}

return col_res;
}
else
throw Exception(
ErrorCodes::ILLEGAL_COLUMN, "Illegal column {} of argument of function {}", arguments[0].column->getName(), getName());
}
};

class FunctionUUIDv7ToDateTime : public IFunction
{
public:
static constexpr auto name = "UUIDv7ToDateTime";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionUUIDv7ToDateTime>(); }

static constexpr UInt32 datetime_scale = 3;

String getName() const override { return name; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool isVariadic() const override { return true; }

DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
if (arguments.empty() || arguments.size() > 2)
throw Exception(
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Wrong number of arguments for function {}: should be 1 or 2", getName());

if (!checkAndGetDataType<DataTypeUUID>(arguments[0].type.get()))
{
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal type {} of first argument of function {}, expected UUID",
arguments[0].type->getName(),
getName());
}

String timezone;
if (arguments.size() == 2)
{
timezone = extractTimeZoneNameFromColumn(arguments[1].column.get(), arguments[1].name);

if (timezone.empty())
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Function {} supports a 2nd argument (optional) that must be a valid time zone",
getName());
}

return std::make_shared<DataTypeDateTime64>(datetime_scale, timezone);
}

ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override
{
const ColumnWithTypeAndName & col_type_name = arguments[0];
const ColumnPtr & column = col_type_name.column;

if (const auto * col_in = checkAndGetColumn<ColumnUUID>(column.get()))
{
const auto & vec_in = col_in->getData();
const UUID * uuids = vec_in.data();
const size_t size = vec_in.size();

auto col_res = ColumnDateTime64::create(size, datetime_scale);
auto & vec_res = col_res->getData();

for (size_t i = 0; i < size; ++i)
{
const uint64_t hiBytes = DB::UUIDHelpers::getHighBytes(uuids[i]);
const uint64_t ms = ((hiBytes & 0xf000) == 0x7000) ? (hiBytes >> 16) : 0;

vec_res[i] = DecimalUtils::decimalFromComponents<DateTime64>(ms / intExp10(datetime_scale), ms % intExp10(datetime_scale), datetime_scale);
}

return col_res;
}
else
throw Exception(
ErrorCodes::ILLEGAL_COLUMN, "Illegal column {} of argument of function {}", arguments[0].column->getName(), getName());
}
};

REGISTER_FUNCTION(CodingUUID)
{
factory.registerFunction<FunctionUUIDNumToString>();
factory.registerFunction<FunctionUUIDStringToNum>();
factory.registerFunction<FunctionUUIDToNum>(
FunctionDocumentation{
.description = R"(
This function accepts a UUID and returns a FixedString(16) as its binary representation, with its format optionally specified by variant (Big-endian by default).
)",
.examples{
{"uuid",
"select toUUID(UUIDNumToString(toFixedString('a/<@];!~p{jTj={)', 16))) as uuid, UUIDToNum(uuid) as uuidNum, "
"UUIDToNum(uuid, 2) as uuidMsNum",
R"(
┌─uuid─────────────────────────────────┬─uuidNum──────────┬─uuidMsNum────────┐
│ 612f3c40-5d3b-217e-707b-6a546a3d7b29 │ a/<@];!~p{jTj={) │ @</a];!~p{jTj={) │
└──────────────────────────────────────┴──────────────────┴──────────────────┘
)"}},
.categories{"UUID"}},
FunctionFactory::CaseSensitive);

factory.registerFunction<FunctionUUIDv7ToDateTime>(
FunctionDocumentation{
.description = R"(
This function extracts the timestamp from a UUID and returns it as a DateTime64(3) typed value.
The function expects the UUID having version 7 to be provided as the first argument.
An optional second argument can be passed to specify a timezone for the timestamp.
)",
.examples{
{"uuid","select UUIDv7ToDateTime(generateUUIDv7())", ""},
{"uuid","select generateUUIDv7() as uuid, UUIDv7ToDateTime(uuid), UUIDv7ToDateTime(uuid, 'America/New_York')", ""}},
.categories{"UUID"}},
FunctionFactory::CaseSensitive);
}

}
32 changes: 12 additions & 20 deletions src/Functions/generateUUIDv4.cpp
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
#include <DataTypes/DataTypeUUID.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/FunctionsRandom.h>
#include <DataTypes/DataTypeUUID.h>

namespace DB
{

namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}

#define DECLARE_SEVERAL_IMPLEMENTATIONS(...) \
DECLARE_DEFAULT_CODE (__VA_ARGS__) \
DECLARE_AVX2_SPECIFIC_CODE(__VA_ARGS__)
Expand All @@ -21,30 +17,26 @@ class FunctionGenerateUUIDv4 : public IFunction
public:
static constexpr auto name = "generateUUIDv4";

String getName() const override
{
return name;
}
String getName() const override { return name; }

size_t getNumberOfArguments() const override { return 0; }

bool isDeterministic() const override { return false; }
bool isDeterministicInScopeOfQuery() const override { return false; }
bool useDefaultImplementationForNulls() const override { return false; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool isVariadic() const override { return true; }

DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
if (arguments.size() > 1)
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Number of arguments for function {} doesn't match: passed {}, should be 0 or 1.",
getName(), arguments.size());
FunctionArgumentDescriptors mandatory_args;
FunctionArgumentDescriptors optional_args{
{"expr", nullptr, nullptr, "Arbitrary Expression"}
};
validateFunctionArgumentTypes(*this, arguments, mandatory_args, optional_args);

return std::make_shared<DataTypeUUID>();
}

bool isDeterministic() const override { return false; }

ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override
{
auto col_res = ColumnVector<UUID>::create();
Expand Down Expand Up @@ -79,10 +71,10 @@ class FunctionGenerateUUIDv4 : public TargetSpecific::Default::FunctionGenerateU
selector.registerImplementation<TargetArch::Default,
TargetSpecific::Default::FunctionGenerateUUIDv4>();

#if USE_MULTITARGET_CODE
#if USE_MULTITARGET_CODE
selector.registerImplementation<TargetArch::AVX2,
TargetSpecific::AVX2::FunctionGenerateUUIDv4>();
#endif
#endif
}

ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
Expand Down
Loading
Loading