Skip to content

A new system table, system.unicode #80055

@alexey-milovidov

Description

@alexey-milovidov

Company or project name

ClickHouse

Use case

Search for particular code points, such as emojis. This will be especially useful from clickhouse-local.

Describe the solution you'd like

A table should contain a record for every existing Unicode code point. It is generated on the fly.

It will contain a numeric value of the code point, a string with the UTF-8 representation,
a few more convenience fields, such as a string representation with U+XXXX... notation and JavaScript notation, and it should contain every property from this list: https://unicode-org.github.io/icu/userguide/strings/properties.html

Describe alternatives you've considered

The idea is from https://github.com/arp242/uni

Additional context

No response

Metadata

Metadata

Assignees

Labels

featurewarmup taskThe task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions