Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add documentation for dictionaries DDL #7720

Merged
merged 9 commits into from
Nov 13, 2019
24 changes: 24 additions & 0 deletions docs/en/query_language/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,3 +271,27 @@ Views look the same as normal tables. For example, they are listed in the result
There isn't a separate query for deleting views. To delete a view, use `DROP TABLE`.

[Original article](https://clickhouse.yandex/docs/en/query_language/create/) <!--hide-->

## CREATE DICTIONARY {#create-dictionary-query}

```sql
CREATE DICTIONARY [IF NOT EXISTS] [db.]dictionary_name
(
key1 type1 [DEFAULT|EXPRESSION expr1] [HIERARCHICAL|INJECTIVE|IS_OBJECT_ID],
key2 type2 [DEFAULT|EXPRESSION expr2] [HIERARCHICAL|INJECTIVE|IS_OBJECT_ID],
attr1 type2 [DEFAULT|EXPRESSION expr3],
attr2 type2 [DEFAULT|EXPRESSION expr4]
)
PRIMARY KEY key1, key2
SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN]))
LAYOUT(LAYOUT_NAME([param_name param_value]))
LIFETIME([MIN val1] MAX val2)
```

Creates [external dictionary](dicts/external_dicts.md) with given [structure](dicts/external_dicts_dict_structure.md), [source](dicts/external_dicts_dict_sources.md), [layout](dicts/external_dicts_dict_layout.md) and [lifetime](dicts/external_dicts_dict_lifetime.md).

External dictionary structure consists of attributes. Dictionary attributes are specified similarly to table columns. The only required attribute property is its type, all other properties may have default values.

Depending on dictionary [layout](dicts/external_dicts_dict_layout.md) one or more attributes can be specified as dictionary keys.

For more information, see [External Dictionaries](dicts/external_dicts.md) section.
7 changes: 5 additions & 2 deletions docs/en/query_language/dicts/external_dicts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ You can add your own dictionaries from various data sources. The data source for

ClickHouse:

> - Fully or partially stores dictionaries in RAM.
- Fully or partially stores dictionaries in RAM.
- Periodically updates dictionaries and dynamically loads missing values. In other words, dictionaries can be loaded dynamically.
- Allows to create external dictionaries with xml-files or [DDL queries](../create.md#create-dictionary-query).

The configuration of external dictionaries is located in one or more files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
The configuration of external dictionaries can be located in one or more xml-files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.

Dictionaries can be loaded at server startup or at first use, depending on the [dictionaries_lazy_load](../../operations/server_settings/settings.md#server_settings-dictionaries_lazy_load) setting.

Expand All @@ -31,6 +32,8 @@ The dictionary configuration file has the following format:

You can [configure](external_dicts_dict.md) any number of dictionaries in the same file.

[DDL queries for dictionaries](../create.md#create-dictionary-query) doesn't require any additional records in server configuration. They allow to work with dictionaries as first-class entities, like tables or views.

!!! attention
You can convert values for a small dictionary by describing it in a `SELECT` query (see the [transform](../functions/other_functions.md) function). This functionality is not related to external dictionaries.

Expand Down
23 changes: 18 additions & 5 deletions docs/en/query_language/dicts/external_dicts_dict.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# Configuring an External Dictionary {#dicts-external_dicts_dict}

The dictionary configuration has the following structure:
If dictionary is configured using xml-file, than dictionary configuration has the following structure:

```xml
<dictionary>
<name>dict_name</name>

<structure>
<!-- Complex key configuration -->
</structure>

<source>
<!-- Source configuration -->
</source>
Expand All @@ -14,16 +18,25 @@ The dictionary configuration has the following structure:
<!-- Memory layout configuration -->
</layout>

<structure>
<!-- Complex key configuration -->
</structure>

<lifetime>
<!-- Lifetime of dictionary in memory -->
</lifetime>
</dictionary>
```

Corresponding [DDL-query](../create.md#create-dictionary-query) has the following structure:

```sql
CREATE DICTIONARY dict_name
(
... -- attributes
)
PRIMARY KEY ... -- complex or single key configuration
SOURCE(...) -- Source configuration
LAYOUT(...) -- Memory layout configuration
LIFETIME(...) -- Lifetime of dictionary in memory
```

- name – The identifier that can be used to access the dictionary. Use the characters `[a-zA-Z0-9_\-]`.
- [source](external_dicts_dict_sources.md) — Source of the dictionary.
- [layout](external_dicts_dict_layout.md) — Dictionary layout in memory.
Expand Down
87 changes: 78 additions & 9 deletions docs/en/query_language/dicts/external_dicts_dict_layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,15 @@ The configuration looks like this:
</yandex>
```

in case of [DDL-query](../create.md#create-dictionary-query), equal configuration will looks like

```sql
CREATE DICTIONARY (...)
...
LAYOUT(LAYOUT_TYPE(param value)) -- layout settings
...
```


## Ways to Store Dictionaries in Memory

Expand Down Expand Up @@ -64,6 +73,12 @@ Configuration example:
</layout>
```

or

```sql
LAYOUT(FLAT())
```

### hashed {#dicts-external_dicts_dict_layout-hashed}

The dictionary is completely stored in memory in the form of a hash table. The dictionary can contain any number of elements with any identifiers In practice, the number of keys can reach tens of millions of items.
Expand All @@ -78,6 +93,12 @@ Configuration example:
</layout>
```

or

```sql
LAYOUT(HASHED())
```

### sparse_hashed {#dicts-external_dicts_dict_layout-sparse_hashed}

Similar to `hashed`, but uses less memory in favor more CPU usage.
Expand All @@ -90,6 +111,9 @@ Configuration example:
</layout>
```

```sql
LAYOUT(SPARSE_HASHED())
```

### complex_key_hashed

Expand All @@ -103,6 +127,9 @@ Configuration example:
</layout>
```

```sql
LAYOUT(COMPLEX_KEY_HASHED())
```

### range_hashed

Expand All @@ -113,15 +140,15 @@ This storage method works the same way as hashed and allows using date/time (arb
Example: The table contains discounts for each advertiser in the format:

```text
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
| advertiser id | discount start date | discount end date | amount |
+===============+=====================+===================+========+
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
```

To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](external_dicts_dict_structure.md). These elements must contain elements `name` and` type` (if `type` is not specified, the default type will be used - Date). `type` can be any numeric type (Date / DateTime / UInt64 / Int32 / others).
Expand All @@ -144,6 +171,19 @@ Example:
...
```

or

```sql
CREATE DICTIONARY somedict (
id UInt64,
first Date,
last Date
)
PRIMARY KEY id
LAYOUT(RANGE_HASHED())
RANGE(MIN first MAX last)
```

To work with these dictionaries, you need to pass an additional argument to the `dictGetT` function, for which a range is selected:

```sql
Expand Down Expand Up @@ -193,6 +233,18 @@ Configuration example:
</yandex>
```

or

```sql
CREATE DICTIONARY somedict(
Abcdef UInt64,
StartTimeStamp UInt64,
EndTimeStamp UInt64,
XXXType String DEFAULT ''
)
PRIMARY KEY Abcdef
RANGE(MIN StartTimeStamp MAX EndTimeStamp)
```

### cache

Expand All @@ -218,6 +270,12 @@ Example of settings:
</layout>
```

or

```sql
LAYOUT(CACHE(SIZE_IN_CELLS 1000000000))
```

Set a large enough cache size. You need to experiment to select the number of cells:

1. Set some value.
Expand All @@ -241,17 +299,17 @@ This type of storage is for mapping network prefixes (IP addresses) to metadata
Example: The table contains network prefixes and their corresponding AS number and country code:

```text
+-----------------+-------+--------+
+-----------------|-------|--------+
| prefix | asn | cca2 |
+=================+=======+========+
| 202.79.32.0/20 | 17501 | NP |
+-----------------+-------+--------+
+-----------------|-------|--------+
| 2620:0:870::/48 | 3856 | US |
+-----------------+-------+--------+
+-----------------|-------|--------+
| 2a02:6b8:1::/48 | 13238 | RU |
+-----------------+-------+--------+
+-----------------|-------|--------+
| 2001:db8::/32 | 65536 | ZZ |
+-----------------+-------+--------+
+-----------------|-------|--------+
```

When using this type of layout, the structure must have a composite key.
Expand Down Expand Up @@ -279,6 +337,17 @@ Example:
...
```

or

```sql
CREATE DICTIONARY somedict (
prefix String,
asn UInt32,
cca2 String DEFAULT '??'
)
PRIMARY KEY prefix
```

The key must have only one String type attribute that contains an allowed IP prefix. Other types are not supported yet.

For queries, you must use the same functions (`dictGetT` with a tuple) as for dictionaries with composite keys:
Expand Down
23 changes: 22 additions & 1 deletion docs/en/query_language/dicts/external_dicts_dict_lifetime.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,14 @@ Example of settings:
</dictionary>
```

Setting ` <lifetime> 0</lifetime> ` prevents updating dictionaries.
```sql
CREATE DICTIONARY (...)
...
LIFETIME(300)
...
```

Setting `<lifetime>0</lifetime>` (`LIFETIME(0)`) prevents dictionaries from updating.

You can set a time interval for upgrades, and ClickHouse will choose a uniformly random time within this range. This is necessary in order to distribute the load on the dictionary source when upgrading on a large number of servers.

Expand All @@ -32,6 +39,12 @@ Example of settings:
</dictionary>
```

or

```sql
LIFETIME(MIN 300 MAX 360)
```

When upgrading the dictionaries, the ClickHouse server applies different logic depending on the type of [ source](external_dicts_dict_sources.md):

- For a text file, it checks the time of modification. If the time differs from the previously recorded time, the dictionary is updated.
Expand All @@ -56,5 +69,13 @@ Example of settings:
</dictionary>
```

or

```sql
...
SOURCE(ODBC(... invalidate_query 'SELECT update_time FROM dictionary_source where id = 1'))
...
```


[Original article](https://clickhouse.yandex/docs/en/query_language/dicts/external_dicts_dict_lifetime/) <!--hide-->