Skip to content

Commit

Permalink
Replaced HTML code artifacts.
Browse files Browse the repository at this point in the history
  • Loading branch information
johnhummelAltinity committed Sep 7, 2021
1 parent 57e51ea commit 8aa5812
Show file tree
Hide file tree
Showing 14 changed files with 21 additions and 21 deletions.
Expand Up @@ -9,7 +9,7 @@ GCS with the table function - seems to work correctly!
Essentially you can follow the steps from the [Migrating from Amazon S3 to Cloud Storage](https://cloud.google.com/storage/docs/migrating#migration-simple).

1. Set up a GCS bucket.
2. This bucket must be set as part of the default project for the account. This configuration can be found in settings -> interoperability.
3. Generate a HMAC key for the account, can be done in settings -> interoperability, in the section for user account access keys.
2. This bucket must be set as part of the default project for the account. This configuration can be found in settings -> interoperability.
3. Generate a HMAC key for the account, can be done in settings -> interoperability, in the section for user account access keys.
4. In ClickHouse, replace the S3 bucket endpoint with the GCS bucket endpoint This must be done with the path-style GCS endpoint: `https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME`.
5. Replace the aws access key id and aws secret access key with the corresponding parts of the HMAC key.
Expand Up @@ -19,7 +19,7 @@ There is two ways, how you can get access to data:
3. Unclear restart which produced broken files and/or state on disk is differs too much from state in zookeeper for replicated tables. Fix: Create `force_restore_data` flag.
4. Wrong file permission for ClickHouse files in pod. Fix: Use chown to set right ownership for files and directories.
5. Errors in ClickHouse table schema prevents ClickHouse from start. Fix: Rename problematic `table.sql` scripts to `table.sql.bak`
6. Occasional failure of distributed queries because of wrong user/password. Due nature of k8s with dynamic ip allocations, it's possible that ClickHouse would cache wrong ip-&gt; hostname combination and disallow connections because of mismatched hostname. Fix: run `SYSTEM DROP DNS CACHE;` `<disable_internal_dns_cache>1</disable_internal_dns_cache>` in config.xml.
6. Occasional failure of distributed queries because of wrong user/password. Due nature of k8s with dynamic ip allocations, it's possible that ClickHouse would cache wrong ip-> hostname combination and disallow connections because of mismatched hostname. Fix: run `SYSTEM DROP DNS CACHE;` `<disable_internal_dns_cache>1</disable_internal_dns_cache>` in config.xml.

Caveats:

Expand Down
Expand Up @@ -10,7 +10,7 @@ So that works this way:

1. ClickHouse does partition pruning based on `WHERE` conditions.
2. For every partition, it picks a columns ranges (aka 'marks' / 'granulas') based on primary key conditions.
3. Here the sampling logic is applied: a) in case of `SAMPLE k` (`k` in `0..1` range) it adds conditions `WHERE sample_key < k * max_int_of_sample_key_type` b) in case of `SAMPLE k OFFSET m` it adds conditions `WHERE sample_key BETWEEN m * max_int_of_sample_key_type AND (m + k) * max_int_of_sample_key_type`c) in case of `SAMPLE N` (N&gt;1) if first estimates how many rows are inside the range we need to read and based on that convert it to 3a case (calculate k based on number of rows in ranges and desired number of rows)
3. Here the sampling logic is applied: a) in case of `SAMPLE k` (`k` in `0..1` range) it adds conditions `WHERE sample_key < k * max_int_of_sample_key_type` b) in case of `SAMPLE k OFFSET m` it adds conditions `WHERE sample_key BETWEEN m * max_int_of_sample_key_type AND (m + k) * max_int_of_sample_key_type`c) in case of `SAMPLE N` (N>1) if first estimates how many rows are inside the range we need to read and based on that convert it to 3a case (calculate k based on number of rows in ranges and desired number of rows)
4. on the data returned by those other conditions are applied (so here the number of rows can be decreased here)

[Source Code](https://github.com/ClickHouse/ClickHouse/blob/92c937db8b50844c7216d93c5c398d376e82f6c3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L355)
Expand Down
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/atomic-insert.md
Expand Up @@ -8,7 +8,7 @@ Insert would be atomic only if those conditions met:

* Insert data only in single partition.
* Numbers of rows is less than `max_insert_block_size`.
* Table doesn't have Materialized Views (there is no atomicity Table &lt;&gt; MV)
* Table doesn't have Materialized Views (there is no atomicity Table <> MV)
* For TSV, TKSV, CSV, and JSONEachRow formats, setting `input_format_parallel_parsing=0` is set.

[https://github.com/ClickHouse/ClickHouse/issues/9195\#issuecomment-587500824](https://github.com/ClickHouse/ClickHouse/issues/9195\#issuecomment-587500824)
2 changes: 1 addition & 1 deletion content/en/altinity-kb-queries-and-syntax/time-zones.md
Expand Up @@ -11,7 +11,7 @@ Important things to know:
3. Depending on the place where that conversion happened rules of different timezones may be applied.
4. You can check server timezone using `SELECT timezone()`
5. clickhouse-client also by default tries to use server timezone (see also `--use_client_time_zone` flag)
6. If you want you can store the timezone name inside the data type, in that case, timestamp &lt;-&gt; human-readable time rules of that timezone will be applied.
6. If you want you can store the timezone name inside the data type, in that case, timestamp <-> human-readable time rules of that timezone will be applied.

```sql
SELECT
Expand Down
Expand Up @@ -23,4 +23,4 @@ Default comression is LZ4 [https://clickhouse.tech/docs/en/operations/server-con

These TTL rules recompress data after 1 and 6 months.

CODEC(Delta, Default) -- **Default** means to use default compression (LZ4 -&gt; ZSTD1 -&gt; ZSTD6) in this case.
CODEC(Delta, Default) -- **Default** means to use default compression (LZ4 -> ZSTD1 -> ZSTD6) in this case.
Expand Up @@ -57,7 +57,7 @@ Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 192.0
```

1. Using remote(...) table function with secure TCP port (default values is 9440). There is remoteSecure() function for that.
2. High (&gt;50ms) ping between servers, values for `connect_timeout_with_failover_ms,` `connect_timeout_with_failover_secure_ms` need's to be adjusted accordingly.
2. High (>50ms) ping between servers, values for `connect_timeout_with_failover_ms,` `connect_timeout_with_failover_secure_ms` need's to be adjusted accordingly.

Default values:

Expand Down
Expand Up @@ -114,9 +114,9 @@ All users settings don’t need server restart but applied on connect. User need

Most of server settings applied only on a server start, except sections:

1. &lt;remote_servers&gt; (cluster config)
2. &lt;dictionaries&gt; (ext.dictionaries)
3. &lt;max_table_size_to_drop&gt; & &lt;max_partition_size_to_drop&gt;
1. <remote_servers> (cluster config)
2. <dictionaries> (ext.dictionaries)
3. <max_table_size_to_drop> & <max_partition_size_to_drop>

## Dictionaries

Expand Down
Expand Up @@ -26,7 +26,7 @@ JAVA_OPTS="-Xms7G -Xmx7G -XX:+AlwaysPreTouch -Djute.maxbuffer=8388608 -XX:MaxGCP
On JVM 13-14 using `ZGC` or `Shenandoah` garbage collector may reduce pauses.
On older JVM version (before 10) you may want to make some tuning to decrease pauses, ParNew + CMS garbage collectors (like in Yandex config) is one of the best options.

1. One of the most important setting for JVM application is heap size. A heap size of &gt;1 GB is recommended for most use cases and monitoring heap usage to ensure no delays are caused by garbage collection. We recommend to use at least 4Gb of RAM for zookeeper nodes (8Gb is better, that will make difference only when zookeeper is heavily loaded).
1. One of the most important setting for JVM application is heap size. A heap size of >1 GB is recommended for most use cases and monitoring heap usage to ensure no delays are caused by garbage collection. We recommend to use at least 4Gb of RAM for zookeeper nodes (8Gb is better, that will make difference only when zookeeper is heavily loaded).

Set the Java heap size smaller than available RAM size on the node. This is very important to avoid swapping, which will seriously degrade ZooKeeper performance. Be conservative - use a maximum heap size of 3GB for a 4GB machine.

Expand Down
Expand Up @@ -35,4 +35,4 @@ See [altinity-kb-clickhouse-client]({{<ref "altinity-kb-clickhouse-client" >}})

## I Can’t Connect From Other Hosts. What do I do?

Check the &lt;listen&gt; settings in config.xml. Verify that the connection can connect on both IPV4 and IPV6.
Check the <listen> settings in config.xml. Verify that the connection can connect on both IPV4 and IPV6.
Expand Up @@ -198,7 +198,7 @@ The following are recommended Best Practices when it comes to setting up a Click
</yandex>
```
1. Some parts of configuration will contain repeated elements (like allowed ips for all the users). To avoid repeating that - use substitutions file. By default its /etc/metrika.xml, but you can change it for example to /etc/clickhouse-server/substitutions.xml with the &lt;include_from&gt; section of the main config. Put the repeated parts into substitutions file, like this:
1. Some parts of configuration will contain repeated elements (like allowed ips for all the users). To avoid repeating that - use substitutions file. By default its /etc/metrika.xml, but you can change it for example to /etc/clickhouse-server/substitutions.xml with the <include_from> section of the main config. Put the repeated parts into substitutions file, like this:
```markup
<?xml version="1.0"?>
Expand All @@ -221,10 +221,10 @@ This way you have full flexibility; you’re not limited to the settings describ
Other configurations that should be evaluated:
* &lt;listen&gt; in config.xml: Determines which IP addresses and ports the ClickHouse servers listen for incoming communications.
* &lt;max_memory_..&gt; and &lt;max_bytes_before_external_...&gt; in users.xml. These are part of the profile &lt;default&gt;.
* &lt;max_execution_time&gt;
* &lt;log_queries&gt;
* <listen> in config.xml: Determines which IP addresses and ports the ClickHouse servers listen for incoming communications.
* <max_memory_..> and <max_bytes_before_external_...> in users.xml. These are part of the profile <default>.
* <max_execution_time>
* <log_queries>
The following extra debug logs should be considered:
Expand Down
Expand Up @@ -11,7 +11,7 @@ ClickHouse will use all available hardware to maximize performance. So the more
* Minimum Hardware: 4-core CPU with support of SSE4.2, 16 Gb RAM, 1Tb HDD.
* Recommended for development and staging environments.
* SSE4.2 is required, and going below 4 Gb of RAM is not recommended.
* Recommended Hardware: &gt;=16-cores, &gt;=64Gb RAM, HDD-raid or SSD.
* Recommended Hardware: >=16-cores, >=64Gb RAM, HDD-raid or SSD.
* For processing up to hundreds of millions / billions of rows.

For clouds: disk throughput is the more important factor compared to IOPS. Be aware of burst / baseline disk speed difference.
Expand Down
Expand Up @@ -103,7 +103,7 @@ The following health checks should be monitored:
</tr>
<tr>
<td style="text-align:left">Some replication tasks are stuck</td>
<td style="text-align:left">select count()from system.replication_queuewhere num_tries &gt; 100</td>
<td style="text-align:left">select count()from system.replication_queuewhere num_tries > 100</td>
<td
style="text-align:left">High</td>
</tr>
Expand Down
Expand Up @@ -12,7 +12,7 @@ ClickHouse has a registry of parts in ZooKeeper.

And during the start ClickHouse compares that list of parts on a local disk is consistent with a list in ZooKeeper. If the lists are too different ClickHouse denies to start because it could be an issue with settings, wrong Shard or wrong Replica macros. But this safe-limiter throws an exception if the difference is more 50% (in rows).

In your case the table is very small and the difference &gt;50% ( 100.00 vs 150.00 ) is only a single part mismatch, which can be the result of hard restart.
In your case the table is very small and the difference >50% ( 100.00 vs 150.00 ) is only a single part mismatch, which can be the result of hard restart.

```sql
SELECT * FROM system.merge_tree_settings WHERE name = 'replicated_max_ratio_of_wrong_parts'
Expand Down

0 comments on commit 8aa5812

Please sign in to comment.