chore: merge change sets on the dev branch #1423

ShiKaiWi · 2024-01-03T09:58:20Z

Rationale

For #1319, some breaking changes are introduced, and for fast development, dev branch is chosen as the base branch for merging all the change sets. And currently, #1319 has been fixed, and the change sets can be merged into the new main branch.

Detailed Changes

Resolve the conflicts and merge the change sets into the main branch.

Test Plan

Should pass the CI and the manual compatibility test.

## Rationale When channel capacity < read_parallelism, we will pass 0 to channel, which will cause panic ``` 2023-12-05 20:31:32.974 ERRO [components/panic_ext/src/lib.rs:54] thread 'ceres-read' panicked 'mpsc bounded channel requires buffer > 0' at "analytic_engine/src/sst/parquet/async_reader.rs:736" ``` ## Detailed Changes - Ensure channel size non zero ## Test Plan No need.

## Rationale Close apache#1178 ## Detailed Changes - Use string to represent request_id, which is uuid v4, random string like `575c02e1-cd92-4c35-a5f3-353781163e93` - https://docs.rs/uuid/1.6.1/uuid/struct.Uuid.html#method.new_v4 ## Test Plan Manually.

## Rationale Because the logic related to create table has been reconstructed in `horaemeta` apache/incubator-horaedb-meta#288, the results returned when reporting errors during create table have changed. Therefore, we need to correct the results of table creation in the integration test. ## Detailed Changes * Update create table result in integration test. ## Test Plan No need.

## Rationale In this PR apache#1344, the results of table creation in the integration test were updated, but it was found that the integration test failed to pass due to an extra blank row. Fix this problem. ## Detailed Changes * Fix create table result. ## Test Plan No need.

Reverts apache#1354

## Rationale ## Detailed Changes ## Test Plan

## Rationale <del>This issue happens quite a lot in CI, and developer can do nothing beside retry, which is a very annoying thing. Since this test is cased by race condition, and there is no easy way to fix it, so I suggest we disable it for now. </del> ## Detailed Changes Increase wait time to 10s. ## Test Plan CI

## Rationale The flush failure before table close may lead to shard close failure. However, such failure is tolerable because the unflushed can still be recovered during the following table open. ## Detailed Changes Ignore the flush failure before closing table. ## Test Plan Pass all the tests in the ci.

…apache#1363) ## Rationale The metrics about the fetched data from the object storage helps estimate the load of query on one specific table, and the table's load distribution can help build a cluster topology with better load balance. ## Detailed Changes Collect metrics for bytes fetched from object storage. ## Test Plan Check the added metrics manually.

## Rationale Metrics about fetched sst bytes with `get_ranges` is not collected. ## Detailed Changes Collect the missing metrics. ## Test Plan Query and check the metrics after executing a query.

## Rationale ## Detailed Changes ## Test Plan Pass CI

… once (apache#1369) ## Rationale The stats about the number of bytes fetched from object store should not include the low-frequency reading, e.g. compaction because such stats are used to show the query load distribution across the tables. ## Detailed Changes Ignore collecting the fetched bytes stats in the low-frequency reading. ## Test Plan The ci's tests should pass.

## Rationale HTTP API is mainly used for debugging, and should not be blocked. ## Detailed Changes ## Test Plan Pass CI

…che#1365) ## Rationale Close apache#1105 ## Detailed Changes - Reduce the parameters in the sst write path - Avoid building dictionary for massive unique column values ## Test Plan - New unit test for the changeset - Observe the metrics for disable/enable dictionary encoding

## Rationale The codes about deciding whether to do metrics collection according to the read frequency are duplicate. ## Detailed Changes Remove the duplicate codes. ## Test Plan CI.

…pache#1372) ## Rationale The column value set has been summarized into the meta data if the column is low distinct. With such information, the sampling for the columns can be skipped. ## Detailed Changes Skip sampling over the low-cardinality columns. ## Test Plan Updated the unit tests.

## Rationale apache#1003 tries to avoid frequent flush requests which may generate massive small ssts, but the write stall is also removed in the normal write path. ## Detailed Changes Introduce the `min_flush_interval` to avoid frequent flush requests and recover the write stall mechanism. ## Test Plan Add unit tests for the frequent flush check.

## Rationale Some logs about query are verbose and some key logs about opening shard are missing. ## Detailed Changes Remove verbose logs and add missing key logs. ## Test Plan CI.

…le (apache#1307) ## Rationale Closes apache#1302 The pulling arrow record batches are ensured to include primary key columns, however the pulled primary key columns are unused for append mode tables' queries. I refactor the whole record batches pulling path in this pr for readability and enhancement for avoiding pulling primary key columns even they are unused. ## Detailed Changes + Refactor `RowProjector` to `RecordFetchingContext` holding just the needed information, and pass it to `ScanRequest` & `SstReadOptions` rather than the too heavy `ProjectedSchema`. + Refactor `RecordBatchWithKey` to `FetchingRecordBatch` which holds the primary indexes on demand. ## Test Plan Test by exist and new added tests.

## Rationale Reduce two `match`to only one. ## Detailed Changes ## Test Plan CI

## Rationale When data wal is disable, data is still encoded, which waste cpu usage. ## Detailed Changes Skip encode when data wal is disabled. ## Test Plan CI.

## Rationale Currently there are no error log for remote server, this make it's hard to debug. ## Detailed Changes ## Test Plan No need.

## Rationale See apache#1405 ## Detailed Changes Disable percentile functions ## Test Plan CI

## Rationale Currently, the analyze sql can not obtain detailed metrics of partitioned table. ## Detailed Changes Return metrics of partitioned table to remote client and then collect metrics in client. ## Test Plan - Existing tests - add new integration tests for explain analyze

## Rationale Close apache#1299 ## Detailed Changes - Add PriorityRuntime component, and use in read API - In normal query, its plan will be executed in higher runtime by default, when executor decide query is expensive, then it will spawn `stream.poll` in another lower runtime. - In distributed query, a priority field is added in remote query request, so it can decide which runtime to run on. ## Test Plan Newly added UT

## Rationale In apache#1260, we implemented distributed analyze, but for query that are not analyze, metrics will be returned, which will lead to a decrease in query performance. Therefore, we will fix it in this PR, and metrics will not be returned for normal queries. ## Detailed Changes - Add is_analyze field to determine whether it is analyze ## Test Plan Existing tests --------- Co-authored-by: jiacai2050 <dev@liujiacai.net>

## Rationale Query with long time range usually cost too much resources, which affect stable of the whole cluster ## Detailed Changes - Support block query by query range ## Test Plan Manually ```bash curl 0:5000/admin/block -H 'content-type: application/json' -d ' { "operation": "Set", "write_block_list": [], "read_block_list": [], "block_rules": [ {"type": "QueryRange", "content": "24h"} ] }' ```

## Rationale ## Detailed Changes - Attach endpoint to remote error ## Test Plan CI

## Rationale See apache#1040 ## Detailed Changes - Try load page indexes ## Test Plan CI

## Rationale Refer to this issue apache#1386, currently, if the status of the shard is abnormal, we cannot get any valid exception information from the error message `table not found`. ## Detailed Changes * Add `TableStatus` in `cluster`, you can use it to get the status of the table in the current cluster.. * Add `SchemaWithCluster`, It wraps the schema inside the cluster, through which the state of the cluster and schema can be combined. ## Test Plan Pass CI.

…pache#1271) ## Rationale Conversion from row format in memtable to record batch in datafusion has been found a cpu bottleneck in production. For reduce the cpu cost, I impl the layered memtable framework to support gradually conversion during normal write path(and before flush). ## Detailed Changes + Impl layered memtable framework + Integrate it into the write path. ## Test Plan Test by new ut and it.

## Rationale ## Detailed Changes - Disable seq check when wal is disabled - Fix request id in remote query. ## Test Plan

…ery process (apache#1431) ## Rationale When there is a cache miss in disk cache, it will 1. Fetch data from remote 2. Insert data to cache, which will incur disk IO 3. Return the data for query. We can move the second step to another thread to avoid it blocking the normal query process. ## Detailed Changes - Make write disk nonblocking - Block on test explicitly, otherwise it will throw errors below > Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context. ## Test Plan CI

## Rationale Make it compatible for old table options. ## Detailed Changes When `layered_memtable_opts` not found in `TableOptions`, we disable `layered_memtable`. ## Test Plan Test manually.

ShiKaiWi · 2024-01-09T12:41:31Z

@jiacai2050 When resolving the conflicts, a field called request_id_str is found, which is quite not a good name: https://github.com/apache/incubator-horaedb-proto/blob/4a6f323b892c5944acdcf5447a3cc1e0c18f6e16/protos/engine/remote_engine.proto#L178C3-L178C29.
I guess a better name is necessary before this pr is merged.

chunshao90

LGTM

Shouldn't approve until we can choose not to squash merge.

jiacai2050

LGTM

jiacai2050 and others added 26 commits December 6, 2023 17:51

Revert "fix: fix create table result" (apache#1355)

f164cda

Reverts apache#1354

fix: fix test create table result (apache#1357)

b5265fa

## Rationale ## Detailed Changes ## Test Plan

fix: collect metrics for get_ranges (apache#1364)

18a59b6

## Rationale Metrics about fetched sst bytes with `get_ranges` is not collected. ## Detailed Changes Collect the missing metrics. ## Test Plan Query and check the metrics after executing a query.

chore: disable timeout for http api (apache#1367)

1d8593f

## Rationale ## Detailed Changes ## Test Plan Pass CI

chore: disable block for http api (apache#1368)

f41ad89

## Rationale HTTP API is mainly used for debugging, and should not be blocked. ## Detailed Changes ## Test Plan Pass CI

refactor: avoid duplicate codes (apache#1371)

c5c01af

## Rationale The codes about deciding whether to do metrics collection according to the read frequency are duplicate. ## Detailed Changes Remove the duplicate codes. ## Test Plan CI.

fix: missing and verbose logs (apache#1398)

da4e7ea

## Rationale Some logs about query are verbose and some key logs about opening shard are missing. ## Detailed Changes Remove verbose logs and add missing key logs. ## Test Plan CI.

chore: refactor for better readability (apache#1400)

bb4db60

## Rationale Reduce two `match`to only one. ## Detailed Changes ## Test Plan CI

fix: skip wal encoding when data wal is disabled (apache#1401)

3359a9a

## Rationale When data wal is disable, data is still encoded, which waste cpu usage. ## Detailed Changes Skip encode when data wal is disabled. ## Test Plan CI.

chore: add error log for remote server (apache#1407)

9974944

## Rationale Currently there are no error log for remote server, this make it's hard to debug. ## Detailed Changes ## Test Plan No need.

fix: disable percentile for distributed tables (apache#1406)

b3fd459

## Rationale See apache#1405 ## Detailed Changes Disable percentile functions ## Test Plan CI

ShiKaiWi changed the title ~~Merge dev to main~~ chore: merge change sets on the dev branch Jan 3, 2024

jiacai2050 added 3 commits January 4, 2024 14:42

chore: fix error message (apache#1412)

94673bd

## Rationale ## Detailed Changes - Attach endpoint to remote error ## Test Plan CI

feat: try load page indexes (apache#1425)

5377dfd

## Rationale See apache#1040 ## Detailed Changes - Try load page indexes ## Test Plan CI

ZuLiangWang and others added 6 commits January 8, 2024 09:57

chore: skip wal seq check when wal is disabled (apache#1430)

2e900ea

## Rationale ## Detailed Changes - Disable seq check when wal is disabled - Fix request id in remote query. ## Test Plan

fix: compatible for old table options (apache#1432)

74c5641

## Rationale Make it compatible for old table options. ## Detailed Changes When `layered_memtable_opts` not found in `TableOptions`, we disable `layered_memtable`. ## Test Plan Test manually.

Merge branch 'dev' into merge-dev-to-main

56e7d6d

ShiKaiWi force-pushed the merge-dev-to-main branch from da030c6 to 56e7d6d Compare January 9, 2024 12:39

ShiKaiWi requested review from tanruixiang, Rachelint and chunshao90 January 9, 2024 12:40

fix: invalid license header

a7fe3a4

ShiKaiWi marked this pull request as ready for review January 9, 2024 12:50

ShiKaiWi added 2 commits January 10, 2024 10:26

fix: upgrade the horaedb proto to latest version

f5b6b4f

fix: update the integration test results

8675544

ShiKaiWi force-pushed the merge-dev-to-main branch from 2dc43e4 to 8675544 Compare January 10, 2024 09:39

chunshao90 previously approved these changes Jan 10, 2024

View reviewed changes

ShiKaiWi requested a review from chunshao90 January 10, 2024 15:06

jiacai2050 approved these changes Jan 11, 2024

View reviewed changes

jiacai2050 merged commit 7b7eb1b into apache:main Jan 11, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: merge change sets on the dev branch #1423

chore: merge change sets on the dev branch #1423

ShiKaiWi commented Jan 3, 2024 •

edited

ShiKaiWi commented Jan 9, 2024 •

edited

chunshao90 left a comment

jiacai2050 left a comment

chore: merge change sets on the dev branch #1423

chore: merge change sets on the dev branch #1423

Conversation

ShiKaiWi commented Jan 3, 2024 • edited

Rationale

Detailed Changes

Test Plan

ShiKaiWi commented Jan 9, 2024 • edited

chunshao90 left a comment

Choose a reason for hiding this comment

jiacai2050 left a comment

Choose a reason for hiding this comment

ShiKaiWi commented Jan 3, 2024 •

edited

ShiKaiWi commented Jan 9, 2024 •

edited