Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: merge change sets on the dev branch #1423

Merged
merged 38 commits into from
Jan 11, 2024

Conversation

ShiKaiWi
Copy link
Member

@ShiKaiWi ShiKaiWi commented Jan 3, 2024

Rationale

For #1319, some breaking changes are introduced, and for fast development, dev branch is chosen as the base branch for merging all the change sets. And currently, #1319 has been fixed, and the change sets can be merged into the new main branch.

Detailed Changes

Resolve the conflicts and merge the change sets into the main branch.

Test Plan

Should pass the CI and the manual compatibility test.

jiacai2050 and others added 26 commits December 6, 2023 17:51
## Rationale
When channel capacity < read_parallelism, we will pass 0 to channel,
which will cause panic
```
2023-12-05 20:31:32.974 ERRO [components/panic_ext/src/lib.rs:54] thread 'ceres-read' panicked 'mpsc bounded channel requires buffer > 0' at "analytic_engine/src/sst/parquet/async_reader.rs:736"
```

## Detailed Changes
- Ensure channel size non zero

## Test Plan
No need.
## Rationale
Close apache#1178

## Detailed Changes
- Use string to represent request_id, which is uuid v4, random string
like `575c02e1-cd92-4c35-a5f3-353781163e93`
  - https://docs.rs/uuid/1.6.1/uuid/struct.Uuid.html#method.new_v4

## Test Plan
Manually.
## Rationale
Because the logic related to create table has been reconstructed in
`horaemeta` apache/incubator-horaedb-meta#288, the results
returned when reporting errors during create table have changed.
Therefore, we need to correct the results of table creation in the
integration test.

## Detailed Changes
* Update create table result in integration test.

## Test Plan
No need.
## Rationale
In this PR apache#1344, the results of
table creation in the integration test were updated, but it was found
that the integration test failed to pass due to an extra blank row. Fix
this problem.

## Detailed Changes
* Fix create table result.

## Test Plan
No need.
## Rationale


## Detailed Changes


## Test Plan
## Rationale
<del>This issue happens quite a lot in CI, and developer can do nothing
beside retry, which is a very annoying thing. Since this test is cased
by race condition, and there is no easy way to fix it, so I suggest we
disable it for now.
</del>


## Detailed Changes
Increase wait time to 10s.

## Test Plan

CI
## Rationale
The flush failure before table close may lead to shard close failure.
However, such failure is tolerable because the unflushed can still be
recovered during the following table open.

## Detailed Changes
Ignore the flush failure before closing table.

## Test Plan
Pass all the tests in the ci.
…apache#1363)

## Rationale
The metrics about the fetched data from the object storage helps
estimate the load of query on one specific table, and the table's load
distribution can help build a cluster topology with better load balance.

## Detailed Changes
Collect metrics for bytes fetched from object storage.

## Test Plan
Check the added metrics manually.
## Rationale
Metrics about fetched sst bytes with `get_ranges` is not collected.

## Detailed Changes
Collect the missing metrics.

## Test Plan
Query and check the metrics after executing a query.
## Rationale


## Detailed Changes


## Test Plan
Pass CI
… once (apache#1369)

## Rationale
The stats about the number of bytes fetched from object store should not
include the low-frequency reading, e.g. compaction because such stats
are used to show the query load distribution across the tables.

## Detailed Changes
Ignore collecting the fetched bytes stats in the low-frequency reading.

## Test Plan
The ci's tests should pass.
## Rationale
HTTP API is mainly used for debugging, and should not be blocked.

## Detailed Changes


## Test Plan
Pass CI
…che#1365)

## Rationale
Close apache#1105

## Detailed Changes
- Reduce the parameters in the sst write path
- Avoid building dictionary for massive unique column values

## Test Plan
- New unit test for the changeset
- Observe the metrics for disable/enable dictionary encoding
## Rationale
The codes about deciding whether to do metrics collection according to
the read frequency are duplicate.

## Detailed Changes
Remove the duplicate codes.

## Test Plan
CI.
…pache#1372)

## Rationale
The column value set has been summarized into the meta data if the
column is low distinct. With such information, the sampling for the
columns can be skipped.

## Detailed Changes
Skip sampling over the low-cardinality columns.

## Test Plan
Updated the unit tests.
## Rationale
apache#1003 tries to avoid frequent flush requests which may generate massive
small ssts, but the write stall is also removed in the normal write
path.

## Detailed Changes
Introduce the `min_flush_interval` to avoid frequent flush requests and
recover the write stall mechanism.

## Test Plan
Add unit tests for the frequent flush check.
## Rationale
Some logs about query are verbose and some key logs about opening shard
are missing.

## Detailed Changes
Remove verbose logs and add missing key logs.

## Test Plan
CI.
…le (apache#1307)

## Rationale
Closes apache#1302 

The pulling arrow record batches are ensured to include primary key
columns, however the pulled primary key columns are unused for append
mode tables' queries.
I refactor the whole record batches pulling path in this pr for
readability and enhancement for avoiding pulling primary key columns
even they are unused.

## Detailed Changes
+ Refactor `RowProjector` to `RecordFetchingContext` holding just the
needed information, and pass it to `ScanRequest` & `SstReadOptions`
rather than the too heavy `ProjectedSchema`.
+ Refactor `RecordBatchWithKey` to `FetchingRecordBatch` which holds the
primary indexes on demand.

## Test Plan
Test by exist and new added tests.
## Rationale
Reduce two `match`to only one.

## Detailed Changes


## Test Plan
CI
## Rationale
When data wal is disable, data is still encoded, which waste cpu usage.

## Detailed Changes
Skip encode when data wal is disabled.

## Test Plan
CI.
## Rationale
Currently there are no error log for remote server, this make it's hard
to debug.

## Detailed Changes


## Test Plan
No need.
## Rationale
See apache#1405

## Detailed Changes
Disable percentile functions

## Test Plan
CI
## Rationale
Currently, the analyze sql can not obtain detailed metrics of partitioned
table.

## Detailed Changes
Return metrics of partitioned table to remote client and then collect
metrics in client.

## Test Plan
- Existing tests
- add new integration tests for explain analyze
## Rationale
Close apache#1299

## Detailed Changes
- Add PriorityRuntime component, and use in read API
- In normal query, its plan will be executed in higher runtime by
default, when executor decide query is expensive, then it will spawn
`stream.poll` in another lower runtime.
- In distributed query, a priority field is added in remote query
request, so it can decide which runtime to run on.


## Test Plan
Newly added UT
## Rationale
In apache#1260, we implemented distributed analyze, but for query that are not
analyze, metrics will be returned, which will lead to a decrease in
query performance. Therefore, we will fix it in this PR, and metrics
will not be returned for normal queries.

## Detailed Changes
- Add is_analyze field to determine whether it is analyze

## Test Plan
Existing tests

---------

Co-authored-by: jiacai2050 <dev@liujiacai.net>
@ShiKaiWi ShiKaiWi changed the title Merge dev to main chore: merge change sets on the dev branch Jan 3, 2024
## Rationale
Query with long time range usually cost too much resources, which affect
stable of the whole cluster

## Detailed Changes
- Support block query by query range


## Test Plan
Manually
```bash

curl 0:5000/admin/block -H 'content-type: application/json' -d '
{
  "operation": "Set",
  "write_block_list": [],
  "read_block_list": [],
  "block_rules": [
    {"type": "QueryRange", "content": "24h"}
  ]
}'
```
## Rationale


## Detailed Changes
- Attach endpoint to remote error

## Test Plan
CI
## Rationale
See apache#1040

## Detailed Changes
- Try load page indexes 

## Test Plan
CI
ZuLiangWang and others added 6 commits January 8, 2024 09:57
## Rationale
Refer to this issue
apache#1386, currently, if
the status of the shard is abnormal, we cannot get any valid exception
information from the error message `table not found`.

## Detailed Changes
* Add `TableStatus` in `cluster`, you can use it to get the status of
the table in the current cluster..
* Add `SchemaWithCluster`, It wraps the schema inside the cluster,
through which the state of the cluster and schema can be combined.

## Test Plan
Pass CI.
…pache#1271)

## Rationale
Conversion from row format in memtable to record batch in datafusion has
been found a cpu bottleneck in production. For reduce the cpu cost, I
impl the layered memtable framework to support gradually conversion
during normal write path(and before flush).

## Detailed Changes
+ Impl layered memtable framework
+ Integrate it into the write path.

## Test Plan
Test by new ut and it.
## Rationale


## Detailed Changes
- Disable seq check when wal is disabled
- Fix request id in remote query.

## Test Plan
…ery process (apache#1431)

## Rationale
When there is a cache miss in disk cache, it will 
1. Fetch data from remote
2. Insert data to cache, which will incur disk IO
3. Return the data for query.

We can move the second step to another thread to avoid it blocking the
normal query process.

## Detailed Changes
- Make write disk nonblocking
- Block on test explicitly, otherwise it will throw errors below

> Cannot drop a runtime in a context where blocking is not allowed. This
happens when a runtime is dropped from within an asynchronous context.


## Test Plan
CI
## Rationale
Make it compatible for old table options.

## Detailed Changes
When `layered_memtable_opts` not found in `TableOptions`, we disable
`layered_memtable`.

## Test Plan
Test manually.
@ShiKaiWi
Copy link
Member Author

ShiKaiWi commented Jan 9, 2024

@jiacai2050 When resolving the conflicts, a field called request_id_str is found, which is quite not a good name: https://github.com/apache/incubator-horaedb-proto/blob/4a6f323b892c5944acdcf5447a3cc1e0c18f6e16/protos/engine/remote_engine.proto#L178C3-L178C29.
I guess a better name is necessary before this pr is merged.

@ShiKaiWi ShiKaiWi marked this pull request as ready for review January 9, 2024 12:50
chunshao90
chunshao90 previously approved these changes Jan 10, 2024
Copy link
Contributor

@chunshao90 chunshao90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShiKaiWi ShiKaiWi dismissed chunshao90’s stale review January 10, 2024 15:08

Shouldn't approve until we can choose not to squash merge.

Copy link
Contributor

@jiacai2050 jiacai2050 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiacai2050 jiacai2050 merged commit 7b7eb1b into apache:main Jan 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants