Skip to content

Conversation

@zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Apr 1, 2025

rebase #49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt

NOTE: compilation of FE still has error.

@Thearas
Copy link
Contributor

Thearas commented Apr 1, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh zhiqiang-hhhh changed the title Baidu vector index [vector] fix compile on veector-index branch Apr 1, 2025
@yiguolei yiguolei merged commit 6d26690 into apache:vector-index-dev Apr 1, 2025
2 of 3 checks passed
@zhiqiang-hhhh zhiqiang-hhhh deleted the baidu-vector-index branch April 1, 2025 07:55
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request May 20, 2025
rebase apache#49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt

NOTE: compilation of FE still has error.

---------

Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request May 21, 2025
rebase apache#49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt

NOTE: compilation of FE still has error.

---------

Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Jun 9, 2025
rebase apache#49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt

NOTE: compilation of FE still has error.

---------

Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Jun 12, 2025
rebase apache#49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt

NOTE: compilation of FE still has error.

---------

Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Jun 26, 2025
rebase apache#49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt

NOTE: compilation of FE still has error.

---------

Co-authored-by: chenlinzhong <490103404@qq.com>

enable faiss hnsw (apache#49745)

```
CREATE TABLE `vector_table` (
  `siteid` int(11) NULL DEFAULT "10" COMMENT "",
  `embedding` array<float>  NOT NULL  COMMENT "",
  `comment` text NULL,
  INDEX idx_test_ann (`embedding`) USING ANN PROPERTIES(
    "index_type"="hnsw",
    "metric_type"="l2",
    "dim"="8",
    "max_degree"="100") COMMENT 'test diskann index',
  INDEX idx_comment (`comment`) USING INVERTED PROPERTIES("support_phrase" = "true", "parser" = "english", "lower_case" = "true") COMMENT 'inverted index for comment' )
  ENGINE=OLAP duplicate KEY(`siteid`) COMMENT "OLAP" DISTRIBUTED BY HASH(`siteid`) BUCKETS 1 PROPERTIES ( "replication_num" = "1" );

INSERT INTO `vector_table` (`siteid`, `embedding`,`comment`) VALUES
(10, [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,20],"emb1"),
(20, [7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0,30],"emb2")
--------------

Query OK, 2 rows affected (0.07 sec)
{'label':'label_858347013b14baf_b9db5d59b5e30322', 'status':'VISIBLE', 'txnId':'18029'}
```

```
I20250401 19:18:17.977408 3765348 faiss_vector_index.cpp:86] Faiss index saved to faiss.idx, rows 2
```

[fetura](vector) Extend the index interface to support vector indexing. (apache#49780)

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

[feature](index) update phycical plan def for vector index

[fix](inverted index) Fix compilation error (apache#50286)

add is_virtual_slot (apache#50390)

[opt](Nereids) push down virtual column into scan (apache#50521)

[fix](vector) column prune is wrong on virtual column (apache#50550)

[fix](vector) equals and hashcode are wrong in olap relation (apache#50558)

[fix](vector) should retain all virtual columns (apache#50571)

[feature](vector) push down ann topn into scan

[opt](vector) reopen push down virtual column in filter rule

remove all x_ prefix (apache#50995)

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

[opt](vector) refactor some scan code

[feat] VirtualColumn & ANNTopN & AnnRangeSearch (apache#50403)

This is the first version with relatively stable behavior:

* Pass the TPC-H test without crashes.

* Perform range search without crashes.

* Perform ANN Top-N search without crashes.

* Perform compound search (range + Top-N) without crashes.

Some unit tests are unstable; they may be related to the Faiss library.

```text
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] VectorSearchTest.AnnTopNDescriptorEvaluateTopN
[  FAILED  ] VectorSearchTest.CompRangeSearch
[  FAILED  ] VectorSearchTest.RangeSearchNoSelector1
[  FAILED  ] VectorSearchTest.RangeSearchWithSelector1
```

tpch test has some unstable failure:
```text
q13	Error: Failed to execute query q13 (cold run). Output:
ERROR 1105 (HY000) at line 18: errCode = 2, detailMessage = (10.16.10.2)[INTERNAL_ERROR]Parameters start = 0, length = 4064, are out of bound in ColumnVector<T>::insert_range_from method (data.size() = 0).
```

fix virtual column expr context (apache#51085)

failure of tpchq13 is fixed.

[opt](vector) only extract l2_distance from filter

fix rebase master

[fix] fix unstable test case (apache#51159)

Search result of diff faiss index object is not same if we use batch
insert mode.
So i refactored the test case to make sure we can compare result of
native faiss and vector index of doris

[vector search] Step forward on stability and functionality (apache#51213)

A huge step forward on stability and functionality.

1. Search parameters like `ef_search`, can be passed to index as session
variables. This behavior is same with pg-vector and duckdb vector search
plug-in.
2. Correct processing for order by desc. Fallback to brute force search
when it is necessary.
3. Support using inner product as index metric and order by
inner_product.
4. When metrics of sql dismatches with index, fallback to brute force.

1. More unit test
2. Virtual column iterator.
3. According to custom script, result of range search, topn search &
compound search is almost same with native faiss. The overlap rate of
result is more than 90%. The 10% difference is introduced by batch
insert mode of native faiss.

[fix] VirtualColumnIterator should do scatter to input column in its prepare function. (apache#51299)

step forward on vector search (apache#51374)

1. VirtualColumnIterator seek_to_ordinal should check its _max_ordinal.
2. Fix data race of ann_index_reader load_index.

1. Add l2_distance_approximate & inner_product_approximate.
2. Above xxx_approximate function will be pushed to index,
l2_distance/inner_product will do exhaustive search.

1. Rename AnnTopNDescriptor to AnnTopNRuntime

Refactor: remove useless tools

more functionality on ann index (apache#51524)

1. Add check for index properties when creating index.
2. delete some useless cpp files.

Fix multi-threads range search

Fix rebase master compile

[fix](vector) plan conflict with topn lazy materialization

[fix](vector) lost row id slot desc

fix conflict with global lazy materialization (apache#52093)

解决跟 TOP N 全局延迟物化实现上的冲突以及一些DDL的增强。

解决冲突:
1. 在 finalizeForNereids 中为每个虚拟列绑定一个 Column,这个 Column 的 unique id 从
INT32_MAX - 1 开始递减
2. 虚拟列绑定的 Column 的 NAME 必须以 `__DORIS_VIRTUAL_COL__` 开头
3. 类型需要与 VirtualSlot 的输出类型一致
4. 上述绑定必须在 finalizeForNereids 阶段而不是 toThrift 的时候做,不然无法处理
DescriptorTable,并且只有这样才能在 explain 的时候能够看到真正的 unique id
5. `toThrift`阶段把虚拟列的 Column 添加到 ColumnDesc 里面

BE 上删除了虚拟列计算 cid 的一些特殊逻辑,按照普通列去处理。

[feature](score) Add score function (apache#52113)

fix ann index compaction & inner product plan (apache#52150)

DE

FIX COMPACTION & INNER_PRODUCT

FIX DELETE PREDICATE

RM USLESS

ann with fulltext

third party
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants