-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[vector] fix compile on veector-index branch #49733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
yiguolei
merged 10 commits into
apache:vector-index-dev
from
zhiqiang-hhhh:baidu-vector-index
Apr 1, 2025
Merged
[vector] fix compile on veector-index branch #49733
yiguolei
merged 10 commits into
apache:vector-index-dev
from
zhiqiang-hhhh:baidu-vector-index
Apr 1, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…s into baidu-vector-index
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
yiguolei
approved these changes
Apr 1, 2025
zhiqiang-hhhh
added a commit
to zhiqiang-hhhh/doris
that referenced
this pull request
May 20, 2025
rebase apache#49703 on master rm diskann src code (impl is kept as reference) fix BE comple fix fmt NOTE: compilation of FE still has error. --------- Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh
added a commit
to zhiqiang-hhhh/doris
that referenced
this pull request
May 21, 2025
rebase apache#49703 on master rm diskann src code (impl is kept as reference) fix BE comple fix fmt NOTE: compilation of FE still has error. --------- Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh
added a commit
to zhiqiang-hhhh/doris
that referenced
this pull request
Jun 9, 2025
rebase apache#49703 on master rm diskann src code (impl is kept as reference) fix BE comple fix fmt NOTE: compilation of FE still has error. --------- Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh
added a commit
to zhiqiang-hhhh/doris
that referenced
this pull request
Jun 12, 2025
rebase apache#49703 on master rm diskann src code (impl is kept as reference) fix BE comple fix fmt NOTE: compilation of FE still has error. --------- Co-authored-by: chenlinzhong <490103404@qq.com>
zhiqiang-hhhh
added a commit
to zhiqiang-hhhh/doris
that referenced
this pull request
Jun 26, 2025
rebase apache#49703 on master rm diskann src code (impl is kept as reference) fix BE comple fix fmt NOTE: compilation of FE still has error. --------- Co-authored-by: chenlinzhong <490103404@qq.com> enable faiss hnsw (apache#49745) ``` CREATE TABLE `vector_table` ( `siteid` int(11) NULL DEFAULT "10" COMMENT "", `embedding` array<float> NOT NULL COMMENT "", `comment` text NULL, INDEX idx_test_ann (`embedding`) USING ANN PROPERTIES( "index_type"="hnsw", "metric_type"="l2", "dim"="8", "max_degree"="100") COMMENT 'test diskann index', INDEX idx_comment (`comment`) USING INVERTED PROPERTIES("support_phrase" = "true", "parser" = "english", "lower_case" = "true") COMMENT 'inverted index for comment' ) ENGINE=OLAP duplicate KEY(`siteid`) COMMENT "OLAP" DISTRIBUTED BY HASH(`siteid`) BUCKETS 1 PROPERTIES ( "replication_num" = "1" ); INSERT INTO `vector_table` (`siteid`, `embedding`,`comment`) VALUES (10, [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,20],"emb1"), (20, [7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0,30],"emb2") -------------- Query OK, 2 rows affected (0.07 sec) {'label':'label_858347013b14baf_b9db5d59b5e30322', 'status':'VISIBLE', 'txnId':'18029'} ``` ``` I20250401 19:18:17.977408 3765348 faiss_vector_index.cpp:86] Faiss index saved to faiss.idx, rows 2 ``` [fetura](vector) Extend the index interface to support vector indexing. (apache#49780) Issue Number: close #xxx Related PR: #xxx Problem Summary: None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> [feature](index) update phycical plan def for vector index [fix](inverted index) Fix compilation error (apache#50286) add is_virtual_slot (apache#50390) [opt](Nereids) push down virtual column into scan (apache#50521) [fix](vector) column prune is wrong on virtual column (apache#50550) [fix](vector) equals and hashcode are wrong in olap relation (apache#50558) [fix](vector) should retain all virtual columns (apache#50571) [feature](vector) push down ann topn into scan [opt](vector) reopen push down virtual column in filter rule remove all x_ prefix (apache#50995) Issue Number: close #xxx Related PR: #xxx Problem Summary: None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> [opt](vector) refactor some scan code [feat] VirtualColumn & ANNTopN & AnnRangeSearch (apache#50403) This is the first version with relatively stable behavior: * Pass the TPC-H test without crashes. * Perform range search without crashes. * Perform ANN Top-N search without crashes. * Perform compound search (range + Top-N) without crashes. Some unit tests are unstable; they may be related to the Faiss library. ```text [ FAILED ] 4 tests, listed below: [ FAILED ] VectorSearchTest.AnnTopNDescriptorEvaluateTopN [ FAILED ] VectorSearchTest.CompRangeSearch [ FAILED ] VectorSearchTest.RangeSearchNoSelector1 [ FAILED ] VectorSearchTest.RangeSearchWithSelector1 ``` tpch test has some unstable failure: ```text q13 Error: Failed to execute query q13 (cold run). Output: ERROR 1105 (HY000) at line 18: errCode = 2, detailMessage = (10.16.10.2)[INTERNAL_ERROR]Parameters start = 0, length = 4064, are out of bound in ColumnVector<T>::insert_range_from method (data.size() = 0). ``` fix virtual column expr context (apache#51085) failure of tpchq13 is fixed. [opt](vector) only extract l2_distance from filter fix rebase master [fix] fix unstable test case (apache#51159) Search result of diff faiss index object is not same if we use batch insert mode. So i refactored the test case to make sure we can compare result of native faiss and vector index of doris [vector search] Step forward on stability and functionality (apache#51213) A huge step forward on stability and functionality. 1. Search parameters like `ef_search`, can be passed to index as session variables. This behavior is same with pg-vector and duckdb vector search plug-in. 2. Correct processing for order by desc. Fallback to brute force search when it is necessary. 3. Support using inner product as index metric and order by inner_product. 4. When metrics of sql dismatches with index, fallback to brute force. 1. More unit test 2. Virtual column iterator. 3. According to custom script, result of range search, topn search & compound search is almost same with native faiss. The overlap rate of result is more than 90%. The 10% difference is introduced by batch insert mode of native faiss. [fix] VirtualColumnIterator should do scatter to input column in its prepare function. (apache#51299) step forward on vector search (apache#51374) 1. VirtualColumnIterator seek_to_ordinal should check its _max_ordinal. 2. Fix data race of ann_index_reader load_index. 1. Add l2_distance_approximate & inner_product_approximate. 2. Above xxx_approximate function will be pushed to index, l2_distance/inner_product will do exhaustive search. 1. Rename AnnTopNDescriptor to AnnTopNRuntime Refactor: remove useless tools more functionality on ann index (apache#51524) 1. Add check for index properties when creating index. 2. delete some useless cpp files. Fix multi-threads range search Fix rebase master compile [fix](vector) plan conflict with topn lazy materialization [fix](vector) lost row id slot desc fix conflict with global lazy materialization (apache#52093) 解决跟 TOP N 全局延迟物化实现上的冲突以及一些DDL的增强。 解决冲突: 1. 在 finalizeForNereids 中为每个虚拟列绑定一个 Column,这个 Column 的 unique id 从 INT32_MAX - 1 开始递减 2. 虚拟列绑定的 Column 的 NAME 必须以 `__DORIS_VIRTUAL_COL__` 开头 3. 类型需要与 VirtualSlot 的输出类型一致 4. 上述绑定必须在 finalizeForNereids 阶段而不是 toThrift 的时候做,不然无法处理 DescriptorTable,并且只有这样才能在 explain 的时候能够看到真正的 unique id 5. `toThrift`阶段把虚拟列的 Column 添加到 ColumnDesc 里面 BE 上删除了虚拟列计算 cid 的一些特殊逻辑,按照普通列去处理。 [feature](score) Add score function (apache#52113) fix ann index compaction & inner product plan (apache#52150) DE FIX COMPACTION & INNER_PRODUCT FIX DELETE PREDICATE RM USLESS ann with fulltext third party
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
rebase #49703 on master
rm diskann src code (impl is kept as reference)
fix BE comple
fix fmt
NOTE: compilation of FE still has error.