TimescaleDB 发布基于DiskANN的增强向量索引

作者

digoal

日期

2023-09-28

背景

https://www.timescale.com/blog/how-we-made-postgresql-the-best-vector-database/

Introducing Timescale Vector, PostgreSQL++ for production AI applications. Timescale Vector enhances pgvector with faster search, higher recall, and more efficient time-based filtering, making PostgreSQL your new go-to vector database. Timescale Vector is available today in early access on Timescale’s cloud data platform. Keep reading to learn why and how we built it. Then take it out for a ride: try Timescale Vector for free today, with a 90-day extended trial.

https://www.microsoft.com/en-us/research/project/project-akupara-approximate-nearest-neighbor-search-for-large-scale-semantic-search/

https://github.com/Microsoft/DiskANN

Deep Learning-based embeddings are used widely for “dense retrieval” in information retrieval, computer vision, NLP, amongst others, owing to capture diverse types of semantic information. This paradigm constructs embeddings so that semantically similar items are closer in a high dimensional metric space. The first step to enabling search and recommendation with such embeddings is to index the embeddings of the corpus and support approximate nearest-neighbor search (ANNS) a.k.a. Vector Search for query embeddings. While ANNS is a fundamental problem has been studied for decades, existing algorithms suffer from two main drawbacks: either their search accuracies are low, thereby affecting the quality of results downstream, or their memory (DRAM) footprint is enormous, making it hard to serve them at web scale.

In this project, we are designing algorithms to address the challenges of scaling ANNS for web and enterprise search and recommendation systems. Our goal is to build systems that serve trillions of points in a streaming setting cost effectively. Below is a summary of the associated research directions:

DiskANN:(opens in new tab) an ANNS algorithm which can achieve both high accuracy as well as low DRAM footprint, by suitably using auxilliary SSD storage, which is significantly more cost-effective than DRAM. Using DiskANN, we can index 5-10X more points per machine than the state-of-the-art DRAM-based solutions: e.g., DiskANN can index upto a billion vectors while achieving 95% search accuracy with 5ms latencies, while existing DRAM-based algorithms peak at 100-200M points for similar latency and accuracy.

号称可以轻松支持10亿级别向量, 索引相比pgvector hnsw占用空间小至十分之一, 性能略优于pgvector hnsw, build时间比pgvector略快. 当前仅支持timescaledb cloud版本体验.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20230928_01.md

20230928_01.md

TimescaleDB 发布基于DiskANN的增强向量索引

作者

日期

标签

背景

期望 PostgreSQL|开源PolarDB 增加什么功能?

PolarDB 云原生分布式开源数据库

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、内核开发公开课、生态合作、写心得拿奖品

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥

Files

20230928_01.md

Latest commit

History

20230928_01.md

File metadata and controls

TimescaleDB 发布基于DiskANN的增强向量索引

作者

日期

标签

背景

期望 PostgreSQL|开源PolarDB 增加什么功能?

PolarDB 云原生分布式开源数据库

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、内核开发公开课、生态合作、写心得拿奖品

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥