Skip to content

Latest commit

 

History

History
132 lines (87 loc) · 5.5 KB

20240108_03.md

File metadata and controls

132 lines (87 loc) · 5.5 KB

pg_vectorize - 结合pgvector, openAI的 DB&AI 工具

作者

digoal

日期

2024-01-08

标签

PostgreSQL , PolarDB , DuckDB , db4ai , ai4db , pgvector , openAI


背景

pg_vectorize

The simplest way to do vector search in Postgres. Vectorize is a Postgres extension that automates that the transformation and orchestration of text to embeddings, allowing you to do vector and semantic search on existing data with as little as two function calls.

通义千问也有类似的API 可以直接讲text转换为向量, 参考: 《沉浸式学习PostgreSQL|PolarDB 16: 植入通义千问大模型+文本向量化模型, 让数据库具备AI能力》

One function call to initialize your data. Another function call to search.

Postgres Extensions:

  • pg_cron == 1.5
  • pgmq >= 0.30.0
  • pgvector >= 0.5.0

And you'll need an OpenAI key:

  • openai API key

安装

克隆项目

docker exec -ti pg bash              
              
              
cd /tmp              
git clone --depth 1 https://github.com/tembo-io/pg_vectorize.git         

配置cargo源, 参考: https://mirrors.ustc.edu.cn/help/crates.io-index.html

# export CARGO_HOME=/root                
                
# mkdir -vp ${CARGO_HOME:-$HOME/.cargo}                
                
# vi ${CARGO_HOME:-$HOME/.cargo}/config                
              
[source.crates-io]                
replace-with = 'ustc'                
                
[source.ustc]                
registry = "sparse+https://mirrors.ustc.edu.cn/crates.io-index/"                

安装插件

cd /tmp/pg_vectorize             
          
grep pgrx Cargo.toml    # 返回pgrx版本        
          
cargo install --locked --version 0.11.0 cargo-pgrx          
              
cargo pgrx init        # create PGRX_HOME 后, 立即ctrl^c 退出              
cargo pgrx init --pg14=`which pg_config`      # 不用管报警              
              
PGRX_IGNORE_RUST_VERSIONS=y cargo pgrx install --pg-config `which pg_config`               

如果遇到错误 error: none of the selected packages contains these features: pg14, did you mean: pg15?

vi Cargo.toml   
  
[features]  
default = ["pg15"]  
pg15 = ["pgrx/pg15", "pgrx-tests/pg15"]  
# 添加一行14的    
pg14 = ["pgrx/pg14", "pgrx-tests/pg14"]    
pg_test = []  
    Finished dev [unoptimized + debuginfo] target(s) in 8m 36s  
  Installing extension  
     Copying control file to /usr/share/postgresql/14/extension/vectorize.control  
     Copying shared library to /usr/lib/postgresql/14/lib/vectorize.so  
 Discovering SQL entities  
  Discovered 10 SQL entities: 0 schemas (0 unique), 4 functions, 0 types, 4 enums, 2 sqls, 0 ords, 0 hashes, 0 aggregates, 0 triggers  
     Writing SQL entities to /usr/share/postgresql/14/extension/vectorize--0.7.0.sql  
     Copying extension schema upgrade file to /usr/share/postgresql/14/extension/vectorize--0.3.0--0.4.0.sql  
     Copying extension schema upgrade file to /usr/share/postgresql/14/extension/vectorize--0.4.0--0.5.0.sql  
     Copying extension schema upgrade file to /usr/share/postgresql/14/extension/vectorize--0.7.0--0.7.1.sql  
     Copying extension schema upgrade file to /usr/share/postgresql/14/extension/vectorize--0.6.0--0.6.1.sql  
     Copying extension schema upgrade file to /usr/share/postgresql/14/extension/vectorize--0.5.0--0.6.0.sql  
     Copying extension schema upgrade file to /usr/share/postgresql/14/extension/vectorize--0.2.0--0.3.0.sql  
    Finished installing vectorize  

打包插件文件

docker cp pg:/usr/share/postgresql/14/extension/vectorize.control  ~/pg14_amd64/      
docker cp pg:/usr/lib/postgresql/14/lib/vectorize.so  ~/pg14_amd64/      
docker cp pg:/usr/share/postgresql/14/extension/vectorize--0.7.0.sql  ~/pg14_amd64/      

参考

https://github.com/tembo-io/pg_vectorize

digoal's wechat