Skip to content

Latest commit

 

History

History
281 lines (190 loc) · 10.1 KB

20230912_01.md

File metadata and controls

281 lines (190 loc) · 10.1 KB

制作 PostgresML docker 镜像

作者

digoal

日期

2023-09-12

标签

PostgreSQL , PolarDB , 机器学习 , ai , 大模型 , 模型集市 , 向量数据库 , 向量 , 图像搜索 , 推荐系统 , 自然语言处理 , 情感分析 , 模型训练 , 谎言分析 , 消除重复问题 , 语义逻辑分析 , 词性分类 , 摘要提炼 ,
翻译 , 阅读理解 , 讲故事 , 上下文回答 , 填空


背景

postgresml的dockerfile内容:

with nvidia gpu

postgresml Dockerfile使用了nvidia的基础镜像, 目的是使用nvidia gpu 进行加速.

简单修改如下, 修改ubuntu源为国内的地址.

FROM nvidia/cuda:12.1.1-devel-ubuntu22.04   
ENV PATH="/usr/local/cuda/bin:${PATH}"  
  
RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list  
RUN sed -i 's/cn.archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list  
RUN sed -i 's/security.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list  
  
RUN apt-get update && apt-get install -y lsb-release curl ca-certificates gnupg coreutils sudo openssl  
RUN echo "deb [trusted=yes] https://apt.postgresml.org $(lsb_release -cs) main" > /etc/apt/sources.list.d/postgresml.list  
RUN echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list  
RUN curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor | tee /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg >/dev/null  
  
ENV TZ=UTC  
ENV DEBIAN_FRONTEND=noninteractive  
RUN apt-get update -y && apt-get install git postgresml-15 postgresml-dashboard -y  
RUN git clone --branch v0.5.0 https://github.com/pgvector/pgvector && cd pgvector && echo "trusted = true" >> vector.control && make && make install  
  
COPY entrypoint.sh /app/entrypoint.sh  
COPY dashboard.sh /app/dashboard.sh  
  
COPY --chown=postgres:postgres local_dev.conf /etc/postgresql/15/main/conf.d/01-local_dev.conf  
COPY --chown=postgres:postgres pg_hba.conf /etc/postgresql/15/main/pg_hba.conf  
  
ENTRYPOINT ["bash", "/app/entrypoint.sh"]  

如果你的环境有nvidia gpu, 如何打包PostgresML docker镜像?

git clone --depth 1 -b v2.7.9 https://github.com/postgresml/postgresml  
  
cd postgresml/docker  
  
docker build -t="postgresml/15:v2.7.9" . 2>&1 | tee /tmp/docker_build.log    

运行postgresml镜像:

docker run --platform linux/amd64 -d -it --gpus all --cap-add=SYS_PTRACE --cap-add SYS_ADMIN --privileged=true --name pgml --shm-size=1g -p 5433:5432 -p 8000:8000 postgresml/15:v2.7.9 sudo -u postgresml psql -d postgresml   
  
docker exec -ti pgml bash    

without nvidia gpu

如果你的环境没有nvidia gpu, 换成普通镜像打包:

cd postgresml/docker  
  
vi Dockerfile   
FROM --platform=$TARGETPLATFORM ubuntu:22.04    
MAINTAINER digoal zhou "dege.zzz@alibaba-inc.com"    
ARG TARGETPLATFORM    
ARG BUILDPLATFORM    
RUN echo "I am running on $BUILDPLATFORM, building for $TARGETPLATFORM"    
ENV TZ=UTC  
ENV DEBIAN_FRONTEND=noninteractive  
  
RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list  
RUN sed -i 's/cn.archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list  
RUN sed -i 's/security.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list  
  
# FROM nvidia/cuda:12.1.1-devel-ubuntu22.04  
# ENV PATH="/usr/local/cuda/bin:${PATH}"  
RUN apt-get update && apt-get install -y lsb-release curl ca-certificates gnupg coreutils sudo openssl  
RUN echo "deb [trusted=yes] https://apt.postgresml.org $(lsb_release -cs) main" > /etc/apt/sources.list.d/postgresml.list  
RUN echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list  
RUN curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor | tee /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg >/dev/null  
  
RUN apt-get update -y && apt-get install git postgresml-15 postgresml-dashboard -y  
RUN git clone --branch v0.5.0 https://github.com/pgvector/pgvector && cd pgvector && echo "trusted = true" >> vector.control && make && make install  
  
COPY entrypoint.sh /app/entrypoint.sh  
COPY dashboard.sh /app/dashboard.sh  
  
COPY --chown=postgres:postgres local_dev.conf /etc/postgresql/15/main/conf.d/01-local_dev.conf  
COPY --chown=postgres:postgres pg_hba.conf /etc/postgresql/15/main/pg_hba.conf  
  
ENTRYPOINT ["bash", "/app/entrypoint.sh"]  

build docker image:

cd postgresml/docker  
  
docker build --platform=linux/amd64 -t="postgresml/15:v2.7.9" . 2>&1 | tee /tmp/docker_build.log    

运行postgresml镜像:

docker run --platform linux/amd64 -d -it --cap-add=SYS_PTRACE --cap-add SYS_ADMIN --privileged=true --name pgml --shm-size=1g -p 5433:5432 -p 8000:8000 postgresml/15:v2.7.9 sudo -u postgresml psql -d postgresml  
  
docker exec -ti pgml bash

su - postgresml

$ psql
\psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))
Type "help" for help.

postgresml=# \dx
                      List of installed extensions
  Name   | Version |   Schema   |              Description              
---------+---------+------------+---------------------------------------
 pgml    | 2.7.9   | pgml       | pgml:  Created by the PostgresML team
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

打开web, 体验postgresml dashboard:

推送到阿里云镜像仓库:

docker images
REPOSITORY                                                     TAG                    IMAGE ID       CREATED          SIZE
postgresml/15                                                  v2.7.9                 064b5796d5f1   56 minutes ago   8.08GB

docker tag 064b5796d5f1 registry.cn-hangzhou.aliyuncs.com/digoal/opensource_database:pgml15_v2.7.9  
  
docker push registry.cn-hangzhou.aliyuncs.com/digoal/opensource_database:pgml15_v2.7.9  

你如果想体验postgresml, 可以我做好的这个镜像:

docker pull registry.cn-hangzhou.aliyuncs.com/digoal/opensource_database:pgml15_v2.7.9  

环境

OS USER: postgresml , postgres    
  
  
DB SUPERUSER: root  
DB USER: postgresml  
DB PASSWORD: postgresml  
DB NAME: postgresml  
DB OWNER: postgresml  
SCHEMA: pgml  
EXTENSION: pgml  
PORT: 5432  
  
  
dashboard:    
BIN: /usr/bin/pgml-dashboard  
PORT: 8000  
WEB: (在宿主机打开)    
http://localhost:8000/    

遇到的问题

pgml需要在线连模型集市 huggingface 获取模型, 如果网络不通会报错.

postgresml=# SELECT pgml.transform(
        'translation_en_to_fr',
        inputs => ARRAY[
            'Welcome to the future!',
            'Where have you been all this time?'
        ]
    ) AS french;

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like t5-base is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

直接在已有PG实例上编译安装pgml插件

https://postgresml.org/docs/resources/developer-docs/installation

配置cargo源, 参考: https://mirrors.ustc.edu.cn/help/crates.io-index.html

# export CARGO_HOME=/root            
            
# mkdir -vp ${CARGO_HOME:-$HOME/.cargo}            
            
# vi ${CARGO_HOME:-$HOME/.cargo}/config            
          
[source.crates-io]            
replace-with = 'ustc'            
            
[source.ustc]            
registry = "sparse+https://mirrors.ustc.edu.cn/crates.io-index/"     
# 1 install rust       
# su - root     
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh    
source "$HOME/.cargo/env"  
# 下载pgml项目  
cd /tmp  
git clone --depth 1 https://github.com/postgresml/postgresml  
  
# 下载依赖代码  
cd /tmp/postgresml  
git submodule update --init --recursive  
  
# 安装pgml插件  
cd /tmp/postgresml/pgml-extension  
  
# pgrx 版本请参考 Cargo.toml 文件决定  
cargo install --locked --version 0.11.2 cargo-pgrx      
  
cargo pgrx init    # create PGRX_HOME 后, 立即ctrl^c 退出          
cargo pgrx init --pg15=`which pg_config`    # 不用管报警     
  
PGRX_IGNORE_RUST_VERSIONS=y cargo pgrx install --release --pg-config `which pg_config`      

参考

https://github.com/docker-library/official-images/blob/master/library/ubuntu

https://github.com/postgresml/postgresml

https://github.com/postgresml/postgresml/tree/master/docker

https://postgresml.org/docs/guides/setup/v2/installation

《PostgresML=模型集市+向量数据库+自定义模型 : 用postgresml体验AI应用(图像搜索、推荐系统和自然语言处理)与向量检索》

《postgresML - end-to-end machine learning system》

digoal's wechat