Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于Serving部署OCR服务性能问题的疑问 #1861

Closed
xiulianzw opened this issue Oct 12, 2022 · 1 comment
Closed

关于Serving部署OCR服务性能问题的疑问 #1861

xiulianzw opened this issue Oct 12, 2022 · 1 comment

Comments

@xiulianzw
Copy link

服务器系统:Ubuntu16.04
CPU核数:16
内存:64G
镜像:registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda10.1-cudnn7-runtime
OCR模型:PPOCR-V3
配置文件如下:

#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
rpc_port: 18090

#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
http_port: 9999

#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
worker_num: 20

#build_dag_each_worker, False,框架在进程内创建一条DAG;True,框架会每个进程内创建多个独立的DAG
build_dag_each_worker: False

dag:
    #op资源类型, True, 为线程模型;False,为进程模型
    is_thread_op: False

    #重试次数
    retry: 3

    #使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用
    use_profile: False
    
    tracer:
        interval_s: 10
op:
    det:
        #并发数,is_thread_op=True时,为线程并发;否则为进程并发
        concurrency: 4

        timeout: -1
        retry: 10

        #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
        local_service_conf:
            #client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
            client_type: local_predictor

            #det模型路径
            model_config: ./ppocr_det_v3_serving

            #Fetch结果列表,以client_config中fetch_var的alias_name为准,不设置默认取全部输出变量
            #fetch_list: ["sigmoid_0.tmp_0"]
            
            #batch_size: 4
            #auto_batching_timeout: 100

            #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
            devices: "0"
            #开启文字检测的TensorRT加速
            #device_type: 2            
            #precision: "fp16"            

            #use_mkldnn, 开启mkldnn时,必须同时设置ir_optim=True,否则无效
            #use_mkldnn: True
            mem_optim: True
            ir_optim: True
    rec:
        #并发数,is_thread_op=True时,为线程并发;否则为进程并发
        concurrency: 4

        #超时时间, 单位ms
        timeout: -1
 
        #Serving交互重试次数,默认不重试
        retry: 10

        #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
        local_service_conf:

            #client类型,包括brpc, grpc和local_predictor。local_predictor不启动Serving服务,进程内预测
            client_type: local_predictor

            #rec模型路径
            model_config: ./ppocr_rec_v3_serving

            #Fetch结果列表,以client_config中fetch_var的alias_name为准, 不设置默认取全部输出变量
            #fetch_list: ["softmax_5.tmp_0"]
            
            #batch_size: 4
            #auto_batching_timeout: 1000

            #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
            devices: "0"
            
            #开启文字识别的TensorRT加速
            #device_type: 2
            #precision: "fp16"
            
            #use_mkldnn, 开启mkldnn时,必须同时设置ir_optim=True,否则无效
            #use_mkldnn: True
            mem_optim: True
            ir_optim: True

测试发现的问题,因为服务器上有两块显卡,当我使用1块卡部署模型的时候FPS能达到30,如果开两个服务时,FPS就直接掉了16,这样两个服务和一个服务没有区别,很好奇为啥会有这么大的差距?

@dizhenx
Copy link

dizhenx commented Jun 29, 2023

服务器系统:Ubuntu16.04 CPU核数:16 内存:64G 镜像:registry.baidubce.com/paddlepaddle/serving:0.9.0-cuda10.1-cudnn7-runtime OCR模型:PPOCR-V3 配置文件如下:

#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
rpc_port: 18090

#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
http_port: 9999

#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
worker_num: 20

#build_dag_each_worker, False,框架在进程内创建一条DAG;True,框架会每个进程内创建多个独立的DAG
build_dag_each_worker: False

dag:
    #op资源类型, True, 为线程模型;False,为进程模型
    is_thread_op: False

    #重试次数
    retry: 3

    #使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用
    use_profile: False
    
    tracer:
        interval_s: 10
op:
    det:
        #并发数,is_thread_op=True时,为线程并发;否则为进程并发
        concurrency: 4

        timeout: -1
        retry: 10

        #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
        local_service_conf:
            #client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
            client_type: local_predictor

            #det模型路径
            model_config: ./ppocr_det_v3_serving

            #Fetch结果列表,以client_config中fetch_var的alias_name为准,不设置默认取全部输出变量
            #fetch_list: ["sigmoid_0.tmp_0"]
            
            #batch_size: 4
            #auto_batching_timeout: 100

            #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
            devices: "0"
            #开启文字检测的TensorRT加速
            #device_type: 2            
            #precision: "fp16"            

            #use_mkldnn, 开启mkldnn时,必须同时设置ir_optim=True,否则无效
            #use_mkldnn: True
            mem_optim: True
            ir_optim: True
    rec:
        #并发数,is_thread_op=True时,为线程并发;否则为进程并发
        concurrency: 4

        #超时时间, 单位ms
        timeout: -1
 
        #Serving交互重试次数,默认不重试
        retry: 10

        #当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
        local_service_conf:

            #client类型,包括brpc, grpc和local_predictor。local_predictor不启动Serving服务,进程内预测
            client_type: local_predictor

            #rec模型路径
            model_config: ./ppocr_rec_v3_serving

            #Fetch结果列表,以client_config中fetch_var的alias_name为准, 不设置默认取全部输出变量
            #fetch_list: ["softmax_5.tmp_0"]
            
            #batch_size: 4
            #auto_batching_timeout: 1000

            #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
            devices: "0"
            
            #开启文字识别的TensorRT加速
            #device_type: 2
            #precision: "fp16"
            
            #use_mkldnn, 开启mkldnn时,必须同时设置ir_optim=True,否则无效
            #use_mkldnn: True
            mem_optim: True
            ir_optim: True

测试发现的问题,因为服务器上有两块显卡,当我使用1块卡部署模型的时候FPS能达到30,如果开两个服务时,FPS就直接掉了16,这样两个服务和一个服务没有区别,很好奇为啥会有这么大的差距?

这个web_service怎么添加CORS跨域配置

@paddle-bot paddle-bot bot closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants