Skip to content

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

License

Notifications You must be signed in to change notification settings

SNSerHello/Paddle

 
 


Ubuntu20.04LTS

标准CUDA环境搭建

CPU版本

$ mkdir build
$ cd build
$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=OFF \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release
$ make -j

GPU版本

$ mkdir build
$ cd build
$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release
$ make -j

Anaconda3环境搭建

根据py37-paddle-dev搭建CUDA10.1+CUDNN7.6.5后,可以用它来编译PaddlePaddle。

$ conda activate py37-paddle-dev
(py37-paddle-dev) $ mkdir -p $CONDA_PREFIX/etc/conda/activate.d
(py37-paddle-dev) $ nano $CONDA_PREFIX/etc/conda/activate.d/env_vars.h
文件内容如下:
CUDA_ROOT=$CONDA_PREFIX
export LD_LIBRARY_PATH=$CUDA_ROOT/lib:$LD_LIBRARY_PATH
(py37-paddle-dev) $ mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
(py37-paddle-dev) $ nano $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.h
文件内容如下:
export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | cut -d : -f 2-`
(py37-paddle-dev) $ ln -s $CONDA_PREFIX/pkgs/cuda-toolkit/lib64/libcudadevrt.a  $CONDA_PREFIX/lib/libcudadevrt.a
(py37-paddle-dev) $ conda deactivate

CPU版本

$ conda activate py37-paddle-dev
(py37-paddle-dev) $ mkdir build
(py37-paddle-dev) $ cd build
(py37-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=OFF \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release
(py37-paddle-dev) $ make -j

GPU版本

$ conda activate py37-paddle-dev
(py37-paddle-dev) $ mkdir build
(py37-paddle-dev) $ cd build
(py37-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI
(py37-paddle-dev) $ make -j

对于NVIDIA 3070/3080/3090显卡的编译,CUDA10.x不支持compute_86的算力,所以可以降低到compute_75的算力来编译,如下所示

(py37-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DCUDA_ARCH_NAME=Ampere \
	-DCMAKE_CUDA_ARCHITECTURES=75 \
	-DCMAKE_MATCH_1=75 \
	-DCMAKE_MATCH_2=75

备注

  • CUDA10.x+CUDA7.6需要NVIDIA driver release 396, 384.111+, 410, 418.xx or 440.30,在高版本的情况下,初始化时间很长,可用运行一下如下程序:

    In [1]: import paddle
    In [2]: paddle.utils.run_check()
    Running verify PaddlePaddle program ...
    PaddlePaddle works well on 1 GPU.
    PaddlePaddle works well on 1 GPUs.
    PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
    In [3]: paddle.fluid.install_check.run_check()
    Running Verify Fluid Program ...
    Your Paddle Fluid works well on SINGLE GPU or CPU.
    Your Paddle Fluid works well on MUTIPLE GPU or CPU.
    Your Paddle Fluid is installed successfully! Let's start deep Learning with Paddle Fluid now

Python3.8+CUDA11.3+CUDNN8.2

py38-paddle-dev环境搭建,请参考:py38-paddle-dev.yaml。CUDA11.3支持compute_86CUDNN≥8.0.2时候支持CUDNN_FMA_MATH,可参考:cudnnMathType_t

Training版本

$ conda env create --file py38-paddle-dev.yaml
$ conda activate py38-paddle-dev
(py38-paddle-dev) $ mkdir -p $CONDA_PREFIX/etc/conda/activate.d
(py38-paddle-dev) $ nano $CONDA_PREFIX/etc/conda/activate.d/env_vars.h
文件内容如下:
CUDA_ROOT=$CONDA_PREFIX
export LD_LIBRARY_PATH=$CUDA_ROOT/lib:$LD_LIBRARY_PATH
(py38-paddle-dev) $ mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
(py38-paddle-dev) $ nano $CONDA_PREFIX/etc/conda/deactivate.d/env_vars.h
文件内容如下:
export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | cut -d : -f 2-`
(py38-paddle-dev) $ conda deactivate
# 使用新配置的环境
$ ulimit -n 4096
$ conda activate py38-paddle-dev
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI
(py38-paddle-dev) $ make -j

: 默认Ubuntu20.04LTS版本打开的文件数为1024个,如果觉得每次设置ulimit -n 4096比较麻烦的话,那么请在~/.bashrc中加入此设置。这样每次打开bash的时候都自动设置为你想要的,否则可以考虑写一个PaddlePaddle的编译脚本,一次性搞定。

Inference版本

$ conda activate py38-paddle-dev
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DON_INFER=ON
(py38-paddle-dev) $ ulimit -n 4096
(py38-paddle-dev) $ make -j

在完成编译后,在build目录中会多出两个文件夹,paddle_inference_c_install_dirpaddle_inference_install_dir

paddle_inference_c_install_dir
paddle_inference_c_install_dir
├── paddle
│   ├── include
│   │   ├── pd_common.h
│   │   ├── pd_config.h
│   │   ├── pd_inference_api.h
│   │   ├── pd_predictor.h
│   │   ├── pd_tensor.h
│   │   ├── pd_types.h
│   │   └── pd_utils.h
│   └── lib
│       ├── libpaddle_inference_c.a
│       └── libpaddle_inference_c.so
├── third_party
│   └── install
│       ├── cryptopp
│       │   ├── include
│       │   └── lib
│       ├── gflags
│       │   ├── include
│       │   └── lib
│       ├── glog
│       │   ├── include
│       │   └── lib
│       ├── mkldnn
│       │   ├── include
│       │   └── lib
│       ├── mklml
│       │   ├── include
│       │   └── lib
│       ├── onnxruntime
│       │   ├── include
│       │   └── lib
│       ├── paddle2onnx
│       │   ├── include
│       │   └── lib
│       ├── protobuf
│       │   ├── include
│       │   └── lib
│       ├── utf8proc
│       │   ├── include
│       │   └── lib
│       └── xxhash
│           ├── include
│           └── lib
└── version.txt
paddle_inference_install_dir
paddle_inference_install_dir
├── CMakeCache.txt
├── paddle
│   ├── include
│   │   ├── crypto
│   │   │   └── cipher.h
│   │   ├── experimental
│   │   │   ├── ext_all.h
│   │   │   ├── phi
│   │   │   └── utils
│   │   ├── internal
│   │   │   └── framework.pb.h
│   │   ├── paddle_analysis_config.h
│   │   ├── paddle_api.h
│   │   ├── paddle_infer_contrib.h
│   │   ├── paddle_infer_declare.h
│   │   ├── paddle_inference_api.h
│   │   ├── paddle_mkldnn_quantizer_config.h
│   │   ├── paddle_pass_builder.h
│   │   └── paddle_tensor.h
│   └── lib
│       ├── libpaddle_inference.a
│       └── libpaddle_inference.so
├── third_party
│   ├── externalError
│   │   └── data
│   │       └── data
│   ├── install
│   │   ├── cryptopp
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── gflags
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── glog
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── mkldnn
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── mklml
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── onnxruntime
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── paddle2onnx
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── protobuf
│   │   │   ├── include
│   │   │   └── lib
│   │   ├── utf8proc
│   │   │   ├── include
│   │   │   └── lib
│   │   └── xxhash
│   │       ├── include
│   │       └── lib
│   └── threadpool
│       └── ThreadPool.h
└── version.txt

编译PaddlePaddle v2.3.1版本

Training编译

$ conda activate py38-paddle-dev
(py38-paddle-dev) $ git checkout v2.3.1
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ export PADDLE_VERSION=2.3.1
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI
(py38-paddle-dev) $ ulimit -n 4096
(py38-paddle-dev) $ make -j

Training编译(NVIDIA3070/3080/3090)

$ conda activate py38-paddle-dev
(py38-paddle-dev) $ git checkout v2.3.1
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ export PADDLE_VERSION=2.3.1
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DCMAKE_CUDA_ARCHITECTURES=86
(py38-paddle-dev) $ ulimit -n 4096
(py38-paddle-dev) $ make -j

Inference编译

$ conda activate py38-paddle-dev
(py38-paddle-dev) $ git checkout v2.3.1
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ export PADDLE_VERSION=2.3.1
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DWITH_ONNXRUNTIME=ON \
	-DON_INFER=ON
(py38-paddle-dev) $ ulimit -n 4096
(py38-paddle-dev) $ make -j

Inference编译(NVIDIA3070/3080/3090)

$ conda activate py38-paddle-dev
(py38-paddle-dev) $ git checkout v2.3.1
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ export PADDLE_VERSION=2.3.1
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DWITH_ONNXRUNTIME=ON \
	-DON_INFER=ON \
	-DCMAKE_CUDA_ARCHITECTURES=86
(py38-paddle-dev) $ ulimit -n 4096
(py38-paddle-dev) $ make -j

Inference编译2(NVIDIA3070/3080/3090)

$ sudo apt install libmkl-full-dev libgoogle-perftools-dev google-perftools
$ conda activate py38-paddle-dev
(py38-paddle-dev) $ git checkout v2.3.1
(py38-paddle-dev) $ mkdir build
(py38-paddle-dev) $ cd build
(py38-paddle-dev) $ export PADDLE_VERSION=2.3.1
(py38-paddle-dev) $ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DWITH_LITE=ON \
	-DCMAKE_CXX_FLAGS=-I/usr/include/mkl \
	-DWITH_ONEMKL=ON \
	-DWITH_PROFILER=ON \
	-DWITH_ONNXRUNTIME=ON \
	-DON_INFER=ON \
	-DCMAKE_CUDA_ARCHITECTURES=86
(py38-paddle-dev) $ ulimit -n 4096
(py38-paddle-dev) $ make -j

Docker环境搭建

GPU版本(Python3.7+CUDA10.2+CUDNN7.6)

$ sudo docker pull paddlepaddle/paddle:2.3.1-gpu-cuda10.2-cudnn7
$ sudo nvidia-docker run --rm -itv your_path/Paddle:/workspace -w /workspace paddlepaddle/paddle:2.3.1-gpu-cuda10.2-cudnn7 /bin/bash
$ mkdir build
$ cd build
$ export PADDLE_VERSION=2.3.1
$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DWITH_LITE=ON \
	-DWITH_ONEMKL=OFF \
	-DWITH_ONNXRUNTIME=ON \
	-DON_INFER=ON
$ make -j

GPU版本(Python3.7+CUDA11.2+CUDNN8.1)

$ sudo docker pull paddlepaddle/paddle:2.3.1-gpu-cuda11.2-cudnn8
$ sudo nvidia-docker run --rm -itv your_path/Paddle:/workspace -w /workspace paddlepaddle/paddle:2.3.1-gpu-cuda11.2-cudnn8 /bin/bash
$ mkdir build
$ cd build
$ export PADDLE_VERSION=2.3.1
$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DWITH_LITE=ON \
	-DWITH_ONEMKL=OFF \
	-DWITH_ONNXRUNTIME=ON \
	-DON_INFER=ON
$ make -j

FAQ

问题1unsupported GNU version! gcc versions later than 8 are not supported!

解决方法:请参考GCC环境搭建,建议安装GCCv8.4.0,可以同时解决AVX intrinsic的编译问题。

问题2Too many open files

解决方法:默认是1024个文件,可以设置成一个较大的数,比如说4096个,如下所示

$ ulimit -n
$ ulimit -n 4096

问题3nvcc fatal : Unsupported gpu architecture 'compute_86'

解决方法

1、检查当前支持的gpu architecture,可以知道是compute_75

$ nvcc --help | egrep -i compute
        specified with this option must be a 'virtual' architecture (such as compute_50).
        --gpu-architecture=sm_50' is equivalent to 'nvcc --gpu-architecture=compute_50
        --gpu-code=sm_50,compute_50'.
        Allowed values for this option:  'compute_30','compute_32','compute_35',
        'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
        'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',
        (such as sm_50), and PTX code for the 'virtual' architecture (such as compute_50).
        For instance, '--gpu-architecture=compute_35' is not compatible with '--gpu-code=sm_30',
        because the earlier compilation stages will assume the availability of 'compute_35'
        Allowed values for this option:  'compute_30','compute_32','compute_35',
        'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
        'compute_62','compute_70','compute_72','compute_75','sm_30','sm_32','sm_35',

2、cmake/cuda.cmake里面相关部分成-gencode arch=compute_75,code=sm_75:如果在cmake中指定了-DCUDA_ARCH_NAME=Ampere或者其他的,比如Turing等,那么这步修改可以跳过。

3、增加编译选项,支持compute_75

NVIDIA 3070/3080/3090是Ampere结构,CUDA10.2在NCCL仅仅支持compute_75,下方演示对应的配置,其他显卡可以参看NVIDIA官网的Compare Geforce Graphics Cards来确定对应结构。

sm_35, and sm_37 Basic features+ Kepler support+ Unified memory programming+ Dynamic parallelism support
sm_50, sm_52 and sm_53 + Maxwell support
sm_60, sm_61, and sm_62 + Pascal support
sm_70 and sm_72 + Volta support
sm_75 + Turing support
sm_80, sm_86 and sm_87 + NVIDIA Ampere GPU architecture support

source: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

RTX 30 Series RTX 20 Series GTX 16 Series GTX 10 Series GTX 9 Series
NVIDIA Architecture Architecture Name Ampere Turing Turing Pascal Maxwell
Streaming Multiprocessors 2x FP32 1x FP32 1x FP32 1x FP32 1x FP32
Ray Tracing Cores Gen 2 Gen 1 - - -
Tensor Cores (AI) Gen 3 Gen 2 - - -
Memory Up to 24 GB GDDR6X Up to 11 GB GDDR6 Up to 6GB GDDR6 Up to 11 GB GDDR5X Up to 6 GB GDDR5
Platform NVIDIA DLSS Yes Yes - - -
NVIDIA Reflex Yes Yes Yes Yes Yes
NVIDIA Broadcast Yes Yes GTX 1650 Super or 1660 Super - -
NVIDIA GeForce Experience Yes Yes Yes Yes Yes
Game Ready Drivers Yes Yes Yes Yes Yes
NVIDIA Studio Drivers Yes Yes Yes Yes -
NVIDIA ShadowPlay Yes Yes Yes Yes Yes
NVIDIA Highlights Yes Yes Yes Yes Yes
NVIDIA Ansel Yes Yes Yes Yes Yes
NVIDIA Freestyle Yes Yes Yes Yes Yes
VR Ready Yes Yes GTX 1650 Super or higher GTX 1060 or higher GTX 970 or higher
NVIDIA Omniverse Yes Yes - - -
Additional Features PCIe Gen 4 Gen 3 Gen 3 Gen 3 Gen 3
NVIDIA Encoder (NVENC) Gen 7 Gen 7 Gen 6 Gen 6 Gen 5
NVIDIA Decoder (NVDEC) Gen 5 Gen 4 Gen 4 Gen 3 Gen 2
CUDA Capability 8.6 7.5 7.5 6.1 5.2
DX12 Ultimate Yes Yes - - -
Video Outputs HDMI 2.1, DisplayPort 1.4a HDMI 2.0b, DisplayPort 1.4a HDMI 2.0b, DisplayPort 1.4a HDMI 2.0b, DisplayPort 1.4a HDMI 2.0, DisplayPort 1.2

source: https://www.nvidia.com/en-us/geforce/graphics-cards/compare/?section=compare-specs

cuDNN Package Supported NVIDIA Hardware CUDA Toolkit Version CUDA Compute Capability Supports static linking?
cuDNN 8.4.1 for CUDA 11.x NVIDIA Ampere Architecture
NVIDIA Turing™
NVIDIA Volta™
NVIDIA Pascal™
NVIDIA Maxwell®
NVIDIA Kepler™
11.7
11.6
11.5
11.4
11.3
SM 3.5 and later Yes
11.2
11.1
11.0
No
cuDNN 8.4.1 for CUDA 10.2 NVIDIA Turing
NVIDIA Volta
Xavier™
NVIDIA Pascal
NVIDIA Maxwell
NVIDIA Kepler
10.2 SM 3.0 and later Yes

source: https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html

$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DCUDA_ARCH_NAME=Ampere \
	-DCMAKE_CUDA_ARCHITECTURES=75 \
	-DCMAKE_MATCH_1=75 \
	-DCMAKE_MATCH_2=75

问题4error: identifier "__builtin_ia32_sqrtsd_round" is undefined

解决方法1:升级GCC版本到v8.4.0,请参考:GCC v8.4.0安装

解决方法2:修改CMakeLists.txt

option(WITH_AVX "Compile PaddlePaddle with AVX intrinsics" ${AVX_FOUND}) # 第243行
修改成
option(WITH_AVX "Compile PaddlePaddle with AVX intrinsics" OFF)

问题5Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC, empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

解决方法:在cmake的时候,增加CMAKE_CUDA_ARCHITECTURES的定义,假设我们使用NVIDIA 3070/3080/3090显卡,那么

$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI \
	-DCMAKE_CUDA_ARCHITECTURES=86

对于其他类的显卡设置的值,可以参考问题3中的nvcc --help部分内容。

问题6:如何自动设置编译的python版本?

$ cmake .. \
	-DPY_VERSION=`python --version | cut -d ' ' -f 2 | cut -d '.' -f -2` \
	-DWITH_GPU=ON \
	-DWITH_TESTING=OFF \
	-DCMAKE_BUILD_TYPE=Release \
	-DCUDA_TOOLKIT_ROOT_DIR=$CONDA_PREFIX \
	-DCUDA_SDK_ROOT_DIR=$CONDA_PREFIX \
	-DCUDNN_ROOT=$CONDA_PREFIX \
	-DNCCL_ROOT=$CONDA_PREFIX \
	-DCUPTI_ROOT=$CONDA_PREFIX/pkgs/cuda-toolkit/extras/CUPTI

参考

English | 简体中文 | 日本語

Documentation Status Documentation Status Release License Twitter

Welcome to the PaddlePaddle GitHub.

PaddlePaddle, as the first independent R&D deep learning platform in China, has been officially open-sourced to professional communities since 2016. It is an industrial platform with advanced technologies and rich features that cover core deep learning frameworks, basic model libraries, end-to-end development kits, tools & components as well as service platforms. PaddlePaddle is originated from industrial practices with dedication and commitments to industrialization. It has been widely adopted by a wide range of sectors including manufacturing, agriculture, enterprise service, and so on while serving more than 5.35 million developers, 200,000 companies and generating 670,000 models. With such advantages, PaddlePaddle has helped an increasing number of partners commercialize AI.

Installation

Latest PaddlePaddle Release: v2.4

Our vision is to enable deep learning for everyone via PaddlePaddle. Please refer to our release announcement to track the latest features of PaddlePaddle.

Install Latest Stable Release:

# CPU
pip install paddlepaddle
# GPU
pip install paddlepaddle-gpu

For more information about installation, please view Quick Install

Now our developers can acquire Tesla V100 online computing resources for free. If you create a program by AI Studio, you will obtain 8 hours to train models online per day. Click here to start.

FOUR LEADING TECHNOLOGIES

  • Agile Framework for Industrial Development of Deep Neural Networks

    The PaddlePaddle deep learning framework facilitates the development while lowering the technical burden, through leveraging a programmable scheme to architect the neural networks. It supports both declarative programming and imperative programming with both development flexibility and high runtime performance preserved. The neural architectures could be automatically designed by algorithms with better performance than the ones designed by human experts.

  • Support Ultra-Large-Scale Training of Deep Neural Networks

    PaddlePaddle has made breakthroughs in ultra-large-scale deep neural networks training. It launched the world's first large-scale open-source training platform that supports the training of deep networks with 100 billion features and trillions of parameters using data sources distributed over hundreds of nodes. PaddlePaddle overcomes the online deep learning challenges for ultra-large-scale deep learning models, and further achieved real-time model updating with more than 1 trillion parameters. Click here to learn more

  • High-Performance Inference Engines for Comprehensive Deployment Environments

    PaddlePaddle is not only compatible with models trained in 3rd party open-source frameworks , but also offers complete inference products for various production scenarios. Our inference product line includes Paddle Inference: Native inference library for high-performance server and cloud inference; Paddle Serving: A service-oriented framework suitable for distributed and pipeline productions; Paddle Lite: Ultra-Lightweight inference engine for mobile and IoT environments; Paddle.js: A frontend inference engine for browser and mini-apps. Furthermore, by great amounts of optimization with leading hardware in each scenario, Paddle inference engines outperform most of the other mainstream frameworks.

  • Industry-Oriented Models and Libraries with Open Source Repositories

    PaddlePaddle includes and maintains more than 100 mainstream models that have been practiced and polished for a long time in the industry. Some of these models have won major prizes from key international competitions. In the meanwhile, PaddlePaddle has further more than 200 pre-training models (some of them with source codes) to facilitate the rapid development of industrial applications. Click here to learn more

Documentation

We provide English and Chinese documentation.

  • Guides

    You might want to start from how to implement deep learning basics with PaddlePaddle.

  • Practice

    So far you have already been familiar with Fluid. And the next step should be building a more efficient model or inventing your original Operator.

  • API Reference

    Our new API enables much shorter programs.

  • How to Contribute

    We appreciate your contributions!

Communication

  • Github Issues: bug reports, feature requests, install issues, usage issues, etc.
  • QQ discussion group: 441226485 (PaddlePaddle).
  • Forums: discuss implementations, research, etc.

Courses

  • Server Deployments: Courses introducing high performance server deployments via local and remote services.
  • Edge Deployments: Courses introducing edge deployments from mobile, IoT to web and applets.

Copyright and License

PaddlePaddle is provided under the Apache-2.0 license.

About

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 46.1%
  • Python 44.1%
  • Cuda 7.4%
  • CMake 1.1%
  • Shell 0.7%
  • C 0.3%
  • Other 0.3%