Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpmbuild --rebuild /tmp/nvidia_peer_memory-1.2-0.src.rpm fails on CentOS7 #94

Closed
yug0slav opened this issue Oct 7, 2021 · 4 comments
Closed

Comments

@yug0slav
Copy link

yug0slav commented Oct 7, 2021

  • OS: CentOS Linux release 7.9.2009 (Core)

  • Kernel: 3.10.0-1160.42.2.el7.x86_64

  • Nvidia Driver: NVIDIA-SMI 470.57.02 / Driver Version: 470.57.02 / CUDA Version: 11.4

  • Mellanox driver: MLNX_OFED_LINUX-5.4-1.0.3.0

  • Steps:

# ./build_module.sh

Building source rpm for nvidia_peer_memory...

Built: /tmp/nvidia_peer_memory-1.2-0.src.rpm

To install run on RPM based OS:
    # rpmbuild --rebuild /tmp/nvidia_peer_memory-1.2-0.src.rpm
    # rpm -ivh <path to generated binary rpm file>
rpmbuild --rebuild /tmp/nvidia_peer_memory-1.2-0.src.rpm
Installing /tmp/nvidia_peer_memory-1.2-0.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.kaBNqq
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd /root/rpmbuild/BUILD
+ rm -rf nvidia_peer_memory-1.2
+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/nvidia_peer_memory-1.2.tar.gz
+ /usr/bin/tar -xvvf -
drwx------ root/root         0 2021-10-07 13:27 nvidia_peer_memory-1.2/
-rwx------ root/root      4949 2021-10-07 13:27 nvidia_peer_memory-1.2/Makefile
-rwx------ root/root      3321 2021-10-07 13:27 nvidia_peer_memory-1.2/README.md
-rwx------ root/root      2281 2021-10-07 13:27 nvidia_peer_memory-1.2/build_module.sh
-rwx------ root/root      5817 2021-10-07 13:27 nvidia_peer_memory-1.2/compat_nv-p2p.h
-rwx------ root/root      4208 2021-10-07 13:27 nvidia_peer_memory-1.2/create_nv.symvers.sh
drwx------ root/root         0 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/
-rwx------ root/root      2559 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/changelog
-rwx------ root/root         2 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/compat
-rwx------ root/root       910 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/control
-rwx------ root/root        10 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/nvidia-peer-memory-dkms.dkms
-rwx------ root/root        24 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/nvidia-peer-memory-dkms.install
-rwx------ root/root       245 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/nvidia-peer-memory-dkms.postinst
-rwx------ root/root        81 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/nvidia-peer-memory.install
-rwx------ root/root       198 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/nvidia-peer-memory.postinst
-rwx------ root/root       199 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/nvidia-peer-memory.prerm
drwx------ root/root         0 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/patches/
-rwx------ root/root       369 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/patches/dkms_name.patch
-rwx------ root/root        16 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/patches/series
-rwx------ root/root       557 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/rules
drwx------ root/root         0 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/source/
-rwx------ root/root        13 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/source/format
-rwx------ root/root       431 2021-10-07 13:27 nvidia_peer_memory-1.2/debian/updateInit.sh
-rwx------ root/root       614 2021-10-07 13:27 nvidia_peer_memory-1.2/dkms.conf
-rwx------ root/root      2756 2021-10-07 13:27 nvidia_peer_memory-1.2/nv_peer_mem
-rwx------ root/root     15135 2021-10-07 13:27 nvidia_peer_memory-1.2/nv_peer_mem.c
-rwx------ root/root        47 2021-10-07 13:27 nvidia_peer_memory-1.2/nv_peer_mem.conf
-rwx------ root/root       241 2021-10-07 13:27 nvidia_peer_memory-1.2/nv_peer_mem.upstart
-rwx------ root/root      3299 2021-10-07 13:27 nvidia_peer_memory-1.2/nvidia_peer_memory.spec
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd nvidia_peer_memory-1.2
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.3dxLME
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd nvidia_peer_memory-1.2
+ export KVER=3.10.0-1160.42.2.el7.x86_64
+ KVER=3.10.0-1160.42.2.el7.x86_64
+ make KVER=3.10.0-1160.42.2.el7.x86_64 all
INFO: Building with MLNX_OFED from: /usr/src/ofa_kernel/default
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/create_nv.symvers.sh 3.10.0-1160.42.2.el7.x86_64
'/lib/modules/3.10.0-1160.42.2.el7.x86_64/extra/nvidia.ko.xz' -> './nvidia.ko.xz'
Getting symbol versions from nvidia.ko ...
Created: /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv.symvers
Found /usr/src/nvidia-470.57.02//nvidia/nv-p2p.h
/bin/cp -f /usr/src/nvidia-470.57.02//nvidia/nv-p2p.h /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv-p2p.h
echo -n "" > my.symvers
# get OFED symbols when building with MLNX_OFED
/bin/cp -f /usr/src/ofa_kernel/default/Module.symvers my.symvers
cat nv.symvers >> my.symvers
make -C /lib/modules/3.10.0-1160.42.2.el7.x86_64/build  M=/root/rpmbuild/BUILD/nvidia_peer_memory-1.2 KBUILD_EXTRA_SYMBOLS="/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/my.symvers" modules
make[1]: Entering directory `/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64'
INFO: Building with MLNX_OFED from: /usr/src/ofa_kernel/default
awk: fatal: cannot open file `nvidia_peer_memory.spec' for reading (No such file or directory)
  CC [M]  /root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.o
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:94:9: note: #pragma message: Enable nvidia_p2p_dma_map_pages support
 #pragma message("Enable nvidia_p2p_dma_map_pages support")
         ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: variable 'nv_mem_client_ex' has initializer but incomplete type
 static struct peer_memory_client_ex nv_mem_client_ex = { .client = {
               ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: unknown field 'client' specified in initializer
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: extra brace group at end of initializer
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:468:15: error: (near initialization for 'nv_mem_client_ex')
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:476:1: warning: excess elements in struct initializer [enabled by default]
 }};
 ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:476:1: warning: (near initialization for 'nv_mem_client_ex') [enabled by default]
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c: In function 'nv_mem_client_init':
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:483:2: error: invalid use of undefined type 'struct peer_memory_client_ex'
  strcpy(nv_mem_client_ex.client.name, DRV_NAME);
  ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:488:2: error: invalid use of undefined type 'struct peer_memory_client_ex'
  strcpy(nv_mem_client_ex.client.version, DRV_VERSION);
  ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:492:2: error: invalid use of undefined type 'struct peer_memory_client_ex'
  nv_mem_client_ex.client.version[IB_PEER_MEMORY_VER_MAX-1] = 1;
  ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:493:2: error: invalid use of undefined type 'struct peer_memory_client_ex'
  nv_mem_client_ex.ex_size = sizeof(struct peer_memory_client_ex);
  ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:493:36: error: invalid application of 'sizeof' to incomplete type 'struct peer_memory_client_ex'
  nv_mem_client_ex.ex_size = sizeof(struct peer_memory_client_ex);
                                    ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:499:2: error: invalid use of undefined type 'struct peer_memory_client_ex'
  nv_mem_client_ex.flags = PEER_MEM_INVALIDATE_UNMAPS;
  ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:499:27: error: 'PEER_MEM_INVALIDATE_UNMAPS' undeclared (first use in this function)
  nv_mem_client_ex.flags = PEER_MEM_INVALIDATE_UNMAPS;
                           ^
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:499:27: note: each undeclared identifier is reported only once for each function it appears in
/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.c:501:2: error: invalid use of undefined type 'struct peer_memory_client_ex'
  reg_handle = ib_register_peer_memory_client(&nv_mem_client_ex.client,
  ^
make[2]: *** [/root/rpmbuild/BUILD/nvidia_peer_memory-1.2/nv_peer_mem.o] Error 1
make[1]: *** [_module_/root/rpmbuild/BUILD/nvidia_peer_memory-1.2] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64'
make: *** [all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.3dxLME (%build)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.3dxLME (%build)
@drossetti
Copy link

same as #95, please note the new requirement:

Please note that to build correctly, a MLNX_OFED carrying the Peer-direct fix for the bug "Peer-direct patch may cause deadlock due to lock inversion" (tracked by the Internal Ref. #2696789) is required, for example MLNX_OFED 5.3-1.0.0.1.43.

@yug0slav
Copy link
Author

I am not following... was the bug fixed in 5.3-1.0.0.1.43? I am on 5.4-1.0.3.0 attempting to build/install nvidia_peer_memory-1.2.

@yug0slav
Copy link
Author

resolved in MLNX_OFED_LINUX-5.4-3.0.3.0

@erwincoumans
Copy link

So nv_peer_memory can't be used with ConnectX-3 cards (even though the hardware supports it)?

Note: MLNX_OFED 4.9-x LTS should be used by customers who would like to utilize one of the following:
NVIDIA ConnectX-3 Pro
NVIDIA ConnectX-3
NVIDIA Connect-IB
RDMA experimental verbs library (mlnx_lib)
OSs based on kernel version lower than 3.10
Note: All of the above are not available on MLNX_OFED 5.x branch.

Note: MLNX_OFED 5.4-x LTS should be used by customers who would like to utilize NVIDIA ConnectX-4 onwards adapter cards and keep using stable 5.4-x deployment and get:
Critical bug fixes
Support for new major OSs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants