Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Bandwidth #18

Closed
TheFl0w opened this issue May 10, 2017 · 13 comments
Closed

GPU Bandwidth #18

TheFl0w opened this issue May 10, 2017 · 13 comments
Assignees

Comments

@TheFl0w
Copy link
Contributor

TheFl0w commented May 10, 2017

When we ran the memory bandwidth test on your nvidia TITAN Black at PSI , we got some unexpected results. If I remember correctly, we measured about 6000 MiB/s for data transfer between host and GPU. PCIe 3.0 should actually give us twice the bandwidth. I ran the same tests on the GPU I use at home (GTX 780) so I could find out if consumer grade GPUs are more limited when it comes to data transfer rates. It turned out that data transfer for my card is as fast as data transfer of the Tesla K80x cards we use in our HPC cluster. Can you post the results for your TITAN Black, please?

Here is the output of the bandwidth test:

Pinned memory (physically contiguous)

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 780
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12172.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     12454.7

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     213145.1

Pageable memory (virtually contiguous)

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 780
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PAGEABLE Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     6657.6

 Device to Host Bandwidth, 1 Device(s)
 PAGEABLE Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     6474.5

 Device to Device Bandwidth, 1 Device(s)
 PAGEABLE Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     212451.2
@mbrueckner-psi
Copy link

mbrueckner-psi commented May 10, 2017 via email

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 10, 2017

$ /proc/cpuinfo says the GPUs on our HPC cluster are running on 32 cores of type Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

The test I did with my GPU at home was done on a Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz

I get the same results in both cases.

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 10, 2017

As far as I know CPU clock frequency does only matter for pageable memory anyway. If we transfer data from pinned memory, this is usually done with DMA, so the CPU would not be involved in copying data.

@lopez-c
Copy link

lopez-c commented May 10, 2017

Hi,
We need to find out the reason why the transfers are so slow. We will keep you up to date.

@mbrueckner-psi
Copy link

mbrueckner-psi commented May 10, 2017 via email

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 10, 2017

@mbrueckner-psi
To be honest, I have no idea how to fix this. Off the top of my head, I would say:

  • could be a bug in the driver, make sure you are using the latest version
  • firmware problem on mainboard
  • a power connector of the PSU is not working anymore
  • PCIe slot faulty, maybe try another slot
  • faulty GPU

If you have physical access to the system, maybe try to check the PSU connectors and just put the GPU out of the PCIe slot and back in.

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 10, 2017

I would like to gather some additional information about your GPU. Can you run the program I attached and post the results?

benchmark.zip

@lopez-c
Copy link

lopez-c commented May 10, 2017

Hi,
This is the result of the benchmark:

CUDA Driver version: 8000
CUDA Runtime version: 8000

Devices:
GeForce GTX TITAN Black
Compute capability: 3.5
Global memory: 6082.31 MiB
DMA engines: 1
Multi processors: 15
Warp size: 32
Max concurrent kernels: 1
Max grid size: 2147483647, 65535, 65535
Max block size: 1024, 1024, 64
Max threads per block: 1024

For some reason we are trying to understand, it looks like the link between the CPU and the GPU is PCIe v2.0 instead of PCIe v3.0.

In theory the GPU is compatible with PCIe 3.0 and the slot where it is connected to as well. So yes, a bit strange.

@mbrueckner-psi
Copy link

mbrueckner-psi commented May 10, 2017 via email

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 11, 2017

Okay, so I have tried looking for more possible explanations for our GPU bandwidth problem. A processor supports a certain number of PCIe lanes. In your case that max number of lanes should be 40. However, different chipsets on motherboards support a different number of PCIe lanes. Sometimes the number of lanes available per PCIe socket depends on how many devices are connected. For example: If there are devices plugged into socket 1 and 3, only 8 lanes respectively are available. I can look into this, but I need to know what motherboard is used, which PCIe sockets are used and how many lanes are (theoretically) occupied by those devices.

TL;DR: Max # of PCIe lanes could be the issue. For now I would like to know the model of the motherboard.

@mbrueckner-psi
Copy link

mbrueckner-psi commented May 11, 2017 via email

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 11, 2017

This is what the manual says about the expansion slots.

Expansion Slot # Technology Bus Width Connector Width Bus Number Form Factor Notes
9 PCIe 3.0 x4 x8 32 Full Length / Height For processor 2
8 PCIe 3.0 x16 x16 32 Full Length / Height For processor 2
7 PCIe 3.0 x4 x8 32 Full Length / Height For processor 2
6 PCIe 3.0 x16 x16 32 Full Length / Height For processor 2
5 PCIe 2.0 x4 x8 0 Full Length / Height For processor 2
4 PCIe 3.0 x4 x8 0 Full Length / Height For processor 1
3 PCIe 3.0 x16 x16 0 Full Length / Height For processor 1
2 PCIe 3.0 x4 x8 0 Full Length / Height For processor 1
1 PCIe 3.0 x8 x16 0 Full Length / Height For processor 1

dmidecode reported:

Designation: PCI-E Slot 8 
Type: x16 PCI Express 3

Expansion slot 1 has a connector width of x16 while it only supports x8. Please make sure the card is not plugged into slot 1. If so, consider placing it in slot 3 instead.

@TheFl0w
Copy link
Contributor Author

TheFl0w commented May 11, 2017

If Linux labels the PCIe slots correctly, I am out of ideas for now. I will ask around at work tomorrow, maybe this is a common problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants