-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Bandwidth #18
Comments
What's the CPU clock frequency? The server's CPUs (where the titan is
mounted) are Intel(R) Xeon(R) CPU E5-2680 0 with only 2.70GHz.
This is our output:
[l_brueckner_m@pc-jungfrau-test bandwidthTest]$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX TITAN Black
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5836.3
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6533.9
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 230384.2
[l_brueckner_m@pc-jungfrau-test bandwidthTest]$ ./bandwidthTest
--memory=pageable
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX TITAN Black
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PAGEABLE Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4026.1
Device to Host Bandwidth, 1 Device(s)
PAGEABLE Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3891.8
Device to Device Bandwidth, 1 Device(s)
PAGEABLE Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 231035.7
Am 10.05.2017 um 09:36 schrieb TheFl0w:
… When we ran the memory bandwidth test on your nvidia TITAN Black at PSI
, we got some unexpected results. If I remember correctly, we measured
about 6000 MiB/s for data transfer between host and GPU. PCIe 3.0 should
actually give us twice the bandwidth. I ran the same tests on the GPU I
use at home (GTX 780) so I could find out if consumer grade GPUs are
more limited when it comes to data transfer rates. It turned out that
data transfer for my card is as fast as data transfer of the Tesla K80x
cards we use in our HPC cluster. Can you post the results for your TITAN
Black, please?
Here is the output of the bandwidth test:
*Pinned memory (physically contiguous)*
|[CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GTX
780 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory
Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 12172.0 Device
to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size
(Bytes) Bandwidth(MB/s) 33554432 12454.7 Device to Device Bandwidth, 1
Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s)
33554432 213145.1 |
*Pageable memory (virtually contiguous)*
|[CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GTX
780 Quick Mode Host to Device Bandwidth, 1 Device(s) PAGEABLE Memory
Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6657.6 Device
to Host Bandwidth, 1 Device(s) PAGEABLE Memory Transfers Transfer Size
(Bytes) Bandwidth(MB/s) 33554432 6474.5 Device to Device Bandwidth, 1
Device(s) PAGEABLE Memory Transfers Transfer Size (Bytes)
Bandwidth(MB/s) 33554432 212451.2 |
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXSCCVkRj0ZDsuan-GJTlPeX7tJBeKXjks5r4WjngaJpZM4NWSFm>.
|
The test I did with my GPU at home was done on a Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz I get the same results in both cases. |
As far as I know CPU clock frequency does only matter for pageable memory anyway. If we transfer data from pinned memory, this is usually done with DMA, so the CPU would not be involved in copying data. |
Hi, |
Hi
lspci -nvvs 27:00.0
[...]
LnkCap: Port #0, Speed 5GT/s, Width x16, [...]
LnkSta: Speed 2.5GT/s, Width x16, [...]
dmidecode
[...]
Handle 0x0908, DMI type 9, 17 bytes
System Slot Information
Designation: PCI-E Slot 8
Type: x16 PCI Express 3
Current Usage: In Use
Length: Long
ID: 8
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:27:00.0
lspci shows that the card can handle 5GT/s but it gets only 2.5GT/s.
This is strange since 5GT/s is PCIe 2.0 (wikipedia) and NVidia claims
that the Titan can do PCIe 3.0.
dmidecode shows that the slot can handle PCIe 3.0 with 16 lanes.
Martin
Am 10.05.2017 um 11:11 schrieb lopez-c:
… Hi,
We need to find out the reason why the transfers are so slow. We will
keep you up to date.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXSCCY3sOpi3e2sHhx_SimqniLXd76Ksks5r4X9BgaJpZM4NWSFm>.
|
@mbrueckner-psi
If you have physical access to the system, maybe try to check the PSU connectors and just put the GPU out of the PCIe slot and back in. |
I would like to gather some additional information about your GPU. Can you run the program I attached and post the results? |
Hi,
For some reason we are trying to understand, it looks like the link between the CPU and the GPU is PCIe v2.0 instead of PCIe v3.0. In theory the GPU is compatible with PCIe 3.0 and the slot where it is connected to as well. So yes, a bit strange. |
I've tried this :
Edit /etc/modprobe.d/local.conf or create a new file like /etc/modprobe.d/nvidia.conf
and add this
options nvidia NVreg_EnablePCIeGen3=1
but did not work
Cheers
Aldo
…On 05/10/2017 11:24 AM, Martin Brückner wrote:
Hi
lspci -nvvs 27:00.0
[...]
LnkCap: Port #0, Speed 5GT/s, Width x16, [...]
LnkSta: Speed 2.5GT/s, Width x16, [...]
dmidecode
[...]
Handle 0x0908, DMI type 9, 17 bytes
System Slot Information
Designation: PCI-E Slot 8
Type: x16 PCI Express 3
Current Usage: In Use
Length: Long
ID: 8
Characteristics:
3.3 V is provided
PME signal is supported
Bus Address: 0000:27:00.0
lspci shows that the card can handle 5GT/s but it gets only 2.5GT/s.
This is strange since 5GT/s is PCIe 2.0 (wikipedia) and NVidia claims that the Titan can do PCIe 3.0.
dmidecode shows that the slot can handle PCIe 3.0 with 16 lanes.
Martin
Am 10.05.2017 um 11:11 schrieb lopez-c:
> Hi,
> We need to find out the reason why the transfers are so slow. We will
> keep you up to date.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#18 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AXSCCY3sOpi3e2sHhx_SimqniLXd76Ksks5r4X9BgaJpZM4NWSFm>.
>
|
Okay, so I have tried looking for more possible explanations for our GPU bandwidth problem. A processor supports a certain number of PCIe lanes. In your case that max number of lanes should be 40. However, different chipsets on motherboards support a different number of PCIe lanes. Sometimes the number of lanes available per PCIe socket depends on how many devices are connected. For example: If there are devices plugged into socket 1 and 3, only 8 lanes respectively are available. I can look into this, but I need to know what motherboard is used, which PCIe sockets are used and how many lanes are (theoretically) occupied by those devices. TL;DR: Max # of PCIe lanes could be the issue. For now I would like to know the model of the motherboard. |
Hi,
it's the server HP ML350P Gen8.
The GPU sits in a suitable slot and see the output of lspci and
dmidecode. The there are 16 lanes connected to the GPU.
As said before: Card and Mainboard support PCIe 3 (8GT/s) but the link
is only PCIe 2 (5GT/s). This limits the bandwith to max 8GB/s.
Am 11.05.2017 um 15:29 schrieb TheFl0w:
… Okay, so I have tried looking for more possible explanations for our GPU
bandwidth problem. A processor supports a certain number of PCIe lanes.
In your case that max number of lanes should be 40. However, different
chipsets on motherboards support a different number of PCIe lanes.
Sometimes the number of lanes available per PCIe socket depends on how
many devices are connected. For example: /If there are devices plugged
into socket 1 and 3, only 8 lanes respectively are available/. I can
look into this, but I need to know what motherboard is used, which PCIe
sockets are used and how many lanes are (theoretically) occupied by
those devices.
*TL;DR:* Max # of PCIe lanes could be the issue. For now I would like to
know the model of the motherboard.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXSCCXEc85StzV_PweDWVEvUHxdkOlPVks5r4w09gaJpZM4NWSFm>.
|
This is what the manual says about the expansion slots.
dmidecode reported:
Expansion slot 1 has a connector width of x16 while it only supports x8. Please make sure the card is not plugged into slot 1. If so, consider placing it in slot 3 instead. |
If Linux labels the PCIe slots correctly, I am out of ideas for now. I will ask around at work tomorrow, maybe this is a common problem. |
When we ran the memory bandwidth test on your nvidia TITAN Black at PSI , we got some unexpected results. If I remember correctly, we measured about 6000 MiB/s for data transfer between host and GPU. PCIe 3.0 should actually give us twice the bandwidth. I ran the same tests on the GPU I use at home (GTX 780) so I could find out if consumer grade GPUs are more limited when it comes to data transfer rates. It turned out that data transfer for my card is as fast as data transfer of the Tesla K80x cards we use in our HPC cluster. Can you post the results for your TITAN Black, please?
Here is the output of the bandwidth test:
Pinned memory (physically contiguous)
Pageable memory (virtually contiguous)
The text was updated successfully, but these errors were encountered: