<hr style="border-width:4px; border-color:coral"></hr>

# GPUs on the Redhawk cluster

<hr style="border-width:4px; border-color:coral"></hr>

We can use Slurm commands to get information about the nodes available on Redhawk, and the availability of GPUs.  

The `sinfo` command indicates that Redhawk has six nodes.  The available states are `alloc` (in use), `idle` (available) or `down` (not available).   The `*` designation indicates that the partition is a default partition (not named).  

In [14]:
%%bash

sinfo -N

NODELIST   NODES PARTITION STATE 
node1          1   normal* alloc 
node2          1   normal* idle  
node3          1   normal* idle  
node4          1   normal* idle  
node5          1   normal* idle  
node6          1   normal* idle  


<hr style="border-color:black; border-width:2px"></hr>

On Redhawk, each node has access to 2 GPUs, for a total of 12 GPUs.  To get more information on the GPUs, we first have to load the NVIDIA Toolkit. 

    module load cuda/10.1 

Add this to your `.bashrc` file on Redhawk.   Below, we also check to see what modules are loaded in your current environment.  Each of these can be added to your .bashrc file. 

In [15]:
%%bash 

module load cuda/10.1

module list


Currently Loaded Modules:
  1) autotools   3) ohpc         5) openmpi3/3.1.4   7) cuda/10.1
  2) prun/1.3    4) gnu8/8.3.0   6) anaconda/3.7

 



<hr style="border-color:black; border-width:2px"></hr>

Once the modules are loaded, we can call NVIDIA utility functions.  Using the Slurm `srun` command, we can specify which node we want to query. 

In [20]:
%%bash 

srun nvidia-smi 

Thu Apr  2 16:12:52 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX TIT...  Off  | 00000000:02:00.0 Off |                  N/A |
| 18%   58C    P0    74W / 250W |      0MiB / 12212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 00000000:03:00.0 Off |                  N/A |
| 16%   57C    P0    68W / 250W |      0MiB / 12212MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-------

<hr style="border-color:black; border-width:2px"></hr>

We can form a query that just returns the model of each GPU. Nodes 1-4 each have two GeForce GTX Titan X GPUs.  These were introduced in March, 2015. 

**Note:** You may need to modify the commands below to include only those nodes that are available. 

In [17]:
%%bash 

srun --nodelist=node2,node3,node4 nvidia-smi --list-gpus

GPU 0: GeForce GTX TITAN X (UUID: GPU-16b25d2a-b8c7-92be-5a41-f7598b3fda06)
GPU 1: GeForce GTX TITAN X (UUID: GPU-c3ee78de-e0b2-c807-4117-4287bf2f898e)
GPU 0: GeForce GTX TITAN X (UUID: GPU-f45e1171-f4dd-4e84-45de-b1f83a07e52d)
GPU 1: GeForce GTX TITAN X (UUID: GPU-07a29663-2afe-47f1-e619-833d2b6446d2)
GPU 0: GeForce GTX TITAN X (UUID: GPU-7a59d1e0-95b1-4515-02c6-bbc61407a7dc)
GPU 1: GeForce GTX TITAN X (UUID: GPU-0ab7efce-479b-3a3f-5799-881569d86cc3)


<hr style="border-color:black; border-width:2px"></hr>

**Node 5** has two GeForce GTX Titan GPUs

In [18]:
%%bash 

srun --nodelist=node5  nvidia-smi --list-gpus

GPU 0: GeForce GTX TITAN (UUID: GPU-34dce90e-74c7-4136-b3cb-4565f26f8a3f)
GPU 1: GeForce GTX TITAN (UUID: GPU-3308982b-7882-294f-24d2-bc12a269b72a)


<hr style="border-color:black; border-width:2px"></hr>

**Node 6** has a Tesla K20c and a Tesla K40c, which were introduced in November 2012 and October 2013, respectively.  

In [19]:
%%bash 

srun --nodelist=node6 nvidia-smi --list-gpus

GPU 0: Tesla K20c (UUID: GPU-14a3bcd4-ed37-b9b1-bcc0-3fec63d8a81f)
GPU 1: Tesla K40c (UUID: GPU-f6e23783-a373-c7df-3fd0-728cd98d114f)


<hr style="border-color:black; border-width:2px"></hr>

We can query each nodes to get more detailed information about its current state. 

In [7]:
%%bash 

srun  --nodelist=node4  nvidia-smi -q



Timestamp                           : Thu Apr  2 14:36:53 2020
Driver Version                      : 430.50
CUDA Version                        : 10.1

Attached GPUs                       : 2
GPU 00000000:02:00.0
    Product Name                    : GeForce GTX TITAN X
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0422215071154
    GPU UUID                        : GPU-f45e1171-f4dd-4e84-45de-b1f83a07e52d
    Minor Number                    : 0
    VBIOS Version                   : 84.00.1F.00.90
    MultiGPU Board                  : No
    Board ID                        : 0x200
    GPU Part Number                 : N/

<hr style="border-color:black; border-width:2px"></hr>

## Using `deviceQuery`

NVIDIA also provides some demonstration programs that provide more detailed information on computing capabilities of available GPUs.  On Redhawk, these programs are located at : 

    /apps/cuda/10.1/extras/demo_suite
    
One particularly useful demo is the `deviceQuery` utility. Below, we provide information on nodes 1-6.  

<hr style="border-color:black; border-width:2px"></hr>

### Nodes 1-4

**Nodes 1-4** all have the  GeForce GTX TITAN X GPUs.  Key information about these GPUs is: 
 
    Device 0: "GeForce GTX TITAN X"
      CUDA Driver Version / Runtime Version          10.1 / 10.1
      CUDA Capability Major/Minor version number:    5.2
      Total amount of global memory:                 12213 MBytes (12806062080 bytes)
      (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores 
      ....
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)      
      ....
      
From the above, we learn that the GeForce GTX TITAN X has compute capability  5.2, 12GB of `global memory`, 24 streaming multiprocessors, and a maximum of 1024 threads per block. 

You can download the source code for the `deviceQuery` function, as well as many other use demonstrations on GitHub [here](https://github.com/NVIDIA/cuda-samples).      

In [12]:
%%bash 

srun --nodelist=node4  /apps/cuda/10.1/extras/demo_suite/deviceQuery

/apps/cuda/10.1/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX TITAN X"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 12213 MBytes (12806062080 bytes)
  (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores
  GPU Max Clock rate:                            1076 MHz (1.08 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:           

<hr style="border-color:black; border-width:2px"></hr>

### Node 5

**Node 5** on Redhawk has the GeForce GTX TITAN with key information :  
 
    Device 0: "GeForce GTX TITAN"
      CUDA Driver Version / Runtime Version          10.1 / 10.1
      CUDA Capability Major/Minor version number:    3.5
      Total amount of global memory:                 6084 MBytes (6379143168 bytes)
      (14) Multiprocessors, (192) CUDA Cores/MP:     2688 CUDA Cores
      ....
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

      
From the above, we learn that the GeForce GTX TITAN has compute capability  3.5, 6GB of `global memory`, and 14 streaming multiprocessors. 

In [10]:
%%bash 

srun --nodelist=node5  /apps/cuda/10.1/extras/demo_suite/deviceQuery

/apps/cuda/10.1/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce GTX TITAN"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 6084 MBytes (6379143168 bytes)
  (14) Multiprocessors, (192) CUDA Cores/MP:     2688 CUDA Cores
  GPU Max Clock rate:                            876 MHz (0.88 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               6

<hr style="border-color:black; border-width:2px"></hr>

### Node 6

**Node 6** has two different GPUs, the Tesla K20c and Tesla K40c. Key information about these nodes is: 

    Device 0: "Tesla K40c"
      CUDA Driver Version / Runtime Version          10.1 / 10.1
      CUDA Capability Major/Minor version number:    3.5
      Total amount of global memory:                 11441 MBytes (11996954624 bytes)
      (15) Multiprocessors, (192) CUDA Cores/MP:     2880 CUDA Cores
      ....
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
      
 and
 
    Device 1: "Tesla K20c"
      CUDA Driver Version / Runtime Version          10.1 / 10.1
      CUDA Capability Major/Minor version number:    3.5
      Total amount of global memory:                 4744 MBytes (4974313472 bytes)
      (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
      ....
      Warp size:                                     32
      Maximum number of threads per multiprocessor:  2048
      Maximum number of threads per block:           1024
      Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
      Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)      

In [13]:
%%bash 

srun --nodelist=node6  /apps/cuda/10.1/extras/demo_suite/deviceQuery

/apps/cuda/10.1/extras/demo_suite/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla K40c"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 11441 MBytes (11996954624 bytes)
  (15) Multiprocessors, (192) CUDA Cores/MP:     2880 CUDA Cores
  GPU Max Clock rate:                            745 MHz (0.75 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 