## How do I access the GPU cluster?
+ The GPU cluster is at: `breeze.cin.ucsf.edu`
+ You can ssh to it by `ssh username@breeze.cin.ucsf.edu`
	+ You don’t need to specify the port like on the virgas (it is the default port 22)
+ There are currently 8 available GPUs and they are referred to by their IDs which starts at 0 and end at 7
+ Each GPU has 80 GB of RAM


## How do I select a GPU to use?
To use GPU decoding, you first must install `cupy`. The best way to do this is via conda because it takes care of the installing the correct cuda-toolkit:
```	
conda install cupy
```


Next you will want to select a single GPU for decoding. You can accomplish this using  `cp.cuda.Device(GPU_ID)` with a context manager (the `with` statement). Warning: if you don't use the context manager, cupy will default to using GPU 0.

For example, This runs the indented code on GPU #6.

In [1]:
import logging

import cupy as cp
import numpy as np

from replay_trajectory_classification import SortedSpikesClassifier
from replay_trajectory_classification.sorted_spikes_simulation import make_simulated_run_data, make_continuous_replay
from replay_trajectory_classification.environments import Environment
from replay_trajectory_classification.continuous_state_transitions import RandomWalk, Uniform, Identity, estimate_movement_var


logging.basicConfig(
    level='INFO',
    format='%(asctime)s %(message)s',
    datefmt='%d-%b-%y %H:%M:%S')


# Create simulated data
time, position, sampling_frequency, spikes, place_fields = make_simulated_run_data()

replay_time, test_spikes = make_continuous_replay()

# Set up classifier
movement_var = estimate_movement_var(position, sampling_frequency)

environment = Environment(place_bin_size=np.sqrt(movement_var))
continuous_transition_types = [[RandomWalk(movement_var=movement_var * 120),  Uniform(), Identity()],
                                [Uniform(),                                   Uniform(), Uniform()],
                                [RandomWalk(movement_var=movement_var * 120), Uniform(), Identity()],
                               ]
classifier = SortedSpikesClassifier(
    environments=environment,
    continuous_transition_types=continuous_transition_types,
    sorted_spikes_algorithm='spiking_likelihood_kde_gpu',
    sorted_spikes_algorithm_params={'position_std': 3.0}
)
state_names = ['continuous', 'fragmented', 'stationary']
    
# Use GPU #6
GPU_ID = 5

# use context manager to specify which GPU (device)
with cp.cuda.Device(GPU_ID):
    # Fit the model place fields
    classifier.fit(
        position,
        spikes)

    # Run the model on the simulated replay
    results = classifier.predict(
        test_spikes,
        time=replay_time,
        state_names=state_names,
        use_gpu=True,
    )

18-May-22 13:50:00 Fitting initial conditions...
18-May-22 13:50:00 Fitting continuous state transition...
18-May-22 13:50:00 Fitting discrete state transition
18-May-22 13:50:00 Fitting place fields...


  0%|          | 0/19 [00:00<?, ?it/s]

18-May-22 13:50:01 Estimating likelihood...


  0%|          | 0/19 [00:00<?, ?it/s]

18-May-22 13:50:02 Estimating causal posterior...
18-May-22 13:50:03 Estimating acausal posterior...


## Which GPU should I use?
You can see which GPUs are occupied by running `nvidia-smi` in the command line
or alternatively `!nvidia-smi` in a jupyter notebook. Pick a GPU with low memory usage. For example, in the below output, GPUs 1, 4, 6, and 7 are probably not in use because they have low memory utilization. In addition, the power should be around 42W if not in use:

In [2]:
!nvidia-smi

Wed May 18 13:50:03 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100 80G...  On   | 00000000:4F:00.0 Off |                    0 |
| N/A   34C    P0    66W / 300W |   5261MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  On   | 00000000:52:00.0 Off |                    0 |
| N/A   30C    P0    42W / 300W |     38MiB / 81920MiB |      0%      Default |
|       


## How do I use multiple GPUs at once?
To use multiple GPUs, you need to install the `dask_cuda` package with conda:
```bash
conda install -c rapidsai -c nvidia -c conda-forge dask-cuda
```

Next you will set up a client, which controls which GPUs will be used. By default, this will use all the GPUs you have, but if you want to use specific GPUs, you accomplish this using the `CUDA_VISIBLE_DEVICES` argument like so:

In [3]:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES=[4, 5, 6])
client = Client(cluster)

client

2022-05-18 13:50:10,288 - distributed.diskutils - INFO - Found stale lock file and directory '/stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-ly7bpyy1', purging
2022-05-18 13:50:10,296 - distributed.diskutils - INFO - Found stale lock file and directory '/stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-n6nteep3', purging
2022-05-18 13:50:10,302 - distributed.diskutils - INFO - Found stale lock file and directory '/stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-okcse855', purging
2022-05-18 13:50:10,305 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-05-18 13:50:10,313 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-05-18 13:50:10,319 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize


0,1
Connection method: Cluster object,Cluster type: dask_cuda.LocalCUDACluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 3
Total threads: 3,Total memory: 3.94 TiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:37725,Workers: 3
Dashboard: http://127.0.0.1:8787/status,Total threads: 3
Started: Just now,Total memory: 3.94 TiB

0,1
Comm: tcp://127.0.0.1:35731,Total threads: 1
Dashboard: http://127.0.0.1:46067/status,Memory: 1.31 TiB
Nanny: tcp://127.0.0.1:34151,
Local directory: /stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-n17o941r,Local directory: /stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-n17o941r
GPU: NVIDIA A100 80GB PCIe,GPU memory: 80.00 GiB

0,1
Comm: tcp://127.0.0.1:39549,Total threads: 1
Dashboard: http://127.0.0.1:42725/status,Memory: 1.31 TiB
Nanny: tcp://127.0.0.1:41769,
Local directory: /stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-ppy7ls9e,Local directory: /stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-ppy7ls9e
GPU: NVIDIA A100 80GB PCIe,GPU memory: 80.00 GiB

0,1
Comm: tcp://127.0.0.1:39335,Total threads: 1
Dashboard: http://127.0.0.1:34189/status,Memory: 1.31 TiB
Nanny: tcp://127.0.0.1:46373,
Local directory: /stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-m7thw3ne,Local directory: /stelmo/edeno/nwb_datajoint/notebooks/dask-worker-space/worker-m7thw3ne
GPU: NVIDIA A100 80GB PCIe,GPU memory: 80.00 GiB


This uses three GPUs with IDS 4, 5, and 6.

Finally, to run the code, you need to wrap the function that you want to run on each GPU with the `dask.delayed` decorator.


For example, say we want to run the function `test_gpu` on each item of `data` where each item is processed on a different GPU.

In [4]:
import cupy as cp
import dask
import logging

def setup_logger(name_logfile, path_logfile):
    """Sets up a logger for each function that outputs
    to the console and to a file"""
    logger = logging.getLogger(name_logfile)
    formatter = logging.Formatter('%(asctime)s %(message)s', datefmt='%d-%b-%y %H:%M:%S')
    fileHandler = logging.FileHandler(path_logfile, mode='w')
    fileHandler.setFormatter(formatter)
    streamHandler = logging.StreamHandler()
    streamHandler.setFormatter(formatter)

    logger.setLevel(logging.INFO)
    logger.addHandler(fileHandler)
    logger.addHandler(streamHandler)
    
    return logger

# This uses the dask.delayed decorator on the test_gpu function
@dask.delayed
def test_gpu(x, ind):
    # Create a log file for this run of the function
    logger = setup_logger(
        name_logfile=f'test_{ind}',
        path_logfile=f'test_{ind}.log')

    # Test to see if these go into different log files
    logger.info(f'This is a test of {ind}')
    logger.info('This should be in a unique file')

    # Run a GPU computation
    return cp.asnumpy(cp.mean(x[:, None] @ x[:, None].T, axis=0))


# Make up 10 fake datasets
x = cp.random.normal(size=10_000, dtype=cp.float32)
data = [x + i for i in range(10)]

# Append the result of the computation into a results list
results = [test_gpu(x, ind) for ind, x in enumerate(data)]

# Run `dask.compute` on the results list for the code to run
dask.compute(*results)

18-May-22 13:50:12 This is a test of 4
18-May-22 13:50:12 This should be in a unique file
18-May-22 13:50:12 This is a test of 3
18-May-22 13:50:12 This should be in a unique file
18-May-22 13:50:12 This is a test of 1
18-May-22 13:50:12 This should be in a unique file
18-May-22 13:50:13 This is a test of 9
18-May-22 13:50:13 This should be in a unique file
18-May-22 13:50:13 This is a test of 6
18-May-22 13:50:13 This should be in a unique file
18-May-22 13:50:13 This is a test of 5
18-May-22 13:50:13 This should be in a unique file
18-May-22 13:50:13 This is a test of 0
18-May-22 13:50:13 This should be in a unique file
18-May-22 13:50:13 This is a test of 2
18-May-22 13:50:13 This should be in a unique file
18-May-22 13:50:13 This is a test of 7
18-May-22 13:50:13 This should be in a unique file
18-May-22 13:50:13 This is a test of 8
18-May-22 13:50:13 This should be in a unique file


(array([ 0.00106875, -0.0032616 , -0.00759213, ..., -0.00612375,
         0.0059738 , -0.01329288], dtype=float32),
 array([ 1.1191896 ,  0.6762629 ,  0.23331782, ...,  0.38350907,
         1.6208985 , -0.34978032], dtype=float32),
 array([4.23731  , 3.3557875, 2.4742277, ..., 2.7731416, 5.235823 ,
        1.3137323], dtype=float32),
 array([ 9.3554325,  8.035313 ,  6.7151375, ...,  7.1627736, 10.850748 ,
         4.9772453], dtype=float32),
 array([16.47355 , 14.714837, 12.956048, ..., 13.552408, 18.465672,
        10.640757], dtype=float32),
 array([25.591675, 23.39436 , 21.196953, ..., 21.942041, 28.0806  ,
        18.304268], dtype=float32),
 array([36.7098  , 34.073883, 31.437868, ..., 32.331676, 39.695522,
        27.967781], dtype=float32),
 array([49.827915, 46.753407, 43.67878 , ..., 44.721313, 53.31045 ,
        39.631294], dtype=float32),
 array([64.946045, 61.432938, 57.9197  , ..., 59.11094 , 68.92538 ,
        53.2948  ], dtype=float32),
 array([82.064156, 78.11245 , 74.1

This example also shows how to create a log file for each item in data with the `setup_logger` function.