Clustering by GPU
==================

In short, we try a lot of technologies and find that GPU acceleration is the most efficient way to speed up clustering.

Requirements
============

CUDA
----

***Linux*** users installing CUDA follow the guide, [NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).

Installing CUDA on ***Windows*** is a bit more complicated because Stereopy is not supported on Windows. Following the guide [CUDA on WSL User Guide](https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl), you can run Stereopy with the GPU option on WSL.

RAPIDS on Anaconda
----

Select the correct version on [RAPIDS' official website](https://rapids.ai/start.html). The following command is closely related to my environment:

``conda create -n rapids-22.12 -c rapidsai -c conda-forge -c nvidia rapids=22.12 python=3.8 cudatoolkit=11.5``

NOTE: My real experience installing CUDA on WSL successfully with ***NVIDIA Studio Driver WHQL 522.30*** according to this [bug](https://forums.developer.nvidia.com/t/cudaruntimeapierror-100-call-to-cudaruntimegetversion-results-in-cuda-error-no-device/234740)  reporter's advice. By the way, this is my personal PC environment with CUDA:
* Intel Core i7-7700k
* NVIDIA-GeForce-RTX-3060（NVIDIA-SMI 522.30; Driver Version: 522.30; CUDA Version: 11.8）
* WSL2 on Windows10(21H2)

Stereopy Installation
----

Installing StereoPy by anaconda fails in this Conda GPU environment. Only [PyPI Installation](https://stereopy.readthedocs.io/en/latest/General/Installation.html#pypi) will succeed.

``pip install stereopy``

# Test

Common part:

In [2]:
import sys
sys.path.insert(0, "/ldfssz1/ST_BI/USER/lyonlin/stereopy")
import stereo as st

gef_file = '/ldfssz1/ST_BI/USER/stereopy/test/xujunhao/data/SS200000135TL_D1/SS200000135TL_D1.gef'
bin_size = 50

data = st.io.read_gef(gef_file, bin_size=bin_size)
data.tl.cal_qc()
print(data.exp_matrix.shape)
data.tl.normalize_total(target_sum=1e4)
data.tl.log1p()
data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)
data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')

[2022-12-21 15:39:55][Stereo][66497][140419630081856][reader][625][INFO]: read_gef begin ...
[2022-12-21 15:40:10][Stereo][66497][140419630081856][reader][698][INFO]: the matrix has 149399 cells, and 24604 genes.
[2022-12-21 15:40:11][Stereo][66497][140419630081856][reader][706][INFO]: read_gef end.
[2022-12-21 15:40:12][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run cal_qc...
[2022-12-21 15:40:12][Stereo][66497][140419630081856][st_pipeline][35][INFO]: cal_qc end, consume time 0.3396s.
[2022-12-21 15:40:12][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run normalize_total...


(149399, 24604)


[2022-12-21 15:40:12][Stereo][66497][140419630081856][st_pipeline][35][INFO]: normalize_total end, consume time 0.5192s.
[2022-12-21 15:40:12][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run log1p...
[2022-12-21 15:40:13][Stereo][66497][140419630081856][st_pipeline][35][INFO]: log1p end, consume time 0.8321s.
[2022-12-21 15:40:13][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run highly_variable_genes...
[2022-12-21 15:40:17][Stereo][66497][140419630081856][st_pipeline][35][INFO]: highly_variable_genes end, consume time 3.4336s.
[2022-12-21 15:40:17][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run pca...
[2022-12-21 15:40:23][Stereo][66497][140419630081856][st_pipeline][35][INFO]: pca end, consume time 6.4518s.


CPU clustering demo script:

In [2]:
data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)
data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')
data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')

[2022-12-21 14:42:14][Stereo][61764][139713726842688][st_pipeline][32][INFO]: start to run neighbors...
[2022-12-21 14:42:41][Stereo][61764][139713726842688][st_pipeline][35][INFO]: neighbors end, consume time 26.8832s.
[2022-12-21 14:42:41][Stereo][61764][139713726842688][st_pipeline][32][INFO]: start to run umap...


	completed  0  /  200 epochs
	completed  20  /  200 epochs
	completed  40  /  200 epochs
	completed  60  /  200 epochs
	completed  80  /  200 epochs
	completed  100  /  200 epochs
	completed  120  /  200 epochs
	completed  140  /  200 epochs
	completed  160  /  200 epochs
	completed  180  /  200 epochs


[2022-12-21 14:44:16][Stereo][61764][139713726842688][st_pipeline][35][INFO]: umap end, consume time 95.9194s.
[2022-12-21 14:44:16][Stereo][61764][139713726842688][st_pipeline][32][INFO]: start to run leiden...
[2022-12-21 14:45:13][Stereo][61764][139713726842688][st_pipeline][35][INFO]: leiden end, consume time 56.9536s.


GPU clustering demo script:

In [3]:
data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8, method='rapids')
data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral', method='rapids')
data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden', method='rapids')

[2022-12-21 14:45:13][Stereo][61764][139713726842688][st_pipeline][32][INFO]: start to run neighbors...
[2022-12-21 14:45:27][Stereo][61764][139713726842688][neighbors][76][INFO]: cuml NearestNeighbors run end
[2022-12-21 14:45:30][Stereo][61764][139713726842688][st_pipeline][35][INFO]: neighbors end, consume time 16.5405s.
[2022-12-21 14:45:30][Stereo][61764][139713726842688][st_pipeline][32][INFO]: start to run umap...


[D] [14:45:31.054023] /workspace/.conda-bld/work/cpp/src/umap/runner.cuh:107 n_neighbors=10
[D] [14:45:31.060267] /workspace/.conda-bld/work/cpp/src/umap/runner.cuh:129 Calling knn graph run


[2022-12-21 14:45:39][Stereo][61764][139713726842688][st_pipeline][35][INFO]: umap end, consume time 8.7355s.
[2022-12-21 14:45:39][Stereo][61764][139713726842688][st_pipeline][32][INFO]: start to run leiden...


[D] [14:45:31.366800] /workspace/.conda-bld/work/cpp/src/umap/runner.cuh:135 Done. Calling fuzzy simplicial set
[D] [14:45:31.373483] /workspace/.conda-bld/work/cpp/src/umap/fuzzy_simpl_set/naive.cuh:317 Smooth kNN Distances
[D] [14:45:31.373641] /workspace/.conda-bld/work/cpp/src/umap/fuzzy_simpl_set/naive.cuh:319 sigmas = [ 1.66096, 3.51944, 3.39145, 5.00403, 4.24579, 0.635422, 6.13477, 2.1259, 0.802788, 2.81464, 10.1738, 1.67606, 2.70644, 2.61655, 8.5838, 1.9319, 2.06683, 5.92374, 2.35748, 3.3945, 2.19998, 13.5659, 3.18521, 1.15547, 2.06163 ]

[D] [14:45:31.373683] /workspace/.conda-bld/work/cpp/src/umap/fuzzy_simpl_set/naive.cuh:321 rhos = [ 18.8295, 4.74797, 12.2983, 20.5062, 11.9926, 0.968774, 21.8343, 16.933, 13.0154, 20.0772, 4.94045, 4.10428, 15.5052, 10.3524, 14.3463, 7.09102, 21.7079, 27.1835, 17.1671, 5.06311, 7.3557, 12.2797, 11.9784, 8.74901, 5.08703 ]

[D] [14:45:31.373699] /workspace/.conda-bld/work/cpp/src/umap/fuzzy_simpl_set/naive.cuh:345 Compute Membership Strength


INFO:numba.cuda.cudadrv.driver:init
[2022-12-21 14:45:50][Stereo][61764][139713726842688][st_pipeline][35][INFO]: leiden end, consume time 10.9636s.


# GPU Option at `neighbors, umap, leiden and louvain`

The use of clustering methods neighbors, umap and leiden, is shown just before, setting the `method` parameter to `rapids`.

`louvain` setting the flavor to `rapids` can also use GPU acceleration, just like below:

In [3]:
data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8, method='rapids')
data.tl.louvain(neighbors_res_key='neighbors', res_key='louvain', flavor='rapids', use_weights=True)

[2022-12-21 15:40:23][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run neighbors...
[2022-12-21 15:40:36][Stereo][66497][140419630081856][neighbors][76][INFO]: cuml NearestNeighbors run end
[2022-12-21 15:40:43][Stereo][66497][140419630081856][st_pipeline][35][INFO]: neighbors end, consume time 20.1793s.
[2022-12-21 15:40:43][Stereo][66497][140419630081856][st_pipeline][32][INFO]: start to run louvain...
INFO:numba.cuda.cudadrv.driver:init
[2022-12-21 15:41:11][Stereo][66497][140419630081856][st_pipeline][35][INFO]: louvain end, consume time 27.4569s.
