# Testing Memory Management with RAPIDS

This notebook demonstrates the impact of RAPIDS Memory Manager (RMM) configuration on GPU-accelerated single-cell analysis, specifically focusing on Harmony batch correction.

## Overview

Harmony batch correction performs frequent allocation and deallocation of GPU arrays during its iterative optimization process. This makes it particularly sensitive to memory management overhead and bandwidth constraints. RMM configuration can significantly impact Harmony's performance by:

- **Reducing allocation overhead**: Memory pooling eliminates the cost of repeated malloc/free operations
- **Improving bandwidth utilization**: Efficient memory reuse reduces memory bandwidth bottlenecks
- **Faster execution**: Minimizing allocation/deallocation cycles speeds up the overall workflow

## What is RMM?

The RAPIDS Memory Manager (RMM) provides efficient GPU memory allocation and pooling strategies that can significantly improve performance by:
- Reducing allocation overhead through memory pooling
- Supporting memory oversubscription with managed memory
- Providing fine-grained control over GPU memory usage

This notebook will compare different RMM configurations and their effects on Harmony batch correction workflows.

In [None]:
import rapids_singlecell as rsc
import scanpy as sc
import rmm
import cupy as cp
import pandas as pd
import anndata as ad
import decoupler as dc


ℹ️ Note: The dataset used in this notebook is generated in `01_demo_gpu.ipynb`. 

In [None]:
adata = sc.read_h5ad("h5/dli_decoupler.h5ad")

In [None]:
from rmm.allocators.cupy import rmm_cupy_allocator
rmm.reinitialize(
    managed_memory=False, # Allows oversubscription
    pool_allocator=True, # default is False
)
cp.cuda.set_allocator(rmm_cupy_allocator)

In [None]:
%%time
rsc.pp.harmony_integrate(adata, key="assay", dtype=cp.float32)