# Step-by-Step NMF Analysis on scRNA-seq Data

This notebook demonstrates how to perform Non-Negative Matrix Factorization (NMF) on single-cell RNA sequencing data using the CoGAPS algorithm. We will utilize the <strong>scanpy</strong> library for data handling and <strong>scikit-learn</strong> for NMF implementation.

In [None]:
# Import necessary libraries
import scanpy as sc
import numpy as np
from sklearn.decomposition import NMF
import matplotlib.pyplot as plt
import pandas as pd

# Load scRNA-seq data (replace 'data.h5ad' with your dataset)
adata = sc.read_h5ad('data.h5ad')

# Preprocess data
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Extract expression matrix
expression_matrix = adata.X.toarray()

# Perform NMF
model = NMF(n_components=8, init='random', random_state=42)
W = model.fit_transform(expression_matrix)
H = model.components_

# Add NMF results to AnnData object
adata.obsm['X_nmf'] = W
adata.varm['H_nmf'] = H

# Visualize NMF components
plt.figure(figsize=(10, 8))
plt.matshow(H, aspect='auto', cmap='viridis')
plt.colorbar()
plt.xlabel('Components')
plt.ylabel('Genes')
plt.title('NMF Components Heatmap')
plt.show()

# Save the results
adata.write('nmf_results.h5ad')