# Data structure 

## meta.data columns

**nCount_RNA**: Total number of transcript (UMI) counts per nucleus.

**nFeature_RNA**: Number of unique genes detected per nucleus.

**sample_id**: Unique sample identifier (e.g., LV_####).

**percent.mt**: Percentage of reads mapping to mitochondrial genes, used for quality control.

**scDblFinder.weighted**: Weighted doublet score computed by scDblFinder to flag likely doublets.

**treatment**: Experimental condition (CTRL, ALDO, REC).

**sex**: Biological sex of the mouse (m, f).

**batch**: Batch number, representing independent sequencing runs (1, 2, 3).

**seurat_clusters**: Clustering result from Seurat based on transcriptomic similarity (0–14).

**cell_type**: Initial cell type annotation mapped from seurat_clusters.

**cell_type_sub**: Refined cell type annotation incorporating fibroblast subclustering and reassignments across the full dataset.

**cell_type_comb**: Collapsed cell type annotation where all cardiomyocyte clusters are grouped into a single CM category.

**phase**: Predicted cell cycle phase based on canonical marker expression (G1, S, G2M).

**cardiomyocyte**: Binary indicator of cardiomyocyte identity (CM, non-CM).

## embeddings

**harmony**, **pca**, **umap**: PCs calculated with protein coding genes.

**_all**: PCs calculated with all genes.

# Load libraries

In [1]:
suppressPackageStartupMessages({
    suppressWarnings({
        library(Seurat)
        library(SeuratDisk)
    })})

In [2]:
setwd("/media/daten/dmeral/scseq_analysis/2024_LV_CTRL_ALDO_REC")

# Load obj and clean-up

In [None]:
obj <- LoadH5Seurat("subcluster/FB/full_obj_with_Subcluster_FB_annotations.h5seurat")

In [19]:
# Clean-up of meta data

obj$orig.ident <- NULL
obj$log1p_total_counts <- NULL
obj$log1p_n_genes_by_counts <- NULL
obj$soup_group <- NULL
obj$nCount_original.counts <- NULL
obj$nFeature_original.counts <- NULL
obj$ident <- NULL
obj$scDblFinder.class <- NULL
obj$scDblFinder.score <- NULL
obj$scDblFinder.cxds_score <- NULL
obj$RNA_snn_res.0.25 <- NULL
obj$CMgenes1 <- NULL
obj$seurat_clusters <- obj$Ident_numerical
obj$Ident_numerical <- NULL
obj$S.Score <- NULL
obj$G2M.Score <- NULL
obj$seurat_clusters_protein_coding <- NULL
obj$replicate <- NULL
obj$chamber <- NULL
obj$unique <- NULL
obj$cell_type_comb <- obj$cell_type_CMcomb
obj$cell_type_CMcomb <- NULL
obj$phase <- obj$Phase
obj$Phase <- NULL
obj$cardiomyocyte <- obj$Cardiomyocyte
obj$Cardiomyocyte <- NULL

In [27]:
# Clean-up of embedding keys

Key(obj[["umap"]]) <- "UMAP_"
Key(obj[["harmony"]]) <- "HARMONY_"
Key(obj[["pca"]]) <- "PCA_"

obj[["harmony_protein_coding"]] <- NULL
obj[["pca_protein_coding"]] <- NULL
obj[["umap_protein_coding"]] <- NULL

In [33]:
saveRDS(obj, "seurat_objects/2025_MR_HFpEF_Meral.rds")

In [30]:
sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS/LAPACK: /media/daten/dmeral/micromamba/envs/user_R/lib/libopenblasp-r0.3.26.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SeuratDisk_0.0.0.9021 SeuratObject_5.0.1    Seurat_4.4.0         

loaded via a namespace (and not attached):
  [1] Rtsne_0.17             colorspace_2.1-1       deldir_2.0-4          
  [4] ggridges_0.5.6         IRdisplay_1.1          base64enc_0.1-3       
  [7] dichromat_2.0-0.1      spat