# Basic Pipeline

2025-07-22

In [2]:
library(Seurat)
library(harmony)
library(ggplot2)
set.seed(42) # for reproducibility

save_path = "basic_pipeline"

## Load data

The data used in this example is from the paper [Sikkema, L. et al. (2023)](https://doi.org/10.1038/s41591-023-02327-2).
This data can be accessed through this [collection](https://cellxgene.cziscience.com/collections/6f6d381a-7701-4781-935c-db10d30de293) on the Cellxgene platform. 
In this example however, we will use the sampled version of this data. 
Therefore, we set `min.cells` and `min.features` to 0, avoiding any further filtering.

In [None]:
count_matrix <- read.csv("/BiO/data/process/basic_pipeline_data/HLCA_pulmonary_fibrosis_immune_raw.csv", row.names = 1)
meta.data <- read.csv("/BiO/data/process/basic_pipeline_data/HLCA_pulmonary_fibrosis_immune_meta.csv", row.names = 1)

# so stands for 's'eurat 'o'bject 
so <- CreateSeuratObject(counts = count_matrix, meta.data = meta.data, assay = "RNA", min.cells = 0, min.features = 0, project = "HLCA_Pulmonary_Fibrosis_immune")
# genes are in rows, cells are in columns



“Data is of class data.frame. Coercing to dgCMatrix.”


In [5]:
print(head(so, n = 3))

                                                     orig.ident nCount_RNA
F01173_GCTGGGTTCCTGTAGA_haberman HLCA_Pulmonary_Fibrosis_immune       5525
F00431_CTAGAGTCATGCCACG_haberman HLCA_Pulmonary_Fibrosis_immune       2784
F01172_AGTAGTCGTCCGACGT_haberman HLCA_Pulmonary_Fibrosis_immune       1617
                                 nFeature_RNA            disease
F01173_GCTGGGTTCCTGTAGA_haberman         1877 pulmonary fibrosis
F00431_CTAGAGTCATGCCACG_haberman         1017 pulmonary fibrosis
F01172_AGTAGTCGTCCGACGT_haberman         1012 pulmonary fibrosis
                                                 study
F01173_GCTGGGTTCCTGTAGA_haberman Banovich_Kropski_2020
F00431_CTAGAGTCATGCCACG_haberman Banovich_Kropski_2020
F01172_AGTAGTCGTCCGACGT_haberman Banovich_Kropski_2020


### Seurat style

In seurat, we can perform normalization like below.

In [None]:
so <- NormalizeData(so)

### Scran style

In scran, we first convert the object to a `SingleCellExperiment` object and perform normalization by:

1. clustering cells using `quickCluster()`
2. calculating cell-specific size factors using `computeSumFactors()`
3. dividing counts of each cell by its size factor, and log2-transforming with a pseudocount of 1 using `logNormCounts()`

Then, we convert the `SingleCellExperiment` object back to a `Seurat` object.

In [None]:
library(scran)
library(scater)

sce <- as.SingleCellExperiment(so)

clusters <- quickCluster(sce)
sce <- computeSumFactors(sce, clusters = clusters)
sce.norm <- logNormCounts(sce,  pseudo_count = 1)
so <- as.Seurat(sce.norm, counts = "counts", data = "logcounts")

“Layer ‘data’ is empty”
“Layer ‘scale.data’ is empty”


## Batch-aware feature selection