# Standard Seurat Processing for Mol Bio sequencing

## Importing commonly used Libraries:

In [1]:
library(dplyr)
library(Seurat)
library(patchwork)
library(H5weaver)
library(hise)
library(tidyverse)
library(SeuratObject)
library(ggrepel)
library(SeuratDisk)



Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t


Loading required package: data.table


Attaching package: ‘data.table’


The following objects are masked from ‘package:dplyr’:

    between, first, last


Loading required package: Matrix

Loading required package: rhdf5


Attaching package: ‘H5weaver’


The following objects are masked from ‘package:rhdf5’:

    h5dump, h5ls


“running command 'timedatectl' had status 1”
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mggplot2  [39m 3.4.3     [32m✔[39m [34mstringr  [39

## Creating Seurat Objects from h5 outs from Cellranger

### Reading h5 files into memory

In [2]:
h5s <- list.files(

    path = '/home/jupyter/CS15_WHBL/CWB_Paper/01_Final_Data/03_Data/exp888_scrublet', 
    pattern = 'filtered_feature_bc_matrix.h5$',
    full.names = TRUE, 
    recursive = TRUE
)

### Creating Seurat Objects

In [3]:
fully <- lapply(h5s, function(x){
    pro <- strsplit(strsplit(x,'/exp888_scrublet/')[[1]][2],'_sample_')[[1]][1]

    mtx <- Read10X_h5(x) 
    so <- CreateSeuratObject(mtx,project=pro) 
    return(so) 
    })


In [4]:
scrub <- list.files(
    path = '/home/jupyter/CS15_WHBL/CWB_Paper/01_Final_Data/03_Data/exp888_scrublet', 
    pattern = '_2.csv$',
    full.names = TRUE, 
    recursive = TRUE
)


In [5]:
scrubs <- lapply(scrub,read.csv)

In [6]:
scrubs <- scrubs %>% lapply(function(x){
    rownames(x) <- x$X
    x$X <- NULL
    x$Barcodes <- rownames(x)
    return(x)
})

In [7]:
for (i in c(1:16)){

    fully[[i]][[]]$Barcodes <- rownames(scrubs[[i]])
    
    fully[[i]][[]] <- left_join(fully[[i]][[]], scrubs[[i]], by = 'Barcodes') # , by='row.names', all=TRUE
}

In [8]:
fully <- Reduce(merge,fully) %>% JoinLayers()


In [9]:
fully$tech <- substr(fully$orig.ident,9,12)
fully$donor <- substr(fully$orig.ident,1,7)


In [10]:
fully <- fully %>% subset(subset = Predicted_Doublet == 'False')

In [11]:
fully[["percent.mt"]] <- PercentageFeatureSet(fully, pattern = "^MT-")
fully <- subset(fully, subset = percent.mt < 5)


### Normalizing, running PCA and UMAP clustering

In [12]:
fully <- NormalizeData(fully) %>% 
    FindVariableFeatures() %>% 
    ScaleData() %>% 
    RunPCA() %>% 
    RunUMAP(dims = 1:20) %>% 
    FindNeighbors(dims = 1:20) %>% 
    FindClusters(resolution = 0.5)


Normalizing layer: counts

Finding variable features for layer counts

Centering and scaling data matrix

PC_ 1 
Positive:  SPI1, IFI30, CST3, LYZ, SERPINA1, NCF2, CD68, TYMP, S100A9, MNDA 
	   MPEG1, PLXNB2, FGL2, CYBB, CLEC7A, EMILIN2, HCK, VCAN, GRN, CFP 
	   LILRB2, CSF3R, ZNF385A, S100A8, KCTD12, MS4A6A, LRP1, KLF4, FCER1G, CD14 
Negative:  TRBC2, TRAC, IL32, TCF7, IL7R, FCMR, LTB, TRBC1, RORA, IKZF3 
	   LEF1, FAM102A, CD247, CD5, CD7, SAMD3, CCR7, PYHIN1, KLRK1, TRAT1 
	   THEMIS, CD69, PIM2, CTSW, MAL, NELL2, CCL5, PCED1B, IL2RB, PRF1 
PC_ 2 
Positive:  NKG7, PRF1, CST7, ANXA1, IL32, CCL5, GZMA, GNLY, SAMD3, CTSW 
	   SYNE1, KLRK1, KLRD1, IL2RB, ADGRG1, FGFBP2, SRGN, CD247, MYBL1, GZMH 
	   RORA, FCRL6, TGFBR3, ID2, MATK, CD7, GZMM, DOK2, CX3CR1, HOPX 
Negative:  IGHM, NIBAN3, CD79A, MS4A1, IGKC, IGHD, BANK1, CD22, FCRL1, PAX5 
	   FCRL2, CD79B, BLK, TNFRSF13C, BCL11A, POU2AF1, RALGPS2, TCL1A, RUBCNL, AFF3 
	   OSBPL10, COBLL1, FCRLA, WDFY4, BLNK, FCER2, IRF8, TCF4, SWAP70, CD2

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 197827
Number of edges: 5536651

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9312
Number of communities: 26
Elapsed time: 276 seconds


### Saving the SO to a file that can be read into memory later.


In [13]:
ref <- LoadH5Seurat(file = '/home/jupyter/pbmc_multimodal.h5seurat')


Validating h5Seurat file

Initializing ADT with data

Adding counts for ADT

Adding variable feature information for ADT

Adding miscellaneous information for ADT

Initializing SCT with data

Adding counts for SCT

Adding variable feature information for SCT

Adding miscellaneous information for SCT

Adding reduction apca

Adding cell embeddings for apca

Adding feature loadings for apca

Adding miscellaneous information for apca

Adding reduction aumap

Adding cell embeddings for aumap

Adding miscellaneous information for aumap

Adding reduction pca

Adding cell embeddings for pca

Adding feature loadings for pca

Adding miscellaneous information for pca

Adding reduction spca

Adding cell embeddings for spca

Adding feature loadings for spca

Adding miscellaneous information for spca

Adding reduction umap

Adding cell embeddings for umap

Adding miscellaneous information for umap

Adding reduction wnn.umap

Adding cell embeddings for wnn.umap

Adding miscellaneous information for w

In [14]:
anchors <- FindTransferAnchors(
    reference = ref,
    query = fully,
    normalization.method = "SCT",
    reference.reduction = "spca",
    dims = 1:50
)


Normalizing query using reference SCT model

Projecting cell embeddings

Finding neighborhoods

Finding anchors

	Found 13584 anchors



In [15]:
fully <- TransferData(
    anchorset = anchors, 
    reference = ref, 
    query = fully,
    refdata = list(
        celltype.l1 = "celltype.l1",
        celltype.l2 = "celltype.l2",
        celltype.l3 = "celltype.l3",
        predicted_ADT = 'ADT'
    )
)

Finding integration vectors

Finding integration vector weights

Predicting cell labels

“Layer counts isn't present in the assay object; returning NULL”
Predicting cell labels

“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
“Layer counts isn't present in the assay object; returning NULL”
Predicting cell labels

“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
“Layer counts isn't present in the assay object; returning NULL”
Transfering 228 features onto reference data

“Layer counts isn't present in the assay object; returning NULL”


### Messing with Metadata, helpful with plotting!

In [16]:
saveRDS(fully, '/home/jupyter/CS15_WHBL/CWB_Paper/01_Final_Data/03_Data/Fig_3_Final.rds')

In [17]:
sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SeuratDisk_0.0.0.9020 ggrepel_0.9.5         lubridate_1.9.3      
 [4] forcats_1.0.0         stringr_1.5.1         purrr_1.0.2          
 [7] readr_2.1.5           tidyr_1.3.1           tibble_3.2.1         
[10] ggplot2_3.4.3         tidyverse_2.0.0       hise_2.15.0          
[13] H5weaver_1.2.0        r