In [1]:
library(ArchR)
library(tidyverse)
library(BSgenome.Hsapiens.UCSC.hg38)
library(SingleCellExperiment)
library(anndata)
# library(sceasy)
# library(reticulate)
# use_condaenv('cellpymc')
# loompy <- reticulate::import('loompy')


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

In [2]:
getwd()

In [3]:
# before starting a project we must set the ArchRGenome and default threads for parallelization.
# Setting default genome to Hg38.
addArchRGenome("hg38")

Setting default genome to Hg38.



In [4]:
# Setting default number of Parallel threads to 16
addArchRThreads(threads = 1) 

Setting default number of Parallel threads to 1.



# Read in ArchR project

In [5]:
archr_project_path = '/nfs/team205/heart/anndata_objects/Foetal/multiome_ATAC/ArchR/project_output'
proj = loadArchRProject(path = archr_project_path, showLogo = FALSE)
proj

Successfully loaded ArchRProject!


           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    



class: ArchRProject 
outputDirectory: /nfs/team205/heart/anndata_objects/Foetal/multiome_ATAC/ArchR/project_output 
samples(37): BHF_F_Hea11064670_BHF_F_Hea11031823
  BHF_F_Hea11064671_BHF_F_Hea11031824 ...
  HCAHeartST13386009_HCAHeartST13303419
  HCAHeartST13386010_HCAHeartST13303420
sampleColData names(1): ArrowFiles
cellColData names(59): Sample TSSEnrichment ... ReadsInPeaks FRIP
numberOfCells(1): 167022
medianTSS(1): 11.614
medianFrags(1): 10338.5

# AnnData to Seurat object

In [6]:
# convert anndata to seurat object
if(0){
    sceasy::convertFormat('/nfs/team205/heart/anndata_objects/Foetal/Feb28ObjectRaw_Multiome.h5ad', 
                      from="anndata", to="seurat",
                       outFile='/nfs/team205/heart/anndata_objects/Foetal/Feb28ObjectRaw_Multiome.rds')
}

X -> counts



An object of class Seurat 
36601 features across 211145 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)

# Prepare RNA data

### Prepare SingleCellExperiment

In [6]:
# read in seurat object
rna = readRDS('/nfs/team205/heart/anndata_objects/Foetal/Feb28ObjectRaw_Multiome.rds')
rna

Loading required package: SeuratObject


Attaching package: ‘SeuratObject’


The following object is masked from ‘package:SummarizedExperiment’:

    Assays




An object of class Seurat 
36601 features across 211145 samples within 1 assay 
Active assay: RNA (36601 features, 0 variable features)

In [7]:
# read in CellRanger h5 data for rowRanges data
rna_forRowRanges <- import10xFeatureMatrix(
    input = c("/nfs/team205/heart/mapped/cellranger-arc200/BHF_F_Hea13168898_BHF_F_Hea13168514/filtered_feature_bc_matrix.h5"),
    names = c("BHF_F_Hea13168898_BHF_F_Hea13168514")
   )

Importing Feature Matrix 1 of 1

Re-ordering RNA matricies for consistency.



In [8]:
# shared genes
shared_genes = intersect(rownames(rna_forRowRanges),rownames(rna))

# subset shared genes
rna = rna[shared_genes,]
rna_forRowRanges = rna_forRowRanges[shared_genes,]

# convert to SingleCellExperiment
rna = Seurat::as.SingleCellExperiment(rna)
rna

class: SingleCellExperiment 
dim: 36578 211145 
metadata(0):
assays(2): counts logcounts
rownames(36578): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(1): gene_id
colnames(211145):
  BHF_F_Hea11064670_BHF_F_Hea11031823_TTGCTTAGTGAGACTC-1
  BHF_F_Hea11064670_BHF_F_Hea11031823_ATAGATGCATTGTCCT-1 ...
  HCAHeartST13386010_HCAHeartST13303420_CGTAATGGTATTGAGT-1
  HCAHeartST13386010_HCAHeartST13303420_GCTGTAAGTCAATACG-1
colData names(50): sangerID combinedID ... nFeature_RNA ident
reducedDimNames(0):
altExpNames(0):

### Add rowRanges data

What is rowRanges data?<br>
ref<br>
https://www.bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html<br>
https://robertamezquita.github.io/orchestratingSingleCellAnalysis/data-infrastructure.html#the-essentials-of-sce<br>

* described by a GRanges or a GRangesList
* the range information of each gene transcript
* contains data in a GRangesList (where each entry is a GenomicRanges format) that describes the chromosome, start, and end coordinates of the features (genes, genomic regions).

In [9]:
rowRanges(rna_forRowRanges)

GRanges object with 36578 ranges and 5 metadata columns:
                seqnames          ranges strand |    feature_type genome
                   <Rle>       <IRanges>  <Rle> |           <Rle>  <Rle>
  MIR1302-2HG       chr1     29553-30267      * | Gene Expression GRCh38
      FAM138A       chr1     36080-36081      * | Gene Expression GRCh38
        OR4F5       chr1     65418-69055      * | Gene Expression GRCh38
   AL627309.3       chr1     91104-91105      * | Gene Expression GRCh38
   AL627309.1       chr1   120931-133723      * | Gene Expression GRCh38
          ...        ...             ...    ... .             ...    ...
   AC141272.1 KI270728.1 1270983-1270984      * | Gene Expression GRCh38
   AC023491.2 KI270731.1     13000-13001      * | Gene Expression GRCh38
   AC007325.1 KI270734.1     72410-72411      * | Gene Expression GRCh38
   AC007325.4 KI270734.1   131493-131494      * | Gene Expression GRCh38
   AC007325.2 KI270734.1   161749-161852      * | Gene Expression G

In [10]:
# add rowRanges data
rowRanges(rna) <- rowRanges(rna_forRowRanges)

### Modify cellnames etc

In [11]:
# add barcode column
colData(rna)$cellbarcode = strsplit(colnames(rna), split = paste0(colData(rna)$combinedID,"_")) %>% lapply(function(x){x[2]}) %>% unlist()

# rename cell names to be matched with ArchR object
colnames(rna) = paste0(colData(rna)$combinedID,'#',colData(rna)$cellbarcode)
colnames(rna)[1:10]

In [12]:
# remove logcounts
assays(rna)['logcounts'] = NULL
rna

class: SingleCellExperiment 
dim: 36578 211145 
metadata(0):
assays(1): counts
rownames(36578): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(5): feature_type genome id interval name
colnames(211145):
  BHF_F_Hea11064670_BHF_F_Hea11031823#TTGCTTAGTGAGACTC-1
  BHF_F_Hea11064670_BHF_F_Hea11031823#ATAGATGCATTGTCCT-1 ...
  HCAHeartST13386010_HCAHeartST13303420#CGTAATGGTATTGAGT-1
  HCAHeartST13386010_HCAHeartST13303420#GCTGTAAGTCAATACG-1
colData names(51): sangerID combinedID ... ident cellbarcode
reducedDimNames(0):
altExpNames(0):

In [13]:
head(proj$cellNames)

In [14]:
length(intersect(proj$cellNames,colnames(rna)))

# Add RNA data to ArchR object

In [None]:
# subset shared cells
# shared_cells = intersect(proj$cellNames,colnames(rna))
# proj_shared = proj[shared_cells, ]

In [15]:
proj <- addGeneExpressionMatrix(input = proj, seRNA = rna, force = TRUE)
proj

ArchR logging to : ArchRLogs/ArchR-addGeneExpressionMatrix-49213027a1f7-Date-2023-03-09_Time-22-44-06.log
If there is an issue, please report to github with logFile!

Overlap w/ scATAC = 1

2023-03-09 22:44:10 : 

Overlap Per Sample w/ scATAC : 7089STDY13216920_BHF_F_Hea13242527=10630,7089STDY13216921_BHF_F_Hea13242528=9214,7089STDY13216922_BHF_F_Hea13242529=3958,7089STDY13216923_BHF_F_Hea13242530=1,7089STDY13216924_BHF_F_Hea13242531=3783,7089STDY13216925_BHF_F_Hea13242532=2197,7089STDY13216926_BHF_F_Hea13242533=3622,7089STDY13216927_BHF_F_Hea13242534=4909,BHF_F_Hea11064670_BHF_F_Hea11031823=350,BHF_F_Hea11064671_BHF_F_Hea11031824=2218,BHF_F_Hea11064672_BHF_F_Hea11031825=3537,BHF_F_Hea11933666_BHF_F_Hea11596619=7905,BHF_F_Hea11933667_BHF_F_Hea11596620=8692,BHF_F_Hea11933668_BHF_F_Hea11596621=8194,BHF_F_Hea11933669_BHF_F_Hea11596622=7959,BHF_F_Hea11933670_BHF_F_Hea11596623=2824,BHF_F_Hea11933671_BHF_F_Hea11596624=3092,BHF_F_Hea11933672_BHF_F_Hea11596625=3143,BHF_F_Hea11933673_BHF_F_Hea1

class: ArchRProject 
outputDirectory: /nfs/team205/heart/anndata_objects/Foetal/multiome_ATAC/ArchR/project_output 
samples(37): BHF_F_Hea11064670_BHF_F_Hea11031823
  BHF_F_Hea11064671_BHF_F_Hea11031824 ...
  HCAHeartST13386009_HCAHeartST13303419
  HCAHeartST13386010_HCAHeartST13303420
sampleColData names(1): ArrowFiles
cellColData names(63): Sample TSSEnrichment ... Gex_MitoRatio
  Gex_RiboRatio
numberOfCells(1): 167022
medianTSS(1): 11.614
medianFrags(1): 10338.5

# Save

In [16]:
getwd()

In [17]:
saveArchRProject(ArchRProj = proj, outputDirectory = "project_output", load = FALSE)

Saving ArchRProject...

