In [1]:
library(ArchR)
library(tidyverse)
library(BSgenome.Hsapiens.UCSC.hg38)
library(SingleCellExperiment)
library(anndata)


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

In [2]:
setwd('/nfs/team205/heart/anndata_objects/8regions/ArchR')
getwd()

In [3]:
# before starting a project we must set the ArchRGenome and default threads for parallelization.
# Setting default genome to Hg38.
addArchRGenome("hg38")

Setting default genome to Hg38.



In [4]:
# Setting default number of Parallel threads to 16
addArchRThreads(threads = 1) 

Setting default number of Parallel threads to 1.



# Read in ArchR project

In [5]:
archr_project_path = '/nfs/team205/heart/anndata_objects/8regions/ArchR/project_output'
proj = loadArchRProject(path = archr_project_path, showLogo = FALSE)
proj

Successfully loaded ArchRProject!


           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    



class: ArchRProject 
outputDirectory: /nfs/team205/heart/anndata_objects/8regions/ArchR/project_output 
samples(47): HCAHeart9508627_HCAHeart9508819
  HCAHeart9508628_HCAHeart9508820 ...
  HCAHeartST13180618_HCAHeartST13177115
  HCAHeartST13180619_HCAHeartST13177116
sampleColData names(1): ArrowFiles
cellColData names(48): Sample TSSEnrichment ... ReadsInPeaks FRIP
numberOfCells(1): 139835
medianTSS(1): 8.699
medianFrags(1): 9459

# AnnData to Seurat object

In [7]:
# convert anndata to seurat object
if(1){
    sceasy::convertFormat('/nfs/team205/heart/anndata_objects/8regions/RNA_adult-8reg_full_raw_Multiome.h5ad', 
                      from="anndata", to="seurat",
                       outFile='/nfs/team205/heart/anndata_objects/8regions/RNA_adult-8reg_full_raw_Multiome.rds')
}

X -> counts



An object of class Seurat 
32732 features across 211060 samples within 1 assay 
Active assay: RNA (32732 features, 0 variable features)

# Prepare RNA data

### Prepare SingleCellExperiment

In [6]:
# read in seurat object
rna = readRDS('/nfs/team205/heart/anndata_objects/8regions/RNA_adult-8reg_full_raw_Multiome.rds')
rna

Loading required package: SeuratObject


Attaching package: ‘SeuratObject’


The following object is masked from ‘package:SummarizedExperiment’:

    Assays




An object of class Seurat 
32732 features across 211060 samples within 1 assay 
Active assay: RNA (32732 features, 0 variable features)

In [7]:
# read in CellRanger h5 data for rowRanges data
rna_forRowRanges <- import10xFeatureMatrix(
    input = c("/nfs/team205/heart/mapped/cellranger-arc200/HCAHeartST11350194_HCAHeartST11445771/filtered_feature_bc_matrix.h5"),
    names = c("HCAHeartST11350194_HCAHeartST11445771")
   )

Importing Feature Matrix 1 of 1

Re-ordering RNA matricies for consistency.



In [8]:
# shared genes
shared_genes = intersect(rownames(rna_forRowRanges),rownames(rna))

# subset shared genes
rna = rna[shared_genes,]
rna_forRowRanges = rna_forRowRanges[shared_genes,]

# convert to SingleCellExperiment
rna = Seurat::as.SingleCellExperiment(rna)
rna

class: SingleCellExperiment 
dim: 32711 211060 
metadata(0):
assays(2): counts logcounts
rownames(32711): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(1): gene_id
colnames(211060): HCAHeart9508627_HCAHeart9508819_TCAATCGCAGTAAAGC-1
  HCAHeart9508627_HCAHeart9508819_CTGCTCCCAACTGGCT-1 ...
  HCAHeartST13180619_HCAHeartST13177116_AGGCTAGCAGAAATGC-1
  HCAHeartST13180619_HCAHeartST13177116_GCAAGTCGTACGGGTT-1
colData names(16): sangerID combinedID ... nFeature_RNA ident
reducedDimNames(0):
altExpNames(0):

### Add rowRanges data

What is rowRanges data?<br>
ref<br>
https://www.bioconductor.org/packages/devel/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html<br>
https://robertamezquita.github.io/orchestratingSingleCellAnalysis/data-infrastructure.html#the-essentials-of-sce<br>

* described by a GRanges or a GRangesList
* the range information of each gene transcript
* contains data in a GRangesList (where each entry is a GenomicRanges format) that describes the chromosome, start, and end coordinates of the features (genes, genomic regions).

In [9]:
rowRanges(rna_forRowRanges)

GRanges object with 32711 ranges and 5 metadata columns:
                seqnames          ranges strand |    feature_type genome
                   <Rle>       <IRanges>  <Rle> |           <Rle>  <Rle>
  MIR1302-2HG       chr1     29553-30267      * | Gene Expression GRCh38
      FAM138A       chr1     36080-36081      * | Gene Expression GRCh38
        OR4F5       chr1     65418-69055      * | Gene Expression GRCh38
   AL627309.3       chr1     91104-91105      * | Gene Expression GRCh38
   AL627309.1       chr1   120931-133723      * | Gene Expression GRCh38
          ...        ...             ...    ... .             ...    ...
   AC141272.1 KI270728.1 1270983-1270984      * | Gene Expression GRCh38
   AC023491.2 KI270731.1     13000-13001      * | Gene Expression GRCh38
   AC007325.1 KI270734.1     72410-72411      * | Gene Expression GRCh38
   AC007325.4 KI270734.1   131493-131494      * | Gene Expression GRCh38
   AC007325.2 KI270734.1   161749-161852      * | Gene Expression G

In [10]:
# add rowRanges data
rowRanges(rna) <- rowRanges(rna_forRowRanges)

### Modify cellnames etc

In [11]:
# add barcode column
colData(rna)$cellbarcode = strsplit(colnames(rna), split = paste0(colData(rna)$combinedID,"_")) %>% lapply(function(x){x[2]}) %>% unlist()

# rename cell names to be matched with ArchR object
colnames(rna) = paste0(colData(rna)$combinedID,'#',colData(rna)$cellbarcode)
colnames(rna)[1:10]

In [12]:
# remove logcounts
assays(rna)['logcounts'] = NULL
rna

class: SingleCellExperiment 
dim: 32711 211060 
metadata(0):
assays(1): counts
rownames(32711): MIR1302-2HG FAM138A ... AC007325.4 AC007325.2
rowData names(5): feature_type genome id interval name
colnames(211060): HCAHeart9508627_HCAHeart9508819#TCAATCGCAGTAAAGC-1
  HCAHeart9508627_HCAHeart9508819#CTGCTCCCAACTGGCT-1 ...
  HCAHeartST13180619_HCAHeartST13177116#AGGCTAGCAGAAATGC-1
  HCAHeartST13180619_HCAHeartST13177116#GCAAGTCGTACGGGTT-1
colData names(17): sangerID combinedID ... ident cellbarcode
reducedDimNames(0):
altExpNames(0):

In [13]:
head(proj$cellNames)

In [14]:
length(intersect(proj$cellNames,colnames(rna)))

# Add RNA data to ArchR object

In [16]:
proj <- addGeneExpressionMatrix(input = proj, seRNA = rna, force = TRUE)
proj

ArchR logging to : ArchRLogs/ArchR-addGeneExpressionMatrix-9b361f4760-Date-2023-01-07_Time-23-15-22.log
If there is an issue, please report to github with logFile!

Overlap w/ scATAC = 1

2023-01-07 23:15:26 : 

Overlap Per Sample w/ scATAC : HCAHeart9508627_HCAHeart9508819=2228,HCAHeart9508628_HCAHeart9508820=5416,HCAHeart9508629_HCAHeart9508821=5424,HCAHeart9845431_HCAHeart9917173=2906,HCAHeart9845432_HCAHeart9917174=905,HCAHeart9845433_HCAHeart9917175=2638,HCAHeart9845434_HCAHeart9917176=2638,HCAHeart9845435_HCAHeart9917177=4242,HCAHeart9845436_HCAHeart9917178=2682,HCAHeartST10773165_HCAHeartST10781062=3503,HCAHeartST10773166_HCAHeartST10781063=7367,HCAHeartST10773167_HCAHeartST10781064=3149,HCAHeartST10773168_HCAHeartST10781065=6253,HCAHeartST10773169_HCAHeartST10781446=1489,HCAHeartST10773170_HCAHeartST10781447=4357,HCAHeartST10773171_HCAHeartST10781448=25,HCAHeartST11064574_HCAHeartST11023239=592,HCAHeartST11064575_HCAHeartST11023240=3799,HCAHeartST11064576_HCAHeartST11023241=797

class: ArchRProject 
outputDirectory: /nfs/team205/heart/anndata_objects/8regions/ArchR/project_output 
samples(47): HCAHeart9508627_HCAHeart9508819
  HCAHeart9508628_HCAHeart9508820 ...
  HCAHeartST13180618_HCAHeartST13177115
  HCAHeartST13180619_HCAHeartST13177116
sampleColData names(1): ArrowFiles
cellColData names(52): Sample TSSEnrichment ... Gex_MitoRatio
  Gex_RiboRatio
numberOfCells(1): 139835
medianTSS(1): 8.699
medianFrags(1): 9459

# Save

In [17]:
getwd()

In [18]:
saveArchRProject(ArchRProj = proj, outputDirectory = "project_output", load = FALSE)

Saving ArchRProject...

