# Multiome seurat v4

Running seurat on first batch of multiome data from zebrafish samples. Aligned with cellranger-arc count using the reference which added ensembl IDs to lines with missing gene name attributes [cellranger-arc-mkref.md](cellranger-arc-mkref.md).
From https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html

* [Loading packages](#packages)

## Loading packages <a class="anchor" id="packages"></a>

In [None]:
library(Seurat)
library(Signac)
library(dplyr)
library(ggplot2)
library(hdf5r)
library(BSgenome.Drerio.UCSC.danRer11)
library(presto)
                

Create the seurat object starting from the filtered_feature_bc_matrix.h5 file (file format that can contain multidimensional arrays of scientific data) output from cellranger-arc count.

In [3]:
# the 10x hdf5 file contains both data types. 
inputdata.10x <- Read10X_h5("multiome-sample/outs/filtered_feature_bc_matrix.h5")

# extract RNA and ATAC data
rna_counts <- inputdata.10x$`Gene Expression`
atac_counts <- inputdata.10x$Peaks

"'giveCsparse' has been deprecated; setting 'repr = "T"' for you"
Genome matrix has multiple modalities, returning a list of matrices for this genome



Create the seurat object from the RNA data and calculate the percentage mitochondiral reads. N.B. zebrafish has lower case naming convention (^mt-) rather than the uppcase convention for human.

In [4]:
fish <- CreateSeuratObject(counts = rna_counts)
fish[["percent.mt"]] <- PercentageFeatureSet(fish, pattern = "^mt-")

"Feature names cannot have underscores ('_'), replacing with dashes ('-')"


In [5]:
# Now add in the ATAC-seq data
# we'll only use peaks in standard chromosomes
grange.counts <- StringToGRanges(rownames(atac_counts), sep = c(":", "-"))
grange.use <- seqnames(grange.counts) %in% standardChromosomes(grange.counts)
atac_counts <- atac_counts[as.vector(grange.use), ]


Add annotations for the chromatin assay. The seurat tutorials say use EnsDb.Hsapiens.v86, but this doesn't exist for zebrafish. Instead the annotations can be extracted from the gtf file.

In [7]:
library(BSgenome.Drerio.UCSC.danRer11)

gtf <- rtracklayer::import('Danio_rerio.GRCz11.103.gtf.replaced.filtered.chr', format="gtf")
gene.coords <- gtf[gtf$type == 'gene' ]
seqlevelsStyle(gene.coords) <- 'UCSC'
gene.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')

frag.file <- "multiome-sample/outs/atac_fragments.tsv.gz"
chrom_assay <- CreateChromatinAssay(
   counts = atac_counts,
   sep = c(":", "-"),
   genome = 'danRer11',
   fragments = frag.file,
   min.cells = 10,
   annotation = gene.coords
 )
fish[["ATAC"]] <- chrom_assay

Now you can proceed with the rest of the tutorial https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis.html