License
Unknown, MIT licenses found
Licenses found
Unknown
LICENSE
MIT
LICENSE.md
PoolLab/ReferenceEnhancer
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# ReferenceEnhancer The goal of ReferenceEnhancer is to generate a scRNA-seq optimized transcriptomic reference. Generating a scRNA-seq optimized transcriptomic reference requires optimizing the genome annotation ("xxx.gtf") file that transcriptomic references are based on. The following three aspects of genome annotations need to be optimized: A) Resolving gene overlap derived read loss; B) Recovering intergenic reads from 3' un-annotated exons; and C) Recovering intronic reads. After optimizing and assembling the genome annotation, you can use "cellranger mkref" pipeline to assemble the optimized transcriptomic reference for mapping sequencing read data and compiling gene-cell matrices with the "cellranger count" (or other) pipeline. ## Installation You can install the development version of ReferenceEnhancer as follows: ``` r install.packages("devtools") require(devtools) install_github("PoolLab/ReferenceEnhancer") ``` ## Example # This is a sample workflow of the package: This is the basic workflow for optimizing a genome annotation for single-cell RNA-seq work using ReferenceEnhancer: 1. Load ReferenceEnhancer and import ENSEMBL/10x Genomics default genome annotation file (GTF). This file can be downloaded from 10x Genomics provided reference transcriptome "gene" folder at "<https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest>" or Ensembl.org if wish to customize more. For testing, we have provided a sample file. library(ReferenceEnhancer) genome_annotation <- LoadGtf(unoptimized_annotation_path = "test_genes.gtf") 2. Identify all overlapping genes based on the ENSEMBL/10x Genomics default genome annotation file (GTF), rank-order them according to the number of gene overlaps. Prioritize this gene list for manual curation focusing on exonically overlapping genes. The function saves the list of overlapping genes in working directory as overlapping_gene_list.csv. gene_overlaps <- IdentifyOverlappers(genome_annotation = genome_annotation) 3. Generate recommended actions for overlapping genes based on original genome annotation .gtf file and a list of overlapping genes. The function updates overlapping_gene_list.csv file with added recommendations. OverlapResolutions(genome_annotation = genome_annotation, overlap_data = gene_overlaps, gene_pattern = c("Rik$", "^Gm")) 4. Extract intergenic reads from Cell Ranger aligned bam file. The function saves extracted intergenic reads in working directory as intergenic_reads.bed. IsolateIntergenicReads(bam_file_name = "test_bam.bam", index_file_name = "test_index.bam.bai", barcode_length = 26) 5. Generate gene boundaries in order to assign intergenic reads to a specific gene. The function save resulting in working directory as gene_ranges.bed. Note: This step runs partially in bash/linux terminal. Before this step, make sure that bedops (<https://bedops.readthedocs.io/en/latest/>) has been installed to your computer. GenerateGeneLocationBed(genome_annotation = genome_annotation, bedops_loc = NULL) 6. Identify candidate genes for extension with excess 3' intergenic reads and create a rank ordered list of genes as a function of 3' intergenic read mapping within 10kb of known gene end. A rank ordered list of gene extension candidates is saved in working directory as gene_extension_candidates.csv. Note: This step runs partially in bash/linux terminal. Before this step, make sure that bedtools (<https://bedtools.readthedocs.io/en/latest/content/installation.html>) has been installed to your computer and that it has been added to to the path in your R environment. GenerateExtensionCandidates(bedtools_loc = NULL) 7. Create the final optimized annotation file. The function saves the result in working directory as optimized_reference.gtf. OptimizedAnnotationAssembler(unoptimized_annotation_path = "test_genes.gtf", gene_overlaps = "test_overlapping_gene_list.csv", gene_extension = "gene_extension_candidates.csv", gene_replacement = "test_gene_replacement.csv")
About
No description, website, or topics provided.
Resources
License
Unknown, MIT licenses found
Licenses found
Unknown
LICENSE
MIT
LICENSE.md
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published