Skip to content

Commit

Permalink
Merge pull request #8 from Arcadia-Science/EAM/updates
Browse files Browse the repository at this point in the history
Final updates for pub
  • Loading branch information
elizabethmcd committed Dec 19, 2023
2 parents e200771 + 850de80 commit a7eae11
Show file tree
Hide file tree
Showing 265 changed files with 7,199,625 additions and 11 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
sra_info/
*.pid
transcriptome-workflow/work/
transcriptome-workflow/test/
.nextflow/
*.log*
metadata/2023*
results/
.DS_Store
.RData
.Rhistory
Expand Down
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Arcadia Science

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# Repeat Expansions Validations

This repository validates examples found in the `2023-repeats-profiling` pilot project by mapping RNASeq experiments to species that repeat expansion homologs were identified in.
This repository validates examples found in the `2023-repeats-profiling` pilot project by mapping RNASeq experiments to species that repeat expansion homologs were identified in. The code in this repository is associated with the pub [Repeat expansions associated with human disease are present in diverse organisms](https://doi.org/10.57844/arcadia-e367-8b55).

Instructions for running the workflow are in `transcriptome-workflow/`, and other scripts and metadata are for organizing the metadata for the RNAseq accessions.

## Contributing
If you use the code or ideas in this repository for your own work, please cite [DOI:10.57844/arcadia-e367-8b55](https://doi.org/10.57844/arcadia-e367-8b55). If you would like to contribute to this repository, please read our [guide on credit for contributions](https://github.com/Arcadia-Science/arcadia-software-handbook/blob/main/guides-and-standards/guide-credit-for-contributions.md).
Binary file added figs/all-species-tissue-expression-plots.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/all-species-tissue-expression-plots.pdf
Binary file not shown.
Binary file modified figs/all-species-tissue-expression-plots.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 43 additions & 0 deletions results/species-accession-counts.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
species_clean,n
house_mouse,1144
Bengalese_finch,714
golden_hamster,521
pig,334
dog,266
Nile_tilapia,184
domestic_cat,139
domestic_guinea_pig,113
naked_mole-rat,105
sea_lamprey,99
gray_short-tailed_opossum,93
horse,75
emerald_rockcod,52
platypus,28
brown_bear,18
Northern_elephant_seal,17
monito_del_monte,17
Mongolian_gerbil,15
European_shrew,14
domestic_ferret,14
little_skate,14
Arctic_ground_squirrel,12
common_brushtail,9
fat_sand_rat,8
Simochromis_diagramma,6
great_roundleaf_bat,6
Upper_Galilee_mountains_blind_mole_rat,4
Australian_echidna,3
Malayan_pangolin,3
koala,3
long-finned_pilot_whale,3
Common_wall_lizard,2
Egyptian_rousette,2
lesser_Egyptian_jerboa,2
southern_two-toed_sloth,2
California_sea_lion,1
Myotis_myotis,1
Pacific_pocket_mouse,1
Tasmanian_devil,1
lion,1
ringed_pipefish,1
sooty_mangabey,1
2,049 changes: 2,049 additions & 0 deletions results/species-expression-counts-stats.csv

Large diffs are not rendered by default.

40 changes: 31 additions & 9 deletions scripts/parsing-counts-tables.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
library(tidyverse)
library(viridisLite)
library(ggpubr)
library(stringr)
library(stringi)

###########################################
# Parse counts results and inspect homologs of interest
Expand All @@ -11,7 +13,7 @@ library(ggpubr)
###########################################

# paths
counts_dir <- "transcriptome-workflow/results/v2/counts/"
counts_dir <- "transcriptome-workflow/results/counts/"
files <- dir(counts_dir, pattern = "*.htseq")

# Custom function to read a file
Expand Down Expand Up @@ -94,7 +96,7 @@ protein_accessions <- read.csv("metadata/2023-08-02-repeat-expansion-profiles.cs
select(-Accession, -Common.Name)

# path for parsed gtf tables
gtf_dir <- "transcriptome-workflow/results/v2/gtf_tables/"
gtf_dir <- "transcriptome-workflow/results/gtf_tables/"
gtf_files <- dir(gtf_dir, pattern = "*.tsv")

# Custom function to read tsvs
Expand Down Expand Up @@ -139,20 +141,38 @@ sample_counts <- counts_table_info %>%
###########################################
# Plotting
###########################################

# Arcadia color scheme
black <- "#09090A"
grape <- "#5A4596"
taffy <- "#E87485"
tangerine <- "#FFB984"
oat <- "#F5E4BE"

magma_colors <- c(black, grape, taffy, tangerine, oat)

magma_gradient <- colorRampPalette(magma_colors)

gradient_100 <- magma_gradient(100)

species_percent_samples_expression <- homolog_count_table_stats %>%
select(species_name, gene, binary_count, tissue) %>%
left_join(sample_counts, by = c('species_name', 'tissue')) %>%
group_by(species_name, gene, tissue) %>%
mutate(species_upper = stri_trans_totitle(species_name, opts_brkiter = stri_opts_brkiter(type="sentence"))) %>%
mutate(tissue_upper = stri_trans_totitle(tissue, opts_brkiter = stri_opts_brkiter(type="sentence"))) %>%
group_by(species_upper, gene, tissue_upper) %>%
mutate(n_expressed_samples = sum(binary_count)) %>%
mutate(percent_expressed = n_expressed_samples / n_total_samples) %>%
ggplot(aes(x=tissue, y=gene, fill=percent_expressed)) +
geom_tile(color="black") +
facet_wrap(~species_name, scales="free", nrow=2) +
theme_classic() +
theme(axis.text.x = element_text(angle = 80, hjust=1), legend.position = "bottom") +
ggplot(aes(x=tissue_upper, y=gene, fill=percent_expressed)) +
geom_tile(color="white", linewidth=0.5) +
facet_wrap(~species_upper, scales="free", nrow=2) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 80, hjust=1), legend.position = "bottom", axis.title.x=element_blank(), axis.title.y=element_blank()) +
scale_x_discrete(expand=c(0,0)) +
scale_y_discrete(expand=c(0,0)) +
scale_fill_viridis_c()
scale_fill_gradientn(colors=magma_gradient(100), name = "Percent Expressed")

species_percent_samples_expression

# write table to csv
species_expression_table <- species_percent_samples_expression <- homolog_count_table_stats %>%
Expand All @@ -165,3 +185,5 @@ write.csv(species_expression_table, "results/species-expression-counts-stats.csv

# save plot
ggsave("figs/all-species-tissue-expression-plots.png", species_percent_samples_expression, width=11, height=8, units=c("in"))
ggsave("figs/all-species-tissue-expression-plots.jpg", species_percent_samples_expression, width=11, height=8, units=c("in"))
ggsave("figs/all-species-tissue-expression-plots.pdf", species_percent_samples_expression, width=11, height=8, units=c("in"))

0 comments on commit a7eae11

Please sign in to comment.