# Overview of peptigate results on TSA tick salivary gland transcriptomes

This notebook provides a very brief overview of the peptides predicted in tick salivary gland transcriptomes by the peptigate pipeline.
It joins output files and slices them in various ways to count the number and types of peptides predicted by peptigate.

## Notebook setup

In [1]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.0     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become erro

In [2]:
setwd("..")

## Read in and tally peptigate results

In [5]:
peptigate_predictions <- Sys.glob("outputs/tsa_tick_sg_transcriptomes/*/predictions/peptide_predictions.tsv") %>%
  map_dfr(read_tsv, show_col_types = F)

In [22]:
peptigate_predictions_distinct <- peptigate_predictions %>%
  select(-peptide_id) %>%
  select(-nlpprecursor_class_score, -nlpprecursor_cleavage_score, -nucleotide_sequence,
         -start, -end, -prediction_tool, -peptide_class) %>%
  distinct()

In [25]:
peptigate_predictions_distinct %>%
  group_by(peptide_type) %>%
  tally()

peptide_type,n
<chr>,<int>
cleavage,17468
sORF,206965


In [23]:
nrow(peptigate_predictions_distinct)

In [9]:
length(unique(peptigate_predictions_distinct$protein_sequence))

In [41]:
peptigate_annotations <- Sys.glob("outputs/tsa_tick_sg_transcriptomes/*/predictions/peptide_annotations.tsv") %>%
  map_dfr(read_tsv, show_col_types = F)

In [48]:
peptigate_all_distinct <- left_join(peptigate_predictions_distinct, peptigate_annotations_distinct,
                                    by = c("protein_sequence" = "sequence"))

In [53]:
peptigate_all_distinct %>%
 group_by(peptide_type, peptipedia_blast_result) %>%
 tally()

peptide_type,peptipedia_blast_result,n
<chr>,<chr>,<int>
cleavage,blast hit,245
cleavage,no blast hit,17223
sORF,blast hit,1697
sORF,no blast hit,205268


In [55]:
colnames(peptigate_all_distinct)

In [56]:
# if peptide was predicted by multiple tools, randomly select first sequence and tally how many peptides each tool predicted
peptigate_predictions %>%
  group_by(protein_sequence) %>%
  slice_head(n = 1) %>%
  ungroup() %>%
  group_by(prediction_tool) %>%
  tally()

prediction_tool,n
<chr>,<int>
deeppeptide,14324
nlpprecursor,3144
plmutils,206965


In [57]:
sessionInfo()

R version 4.3.3 (2024-02-29)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS/LAPACK: /Users/taylorreiter/miniconda3/envs/tidyjupyter/lib/libopenblasp-r0.3.26.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
 [5] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
 [9] ggplot2_3.5.0   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] bit_4.0.5        gtable_0.3.4     jsonlite_1.8.8   compiler_4.3.3  
 [5] crayon_1.5.2     tidyselect_1.2.0 IRdisplay_1.1    parallel_4.3.3  
 [9] scales_1.3.0     uuid_1.2-0       fastmap_1.1.1    IRkernel_1.3.2  
[13] R6_2.5.1         generics_0.1.3   munsell_0.5.1  