In [1]:
library(readxl)
library(tidyverse)


“running command 'timedatectl' had status 1”
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 0.3.5 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


## Expression_data.xlsx
It contains quantification of individual proteins (columns) in the various samples (rows). Identifiers for protein are from UniProt. The last number in the sample name indicates replicates in the sample group (e.g. S0R24h_4 is the fourth replicate of the S0R24h group).

In [3]:
expression_data <- read_xlsx("data/expression_data.xlsx")
expression_data

[1m[22mNew names:
[36m•[39m `` -> `...1`


...1,A0A067XG53,A0A075B5L5,A0A075B5P4,A0A075B5P5,A0A075B5P6,A0A087WNV1,A0A087WP47,A0A087WPF0,A0A087WPP8,⋯,Z4YJE9,Z4YJT3,Z4YJZ7,Z4YKC4,Z4YKM2,Z4YKT6,Z4YL78,Z4YLT8,Z4YN00,Z4YNA3
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
S0R24h_1,43488.0,0.0,0.0,20276.0,108610.0,458490,209890.0,0.0,31749,⋯,177570.0,2319800,94586.0,97746.0,59840,100420.0,1116400,37164.0,211870,0.0
S0R24h_2,55503.0,0.0,0.0,23570.0,134020.0,405240,160100.0,5383.2,43855,⋯,184240.0,2112400,133110.0,111870.0,70452,28561.0,1027600,3060.2,206690,0.0
S0R24h_3,52803.0,0.0,0.0,21050.0,134920.0,462110,106090.0,0.0,41016,⋯,140150.0,554320,50077.0,60430.0,12682,7099.9,626490,2284.8,251290,0.0
S0R24h_4,42735.0,0.0,7592.3,12950.0,106180.0,323430,222370.0,4422.8,0,⋯,155370.0,2397100,43383.0,95524.0,64375,48862.0,581480,20364.0,227450,508970.0
S0R24h_5,47269.0,0.0,0.0,22438.0,112100.0,393260,140350.0,0.0,39556,⋯,157810.0,2231300,95853.0,78070.0,58544,25261.0,1017200,13811.0,193340,0.0
S0R24h_6,41319.0,0.0,0.0,27004.0,182070.0,283180,197910.0,7488.7,43734,⋯,175480.0,1965100,49407.0,77490.0,36523,92820.0,908840,18727.0,174270,0.0
S0R48h_1,13286.0,0.0,0.0,26524.0,48645.0,353740,73440.0,0.0,20042,⋯,119980.0,1611700,20867.0,24792.0,31985,53277.0,561020,77070.0,144590,0.0
S0R48h_2,22146.0,0.0,0.0,19478.0,61485.0,221960,115830.0,1817.6,34113,⋯,140460.0,1807500,80323.0,37901.0,48602,99953.0,847190,78374.0,136730,640120.0
S0R48h_3,16531.0,0.0,0.0,27020.0,55285.0,272090,92319.0,0.0,0,⋯,99222.0,1432500,0.0,25349.0,21962,56371.0,456720,0.0,121670,62124.0
S0R48h_4,29387.0,0.0,0.0,31893.0,52574.0,319120,88818.0,0.0,21468,⋯,107950.0,1397100,21077.0,43121.0,14006,39204.0,384380,9009.4,146960,230830.0


## Comparisons.txt
Shows what sample groups we analyzed (for statistical tests) - it's mainly for context, because as we talked about, we would like to come up with an approach that might consider all sample groups when training, rather than being based on pairs of sample groups as in the current method (statistical testing for differential expression, and pathway enrichment).

## MMU_Uniprot2Reactome.txt

tab separated; in order, the columns contain the uniprot ID of proteins, the Reactome Pathway ID they belong to, a link, and additional information about the pathway/reaction (text description, a code, and the organism - only mouse in this case)

In [9]:
uniprot_to_reactome <- read_tsv("data/MMU_Uniprot2Reactome.txt")

names(uniprot_to_reactome)[names(uniprot_to_reactome) == "V1"] <- "uniprot_id"
names(uniprot_to_reactome)[names(uniprot_to_reactome) == "V2"] <- "reactome_id"
names(uniprot_to_reactome)[names(uniprot_to_reactome) == "V3"] <- "reactome_url"
names(uniprot_to_reactome)[names(uniprot_to_reactome) == "V4"] <- "description"
names(uniprot_to_reactome)[names(uniprot_to_reactome) == "V5"] <- "code"
names(uniprot_to_reactome)[names(uniprot_to_reactome) == "V6"] <- "organism"

uniprot_to_reactome

[1mRows: [22m[34m79497[39m [1mColumns: [22m[34m6[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (6): V1, V2, V3, V4, V5, V6

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


uniprot_id,reactome_id,reactome_url,description,code,organism
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
A0A075B5J3,R-MMU-198955,https://reactome.org/PathwayBrowser/#/R-MMU-198955,TCR complex interacts with peptide antigen-presenting MHC Class I,IEA,Mus musculus
A0A075B5J3,R-MMU-202165,https://reactome.org/PathwayBrowser/#/R-MMU-202165,Phosphorylation of ITAM motifs in CD3 complexes,IEA,Mus musculus
A0A075B5J3,R-MMU-202168,https://reactome.org/PathwayBrowser/#/R-MMU-202168,Phosphorylation of ZAP-70 by Lck,IEA,Mus musculus
A0A075B5J3,R-MMU-202174,https://reactome.org/PathwayBrowser/#/R-MMU-202174,Activation of ZAP-70,IEA,Mus musculus
A0A075B5J3,R-MMU-202214,https://reactome.org/PathwayBrowser/#/R-MMU-202214,Dephosphorylation of Lck-pY505 by CD45,IEA,Mus musculus
A0A075B5J3,R-MMU-202216,https://reactome.org/PathwayBrowser/#/R-MMU-202216,Phosphorylation of SLP-76,IEA,Mus musculus
A0A075B5J3,R-MMU-202233,https://reactome.org/PathwayBrowser/#/R-MMU-202233,Inactivation of Lck by Csk,IEA,Mus musculus
A0A075B5J3,R-MMU-202245,https://reactome.org/PathwayBrowser/#/R-MMU-202245,Phosphorylation of TBSMs in LAT,IEA,Mus musculus
A0A075B5J3,R-MMU-202248,https://reactome.org/PathwayBrowser/#/R-MMU-202248,Phosphorylation of PLC-gamma1,IEA,Mus musculus
A0A075B5J3,R-MMU-202291,https://reactome.org/PathwayBrowser/#/R-MMU-202291,Activation of Lck,IEA,Mus musculus


## MMU_ReactomePathwaysRelation.txt
tab separated, links pairs of mouse pathways/reactions from reactome (from -> to)

In [13]:
reactome_pathways <- read_tsv("data/MMU_ReactomePathwaysRelation.txt", col_names = FALSE)

names(reactome_pathways)[names(reactome_pathways) == "X1"] <- "uniprot_pathway_id"
names(reactome_pathways)[names(reactome_pathways) == "X2"] <- "reactome_reaction_id"

reactome_pathways


[1mRows: [22m[34m1740[39m [1mColumns: [22m[34m2[39m


[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (2): X1, X2

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


uniprot_pathway_id,reactome_reaction_id
<chr>,<chr>
R-MMU-109581,R-MMU-109606
R-MMU-109581,R-MMU-169911
R-MMU-109581,R-MMU-5357769
R-MMU-109581,R-MMU-75153
R-MMU-109582,R-MMU-140877
R-MMU-109582,R-MMU-202733
R-MMU-109582,R-MMU-418346
R-MMU-109582,R-MMU-75205
R-MMU-109582,R-MMU-75892
R-MMU-109582,R-MMU-76002
