## TYK2 FlowDMS Offsets

For the most recent full TYK2 FlowDMS, we obtained conflicting summary statistics from different runs. The underlying reason is that, for one run, the chunks were separated before processing and in another they were not. This does not matter to the model itself, which operates per-position and only uses WT counts from the same chunk. However, it _does_ matter for computing the _offset_, which is taken as the `mean(log(count))` within each sample. This quantity was only computed within each sample-chunk when the chunks were pre-separated, but within each sample otherwise.

To see how this leads to the effect we observe in the midpoints, let's consider several models who differ only in the offset:

  - `mean(log(count))`
  - `log(sum(stop_counts))`
  - `log(sum(all_counts))`
  - no offset

As an example, let's grab a position chunk 2 and do these regressions, pull out the WT marginals, and compute the midpoints.

In [3]:
library(data.table)
library(tidyverse)

In [23]:
run2 <- read_tsv("../sumstats/TYK2-VAMP/run2/OCNT-VAMPLIB-1-assay-run2-vampseq.sumstats.tsv")
run3 <- read_tsv("../sumstats/OCNT-VAMPLIB-1-assay-run3-vampseq.sumstats.tsv")
run2_redo <- read_tsv("../../dms/sumstats/OCNT-VAMPLIB-1-assay-run2-all-vampseq.sumstats.tsv")

[1mRows: [22m[34m199228[39m [1mColumns: [22m[34m16[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (7): chunk, effect, component, group, term, mut_aa, version
[32mdbl[39m (8): pos, estimate, std.error, statistic, p.value, dispersion, condition...
[33mlgl[39m (1): clone

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m200436[39m [1mColumns: [22m[34m16[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (7): chunk, effect, component, group, term, mut_aa, version
[32mdbl[39m (8): pos, estimate, std.error, statistic, p.value, dispersion, condition...
[33mlgl[39m (1): clone

[36mℹ[39m Use `spec()` to retrieve the full colu

In [25]:
head(run3 %>% filter(chunk == 2))

clone,chunk,pos,effect,component,group,term,estimate,std.error,statistic,p.value,dispersion,mut_aa,condition_conc,df,version
<lgl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<chr>
,2,112,fixed,cond,,condition_conc25,1.7187117,0.01942115,88.496891,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc50,2.1579749,0.01913385,112.783105,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc75,2.4128794,0.0192486,125.35349,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc100,2.4287264,0.0195277,124.373399,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc25:mut_aa*,0.3198689,0.22059033,1.450059,0.1470422,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc50:mut_aa*,-0.9349292,0.22553761,-4.145336,3.393159e-05,1.701287,,,,v2.1.0


In [26]:
head(run2_redo)

clone,chunk,pos,effect,component,group,term,estimate,std.error,statistic,p.value,dispersion,mut_aa,condition_conc,df,version
<lgl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<chr>
,2,112,fixed,cond,,condition_conc25,1.7187117,0.01942115,88.496891,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc50,2.1579749,0.01913385,112.783105,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc75,2.4128794,0.0192486,125.35349,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc100,2.4287264,0.0195277,124.373399,0.0,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc25:mut_aa*,0.3198689,0.22059033,1.450059,0.1470422,1.701287,,,,v2.1.0
,2,112,fixed,cond,,condition_conc50:mut_aa*,-0.9349292,0.22553761,-4.145336,3.393159e-05,1.701287,,,,v2.1.0


In [4]:
mc_all <- data.table::fread("../../dms/pipeline/OCNT-VAMPLIB-1-assay-run2/OCNT-VAMPLIB-1-assay-run2.mapped-counts-all-assigned.tsv")

In [5]:
mc_proc <- mc_all %>%
    separate(oligo, c("lib", "chunk", "wt_aa", "pos",
                        "mut_aa", "wt_codon", "mut_codon"), "_") %>%
    mutate(condition_conc = as.factor(condition_conc),
        condition = as.factor(paste0(condition, condition_conc)))

“[1m[22mExpected 7 pieces. Missing pieces filled with `NA` in 503411 rows [3, 16, 21,
23, 39, 47, 49, 59, 60, 84, 107, 127, 143, 178, 181, 184, 201, 214, 230, 245,
...].”


In [7]:
mc_proc2 <- mc_proc %>%
    mutate(mut_aa = if_else(wt_aa == mut_aa | is.na(mut_aa), "WT", mut_aa),
        mut_aa = relevel(as.factor(mut_aa), ref = "WT"))

In [22]:
mc_proc2 %>%
    filter(mut_aa == "WT") %>%
    group_by(sample) %>%
    summarize(log(sum(count)))

sample,log(sum(count))
<chr>,<dbl>
A100,15.75461
A25,15.23304
A50,15.66088
A75,15.91
B100,15.49724
B25,15.09343
B50,15.31691
B75,15.47455
C100,15.23042
C25,14.78823
