# AUSAB-05, 07, 11 modeling at selected concentrations
I have selection data for these sera at 7 different concentrations, 1.5-fold dilutions. Some of these sera also significantly neutralize the H6 standard at the higher concentrations. Models fit on data from all 7 concentrations are therefore quite noisy. Here, we compare to models fit only at 2-4 selected concentrations.

In [1]:
import pickle

import altair as alt

import pandas as pd

import polyclonal

import warnings
warnings.filterwarnings('ignore')

In [2]:
import os
os.chdir('../../')

In [3]:
# set up function for mean prob escape chart to avoid clutter from large block of code

def plot_avg_escape(prob_escape):
    max_aa_subs = 4  # group if >= this many substitutions
    
    mean_prob_escape = (
        prob_escape.assign(
            n_subs=lambda x: (
                x["aa_substitutions_reference"]
                .str.split()
                .map(len)
                .clip(upper=max_aa_subs)
                .map(lambda n: str(n) if n < max_aa_subs else f">{max_aa_subs - 1}")
            )
        )
        .groupby(["antibody_concentration", "n_subs"], as_index=False)
        .aggregate({"prob_escape": "mean", "prob_escape_uncensored": "mean"})
        .rename(
            columns={
                "prob_escape": "censored to [0, 1]",
                "prob_escape_uncensored": "not censored",
            }
        )
        .melt(
            id_vars=["antibody_concentration", "n_subs"],
            var_name="censored",
            value_name="probability escape",
        )
    )

    mean_prob_escape_chart = (
        alt.Chart(mean_prob_escape)
        .encode(
            x=alt.X("antibody_concentration"),
            y=alt.Y(
                "probability escape",
                scale=alt.Scale(type="symlog", constant=0.05),
            ),
            column=alt.Column("censored", title=None),
            color=alt.Color("n_subs", title="n substitutions"),
            tooltip=[
                alt.Tooltip(c, format=".3g") if mean_prob_escape[c].dtype == float else c
                for c in mean_prob_escape.columns
            ],
        )
        .mark_line(point=True, size=0.5)
        .properties(width=200, height=125)
        .configure_axis(grid=False)
    )

    return mean_prob_escape_chart

## AUSAB-05

In [4]:
prob_escape = pd.read_csv(
    "results/prob_escape/libA_221223_1_AUSAB-05_1_prob_escape.csv", keep_default_na=False, na_values="nan"
).query(
    "`no-antibody_count` >= no_antibody_count_threshold"
)  # filter for those with sufficient no-antibody counts
assert prob_escape.notnull().all().all()
prob_escape.head()

Unnamed: 0,library,antibody_sample,no-antibody_sample,aa_substitutions_sequential,n_aa_substitutions,barcode,prob_escape,prob_escape_uncensored,antibody_count,no-antibody_count,antibody_neut_standard_count,no-antibody_neut_standard_count,total_no_antibody_count,no_antibody_count_threshold,aa_substitutions_reference,antibody,antibody_concentration
0,libA,221223_1_antibody_AUSAB-05_0.056_1,221223_1_no-antibody_control_1,K297I,1,ATAACACAAAAAAGTA,0.0582,0.0582,51938,339935,74631,28408,10675748,21,K278I,AUSAB-05,0.056
1,libA,221223_1_antibody_AUSAB-05_0.056_1,221223_1_no-antibody_control_1,R111S V366M R402S,3,TATCTACCTAACGAAA,0.1608,0.1608,36366,86104,74631,28408,10675748,21,R92S V347M R383S,AUSAB-05,0.056
2,libA,221223_1_antibody_AUSAB-05_0.056_1,221223_1_no-antibody_control_1,L89I L263H Q520R,3,CTCTTTAAAATCCATT,0.2285,0.2285,29107,48487,74631,28408,10675748,21,L70I L244H Q501R,AUSAB-05,0.056
3,libA,221223_1_antibody_AUSAB-05_0.056_1,221223_1_no-antibody_control_1,Q94M A182S S218A L386H,4,ACAGAATACCTTAACG,0.2638,0.2638,24177,34880,74631,28408,10675748,21,Q75M A163S S199A L367H,AUSAB-05,0.056
4,libA,221223_1_antibody_AUSAB-05_0.056_1,221223_1_no-antibody_control_1,R220G N235M L263Q,3,CTAACCAGTTAGACAC,0.192,0.192,19784,39213,74631,28408,10675748,21,R201G N216M L244Q,AUSAB-05,0.056


In [5]:
display(
    prob_escape.groupby("antibody_concentration").aggregate(
        n_variants=pd.NamedAgg("barcode", "nunique")
    )
)

Unnamed: 0_level_0,n_variants
antibody_concentration,Unnamed: 1_level_1
0.0049,26308
0.0074,26308
0.0111,26308
0.0166,26308
0.0249,26308
0.0373,26308
0.056,26308


In [6]:
plot_avg_escape(prob_escape)

Note that at concentrations > 0.02, the probability escape starts to *increase*. This is almost certainly an artifact due to neutralization of the H6 neut standard at high serum concentrations. If we reference back to GFP-neut data, over 30% of H6 is effectively neutralized by AUSAB-05 at a dilution of 0.0249 (the fourth point on this plot). 

Modeling on all concentrations returns a very noisy escape profile:

In [7]:
model = polyclonal.Polyclonal(
    n_epitopes=1,
    data_to_fit=prob_escape.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model.fit(
    logfreq=200,
    reg_escape_weight=0.1
)

display(model.activity_wt_barplot())

display(model.mut_escape_plot())

# First fitting site-level model.
# Starting optimization of 503 parameters at Wed Jan  4 13:31:06 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0     0.04303  1.3806e+05  1.3806e+05           0           0           0              0               0       2.9999
          54      3.7394       11185       11173      8.9896           0           0              0               0       3.0782
# Successfully finished at Wed Jan  4 13:31:10 2023.
# Starting optimization of 3244 parameters at Wed Jan  4 13:31:10 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.059109       12227       12166      57.769  1.8221e-32           0              0               0       3.0782
          74      4.0656       12042       12009      28.182     0.91096           0              0               0        3.

Most escape profiles look similarly noisy, until I remove the concentrations where neutralization of H6 standard changes the trend of the average escape profile (i.e. Ab conc greater than ~0.02):

In [8]:
prob_escape_filtered = prob_escape.loc[(prob_escape['antibody_concentration'] == 0.0074) |
                                       (prob_escape['antibody_concentration'] == 0.0111)
                                      ]

plot_avg_escape(prob_escape_filtered)

In [9]:
model_filtered_05 = polyclonal.Polyclonal(
    n_epitopes=1,
    data_to_fit=prob_escape_filtered.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model_filtered_05.fit(
    logfreq=200,
    reg_escape_weight=0.1,
)

display(model_filtered_05.activity_wt_barplot())

display(model_filtered_05.mut_escape_plot())

# First fitting site-level model.
# Starting optimization of 503 parameters at Wed Jan  4 13:31:17 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.023134       44868       44865           0           0           0              0               0       3.6049
          42      1.8507      612.49      606.83      2.1069           0           0              0               0       3.5517
# Successfully finished at Wed Jan  4 13:31:19 2023.
# Starting optimization of 3244 parameters at Wed Jan  4 13:31:19 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.028468      774.35      748.62       22.18  9.2654e-33           0              0               0       3.5517
          61      2.0708      746.34      739.99       2.199    0.088467           0              0               0       4.0

There is still a lot of signal at general sites, but we see distinct escape at sites 192 and 193, which are likely to be genuine escape mutations. These are in antigenic site B on the H3 head, and have also shown up in other escape profiles.

After incorporating this extra dilution, we also see sites 92 and 94 popping out (antigenic site E). Plus 200, 208, and 220 (in and directly around antigenic site D).

## AUSAB-07

This is one of the highly potent sera, so we don't see quite the same issue, and the generic model run with all 7 concentrations looks good:

In [10]:
prob_escape = pd.read_csv(
    "results/prob_escape/libA_221223_1_AUSAB-07_1_prob_escape.csv", keep_default_na=False, na_values="nan"
).query(
    "`no-antibody_count` >= no_antibody_count_threshold"
)  # filter for those with sufficient no-antibody counts
assert prob_escape.notnull().all().all()
prob_escape.head()

Unnamed: 0,library,antibody_sample,no-antibody_sample,aa_substitutions_sequential,n_aa_substitutions,barcode,prob_escape,prob_escape_uncensored,antibody_count,no-antibody_count,antibody_neut_standard_count,no-antibody_neut_standard_count,total_no_antibody_count,no_antibody_count_threshold,aa_substitutions_reference,antibody,antibody_concentration
0,libA,221223_1_antibody_AUSAB-07_0.00776_1,221223_1_no-antibody_control_1,K297I,1,ATAACACAAAAAAGTA,0.0081,0.0081,70292,339935,723026,28408,10675748,21,K278I,AUSAB-07,0.0078
1,libA,221223_1_antibody_AUSAB-07_0.00776_1,221223_1_no-antibody_control_1,Q63V K208A I261C K387T,4,TTCTGTCTGCCCGATT,1.0,2.1146,64099,1191,723026,28408,10675748,21,Q44V K189A I242C K368T,AUSAB-07,0.0078
2,libA,221223_1_antibody_AUSAB-07_0.00776_1,221223_1_no-antibody_control_1,D123H K208E,2,AAGCCACAAGGTACTA,0.3921,0.3921,51940,5204,723026,28408,10675748,21,D104H K189E,AUSAB-07,0.0078
3,libA,221223_1_antibody_AUSAB-07_0.00776_1,221223_1_no-antibody_control_1,K208N L263M R280S S298Q,4,GAGATAATTTTAACTT,0.7414,0.7414,41662,2208,723026,28408,10675748,21,K189N L244M R261S S279Q,AUSAB-07,0.0078
4,libA,221223_1_antibody_AUSAB-07_0.00776_1,221223_1_no-antibody_control_1,R111D K208V L263Q S298E F372Y F411Y,6,ATGGATGACAGATATG,0.7886,0.7886,30689,1529,723026,28408,10675748,21,R92D K189V L244Q S279E F353Y F392Y,AUSAB-07,0.0078


In [11]:
display(
    prob_escape.groupby("antibody_concentration").aggregate(
        n_variants=pd.NamedAgg("barcode", "nunique")
    )
)

Unnamed: 0_level_0,n_variants
antibody_concentration,Unnamed: 1_level_1
0.0007,26308
0.001,26308
0.0015,26308
0.0023,26308
0.0034,26308
0.0052,26308
0.0078,26308


In [12]:
plot_avg_escape(prob_escape)

In [13]:
model = polyclonal.Polyclonal(
    n_epitopes=1,
    data_to_fit=prob_escape.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model.fit(
    logfreq=200,
    reg_escape_weight=0.1
)

display(model.activity_wt_barplot())

display(model.mut_escape_plot())

# First fitting site-level model.
# Starting optimization of 503 parameters at Wed Jan  4 13:31:30 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.042403  1.5164e+05  1.5164e+05           0           0           0              0               0        4.979
         130      6.3169      6512.4      6497.6      12.118           0           0              0               0       2.6385
# Successfully finished at Wed Jan  4 13:31:36 2023.
# Starting optimization of 3244 parameters at Wed Jan  4 13:31:37 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.047134      7837.7      7769.1      65.912  2.5825e-32           0              0               0       2.6385
         151      7.8844      7371.7      7294.3      70.044      4.1522           0              0               0        3.

As seen before, serum AUSAB-07 selects highly targted escape mutations at site 189. Just to be consistent, reduce to lower selection concentrations, where we see consistent separation between avg prob escape for different n-mutants:

In [14]:
prob_escape_filtered = prob_escape.loc[(prob_escape['antibody_concentration'] == 0.0010) |
                                       (prob_escape['antibody_concentration'] == 0.0015) |
                                       (prob_escape['antibody_concentration'] == 0.0023) |
                                       (prob_escape['antibody_concentration'] == 0.0034)
                                      ]

plot_avg_escape(prob_escape_filtered)

In [15]:
model_filtered_07 = polyclonal.Polyclonal(
    n_epitopes=1,
    data_to_fit=prob_escape_filtered.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model_filtered_07.fit(
    logfreq=200,
    reg_escape_weight=0.1
)

display(model_filtered_07.activity_wt_barplot())

display(model_filtered_07.mut_escape_plot())

# First fitting site-level model.
# Starting optimization of 503 parameters at Wed Jan  4 13:31:49 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.035739       89223       89218           0           0           0              0               0       5.1932
          93      3.9526      2430.7      2418.5      9.2164           0           0              0               0       2.9911
# Successfully finished at Wed Jan  4 13:31:53 2023.
# Starting optimization of 3244 parameters at Wed Jan  4 13:31:53 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.037071      2970.5      2904.7      62.781   2.874e-32           0              0               0       2.9911
         169      6.5415      2725.3      2657.2      60.502      3.9397           0              0               0       3.7

Model looks pretty much the same, only change being an even stronger signal at site 189 (approx. 41 for summed prob escape, was 32 in previous model).

## AUSAB-11

In [16]:
prob_escape = pd.read_csv(
    "results/prob_escape/libA_221223_1_AUSAB-11_1_prob_escape.csv", keep_default_na=False, na_values="nan"
).query(
    "`no-antibody_count` >= no_antibody_count_threshold"
)  # filter for those with sufficient no-antibody counts
assert prob_escape.notnull().all().all()
prob_escape.head()

Unnamed: 0,library,antibody_sample,no-antibody_sample,aa_substitutions_sequential,n_aa_substitutions,barcode,prob_escape,prob_escape_uncensored,antibody_count,no-antibody_count,antibody_neut_standard_count,no-antibody_neut_standard_count,total_no_antibody_count,no_antibody_count_threshold,aa_substitutions_reference,antibody,antibody_concentration
0,libA,221223_1_antibody_AUSAB-11_0.0338_1,221223_1_no-antibody_control_1,N27E N57M T179Q Q192D R241L G294I D438E,7,TACCTATGAAAAACAT,1.0,7.9063,93000,1665,200696,28408,10675748,21,N8E N38M T160Q Q173D R222L G275I D419E,AUSAB-11,0.0338
1,libA,221223_1_antibody_AUSAB-11_0.0338_1,221223_1_no-antibody_control_1,K297I,1,ATAACACAAAAAAGTA,0.0278,0.0278,66658,339935,200696,28408,10675748,21,K278I,AUSAB-11,0.0338
2,libA,221223_1_antibody_AUSAB-11_0.0338_1,221223_1_no-antibody_control_1,R111S V366M R402S,3,TATCTACCTAACGAAA,0.0601,0.0601,36536,86104,200696,28408,10675748,21,R92S V347M R383S,AUSAB-11,0.0338
3,libA,221223_1_antibody_AUSAB-11_0.0338_1,221223_1_no-antibody_control_1,Y113F N141E K154S A182E L263I Q382T,6,AAGACCAAATTACCCA,0.1836,0.1836,31122,23991,200696,28408,10675748,21,Y94F N122E K135S A163E L244I Q363T,AUSAB-11,0.0338
4,libA,221223_1_antibody_AUSAB-11_0.0338_1,221223_1_no-antibody_control_1,L89I L263H Q520R,3,CTCTTTAAAATCCATT,0.0803,0.0803,27515,48487,200696,28408,10675748,21,L70I L244H Q501R,AUSAB-11,0.0338


In [17]:
display(
    prob_escape.groupby("antibody_concentration").aggregate(
        n_variants=pd.NamedAgg("barcode", "nunique")
    )
)

Unnamed: 0_level_0,n_variants
antibody_concentration,Unnamed: 1_level_1
0.003,26308
0.0045,26308
0.0067,26308
0.01,26308
0.015,26308
0.0225,26308
0.0338,26308


In [18]:
plot_avg_escape(prob_escape)

Similarly to AUSAB-05, we see the avg_prob_escape trend shift at serum concentrations that neutralize ~25% or more of the H6 standard.

In [19]:
model = polyclonal.Polyclonal(
    n_epitopes=1,
    data_to_fit=prob_escape.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model.fit(
    logfreq=200,
    reg_escape_weight=0.1
)

display(model.activity_wt_barplot())

display(model.mut_escape_plot())

# First fitting site-level model.
# Starting optimization of 503 parameters at Wed Jan  4 13:32:08 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.052113  1.1476e+05  1.1476e+05           0           0           0              0               0       3.5021
         126      6.6817       22848       22834      12.659           0           0              0               0      0.98895
# Successfully finished at Wed Jan  4 13:32:15 2023.
# Starting optimization of 3244 parameters at Wed Jan  4 13:32:15 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.067844       26414       26361      51.737  1.6525e-32           0              0               0      0.98895
         176      11.732       25496       25412      76.064      5.9468           0              0               0       1.3

Model has more clearly resolved escape sites off the bat (note that site 189, main AUSAB-07 escape site, is also showing up here). Reduce to concentrations that do not significantly neutralize H6:

In [20]:
prob_escape_filtered = prob_escape.loc[(prob_escape['antibody_concentration'] == 0.0067) |
                                       (prob_escape['antibody_concentration'] == 0.0100) |
                                       (prob_escape['antibody_concentration'] == 0.0150)
                                      ]

plot_avg_escape(prob_escape_filtered)

In [23]:
model_filtered_11 = polyclonal.Polyclonal(
    n_epitopes=1,
    data_to_fit=prob_escape_filtered.rename(
        columns={
            "antibody_concentration": "concentration",
            "aa_substitutions_reference": "aa_substitutions",
        }
    ),
    alphabet=polyclonal.AAS_WITHSTOP_WITHGAP,
)

# fit model
opt_res = model_filtered_11.fit(
    logfreq=200,
    reg_escape_weight=0.1
)

display(model_filtered_11.activity_wt_barplot())

display(model_filtered_11.mut_escape_plot())

# First fitting site-level model.
# Starting optimization of 503 parameters at Wed Jan  4 13:34:24 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.026998       59026       59023           0           0           0              0               0       3.5049
         114      3.9248      5920.4      5905.6       13.04           0           0              0               0       1.7772
# Successfully finished at Wed Jan  4 13:34:28 2023.
# Starting optimization of 3244 parameters at Wed Jan  4 13:34:28 2023.
        step    time_sec        loss    fit_loss  reg_escape  reg_spread reg_spatial reg_uniqueness reg_uniqueness2 reg_activity
           0    0.035355      6951.6      6875.7      74.193  3.7293e-32           0              0               0       1.7772
         119      4.0167      6605.8      6540.1      60.457      2.7875           0              0               0       2.4

This looks pretty good! Could potentially use another intermediate concentration, but the escape profile makes sense as is. Site 189 showing up here as well as with AUSAB-07 is nice validation. Plus additional weaker escape sites, which tracks with this serum being less potent than AUSAB-07, but more potent than AUSAB-05.

The 'weaker escape sites' here are sites 163, 201, and 244. 201 is in antigenic site D (lower on the head), and 163 and 244 are not in any classically defined antigenic regions.

The one thing that concerns me is seeing saturation effects for this serum more strongly than the others. May want to shift my thinking to 'saturation' having biological significance - i.e. we cannot expect to be in high excess of all antibody molecules in a serum, any that have low relative concentration are going to saturate out more quickly. So maybe we can make some conclusions about # of molecules vs potency?? For reference, the predicted IC99 concentrations have an actual WT prob escape of 97-98% for the other two sera, but only 92% for AUSAB-11. This discrepancy used to show up when I was running selections under saturatin conditions at very high TCID50/uL (although much more dramatic, like 60% actual prob escape at the predicted IC99).

## Summary models
Replotting the 3 finalized models here for easier comparison. Not including avg prob escape plots, but all of these are the 'filtered' model results.

### AUSAB-05

In [24]:
display(model_filtered_05.mut_escape_plot(addtl_slider_stats={"times_seen": 3}))

### AUSAB-07

In [25]:
display(model_filtered_07.mut_escape_plot(addtl_slider_stats={"times_seen": 3}))

### AUSAB-11

In [26]:
display(model_filtered_11.mut_escape_plot(addtl_slider_stats={"times_seen": 3}))