# Computer-Aided Vaccine Design

In this practical we will rebuild the the epitope selection framework for vaccine design proposed by Toussaint et al.

**Remember**: A epitope-based vaccine desing pipeline can be broken down into three essential steps:
- Epitope discover
- Epitope selection
- Epitope Assembly

![EV-Design Pipeline](./img/ev_design_pipeline.png)

We will cover all three steps in this tutorial. 

## Prerequisites

Please make sure you are running a Python2.7 kernel and have Conda installed. You need the following Anaconda packages

- Fred2 (A Python Framework for Immunoinformatics)
- GLPK (An ILP solver)


You can install the packages with the following commands:

**Fred2**:
pip install git+https://github.com/SchubertLab/Fred2

**GLPK**:
conda install -c conda-forge glpk


## 1. Epitope Discovery

We will design a seasonal Influenza vaccine based on the yearly [WHO reccomendations](https://www.who.int/influenza/vaccines/virus/recommendations/2018_19_north/en/).

As targets we will use protein sequences of two recommendet strains (A/Michigan/45/2015 (H1N1) and B/Colorado/06/2017) and will optimize the vaccine's efficacy for the European population.

To identify potential epitopes we will use [FRED](https://github.com/SchubertLab/Fred2), a Pyhon-based API for immunoinformatics that interfaces with many epitope affinity prediction methods. 

You might find FREDs [documentation](https://fred2.readthedocs.io/en/latest/) and [tutorial](https://github.com/FRED-2/Fred2/tree/master/Fred2/tutorials) usefull.

Task 1.1: Read in the data and use FRED to generate Peptides and HLA objects
----
Use FRED's functions to read in the fasta file and HLA allele file to generate Peptide and HLA objects. Please filter the HLA alleles for a frequency of at least 5%


In [33]:
from Fred2.Core import Allele, Peptide, Protein,generate_peptides_from_proteins
from Fred2.IO import read_lines, read_fasta

proteins = read_fasta("./data/Influenza_ B_Colorado_06_2017_ A_Michigan_45_2015.fasta", in_type=Protein, id_position=0)
peptides = list(generate_peptides_from_proteins(proteins, window_size=9))


alleles = []
with open("./data/HLA_alleles_europe.tsv", "r") as f:
    for l in f:
        allele,prob = l.split()
        if float(prob) >= 0.05:
            alleles.append(Allele(allele,prob=float(prob.strip())))
        

Task 1.2: Epitope prediction
----
Then, use an epitope prediction method to identify potential candidate epitopes for vaccination and filter epitopes whit a binding affinity of at most 50 nM.

In [34]:
from Fred2.EpitopePrediction import EpitopePredictorFactory
#you can either use pre-defined operators from `operator`
from operator import ge
#or define you own comparator function like this
comparator = lambda a,b: a <= b

smmpmbc = EpitopePredictorFactory("smmpmbec")

ep_df = smmpmbc.predict(peptides, alleles=alleles)

ep_df_filtered = ep_df.filter_result([("smmpmbec",comparator, 50)])


## 2. Epitope Selection

To select the optimal set of epitopes for vaccination, we will rebuild OptiTope, Toussaints et al.'s epitope selection ILP, using Pyomo. Pyomo ist a Python-based modeling language for constraint optimization problems. We will sucessively build ILP formulation and in the ende will use it to select the optimal set of epitopes.

### 2.1 Overview

ILP formulations have five essential element: Sets, Parameters, Variables, Objectives, and Constraints. The letter two use sets as indices for parameters and variables to formulate the equations that make up the objectives and constraints.


As a reminder, the complete formulation of OptiTope looks like this:


\begin{align}
   \text{Sets:}&&&\\
   \\
    E&&&\text{Set of epitopes}\\
    H&&&\text{Set of HLA alleles}\\
    A&&&\text{Set of antigens}\\
    I(h)\subseteq E&~\forall~h \in H&&\text{Set of epitopes binding to HLA $h$}\\
    \\
    \text{Parameters:}&&&\\
    \\
    i_{e, h}&&&\text{Immunogenicity of epitope $e$ bound to HLA $h$}\\
    p_h&&&\text{Allele frequency of HLA $h$ in target population}\\
    k&&&\text{Upper limit of selected epitopes}\\
    \tau_{H}&&&\text{Minimum of covered HLA alleles}\\
    \\
    \text{Variables:}&&&\\
    \\
    x_e \in \{0,1\}&~\forall~e \in E&&\text{Dicision variable whether epitope $e$ is selected}\\
    y_h\in \{0,1\}&~\forall~h \in H&&\text{Dicision variable whether HLA $h$ is covered by the selected epitopes}\\
    \\
    \text{Objective:}&&&\\
    \\
    \arg\max_{\mathbf{x}} &\sum_{h \in H} p_h \sum_{e \in E} x_ei_{e,h}&&\\
    \\
    \text{Constraints:}&&&\\
    \\
    \sum_{e\in E} x_e \leq k&&&\\
    \sum_{e\in I(h)} x_e \geq y_h &~\forall h \in H&&\\
    \sum_{h \in H} y_h \leq \tau_{H} &~\forall h \in H&&\\
\end{align}




### 2.2 Abstract Versus Concrete Models

*(Adopted from Pyomo Documentation)*

A mathematical model can be defined using symbols that represent data
values.  For example, the following equations represent a linear program
(LP) to find optimal values for the vector $x$ with parameters
$n$ and $b$, and parameter vectors $a$ and $c$:


\begin{array}{lll}
    \min       & \sum_{j=1}^n c_j x_j &\\
     \mathrm{s.t.} & \sum_{j=1}^n a_{ij} x_j \geq b_i & \forall i = 1 \ldots m\\
               & x_j \geq 0 & \forall j = 1 \ldots n
\end{array}

<div class="alert alert-block alert-info">
   As a convenience, we use the symbol $\forall$ to mean "for all"
   or "for each."
</div>

We call this an *abstract* or *symbolic* mathematical model since it
relies on unspecified parameter values.  Data values can be used to
specify a *model instance*.  The ``AbstractModel`` class provides a
context for defining and initializing abstract optimization models in
Pyomo when the data values will be supplied at the time a solution is to
be obtained.

In many contexts, a mathematical model can and should be directly
defined with the data values supplied at the time of the model
definition.  We call these *concrete* mathematical models.  For example,
the following LP model is a concrete instance of the previous abstract
model:


\begin{array}{ll}
    \min       & 2 x_1 + 3 x_2\\
     \mathrm{s.t.} & 3 x_1 + 4 x_2 \geq 1\\
               & x_1, x_2 \geq 0
\end{array}

The ``ConcreteModel`` class is used to define concrete optimization
models in Pyomo.

<div class="alert alert-block alert-info">
   Python programmers will probably prefer to write concrete models,
   while users of some other algebraic modeling languages may tend to
   prefer to write abstract models.  The choice is largely a matter of
   taste; some applications may be a little more straightforward using
   one or the other.
</div>


We will work with a ``ConcreteModel``. You can find an example of a ``ConcreteModel`` [here](https://pyomo.readthedocs.io/en/latest/pyomo_overview/simple_examples.html#a-simple-concrete-pyomo-model).

Task 2.1: ConcreteModel initialization
-----
1) Import the necessary Pyomo modules and initialize a concrete model.



In [101]:
from pyomo.environ import *

model = ConcreteModel()

### 2.3 Set and Sets of Sets
*(Adopted from Pyomo Documentation)*


Sets can be declared using the `Set` and `RangeSet` functions or by
assigning set expressions.  The simplest set declaration creates a set
and postpones creation of its members:

`
model.A = Set()
`

The ``Set`` function takes optional arguments such as:

- doc = String describing the set
- dimen = Dimension of the members of the set
- filter = A boolean function used during construction to indicate if a
  potential new member should be assigned to the set
- initialize = A function that returns the members to initialize the set.
- ordered = A boolean indicator that the set is ordered; the default is `False`
- validate = A boolean function that validates new member data
- virtual = A boolean indicator that the set will never have elements;
  it is unusual for a modeler to create a virtual set; they are
  typically used as domains for sets, parameters and variables
- within = Set used for validation; it is a super-set of the set being declared.


The `initialize` option can refer to a Python set, which can be
returned by a function or given directly as in

`
model.D = Set(initialize=['red', 'green', 'blue'])
`

If sets are given as arguments to `Set` without keywords, they are
interpreted as indexes for an array of sets. For example, to create an
array of sets that is indexed by the members of the set `model.A`, use

`
model.E = Set(model.A)
`

This is not restricted to one-dimentional indices. You can specify abitrary dimentional index structures.

You can use `initialize` to construct a concrete array of sets either by providing a function that takes the model and an element of `model.A`as input, and returns the entire set, or by providing a dictionary where the `key` is the an element of `model.A` and the `value` the entire set.  


You can finde more detail [here](https://pyomo.readthedocs.io/en/latest/pyomo_overview/simple_examples.html#a-simple-concrete-pyomo-model)


Task 2.2: OptiType's Set definition
---
Define and initialize the Sets of OptiType with the predicted epitope data.

<div class="alert alert-block alert-info">

<b>Note:</b> The column and row indices of the epitope prediction dataframe contain `Peptide` and `HLA-Allele` Objects.
Through the functionality of the Objects you have access to the Protein origine of the Peptides and the allele probability in the target population.

You can find more information about the `Peptide`-Object [here](https://fred2.readthedocs.io/en/latest/Fred2.Core.html#module-Fred2.Core.Peptide) and for the `Allele`-Object [here](https://fred2.readthedocs.io/en/latest/Fred2.Core.html#module-Fred2.Core.Allele).
</div>

In [102]:
eps = ep_df_filtered.index.get_level_values(0).tolist()
alleles = ep_df_filtered.columns.tolist()

#Simple Sets
model.E = Set(initialize=map(str, eps))
model.H = Set(initialize=map(str, alleles))

# Array of Sets
def init_I(model, h):
    d = ep_df_filtered.loc[:, h]
    d2 = d[d <= 50]
    return d2.index.get_level_values(0).tolist()        
model.I = Set(model.H, initialize=init_I)


### 2.4 Parameters 
*(Adopted from Poymo Documentation)*

The word “parameters” is used in many settings. When discussing a Pyomo model, we use the word to refer to data that must be provided in order to find an optimal (or good) assignment of values to the decision variables. Parameters are declared with the `Param` function, which takes arguments that are somewhat similar to the Set function. For example, the following code snippet declares sets `model.A`, `model.B` and then a parameter array `model.P` that is indexed by `model.A`:

`
model.A = Set()
model.B = Set()
model.P = Param(model.A, model.B)
`

In addition to sets that serve as indexes, the `Param` function takes the following command options:

- default = The value absent any other specification.
- doc = String describing the parameter
- initialize = A function (or Python object) that returns the members to initialize the parameter values.
- validate = A boolean function with arguments that are the prospective parameter value, the parameter indices and the model.
- within = Set used for validation; it specifies the domain of the parameter values.

These options perform in the same way as they do for Set. For example, suppose that `model.A = RangeSet(1,3)`, then there are many ways to create a parameter that is a square matrix with 9, 16, 25 on the main diagonal zeros elsewhere, here are two ways to do it. First using a Python object to initialize:

`
v={}
v[1,1] = 9
v[2,2] = 16
v[3,3] = 25
model.S = Param(model.A, model.A, initialize=v, default=0)
`

And now using an initialization function that is automatically called once for each index tuple (remember that we are assuming that `model.A` contains 1,2,3)

`
def s_init(model, i, j):
    if i == j:
        return i*i
    else:
        return 0.0
model.S1 = Param(model.A, model.A, initialize=s_init)
`

In this example, the index set contained integers, but index sets need not be numeric. It is very common to use strings.

Task 2.3: OptiType's Parameter definition
---

Define and initialize the Paremters of OptiType.


In [103]:
import math

model.k = Param(initialize=10, within=PositiveIntegers, mutable=True)

probs = {str(h):h.prob for h in alleles}
model.p = Param(model.H, initialize=lambda model, a: probs[a])
model.t_allele = Param(initialize=0, within=NonNegativeIntegers, mutable=True)
model.i = Param(model.E, model.H, initialize=lambda model, e, h: min(1., max(0.0, 1.0 - math.log(ep_df_filtered.loc[(e, "smmpmbec"), h], 50000))))

### 2.5 Variables
*(Adopted from Pyomo Documentation)*

Variables are intended to ultimately be given values by an optimization package. They are declared and optionally bounded, given initial values, and documented using the Pyomo `Var` function. If index sets are given as arguments to this function they are used to index the variable. Other optional directives include:

- bounds = A function (or Python object) that gives a (lower,upper) bound pair for the variable
- domain = A set that is a super-set of the values the variable can take on.
- initialize = A function (or Python object) that gives a starting value for the variable; this is particularly important for non-linear models
- within = (synonym for domain)

The following code snippet illustrates some aspects of these options by declaring a singleton (i.e. unindexed) variable named `model.LumberJack` that will take on real values between zero and 6 and it initialized to be 1.5:

`
model.LumberJack = Var(within=NonNegativeReals, bounds=(0,6), initialize=1.5)
`


For more details see [here](https://pyomo.readthedocs.io/en/latest/pyomo_modeling_components/Variables.html).


Task 2.4: OptiTope's Variable definition
---
Define OptiTope's variables


In [104]:
model.x = Var(model.E, within=Binary)
model.y = Var(model.H, within=Binary)

### 2.6 Objectives
*(Adopted from Pyomos Documentation)*

An objective is a function of variables that returns a value that an optimization package attempts to maximize or minimize. The `Objective` function in Pyomo declares an objective. Although other mechanisms are possible, this function is typically passed the name of another function that gives the expression. Here is a very simple version of such a function that assumes `model.x` has previously been declared as a `Var`:


    def ObjRule(model):
        return 2*model.x[1] + 3*model.x[2]
    model.g = Objective(rule=ObjRule)


It is more common for an objective function to refer to parameters as in this example that assumes that `model.p` has been declared as a `Param` and that `model.x` has been declared with the same index set, while `model.y` has been declared as a singleton:


    def profrul(model):
        return summation(model.p, model.x) + model.y
    model.Obj = Objective(rule=ObjRule, sense=maximize)


This example uses the `sense` option to specify maximization. The default `sense` is minimize.

Task 2.5: OptiTope's Objective definition
---
Define OptiTope's objetive function


In [105]:
model.obj = Objective(
            rule=lambda model: sum(model.x[e] * sum(model.p[a] * model.i[e, a] for a in model.H) for e in model.E),
            sense=maximize)

### 2.7 Constraints
*(Adopted from Pyomos Documentation)*

Most constraints are specified using equality or inequality expressions that are created using a rule, which is a Python function. For example, if the variable `model.x` has the indexes ‘butter’ and ‘scones’, then this constraint limits the sum over these indexes to be exactly three:

    def teaOKrule(model):
        return(model.x['butter'] + model.x['scones'] == 3)
    model.TeaConst = Constraint(rule=teaOKrule)

Instead of expressions involving equality (==) or inequalities (<= or >=), constraints can also be expressed using a 3-tuple if the form (lb, expr, ub) where lb and ub can be None, which is interpreted as lb <= expr <= ub. Variables can appear only in the middle expr. For example, the following two constraint declarations have the same meaning:

    model.x = Var()

    def aRule(model):
       return model.x >= 2
    model.Boundx = Constraint(rule=aRule)

    def bRule(model):
       return (2, model.x, None)
    model.boundx = Constraint(rule=bRule)

Constraints (and objectives) can be indexed by lists or sets. When the declaration contains lists or sets as arguments, the elements are iteratively passed to the rule function. If there is more than one, then the cross product is sent. For example the following constraint could be interpreted as placing a budget of `i` on the ith item to buy where the cost per item is given by the parameter `model.a`:

    model.A = RangeSet(1,10)
    model.a = Param(model.A, within=PositiveReals)
    model.ToBuy = Var(model.A)
    def bud_rule(model, i):
        return model.a[i]*model.ToBuy[i] <= i
    aBudget = Constraint(model.A, rule=bud_rule)

Task 2.6: OptiTope's Constraints
---
Define OptiTope's constraints

In [106]:
#Obligatory Constraint (number of selected epitopes)
model.NofSelectedEpitopesCov = Constraint(rule=lambda model: sum(model.x[e] for e in model.E) <= model.k)

#optional constraints (in basic model they are disabled)
model.IsAlleleCovConst = Constraint(model.H,
                                    rule=lambda model, h: sum(model.x[e] for e in model.I[h]) >= model.y[h])
model.MinAlleleCovConst = Constraint(rule=lambda model: sum(model.y[h] for a in model.H) >= model.t_allele)


### 2.8 Solving a `ConcreteModels`

If you have a ConcreteModel, add these lines at the bottom of your Python script to solve it

    opt = pyo.SolverFactory('glpk')
    res = opt.solve(model) 
    
    
To load the optimal parameters into your model you can use this snippet:

    model.solutions.load_from(res)
    selected_peptides = [x for x in self.instance.x if 0.9 <= model.x[x].value <= 1.2]


In [107]:
from pyomo.opt import SolverFactory, TerminationCondition

opt = SolverFactory('glpk')
res = opt.solve(model) 

res.write(num=1)

# = Solver Results                                         =
# ----------------------------------------------------------
#   Problem Information
# ----------------------------------------------------------
Problem: 
- Name: unknown
  Lower bound: 5.61031116551
  Upper bound: 5.61031116551
  Number of objectives: 1
  Number of constraints: 17
  Number of variables: 2009
  Number of nonzeros: 4154
  Sense: maximize
# ----------------------------------------------------------
#   Solver Information
# ----------------------------------------------------------
Solver: 
- Status: ok
  Termination condition: optimal
  Statistics: 
    Branch and bound: 
      Number of bounded subproblems: 1
      Number of created subproblems: 1
  Error rc: 0
  Time: 0.0518410205841
# ----------------------------------------------------------
#   Solution Information
# ----------------------------------------------------------
Solution: 
- number of solutions: 0
  number of solutions displayed: 0


In [108]:

model.solutions.load_from(res)
selected_peptides = [x for x in model.x if 0.9 <= model.x[x].value <= 1.2]
selected_peptides


['MMMGMFNML',
 'LLMDGTASL',
 'YLMAWKQVL',
 'FMQALQLLL',
 'YVLFHTSLL',
 'YVSASLSYL',
 'YLNPGNYSM',
 'FMYSDFHFI',
 'FVANFSMEL',
 'KMMKVLLFM']