# McGill-CSB/RNApyro

finished first draft outside and cleaned equations for probs.

 @@ -32,15 +32,15 @@ \subsection{Definitions} \end{array}\right.$\subsection{Energy Model} -The energy we will be composed of two function,$ES^{\beta}_{ab\to a'b'}$and -$EI_{(i,j),ab}$. The former is equal to the +The energy we will be composed of two function,$\text{ES}^{\beta}_{ab\to a'b'}$and +$\text{EI}_{(i,j),ab}$. The former is equal to the stacking energy of the base pair with nucleotides$ab$on top of the base pair with nucleotides$a'b'$, as set in the NNDB~\cite{Turner2010}. If one of the base pair is not valid (i.e. not in$\{\text{GU},\text{UG},\text{CG},\text{GC}, \text{AU or UA}\}$, the value is a parameter$\beta \in [1,\infty]$. This allows to completely forbid a sequence where a base pair is non valid, when$\beta = \infty$or only penalize it. -$EI_{(i,j), a'b'}$is the average of the sum of differences between the isostericity +$\text{EI}_{(i,j), a'b'}$is the average of the sum of differences between the isostericity of base pairs at positions$(i,j)$in$\Omega$and$s_is_j$, and the isostericity of base pairs at positions$(i,j)$in$\Omega$and$ab$. If gives us an indication if the base pair$ab$is more isosteric to the set$\Omega$than the one on the sequence @@ -50,7 +50,9 @@ \subsection{Energy Model} will be used to balance the weight given to the stacking energy or the isostericity. \subsection{Inside} -To define the \emph{Inside} function$\Z{i,j}{m}{a,b}$as a recurrence, we will use as initial conditions: +The \emph{Inside} function$\Z{i,j}{m}{a,b}$is the partition function considering only the +energy in subsequence$[i,j]$over mutants of$s$having exactly$m$mutations between$[i,j]$and who nucleotide at position$i-1$is$a$(resp. in position$j+1$it is$b$). +We define function$\Z{i,j}{m}{a,b}$as a recurrence, and will use as initial conditions: $\forall i \in (0,\cdots,n-1):\, \Z{i+1,i}{m}{a,b}=\left\{ @@ -59,11 +61,11 @@ \subsection{Inside} 0 &\text{Else } \end{array}\right.$ -In other words, when we evaluate the function$\mathcal Z$over only one nucleotide, there +In other words, when we evaluate the function$\mathcal Z$, after exhausting all positions, there is only one possible solution if there is$0$mutations left, and none else. Since the -energetics term only consider base pairs, they are not involved in the initial conditions. +energetic terms only depend on base pairs, they are not involved in the initial conditions. -The recurrence is composed of four terms: +The recursion itself is composed of four terms: $$\Z{i,j}{m}{a,b}:=\left\{ \begin{array}{ll} @@ -72,13 +74,13 @@ \subsection{Inside} \Z{i+1,j}{m-\Kron_{a',s_i}}{a',b} & \text{If }S_{i}=-1\\ \displaystyle \sum_{\substack{a',b'\in \B^2,\\ \Kron_{a'b',s_is_j}\le m}} - e^{\frac{-(\alpha ES_{a b \to a' b'}+(1-\alpha)EI_{(i,j),a'b'})}{RT}} + e^{\frac{-(\alpha \text{ES}^\beta_{a b \to a' b'}+(1-\alpha)\text{EI}_{(i,j),a'b'})}{RT}} \Z{i+1,j-1}{m-\Kron_{a'b',s_is_j}}{a',b'}& \text{Elif }S_i=j \land S_{i-1}=j+1\\ \displaystyle \sum_{\substack{a',b'\in \B^2,\\ \Kron_{a'b',s_is_k}\le m}} \sum_{m'=0}^{m-\Kron_{a'b',s_is_k}} - e^{\frac{-(1-\alpha)EI_{(i,k),a'b'}}{RT}} + e^{\frac{-(1-\alpha)\text{EI}_{(i,k),a'b'}}{RT}} \Z{i+1,k-1}{m-\Kron_{a'b',s_is_k}-m'}{a',b'} \Z{k+1,j}{m'}{b',b} & \text{Elif }S_i=k \land i < k \leq j\\ 0 &\text{Else} @@ -88,14 +90,16 @@ \subsection{Inside} \begin{description} \item[S_{i}=-1:] If the nucleotide at position i is not paired, then the value is the same as if we increase the lower interval bound by 1 (i.e. i+1), and consider all possible - nucleotides a' at position i, taking into account the + nucleotides a' at position i. \item[S_i=j and S_{i-1}=j+1:] If nucleotide i is paired with j and nucleotide i-1 is paired with j+1, we are in the only case were stacked base pairs can occur. We thus add the energy of the stacking and of the isostericity of the base pair (i,j). What is left to compute is the \emph{inside} value of the interval [i+1,j-1] over all possible nucleotides a',b'\in B^2 at positions i and j respectively. -\item[S_i=k and ij, we increase j to include it. + Thus, when we need +to evaluate an interval as (-1,j), all stems between (0,j) are taken into account and the +structure between (j,n-1) must be a set of independent stems. Thus, all the outside energy is +equal to \Z{j,n-1}{m}{X,X}, for any X\in B. The recursion itself is the following:$$ \Y{i,j}{m}{a,b} = \left\{ \begin{array}{ll} \displaystyle - \sum_{\substack{a'\in \B,\\ \Kron_{a',s[i]}\le m}} - \Y{i-1,j}{m- \Kron_{a',s[i]}}{a',b} & - \text{Elif }S[i]=-1 \\ + \sum_{\substack{a'\in \B,\\ \Kron_{a',s_i}\le m}} + \Y{i-1,j}{m- \Kron_{a',s_i}}{a',b} & + \text{Elif }S_i=-1 \\ \displaystyle - \sum_{\substack{a'b'\in \B^2,\\ \Kron_{a'b',s[i]s[j]}\le m}} - e^{\frac{-ES_{ a' b' \to a b }}{RT}} - \Y{i-1,j+1}{m- \Kron_{a'b',s[i]s[j]}}{a',b'} & - \text{Elif }S{[i]}=j \land S{[i+1]}=j-1\\ + \sum_{\substack{a'b'\in \B^2,\\ \Kron_{a'b',s_is_j}\le m}} + e^{\frac{-(\alpha \text{ES}^\beta_{a b \to a' b'}+(1-\alpha)\text{EI}_{(i,j),a'b'})}{RT}} + \Y{i-1,j+1}{m- \Kron_{a'b',s_is_j}}{a',b'} & + \text{Elif }S_{i}=j \land S_{i+1}=j-1\\ \displaystyle - \sum_{\substack{a'b'\in \B^2,\\ \Kron_{a'b',s[i]s[k]}\le m}} - \sum_{m'=0}^{m-\Kron_{a'b',s[i]s[k]}} - e^{\frac{-EI_{s[i]s[k],a'b'}}{RT}} - \Y{i-1,k+1}{m- \Kron_{a'b',s[i]s[k]} - m'}{a',b'} + \sum_{\substack{a'b'\in \B^2,\\ \Kron_{a'b',s_is_k}\le m}} + \sum_{m'=0}^{m-\Kron_{a'b',s_is_k}} + e^{\frac{-(1-\alpha)\text{EI}_{(i,k),a'b'}}{RT}} + \Y{i-1,k+1}{m- \Kron_{a'b',s_is_k} - m'}{a',b'} \Z{j,k-1}{m'}{b,b'} & - \text{Elif }S{[i]}=k \geq j\\ + \text{Elif }S_{i}=k \geq j\\ \displaystyle - \sum_{\substack{a'b'\in \B^2,\\ \Kron_{a'b',s[k]s[i]}\le m}} - \sum_{m'=0}^{m-\Kron_{a'b',s[k]s[i]}} - e^{\frac{-EI_{s[i]s[k],a'b'}}{RT}} - \Y{k-1,j}{m- \Kron_{a'b',s[k]s[i]} - m'}{a',b} + \sum_{\substack{a'b'\in \B^2,\\ \Kron_{a'b',s_ks_i}\le m}} + \sum_{m'=0}^{m-\Kron_{a'b',s_ks_i}} + e^{\frac{-(1-\alpha)\text{EI}_{(k,i),a'b'}}{RT}} + \Y{k-1,j}{m- \Kron_{a'b',s_ks_i} - m'}{a',b} \Z{k+1,i-1}{m'}{a',b'} & - \text{Elif }-1 < S{[i]}=k < i\\ + \text{Elif }-1 < S_{i}=k < i\\ 0 & \text{Else} \end{array}\right. $$-\subsection*{} -We must have (if S[k] = -1): +The five cases can be break down as follows. +\begin{description} +\item[S_i=-1:] If the nucleotide at position i is not paired, then the value is the same +as if we decrease the lower interval bound by 1 (i.e. i-1), and consider all possible +nucleotides a' at position i. +\item[S_{i}=j and S_{i+1}=j-1:] If nucleotide i is paired with j and nucleotide i+1 is +paired with j-11, we are in the only case were stacked base pairs can occur. We thus add +the energy of the stacking and of the isostericity of the base pair (i,j). What is left +to compute is the \emph{outside} value for the interval [i-1,j+1] over all possible nucleotides +a',b'\in B^2 at positions i and j respectively. +\item[S_{i}=k \geq j:]If nucleotide i is paired with position k\geq j, +and is not stacked inside, the +only term contributing directly to the energy is the isostericity of the base pair (i,k). +We can then consider the outside interval [i-1,k+1] by multiplying it by the the \emph{forward} +value of the newly included interval (i.e. [j,k-1]), for +all possible values a',b'\in B^2 for nucleotides at positions i and k respectively. +\item[-1i \end{array}\right.$$ + +In every case, the denominator is the sum of the partition function of exactly$m$mutations, for$m$smaller or equal to our target$M$. The numerators are divided in the following three cases. +\begin{description} +\item[$S_i=-1$:] +\item[$S_i=k>i$:] +\item[$S_i=k