<h1> Implementing New Methods </h1>

This tutorial gives an overview of important interfaces for implementing new prediction methods or writing wrapper classes for external tools. It will also illustrate an easy example of integrating a new epitope prediction method.

**Note**: If you are manipulating positional information of transcripts, proteins, peptides, or variants please note that epytope assumes the starting position at 0 not at 1!


<h2> Chapter 1: Abstract Base-Classes for Prediction Methods </h2>

We will illustrate the different interfaces based on epitope prediction, but there exist similar interfaces for cleavage site, cleavage fragment, and TAP prediction methods.

All epitope prediction methods have to implement at least a very rudimentary interface called `epytope.Core.AEpitopePrediction`:

In [None]:
class AEpitopePrediction(object, metaclass = APluginRegister):

    @property
    @abstractmethod
    def name(self):
        """
        Returns the name of the predictor

        :return: str 
        """
        raise NotImplementedError

    @property
    @abstractmethod
    def version(cls):
        """
        Returns the version of the predictor
        
        :return: str
        """
        raise NotImplementedError

    @property
    @abstractmethod
    def supportedAlleles(self):
        """
        Returns a list of valid allele models

        :return: List of allele names for which the predictor provides models
        
        :return: set(str) - Iterable of supported Alleles e.g. [A*01:02, B*07:05]
        """
        raise NotImplementedError

    @property
    @abstractmethod
    def supportedLength(self):
        """
        Returns a list of supported peptide lengths

        :return: set(int) - Iterable of supported peptide lengths e.g. [8, 9, 10]
        """
        raise NotImplementedError


    @property
    @abstractmethod
    def convert_alleles(self, alleles):
        """
        Converts alleles into the internal allele representation of the predictor
        and returns a string representation

        :param list(Allele) alleles: The alleles for which the internal predictor
        """


<h3> APSSMEpitopePrediction </h3>
The prediction methods are further separated into so called `PSSM`, `SVM`, and `External` modules. Methods in `PSSM` are fully integrated linear prediction methods, whose weight matrices can be found in `epytope.Data`. These methods only have to inherit from `epytope.EpitopePrediction.APSSMEpitopePrediction`. This abstract base-class already implements the prediction method. If the prediction matrices can be found in `epytope.Data` and are in the correct format such as this:


In [None]:
<name>_<locus>_<super-and-sub-digit>_<length> = 
{
0: {'A': 0.0, 'C': 0.0, 'E': 1.09861228867, 'D': 1.09861228867, 'G': 0.0, 'F': 0.0, 'I': 0.0, 'H': 0.0, 'K': 0.0,
    'M': 0.0, 'L': 0.0, 'N': 0.0, 'Q': 0.0, 'P': -2.30258509299, 'S': 0.0, 'R': 0.0, 'T': 0.0, 'W': 0.0, 'V': 0.0,
    'Y': 0.0},
1: {'A': 0.0, 'C': 0.0, 'E': -2.30258509299, 'D': -2.30258509299, 'G': 0.0, 'F': -2.30258509299, 'I': 0.0, 'H': 0.0,
    'K': 1.09861228867, 'M': 0.0, 'L': 0.0, 'N': 0.0, 'Q': 0.0, 'P': 0.0, 'S': 0.0, 'R': 2.99573227355, 'T': 0.0,
    'W': -2.30258509299, 'V': 0.0, 'Y': -2.30258509299},
    ......
8: {'A': 0.0, 'C': 0.0, 'E': -2.30258509299, 'D': -2.30258509299, 'G': -2.30258509299, 'F': 0.0, 'I': 1.38629436112,
    'H': -2.30258509299, 'K': -2.30258509299, 'M': 1.38629436112, 'L': 2.99573227355, 'N': -1.60943791243,
    'Q': -2.30258509299, 'P': -2.30258509299, 'S': 0.0, 'R': -2.30258509299, 'T': 0.0, 'W': 0.0, 'V': 1.38629436112,
    'Y': 0.0}, 

#bias term stored at -1 
-1: {'con': -2.99573227355}
}

only the class properties have to be implemented. If however, the prediction function is much more complicated, please still inherit from `epytope.EpitopePrediction.APSSMEpitopePrediction` and overwrite the prediction function accordingly.

<h3>ASVMEpitopePrediction</h3>
Methods that can be found in the module SVM are also fully integrated into epytope and their fully trained SVMs can be found in `epytope.Data.svms`. epytope is using the python binding of `svmlight`. Therefore, the SVM model files have to be in svmlight-format. All epytope SVM classes implement the interface `epytope.Core.ASVM` besides the basic `epytope.Core.AEpitopePrediction`:

In [None]:
class ASVM(object, metaclass = abc.ABCMeta):
    """
        Base class for SVM prediction tools
    """
   
    @property
    @abstractmethod
    def encode(self, peptides):
        """
        Returns the feature encoding for peptides

        :param List(Peptide)/Peptide peptides: List or Peptide object
        :return: list(Object) -- Feature encoding of the Peptide objects
        """
        raise NotImplementedError


This interface ensures that the peptides are in the correct input format to call SVMlights prediction method. The abstract base-class `epytope.EpitopePrediction.ASVMEpitopePrediction` implements a rudimentary implementation of the function `predict`. If all SVM files are correctly stored in `epytope.Data` and `encode()` its implemented it suffices to inherit from `epytope.EpitopePrediction.ASVMEpitopePrediction` and implement the defined Class properties.

<h3> AExternalEpitopePrediction </h3>
Methods in this module are loosely integrated into epytope. epytope is simply calling their command line tools and pre- and post-processes the in- and output of the tools. Hence the interfaces are much more involved as the ones before. All external tools have to implement `epytope.Core.AExternal`:


In [None]:
class AExternal(object, metaclass = abc.ABCMeta):
    """
     Base class for external tools
    """
    

    @property
    @abstractmethod
    def command(self):
        """
        defines the external execution command 
        e.g. netMHC -p {peptides} -a {alleles} -x {out} {options} 
        """

    @property
    @abstractmethod
    def parse_external_result(self, _file):
        """
        Parses external results and returns a AResult object

        :param str _file: The file path or the external prediction results
        :return: AResult - Returns a AResult object
        """
        raise NotImplementedError

    def is_in_path(self):
        """
        checks whether the specified execution command can be found in PATH

        :return: bool - Whether or not command could be found in PATH
        """
        exe = self.command.split()[0]
        for try_path in os.environ["PATH"].split(os.pathsep):
            try_path = try_path.strip('"')
            exe_try = os.path.join(try_path, exe).strip()
            if os.path.isfile(exe_try) and os.access(exe_try, os.X_OK):
                return True
        return False

    @property
    @abstractmethod
    def get_external_version(self, path=None):
        """
        Returns the external version of the tool by executing
        >{command} --version

        might be dependent on the method and has to be overwritten
        therefore it is declared abstract to enforce the user to
        overwrite the method. The function in the base class can be called
        with super()

        :param (str) path: - optional specification of executable path if deviant from self.__command
        :return: str - The external version of the tool
        """
        exe = self.command.split()[0] if path is None else path
        try:
            p = subprocess.Popen(exe + ' --version', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            p.wait() #block the rest
            stdo, stde = p.communicate()
            stdr = p.returncode
            if stdr > 0:
                raise RuntimeError("Could not check version of " + exe + " - Please check your installation and epytope "
                                                                         "wrapper implementation.")
        except Exception as e:
                raise RuntimeError(e)
        return str(stdo).strip()

    @property
    @abstractmethod
    def prepare_peptide_input(self, _peptides, _file):
        """
        Prepares sequence input for external tools
        and writes them to _file in the specific format

        NO return value!

        :param: (list(str)) _peptides: the peptide sequences to write into _file
        :param (File) _file: File handler to input file for external tool
        """
        return NotImplementedError

The binaries have to be globally executable and the internal version number has to match the version of the external tool. When specifying the command line call, please use the following placeholders `{peptides}` as sequence input, `{alleles}` as allele input, and `{out}` as output file. `{options}` can be used to allow user specific optional command line flags that will directly be passed through epytope to the external tool. The placeholder are later filled with the appropriate replacements via Pythons `string.format()` function.

As usual `epytope.EpitopePrediction.AExternalEpitopePrediction` combines `AEpitopePrediction` and `AExternal` and provides a implementation of `predict()` that calls all other methods to implement the complete interaction between epytope and the external tools. Additionally, `AExternalEpitopePrediction` extends the interface of `AEpitopePrediction.predict()`to the following:


In [None]:
class AExternalEpitopePrediction(AEpitopePrediction, AExternal):
    """
        Abstract class representing external prediction tools.

    """

    def predict(self, peptides, alleles=None, command=None, options=None, **kwargs):
        ...

The two new optional parameters `command=None` and `options=None` can be used to specify a path to an alternative binary and additional command line options respectively. The alternative binary should be of the same version as the version specified in the epytope implementation or at least should produce the same output format. The optional command line parameter are not tested by epytope and are directly handed over to the external tool.

New external Methods should inherit from `AExternalEpitopePrediction` and implement the missing properties and functions accordingly.

<h3> Simple Example of a new Prediction Method </h3>

In [None]:
from epytope.EpitopePrediction import APSSMEpitopePrediction
from epytope.Core import EpitopePredictionResult
import random
import pandas

class RandomEpitopePrediction(APSSMEpitopePrediction):
    __alleles = ["A*02:01"]
    __supported_length = [9]
    __name = "random"
    __version= "1.0"
    
    #the interface defines three class properties
    @property
    def name(self):
        #retunrs the name of the predictor
        return self.__name
    
    @property
    def supportedAlleles(self):
        #returns the supported alleles as strings (without the HLA prefix)
        return self.__alleles
    
    @property
    def supportedLength(self):
        #returns the supported epitope lengths as iterable
        return self.__supported_length
    
    @property
    def version(self):
        #returns the version of the predictor
        return self.__version
    
    #the interface defines a function converting epytope's HLA allele presentation
    #into an internal presentation used by different methods.
    #for this predictor we won't need it but still have to provide it!
    #the function consumes a list of alleles and converts them into the internally used presentation
    def convert_alleles(self, alleles):
        #we just use the identity function
        return alleles
    
    #additionally the interface defines a function `predict` 
    #that consumes a list of peptides or a single peptide and optionally a list 
    #of allele objects
    #
    #this method implements the complete prediction routine
    def predict(self, peptides, alleles=None):
        
        #test whether one peptide or a list
        if isinstance(peptides, Peptide):
            peptides = [peptides]
        
        #if no alleles are specified do predictions for all supported alleles
        if alleles is None:
            alleles = self.supportedAlleles
        else:
            #filter for supported alleles
            alleles = []
            filter(lambda a: a.name in self.supportedAlleles, alleles) 
        
        result = {}
        #now predict binding/non-binding for each peptide at random
        for a in alleles:
            result[a] = {}
            for p in peptides:
                if random.random() >= 0.5:
                    result[a][p] = 1.0
                else:
                    result[a][p] = 0.0
        
        #create EpitopePredictionResult object. This is a multi-indexed DataFrame 
        #with Peptide and Method as multi-index and alleles as columns
        df_result = EpitopePredictionResult.from_dict(result)
        df_result.index = pandas.MultiIndex.from_tuples([tuple((i,self.name)) for i in df_result.index],
                                                        names=['Seq','Method'])
        return df_result


Now lets use our new predictor.

In [None]:
from epytope.EpitopePrediction import EpitopePredictorFactory
from epytope.Core import Peptide

EpitopePredictorFactory("random").predict(Peptide("SYFPEITHI"))

<h2> Chapter 2: Abstract Base-Classes for HLA Typing </h2>

epytope currently offers only an interface to integrate external HLA typing tools, as the algorithms are quite often very involved and use additional third-party tools for data pre-processing. As usual, a very basic interface called `epytope.Core.AHLATyping`: 

In [None]:
class AHLATyping(object, metaclass = APluginRegister):

    @property
    @abstractmethod
    def name(self):
        """
        Returns the name of the predictor

        :return:
        """
        raise NotImplementedError

    @property
    @abstractmethod
    def version(self):
        """
        parameter specifying the version of the prediction method

        """
        raise NotImplementedError

    @property
    @abstractmethod
    def predict(self, ngsFile, output, **kwargs):
        """
        Prediction method calling the HLA typing algorithm

        :param str ngsFile: The path to the input file containing the NGS reads
        :param str output: The path to the output file or directory
        :param kwargs: optional parameters directly handed over to the algorithm without checking
        :return: list(Allele) - A list of HLA alleles representing the genotype predicted by the algorithm
        """
        raise NotImplementedError

defines rudimentary functionality. `epytope.HLATyping.AExternalHLATyping` combines `epytope.Core.AHLATyping` and `epytope.Core.AExternal` and implements `predict()` with an overwritten interface and extends the interface of `epytope.HLATyping.AExternalHLATyping` by an additional abstract method.

In [None]:
class AExternalHLATyping(AHLATyping, AExternal):

    def predict(self, ngsFile, output, command=None, options=None, delete=True, **kwargs):
        """
        Implementation of prediction

        :param str ngsFile: The path to the NGS file of interest
        :param str output: The path to the output file or directory
        :param str command: The path to a alternative binary (if binary is not globally executable)
        :param str options: A string with additional options that is directly past to the tool
        :param bool delete: Boolean indicator whether generated files should be deleted afterwards
        :return: list(Allele) - A list of Allele objects representing the most likely HLA genotype
        """

       ...
    
    @property
    @abstractmethod
    def clean_up(self, _output):
        """
        Cleans the generated files after prediction

        :param str output: The path to the output file or directory
        """
        raise NotImplementedError

Similar to the optional inputs `command=None` and `options=None` in AExternalEpitopePrediction, these options can be used to specify the path to an alternative binary and additional command line inputs respectively.