Generating Random Alanine Scanned Peptides
Peptide sequence mutator through Alanine Scanning
For more information, refer to this page on our wiki: https://2022.igem.wiki/mit-mahe/software
Peptide sequence mutations on a sample sequence. Right: The list of all the outputted sequences.
The software mainly comprises of the following modules:
-
Alanine Scan ddG values BUDE Alanine scan is employed to obtain ddG values of each residue present in the input.
-
Conservative mutations
Conserved mutation is a process in which mutations are made within the same type of residues.
For example: Hydrophobic residues are replaced by hydrophobic residues. In our case we delved deeper into this by replacing it w.r.t to charge as well.
-
Random sampler We select certain samples from the huge sequence list recieved as the output to reduce users work load. The selection happening is completely random.
-
Command Line Interface (CLI) We can use the software locally on command prompt.
-
Integrable modules
While running the software we can include different functions and libraries of python and basically run the software on different python environments.
-
Mutation lock To prevent the remutation of an already modified residue we put that part of the sequence into a mutation lock.
-
Aggrescan
A BASH scipt is been included which takes the output and prepends FASTA headers to each sequence in the text file.
Protein engineering is the process of modifying a protein through substitution, Insertion, or deletion so that it can exhibit certain characteristics and properties that we desire. During the process of designing our peptide, we came across the problem of having a very large sample set of substitutions through trial and error. However, even after this we could not guarantee an increase in affinity as compared to the original peptide as needed in our case. To overcome this problem, we designed this software which will provide all the most probable mutations that increase the binding affinity.
Furthermore, we explored other point mutation softwares for our problem; however, they only took surface information into consideration. We decided to take ddG value of a residue as the parameter for mutation which give an increase in the accuracy of the mutation.
-
Open terminal and go to a directory where you want to installation files to be downloaded to.
cd path_to_directory
-
Clone the repository
git clone https://gitlab.igem.org/2022/software-tools/mit-mahe.git
-
pip install -e .
-
Run GRASP using
grasp
command.grasp -h
git clone https://gitlab.igem.org/2022/software-tools/mit-mahe.git
pip install -e .
grasp -h
pip install -r requirements.txt
Input: A docked structure of a receptor and a ligand.
Output Text file containing all the possible sequences of mutations. Additionally a text file of randomly sampled mutations is also created.
-g, -a, -o, -d, -g, -a, -l, -c
We take ddG value of each residue derived from BUDE Alanine scan as a parameter to do mutations. To make these mutations we set a threshold ddG range for each residue (in our case, 0 to 1). Only if the ddG value lies between this range will it be mutated.
Cartesian Product: The Cartesian Product of sets A and B is defined as the set of all ordered pairs (x, y) such that x belongs to A and y belongs to B. For example, if A = {1, 2} and B = {3, 4, 5}, then the Cartesian Product of A and B is {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}.
Originally, during a test run, the computational time that the software took for making 3 mutation positions was 2 hours 20 mins. After incorporating the cartesian product into the mutator, the time reduced to 11 mutation positions in 10 seconds.
We first create a Mutater object, which support the following three formats:
- Wild type amino acid, the position of the Amino acid, and the Mutant type it needs to be replaced with.
- Just the Wild type amino acid and its position, without the Mutant type. This can act as the template to create new Mutater objects.
- Just the position in the sequence.
The Alanine scanning results from BUDE Alanine scan are parsed through which results in a position array over which mutations are possibly required.
The sequence is taken as a template over which a special array is constructed, with the following steps:
- Iterate over the sequence and get the wild type amino acid at each stage.
- Retain the wild type if mutation isn't required at this position. This could result when the position doesn't want to be mutated, or when the position is present in the mutation lock array.
- If a mutation is required in the current position, replace the position with an array of all the possible amino acids within the same group. This step is important to achieve Conserved mutations.
The special array is then passed to the itertools.product()
function, which evaluates the Cartesian product on the elements of the special array. This results in an iterable object
that contains all the final sequences, which is then stored in the text file.
A trivial random sampler is implemented using random.sample()
on the sequences array, and then stored in a text file.
Plans to include a Monte Carlo approach of sampling.
iGEM MIT_MAHE 2022
This project is licensed under the Creative Commons Attribution 4.0 International. Learn more