<h1>Available Selection Methods</h1>
<ul>
  <li><h2>Random</h2><p>Select residues completely randomly. Each of the 20 residues has a 1/20 chance of appearing in any spot.</p></li>
  <li><h2>Randomized Group Selection</h2><p>Probably not the correct name for this, but this is in beta so we will worry about that later. Select Residues using the probabilities provided. For example, if <code>Percent_Polar</code> is set to <code>0.3</code>, then each residue will have a <code>30%</code> chance to be polar. Note that poor parameterization can result in this method never finding a solution</p></li> 
  <li><h2>Mutation -- Not implemented</h2><p>Begin with a random sequence. Each iteration will randomly select one residue that is causing distance from the desired output and mutate it into something that brings closer the desired results.</p></li>
  <li><h2>Distribution -- Not implemented</h2><p>Select residues using a provided distribution. Note that this method will ignore checks to provided parameters, and instead only use the distributions provided.</p></li>
  <li><h2>Builder</h2><p>Builds a protein from scratch to explicitly satisfy provided parameters. Not very fun, but it is the fastest by far and is honestly the most reliable method here. Adds residues to satisfy a probability parameter then moves to the next parameter. After this, the charged residues are flipped accordingly to get an exact specified charge. After this, the sequence is shuffled to give a random sequence. Awkward parameterization can result in an incorrect length or a very slight incorrect charge (+ or - 1)</p></li>
</ul>

In [None]:
#@title Run this cell to set, refresh, and apply your chosen settings

import random
import string
import math

import sys
import signal

AAs = "ARNDCEQGHILKMFPSTWYV"
POSITIVE = "RK"
NEGATIVE = "ED"
POLAR = "STNQ"
NONPOLAR = "AVILMFYWGP"
CHARGED = "RKED"

#@markdown Length of final output sequence
Length = 30 #@param{type: "integer"}
#@markdown Percentage of the final sequence that should be polar
Percent_Polar = 0 #@param{type: "number"}
#@markdown Percentage of the final sequence that should be charged
Percent_Charged = 0 #@param{type: "number"}
#@markdown Final total charge of the final sequence
Total_Charge = 0 #@param{type: "integer"}
#@markdown Plus-or-minus range for the final charge of the output sequence. Note that the "Builder" selection method forces at least a range of 1.
Charge_Range = 1 #@param{type: "integer"}
#@markdown Consider histidine (H, HIS) to be a charged residue. Otherwise it is considered polar
Count_Histidine_As_Charged = True #@param {type: "boolean"}
#@markdown Consider cysteine (C, CYS) to be a polar residue. Otherwise it is considered nonpolar
Count_Cysteine_As_Polar = True #@param {type: "boolean"}
#@markdown Consider all charged residues to also be polar
Count_Charged_As_Polar = False #@param{type: "boolean"}
#@markdown Disregard charge as a check. The "Percent_Charged" will still be used for generation, but will not be used when determining if a sequence is completed or not
Disregard_Charge = False #@param{type: "boolean"}
#@markdown Random selection method
Random_Selection_Method = "Builder" #@param["Random", "Gaussian", "Mutation", "Builder"]
#@markdown Clean some of the polar residues out of the final result. If <code>Count_Charged_As_Polar</code> is checked, the resultant amount of polar residues may run high. Check this box to push the result towards having closer to the specified amount of residues at the expense of variety and the number of non-charged polar residues.
Clean_Polar = False #@param {type: "boolean"}


if Count_Cysteine_As_Polar:
  POLAR += "C"
else:
  NONPOLAR += "C"

if Count_Histidine_As_Charged:
  CHARGED += "H"
  POSITIVE += "H"
else:
  POLAR += "H"

POLAR_CHECK = POLAR
if Count_Charged_As_Polar:
  POLAR += POSITIVE
  POLAR += NEGATIVE

assert Percent_Charged >= 0 and Percent_Polar >= 0, "All provided probabilities must be positive or 0"
if not Count_Charged_As_Polar:
  assert Percent_Polar + Percent_Charged <= 1, "Sum of percentages must be less than or equal to 1 if charged residues are not considered polar"
else:
  assert Percent_Polar <= 1, "Percentage of polar residues must be less than or equal to 1"
  assert Percent_Charged <= 1, "Percentage of charged residues must be less than or equal to 1"
  conglomerate_prob = (Percent_Polar - Percent_Polar * (len(CHARGED) / (len(POLAR)))) + (Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR))))
  # assert conglomerate_prob <= 1, f"Modified probabilites will be >1.\n\tModified Percent_Polar = {Percent_Polar - Percent_Polar * (len(POLAR_CHECK) / (len(POLAR)))}\n\tModified Percent_Charged = {Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR)))}"
assert Total_Charge <= Length, "Total charge must be less than or equal to the length of the desired protein"

if Count_Charged_As_Polar:
  print(f"\nModified Percent_Charged = {Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR)))}\nModified Percent_Polar = {Percent_Polar - Percent_Polar * (len(CHARGED) / (len(POLAR)))}")

  # print(f"\nModified Percent_Polar = {Percent_Polar - Percent_Polar * (len(POLAR_CHECK) / (len(POLAR)))}\nModified Percent_Charged = {Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR)))}")
  # print(f"\nModified Percent_Polar = {Percent_Polar - Percent_Polar * (len(POLAR_CHECK) / (len(POLAR)))}\nModified Percent_Charged = {Percent_Charged + Percent_Polar * (len(POLAR_CHECK) / (len(POLAR)))}")


In [None]:
#@title Run this to generate a sequence using the settings above
def GetCharge(sequence):
  total_charge = 0
  for c in sequence:
    if c in POSITIVE:
      total_charge += 1
    elif c in NEGATIVE:
      total_charge -= 1
  # print(f"\n{total_charge}\n")
  return total_charge

def NumCharged(sequence):
  total = 0
  for c in sequence:
    if c in CHARGED:
      total += 1
  return total

def NumPolar(sequence):
  total = 0
  for c in sequence:
    if c in POLAR:
      total += 1
  return total

def PercentCharged(sequence):
  num_charged = 0
  for c in sequence:
    if c in POSITIVE or c in NEGATIVE:
      num_charged += 1
  return num_charged / len(sequence)

def PercentPolarC(sequence):
  num_polar = 0
  for c in sequence:
    if c in POLAR_CHECK:
      num_polar += 1
  return num_polar / len(sequence)

def PercentPolar(sequence):
  num_polar = 0
  for c in sequence:
    if c in POLAR:
      num_polar += 1
  return num_polar / len(sequence)

def PercentNonPolar(sequence):
  num_not = 0
  for c in sequence:
    if c in NONPOLAR:
      num_not += 1
  return num_not / len(sequence) 

#Generate a random sequence completely randomly
def RandomSequence(length):
  sequence = ""
  for i in range(length):
    sequence += AAs[random.randrange(0, len(AAs))]
  return sequence

def GaussianSequence(length, Percent_Polar, Percent_Charged, Count_Charged_As_Polar, Total_Charge):
  p_pos = 0
  p_neg = 0
  if Total_Charge > 0:
    p_neg = (1 - (Total_Charge / (Percent_Charged * length))) / 2
    p_pos = 1- p_neg
  else:
    p_pos = (1 - abs(Total_Charge) / (Percent_Charged * length)) / 2
    p_neg = 1- p_pos

  if Count_Charged_As_Polar:
    # Percent_Charged = Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR)))
    # Percent_Polar = Percent_Polar - Percent_Polar * (len(CHARGED) / (len(POLAR)))
    # Percent_Charged = Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR)))
    # # Percent_Polar = Percent_Polar - Percent_Polar * (len(POLAR_CHECK) / (len(POLAR)))
    # Percent_Polar = 1 - Percent_Polar * (len(POLAR)) / (len(CHARGED))
    #original
    Percent_Charged = Percent_Charged + Percent_Polar * (len(CHARGED) / (len(POLAR)))
    Percent_Polar = Percent_Polar - Percent_Polar * (len(POLAR_CHECK) / (len(POLAR)))

  sequence = ""
  for i in range(length):
    r = random.random()
    if r < Percent_Polar:
      sequence += POLAR[random.randrange(0, len(POLAR))]
    elif Percent_Polar < r and r < Percent_Polar + Percent_Charged:
      r2 = random.random()
      if p_pos > r2:
        sequence += POSITIVE[random.randrange(0, len(POSITIVE))]
      else:
        sequence += NEGATIVE[random.randrange(0, len(NEGATIVE))]
    else:
      sequence += NONPOLAR[random.randrange(0, len(NONPOLAR))]
  return sequence

def ConvergentMutation(length, sequence, Percent_Charged, Count_Charged_As_Polar, Total_Charge):
  if len(sequence) != length:
    sequence = RandomSequence(length)
  return sequence

def Shuffler(length, Percent_Charged, Percent_Polar, Total_Charge):
  sequence = []
  # sequence = ["X"] * length
  for i in range(math.ceil(length * Percent_Polar)):
    sequence.append(random.choice(POLAR))
  for i in range(math.ceil(length * Percent_Polar), math.ceil(length * Percent_Polar) + math.ceil(length * Percent_Charged) - NumCharged(sequence)):
    sequence.append(random.choice(CHARGED))

  disparity = Total_Charge - GetCharge(sequence)
  # print(GetCharge(sequence), Total_Charge, disparity)
  ind = 0
  while not (disparity == 0 or disparity == -1 or disparity == 1):
    # print(Total_Charge - GetCharge(sequence))
    if disparity > 0 and sequence[ind] in NEGATIVE:
      # print(f"Changing {sequence[ind]}")
      sequence[ind] = random.choice(POSITIVE)
      disparity -= 2
    if disparity < 0 and sequence[ind] in POSITIVE:
      sequence[ind] = random.choice(NEGATIVE)
      disparity += 2
    ind += 1
  disparity = Total_Charge - GetCharge(sequence)
  while len(sequence) < length:
    sequence.append(random.choice(NONPOLAR))
  random.shuffle(sequence)
  return "".join(sequence)

def CleanPolar(sequence):
  disparity = NumPolar(sequence) - math.floor(len(sequence) * Percent_Polar)
  ind = 0
  s = list(sequence)
  while disparity > 0 and ind < len(s):
    if s[ind] in POLAR_CHECK:
      s[ind] = random.choice(NONPOLAR)
      disparity -= 1
    ind += 1
  return "".join(s)

# def Shuffler(length, Percent_Charged, Percent_Polar, Total_Charge):
#   sequence = ["X"] * length
#   for i in range(math.ceil(length * Percent_Polar)):
#     # print(PercentPolar(sequence) < Percent_Polar)
#     if PercentPolar(sequence) < Percent_Polar:
#       sequence[i] = random.choice(POLAR)
#     elif PercentCharged(sequence) < Percent_Charged:
#       sequence[i] = random.choice(CHARGED)
#     else:
#       sequence[i] = random.choice(NONPOLAR)
#   for i in range(len(sequence)):
#     if GetCharge(sequence) == Total_Charge:
#       break
#     elif GetCharge(sequence) > Total_Charge and sequence[i] in POSITIVE:
#       sequence[i] = random.choice(NEGATIVE)
#     elif GetCharge(sequence) < Total_Charge and sequence[i] in NEGATIVE:
#       sequence[i] = random.choice(POSITIVE)
#   random.shuffle(sequence)
#   return "".join(sequence)



def ConditionsMet(sequence, p_pol, p_char, t_char):
  #Polar condition
  # print(sequence)
  polar = PercentPolar(sequence) >= p_pol
  if p_pol < 0:
    polar = True
  #Charge condition
  charge = PercentCharged(sequence) >= p_char
  if p_char < 0:
    chage = True
  #Charge total condition
  charge_t = False
  if Disregard_Charge:
    charge_t = True
  else:
    charge_t = GetCharge(sequence) <= (t_char + Charge_Range) and GetCharge(sequence) >= (t_char - Charge_Range)
  # print(sequence)
  # print(polar, charge, charge_t)
  # print(PercentPolar(sequence))
  # print(PercentPolarC(sequence))
  # print(PercentCharged(sequence))
  # raise
  return polar and charge and charge_t
  

global curr_attempt
curr_attempt = ""

def sigterm_handler(signal, frame):
  global curr_attempt
  result = curr_attempt
  print(f"\nPrevious attempt before kill: \n{result}\nPercent Polar: {PercentPolar(result)}\nPercent Charged: {PercentCharged(result)}\nTotal Charge: {GetCharge(result)}")
  sys.exit()

signal.signal(signal.SIGINT, sigterm_handler)


def GenerateSequence(length, Count_Charged_As_Polar, Total_Charge):
  global curr_attempt
  curr_attempt = ""
  if Random_Selection_Method == "Random":
    curr_attempt = RandomSequence(length)
  elif Random_Selection_Method == "Gaussian":
    curr_attempt = GaussianSequence(length, Percent_Polar, Percent_Charged, Count_Charged_As_Polar, Total_Charge)
  elif Random_Selection_Method == "Mutation":
    curr_attempt = ConvergentMutation(length, curr_attempt, Percent_Charged, Count_Charged_As_Polar, Total_Charge)
  elif Random_Selection_Method == "Builder":
    curr_attempt = Shuffler(length, Percent_Charged, Percent_Polar, Total_Charge)
  # seq = ""
  # for i in range(length):
  #   if Random_Selection_Method == "Gaussian":
  #     r = random.random()
  #     if r < Percent_Polar:
  #       seq += POLAR[random.randrange(0, len(POLAR))]
  #     elif Percent_Polar < r and r < Percent_Polar + Percent_Charged:
  #       seq += CHARGED[random.randrange(0, len(CHARGED))]
  #     else:
  #       seq += NONPOLAR[random.randrange(0, len(NONPOLAR))]
  #   elif Random_Selection_Method == "Random":
  #     seq += AAs[random.randrange(0, len(AAs))]
  return curr_attempt

def BuildRandomProtein(length, p_pol, p_char, t_char, Count_Charged_As_Polar, Clean_Polar):
  # global curr_attempt
  curr_attempt = GenerateSequence(length, Count_Charged_As_Polar, t_char)
  # while PercentCharged(curr_attempt) < p_char or PercentPolar(curr_attempt) < p_pol or GetCharge(curr_attempt) != t_char:
  iter = 1
  print(f"\rAttempt {iter}", end="")
  while not ConditionsMet(curr_attempt, p_pol, p_char, t_char):
    iter += 1
    # print(ConditionsMet(curr_attempt, p_pol, p_char, t_char))
    curr_attempt = GenerateSequence(length, Count_Charged_As_Polar, t_char)
    print(f"\rAttempt {iter}", end="")
  print("\n")
  if Clean_Polar:
    curr_attempt = CleanPolar(curr_attempt)
  return curr_attempt

current_best = GenerateSequence(Length, Count_Charged_As_Polar, Total_Charge)

result = BuildRandomProtein(Length, Percent_Polar, Percent_Charged, Total_Charge, Count_Charged_As_Polar, Clean_Polar)
print(f"Result: {result}\nPercent Polar: {PercentPolar(result)}\nPercent Charged: {PercentCharged(result)}\nPercent NonPolar: {PercentNonPolar(result)}\nTotal Charge: {GetCharge(result)}")


Attempt 1

Result: APLAPYWIWMAGYFIGWMFWPMIVYGWALF
Percent Polar: 0.0
Percent Charged: 0.0
Percent NonPolar: 1.0
Total Charge: 0
