<font color="Maroon"><h1 style="font-size:2em; "> Protein Folding Exercises! </h1></font><br>
<font color="DarkSlateBlue"><p style="font-size:17px; "><b><u> The following code blocks cover:</b></u><br><br>
    1. How to introduce mutations by randomly changing the amino acid sequence of a protein<br>
    2. How to predict the 3D structure of a protein given its amino acid sequence<br>
    3. How to calculate a score of how two protein structures resemble each other</p></font>

<font color="DarkSlateBlue"><p style="font-size:16px; ">The next code block reads in the amino acid (AA) sequence from a FASTA formated file as a string called `AA_string`; which is then converted to a list called `AA_list`.<br><br> 
Then, a mutation is introduced to the amino acid sequence. This is accomplished by randomizing the list based on a user specified percentage and using the `partially_randomize` function. After running this code, you should see a new file in the directory specified by `file_path`, containing the randomized sequence.<br><br>
Both the the original and mutated sequences ar`````e saved in FASTA format.<br><br>
This code block imports the `random` and `os` Python libraries. You may need to use `pip install` to install these libraries if you haven't done so already.<br><br>
    <b>NOTE:</b> Be sure to change the value of `file_path` to match the location of your FASTA file.
    </p></font>

In [None]:
import random
import os

def partially_randomize(lst, percent):
    n = len(lst)
    num_swaps = int(n * percent / 100)  # Calculate the number of swaps based on the percentage
    for _ in range(num_swaps):
        # Randomly select two indices for swapping
        idx1, idx2 = random.randint(0, n-1), random.randint(0, n-1)
        lst[idx1], lst[idx2] = lst[idx2], lst[idx1]

file_path = r'D:/Python/Protein Structure/Mouse class II alcohol dehydrogenase complex with NADH/rcsb_pdb_1E3E.fasta'

# Open the file in read mode
with open(file_path, 'r') as fasta_file:
    header = fasta_file.readline()
    AA = fasta_file.readlines() 

AA_string = ''.join([line.strip() for line in AA])
AA_list = list(AA_string)

p = float(input("Enter the percentage to randomize the sequences: "))

lst_copy = AA_list.copy()
partially_randomize(lst_copy, p)
randomized_file_name = f"rcsb_pdb_1E3E_{p}pct_randomized.fasta"
file_dir = os.path.dirname(file_path)
randomized_file_path = os.path.join(file_dir, randomized_file_name)
with open(randomized_file_path, 'w') as rand_file:
    rand_file.write(header)
    rand_file.write(''.join(lst_copy))

<font color="DarkSlateBlue"><p style="font-size:16px; ">The following code block predicts the 3D shape of a protein using `ESMfold`. The amino acid sequence of the protein is read from a FASTA formatted file using the function `load_sequence_from_fasta` and passes it to the function `predict_3d_structure` to predict the 3D structure by using the `ESMFold API`. The predicted 3D structure is saved to a file in PDB format.<br><br>
 <b>NOTE:</b> FASTA files generated in the previous code block can be used as input for this code.<br>
 <b>NOTE:</b> Be sure to change the value of `file_path` to match the location of your FASTA file.<br><br>
 The following Python libraries are used in this code block: `requests`, `Bio`, and `SeqIO`. You may need to use `pip install` to install these libraries if you haven't done so already.
  </p></font>

In [None]:
import requests
from Bio import SeqIO

def load_sequence_from_fasta(file_path):
    # Load the sequence from a '.fasta' file
    with open(file_path, 'r') as fasta_file:
        sequence_record = SeqIO.read(fasta_file, 'fasta')
    return str(sequence_record.seq)

def predict_3d_structure(sequence):
    headers = {'Content-Type': 'application/x-www-form-urlencoded',}
    response = requests.post('https://api.esmatlas.com/foldSequence/v1/pdb/', headers=headers, data=sequence)
    pdb_string = response.content.decode('utf-8')

    with open('predicted.pdb', 'w') as f:
        f.write(pdb_string)

file_path = r'D:/Python/Protein Structure/Mouse class II alcohol dehydrogenase complex with NADH/rcsb_pdb_1E3E.fasta'
        
# Load sequence from FASTA file
sequence = load_sequence_from_fasta(file_path)

# Call the function with the input sequence
predict_3d_structure(sequence)

<font color="DarkSlateBlue"><p style="font-size:16px; ">The following code block compares the 3D structure of two proteins. It reads in two protein structures in PDB format. Then, it aligns all the atoms in the protein and calculates the RMSD value.<br><br>
 <b>NOTE:</b> Be sure to change the value of `file_path` to match the location of the PDB files.<br>
 <b>NOTE:</b> PBD files generated in the previous block can be used as input for this code.<br><br>
 The following Python libraries are used in this code block: `requests`, `Bio`, and `SeqIO`. You may need to use `pip install` to install these libraries if you haven't done so already.
  </p></font>

In [None]:
# This code will compare the 3D structure of two proteins. It first reads in two PDB files.
# Then, it aligns all the atoms in the protein and calculates the RMSD value. 

from Bio.PDB import PDBParser, Superimposer

# Read PDB files
parser = PDBParser()
file_path1 = r'D:/Python/Protein Structure/original.pdb'
file_path2 = r'D:/Python/Protein Structure/predicted.pdb'
structure1 = parser.get_structure("Original", file_path1)
structure2 = parser.get_structure("predicted", file_path2)

# Get all atoms from the protein structures
atoms1 = list(structure1.get_atoms())
atoms2 = list(structure2.get_atoms())

# Check if the number of atoms is the same in both proteins
if len(atoms1) != len(atoms2):
    print("The number of atoms in both structures is different. Structural alignment may not be meaningful.")
else:
    # Perform structural alignment
    sup = Superimposer()
    sup.set_atoms(atoms1, atoms2)

    # Get the RMSD (root mean squared deviation) value after alignment
    rmsd = sup.rms

    print("RMSD: ", rmsd)