**Name** : Emmanuel Ogbu

**ID Number** : 24034724

**Title** : Assessing sequence variation using Smith-Waterman local sequence alignment in BRCA1 variants.

**Introduction**

As a bioinformatican, it is important to be able to produce a sequence alignment and evaluate the number and the types of variations in order to determine their biological impact, especially at the protein level. When detecting and characterising genetic changes between sequences, assessing sequence variation is very helpful.

Assessing sequence variation, allows bioinformaticans to see changes that occure in the DNA level and how this changes affects the bases in the protein level. By assessing sequence variations, we are able to see the types of mutation changes that occur such as insertions, deletions and potentially any substitution that may affect gene or protein functions. By assessing sequence variations, it allows bioinformaticans to understand genetic diversity in the evolutionary history and disease susceptibility in the genes. An example of this is Sequence variation in genes like BRCA1 where specific mutations are linked to hereditary breast cancer and overian cancers. 

The aim is to locate those functional issues of these mutations at a protein level(Kumar et al. 2020) . BRCA1 a nucleophosphoprotein plays a important role in maintaining the genome integrity by functioning as a tumour suppressing protein molecules (Kumar et al. 2020) which is involved in DNA repair. Mutations in BRCA1 can slow DNA repair machanisms and increase risk of cancer(Kumar et al. 2020). 


To solve this problem, a computational method that uses Biopython, to produce sequence alignment and assess the number and type of varients was used. This method using local alignment, accurately detects and categorises protein level mutations in BRCA1 across two sample data to see what type of mutation has occurred in the gene and how this will affect the protein bases which could lead to more serious issues like cancer. 

To obtain the sequence variation, we applied some techniques such as using Local Alignment like the Smith-Waterman Algorithm. This technique allows for local alignment to find the best similarities between biological sequences and matching regions between the reference mRNA BRCA1 gene and each isolates (Oliveira et al. 2022). This will accurately align only relevant similar segments, which will handle indels like insertions and deletions. Then, once alignment has been done, using Biopython, we translate the aligned sequence to the first stop codon, converting the BRCA mRNA and partial gene isolates into protein (Amino acid) sequences. Then we compare protein sequences to identify types of mutations like Missense mutations, Synonymous changes and nonsense mutations. Similarly, there are many techniques used to align sequences such as Needleman-Wunsch algorithm for Global Alignment. This algorithm involves aligning a full length sequence from end to end while locating all mutations across the whole gene. However, this techniques is only useable for long full length sequencing but with short sequencing, it can lead to misalignment if they don't span the full mRNA reference. Whereas, another technique used is BLAST (Basic local alignment search tool). This is another tool that can be used for local alignment for sequence similarity searching to align multiple sequence alignment (Zaru et al. 2023). This tool is more better for faster sequencing and scalability comparisons against large databases to find similar sequences. However, the reason why Smith-Waterman Algorithm was chosen rather than BLAST or any other tool is because BLAST is less precise compared to Smith-Waterman which is good for fine resolution comparisons between closely related sequences like mRNA isolate and partial gene isolates. Whereas BLAST is better for searching large databases faster. 

The goal for this problem of assessing sequence variations, is to use Smith-Waterman Algorithm to detect mutation types to see if the variants are synonymous or non-synonymous and to determine whether these changes leads to functionally significant outcomes such as missense mutations, nonsense mutations, or readthrough mutations which may impact the BRCA1 protein structure and actvities. 

**Logical Steps For Assessing Sequence Varitation**

1. Data Processing
-  Load the reference mRNA sequence into a dictionary
-  Load the Isolate gene sequence into a dictionary
-  Read the reference mRNA sequence and Isolate gene sequence using Biopython PairwiseAligner
-  Define the Smith-Waterman local alignment algorithm scoring parameters. 

2. Aligning both reference mRNA sequence and Isolate gene sequence for DNA variant detection
-  Choosing a reference mRNA Sequence to compare nucleotides to all alsolates genes
-  Create a function that will compare aligned reference and isolate sequences to detect SNPs, Insertions and Deletions
-  Align each isolates sequences to the reference mRNA using Smith_Waterman Algorithm and then extreact aligned sequences as strings, then call the first function of detect mutations to detect the mutations in the aligned genes.

3. Translate aligned DNA Sequences into Amino Acid (Protein) and Assess Functional Impact
-  Create a second function that cleans the nucleotide sequence by removing gaps and converting all sequences to Uppercase.
-  Translate cleaned nucleotide sequence into Amino Acid (Protein) till it stops at the first stop codon and return this protein as a string
-  Create a third function that compares both ref amino acid protein and isolate amino acid proteins by going through each amino acid position by position to detect meaningful changes which will then be stored in an empty list variable.
-  Run translation and protein level comparison using Smith_Waterman Algorithm, to align each isolate to reference protein sequences, then extract the aligned sequences and translate them into proteins
-  Using the third function, compare reference protein to isolate proteins to identify mutation types like Nonsense, Readthrough and Missense to view how many amino acid changes were found for each isolates.

**Method**

To solve this Bioinformatic problem, BRCA1 variants files were given with two multifasta files inside the provided data. The first step to solve this problem was the load the file to assess sequence variation in BRCA1 isolates. This was done by importing SeqIO for reading FASTA files, Seq for sequence manipulation and translation and the alignment modules for sequence alignment from the Biopython library. Then two dictionaries were created to store both mRNA and Isolate gene sequence ID and Sequences inside it. Then using the variable mRNA_file and gene_isolate_file we loaded the mRNA isolate and Partial Gene isolate using SeqIO.parse() from the file directory. Then the total number of mRNA and Gene Isolate was recorded. Using the assigned dictionaries the mRNA and Gene Isolate were printed out to show the IDs and Sequences. 

In [19]:
from Bio import SeqIO
from Bio import Align
from Bio.Seq import Seq

mRNA_sequences = {} #mRNA dictionary to store the mRNA isolates id and sequences
gene_isolate_sequences = {} #gene isolate dictionary to store gene isolate id and sequences 


#Step 1 - Load both multifasta files

#loading the mRNA isolate file 
mRNA_file = "C:/Users/emman/OneDrive/Desktop/Alignment_problem_1/MB_problem/BRCA1_mRNA_isolate_D.fasta"
for record in SeqIO.parse(mRNA_file, "fasta"):#use seqio to read the mRNA_file     
    mRNA_sequences[record.id] = record.seq#in the mRNA seq dictionary locate the fasta file iD and also the sequences 

#We also load the partical gene isolate
gene_isolate_file = "C:/Users/emman/OneDrive/Desktop/Alignment_problem_1/MB_problem/BRCA1_partial_gene_isolates.fasta"
for record in SeqIO.parse(gene_isolate_file, "fasta"):#read the fasta file gene_isolate_file and return the sequence records
    gene_isolate_sequences[record.id] = record.seq#in the gene_isolate_sequences dictionary locate the fasta file iD and also the sequences 

for mRNA_isolate_id, mRNA_iso_seq in mRNA_sequences.items():#in mRNA sequence dictionary, locate the mRNA Id and Sequence 
    print("\nmRNA Isolates:")#print the mRNA Isolate 
    print(f"{mRNA_isolate_id}:\n{mRNA_iso_seq}\n")#Print mRNA ID and mRNA seq

for isolate_id, iso_seq in gene_isolate_sequences.items():
    print("\nPartial Gene Isolates:")
    print(f"{isolate_id}:\n{iso_seq}\n")


mRNA Isolates:
lcl|MF590182.1_cds_AWA45334.1_1:
AGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCCA


mRNA Isolates:
lcl|MF590181.1_cds_AWA45333.1_1:
AGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAACCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCCA


mRNA Isolates:
lcl|MF590180.1_cds_AWA45332.1_1:
GAGAGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGGAGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCCA


mRNA Isolates:
lcl|MF590179.1_cds_AWA45331.1_1:
AGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGCACAAATACTCATGCCAGCTCA

After parsing the files from their intended directory, from the mRNA isolate, a reference sequence was selected. This will serve as the standard for alignment and mutation comparison. From the mRNA_sequences dictionary, the first sequence ID was obtained using index [0]. Then a second variable was set to store the first sequence ID and nucleotide sequence.

In [20]:
#Step 2 - we will be choosing a reference mRNA Sequence 
#Pick the first sequence in your BRCA1 mRNA reference
ref_id = list(mRNA_sequences.keys())[0]#This gets the first sequence ID
ref_seq = mRNA_sequences[ref_id]#the sequences in mRNA_sequences dict, extracts the first sequence reference sequence
print(F"Using '{ref_id}' as reference for alignment.\n")#show the first mRNA seq ID as reference.


Using 'lcl|MF590182.1_cds_AWA45334.1_1' as reference for alignment.



Once the reference mRNA ID and Sequence were obtained, the Smith-Waterman local alignment algorithm scoring parameters was setup using pairwise sequence aligner from the Biopython library. Using the aligner mode 'local' the Smith-Waterman scoring parameters were chosen to balance match rewards, and penalties for mismatches and gaps. This will allow for detection of local sequence similarity and variation between reference and isolates sequences. 

In [23]:
#Step 3 - arrange the pairwise local aligner which is the Smith-Waterman Local alignment 
aligner = Align.PairwiseAligner()# Initialise the pairwise alignment object

aligner.mode = 'local'# Local alignment (Smith-Waterman)

aligner.match_score = 2 # Reward for matching bases
aligner.mismatch_score = -1# Penalty for mismatched bases
aligner.open_gap_score = -2# Penalty for opening a gap
aligner.extend_gap_score = -0.5# Penalty for extending a gap

Then the first function was defined. This function detects mutations from aligned sequences reference and isolates. From this defined function, we open an empty list to store information on mutations and another variable was assigned to locate the position index for refernce mRNA. This function involves looping through each aligned base pair, tracks the position in reference and locate mutations like SNP, Insertion and Deletion. Which will then be stored in the dictionary called mutations following the Isolate ID, Mutation type, Position in reference, reference base and isolate base too and returns mutations.

In [24]:
#Step 4 - Define function to detect mutations from aligned sequences 
def detect_mutations(ref_aligned, isolate_aligned, isolate_id):
    """
    Compare aligned reference and isolate sequences to detect:
    1) SNPs (Single Nucleotide Polymorphisms)
    2) Insertions (bases in isolate but not in reference)
    3) Deletions (bases in reference but not in isolate)

    returns - mutations (list): List of dictionaries containing mutation information
    """

    mutations = []#This stores mutation information here
    reference_position = 0 #This is the position index for reference mRNA  

    for i, (r, q) in enumerate(zip(ref_aligned, isolate_aligned)):
        if r != '-':#if there is no gaps
            reference_position += 1 #only add postion if not a gap in reference
        if r == q:
            continue # if r and q are the same, no mutation at that position
        #We need to determine mutation type 
        if r != '-' and q != '-':#if there is no gaps in r and q 
            mutation_type = "SNP" #This is the base mismatch
        elif r == '-' and q != '-': #if there is no gap in isolate and gaps in reference
            mutation_type = "Insertion" #There is bases in isolate but not in reference
        elif r != '-' and q == '-': #if there is no gap in reference and gap in isolate
            mutation_type = "Deletion" #there is bases in reference but no base in isolate
        else:
            continue #this will ignore double gaps which is very rare.

        #We will record the mutation information
        mutations.append({
            "Isolate": isolate_id,
            "Type": mutation_type,
            "Position_in_ref": reference_position,
            "Ref_base": r,
            "Iso_base": q
        })

    return mutations


Once the first function is defined, an empty list variable is open to store all mutations. Then all mRNA and Gene Isolates is combined into a dictionary together. Using the Biopython aligner, each sequence were aligned to the reference BRCA1 mRNA using Smith-Waterman local alignment technique. Then both aligned_reference_str in string format and aligned_isolate_str in string format were extracted from the aligned sequences. The aligned string sequences were parsed by identifying lines that start with 'terget' or 'query'. Then the aligned sequences in string format were added into the detect_mutation function that will detect SNP, Insertation and Deletion, track mutation postions and show the base changes in reference and isolate gene. Then this function of detecting mutation, was then appended to the empty list that will collect all mutations across all isolates. This will then show the number of mutation found, and a summary of each mutation type, position, and reference base and isolate base. With this process, We have completed the DNA-level variant detection. 

In [25]:
#Step 5 -  We will be aligning all sequences against reference to detect mutations

mutation_list = [] #This is an empty list that will collect all mutations 

all_isolates = {**mRNA_sequences, **gene_isolate_sequences}#THis is combining all BRACA1 and isolates sequences dictionaries together

for isolates_id, iso_seq in all_isolates.items():#We are looping over each isolate gene and aligning them to the reference sequence 
    alignment = aligner.align(ref_seq, iso_seq)[0]#This is performing local alignment for best match from index 1 
    alignment_lines = alignment.format().split('\n')#we formate the alignments and put them in lines
    print(alignment)
    aligned_reference_str = ""#get the aligned reference mRNA seq in strings
    aligned_isolate_str = ""#extracts the aligned gene isolates in strings

    for line in alignment_lines:#for the lines in the aligned seqs
        if line.startswith("target") or line.startswith("query"):#if the lines starts with target or query
            parts = line.split(maxsplit=2)#split the line into 2 fragments and put it into a storing veriable 
            if len(parts) == 3:#if the length of the storing veriable "parts" is 3
                seq_fragment = parts[2]#the sequence fragments is only 2 in parts 
                if line.startswith("target"):#if the line starts with target
                    aligned_reference_str += seq_fragment#aligned reference in string formate will equals to one fragment which is the aligned mRNA reference sequence 
                elif line.startswith("query"):#else if the line starts with query 
                    aligned_isolate_str  += seq_fragment#then aligned gene isolates in string formate will equals to the aligned isolate fragment

                    
    mutations = detect_mutations(aligned_reference_str, aligned_isolate_str, isolates_id)#This detects mutations, so in mutation veriable, we will detect mustation of aligned ref mRNA and aligned islolate 

    mutation_list.extend(mutations)#This adds mutations to the mutation list 

    #print(f"Aligned {isolates_id} - {len(mutations)} mutations found.")

    #for mutation in mutations:#so loop through each mutation founds 
        #print(
        #f"{mutation['Isolate']} | {mutation['Type']} at position {mutation['Position_in_ref']} "
        #f"| Ref: {mutation['Ref_base']} -> Iso: {mutation['Iso_base']}")#Print each mutation found for this isolate and there type of mutation, the position and the ref base and iso base that have mutated
        





target            0 AGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGC
                  0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query             0 AGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGC

target           60 ACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGAC
                 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query            60 ACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGAC

target          120 AGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGG
                120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           120 AGAATGAATGTAGAAAAGGCTGAATTCTGTAATAAAAGCAAACAGCCTGGCTTAGCAAGG

target          180 AGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCCA
                180 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           180 AGCCAACATAACAGATGGGCTGGAAGTAAGGAAACATGTAATGATAGGCGGACTCCCA

target          238
          

Once DNA level variant detection is done on the nucleotides, the next step will be to translate the nucleotides sequences into Amino acid (Protein) and assess functional impacts. To assess the functional impact of nucleotides DNA level mutations, the aligned mRNA sequences were translated into amino acid. Two new functions were created, the first function was to clean and translate the aligned sequences into amino acids without gaps. Using the Seq for sequence manipulation and translation from the Biopython libirary, translation was done using Seq.translate(to_stop=True) whcih stops translation at the first stop codon. Then the second function was to compare the reference and isolate amino acid sequences to identify functional mutations at the protein level base. Depending on the type of mutation identified at the protein level, will determine if the gene will be cancerous or not. If the protein mutation identifies Synonymous then there is no change in amino acid and therefore, no mutation in that sequence amino acid gene. However, if the protein mutation identifies Missense where an allel changes from T to A alles at a particilar position, this could cause disease like sickle cell or cancer.

In [21]:
#Step 7 we will write a function to translating DNA to Amino Acid and compare Protein Sequences
#Create a function to romove gaps and translate nucleotide DNA to amino acid sequences 
def cleaned_translation(seq_str):
    """
    We will clean a nucleotide sequence by removing gaps and translating it to amino acids
    """
    cleaned = seq_str.replace('-', '').upper()# in cleaned, we want to remove gaps and make sure all seqs are in uppercases 
    try:
        protein = Seq(cleaned).translate(to_stop=True)#in protein veriable, translate cleaned aligned seqs into amino acid until you reach the first stop codon
        return str(protein)#print protein in string formate
    except Exception as e:#else there is an translation error if invalid DNA is passed
        print(f"Translation error: {e}")#print the translation error
        return ""#return empty string if error occurs 

#Create another function that will compare amino acide sequence and identify functional mutations
def compare_proteins(ref_amino_acid, isolate_amino_acid, isolates_id):
    """
    compare two amino acids sequences and identify:
    1) Synonymous (Same amino acid and no change)
    2) Missense (One amino acid is substututed for another)
    2) Nonesense (A stop codon (*) appears in the isolate but not in reference 
    3) Readthrough (A stop codon is lost in the isolate and the reference has a stop codon and isolate has amino acid)

    Returns:
    List of mutation dictionaries and inside has 
    a) isolate ID
    b) Amino acid position 
    c) Refernce amino acid
    d) Isolate amino acid 
    e) Type of mutation
    """
    
    protein_change = []#create a list veriable called protein change which will store protein change in list formate
    
    for i, (r_amino_acid, q_amino_acid) in enumerate(zip(ref_amino_acid, isolate_amino_acid), start=1):#for i in refernce and isolate amino acid position start from 1 based codon index and go through each amino acid sequences and pair them position by position  
        if r_amino_acid == q_amino_acid:#if refernce amino acid = isolate amino acid,
            continue #no change no mutation 
        elif q_amino_acid == '*': #elif isolate amino acid = * a stop codon 
            mutation_type = "Nonsense (Stop Codon)"#mutation introduces a premature stop codon
        elif r_amino_acid == '*':#if reference amino acid = * a stop codon
            mutation_type = "Readthrough"#mutation has lost it stop loss
        else:
            mutation_type = "Missense" #one amino acid replced by another

        #We store all mutation information in a dictionary and add it to a list
        protein_change.append({
            "Isolate": isolates_id,
            "Amino_Acid_Position": i,
            "Ref_amino_acid_base": r_amino_acid,
            "Iso_amino_acid": q_amino_acid,
            "Type": mutation_type
        })

    return protein_change

Furthermore, once cleaned_translation and compare_proteins function are made, using Biopython items and pairwise aligner, the nuleotide sequence is looped over and aligned using Smith-Waterman local alignment. Then the aligned sequences in turned into string and cleaned up by removing gaps in the alignment. Then the aligned string sequences is then translated using the cleaned_translation function into amino acid sequences. then amino acid sequences will then be compared and stored in a list variable using the compare_proteins function to detect mutations. This will then show the referneced protein compared to isolate protein and the number of changes in that protein alignment. 

In [29]:
#Step 8 - We will run translation and protein level comparison

protein_mutations = []#created a empty list variable to store protein mutation

for isolates_id, iso_seq in all_isolates.items():#We are looping over each isolate gene and aligning them to the reference sequence 
    alignment = aligner.align(ref_seq, iso_seq)[0]#This is performing Smith-Waterman local alignment for best match and alignment result

    #Extract reference and isolate aligned sequences from the alignment
    aligned_reference_str = alignment.target#aligned reference sequence
    aligned_isolate_str = alignment.query#aligned isolate sequence#

    #Clean sequences: remove gaps and newlines
    cleaned_amino_acid_reference = aligned_reference_str.replace("-", "").replace("\n", "").replace(" ", "")
    cleaned_amino_acid_isolate = aligned_isolate_str.replace("-", "").replace("\n", "").replace(" ", "")
    
    #we then translate both aligned ref and iso sequences to amino acid 
    reference_protein = cleaned_translation(cleaned_amino_acid_reference)
    isolate_protein = cleaned_translation(cleaned_amino_acid_isolate)

    #print(reference_protein)
    #print(isolate_protein)

    #compare protein change in reference_protein and isolate_protein e.g. Missense, nonesense and readthrough and print out the ID
    protein_differences = compare_proteins(reference_protein, isolate_protein, isolates_id)

    #Then add the compared protein changes into the empty protein mutation list veriable to show all mutation record 
    protein_mutations.extend(protein_differences)

    print(f"Compared Protein for {isolates_id} - {len(protein_differences)} amino acid changes.") #print a comparison table that shows both amino acid compared and the amino acid change


Compared Protein for lcl|MF590182.1_cds_AWA45334.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF590181.1_cds_AWA45333.1_1 - 1 amino acid changes.
Compared Protein for lcl|MF590180.1_cds_AWA45332.1_1 - 73 amino acid changes.
Compared Protein for lcl|MF590179.1_cds_AWA45331.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF590178.1_cds_AWA45330.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF590177.1_cds_AWA45329.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF590176.1_cds_AWA45328.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF590175.1_cds_AWA45327.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF590174.1_cds_AWA45326.1_1 - 73 amino acid changes.
Compared Protein for lcl|MF590173.1_cds_AWA45325.1_1 - 0 amino acid changes.
Compared Protein for lcl|MF945608.1_cds_AXG51130.1_1 - 75 amino acid changes.
Compared Protein for lcl|MF945607.1_cds_AXG51129.1_1 - 75 amino acid changes.
Compared Protein for lcl|MF945606.1_cds_AXG51128.1_1 - 74 amino acid cha

**Final Output of Results**

After aligning the nucleotide using local alignment, identifying nucleotide level varients and then protein level mutations, the result shows that all alinged protein sequences have a mutation which is Missense. This suggests that there is a DNA base change and therefore causing changes in amino acid base.

In [103]:
print(f"\nSummary of All Protein-Level Mutations")
for mutation in protein_mutations:
    print(mutation)



=== Summary of All Protein-Level Mutations ===
{'Isolate': 'lcl|MF590181.1_cds_AWA45333.1_1', 'Amino_Acid_Position': 53, 'Ref_amino_acid': 'S', 'Iso_amino_acid': 'T', 'Type': 'Missense'}
{'Isolate': 'lcl|MF590180.1_cds_AWA45332.1_1', 'Amino_Acid_Position': 1, 'Ref_amino_acid': 'R', 'Iso_amino_acid': 'E', 'Type': 'Missense'}
{'Isolate': 'lcl|MF590180.1_cds_AWA45332.1_1', 'Amino_Acid_Position': 2, 'Ref_amino_acid': 'H', 'Iso_amino_acid': 'R', 'Type': 'Missense'}
{'Isolate': 'lcl|MF590180.1_cds_AWA45332.1_1', 'Amino_Acid_Position': 3, 'Ref_amino_acid': 'P', 'Iso_amino_acid': 'H', 'Type': 'Missense'}
{'Isolate': 'lcl|MF590180.1_cds_AWA45332.1_1', 'Amino_Acid_Position': 4, 'Ref_amino_acid': 'E', 'Iso_amino_acid': 'P', 'Type': 'Missense'}
{'Isolate': 'lcl|MF590180.1_cds_AWA45332.1_1', 'Amino_Acid_Position': 5, 'Ref_amino_acid': 'K', 'Iso_amino_acid': 'E', 'Type': 'Missense'}
{'Isolate': 'lcl|MF590180.1_cds_AWA45332.1_1', 'Amino_Acid_Position': 6, 'Ref_amino_acid': 'Y', 'Iso_amino_acid': 'K'

**Discussion**

To tackle these biological questions, we used an mRNA sequence to reflect the expressed gene which will show coding region mutations, then translate the aligned sequences to protein sequences to show if the nucleotide level changes has lead to Synonymouse mutations where no amino acid had gone through change, Missense mutations where one amino acid has substituted for another amino acid base, Nonsense mutations where a stop codon has appeared in the isolate but not in reference and lastly, readthrough mutations where there is a loss of stop codons in the isolate and the reference has a stop codon and isolate has amino acid. By identifying those changes, we are able to assess functional impact on the BRCA1 protein to see if it cancerous or not.

**Conclusion**

**References**

*
Kumar, P.S., Srikanth, L., Reddy, K.S. and Sarma, P.V.G.K. (2020) Novel mutations in the RING-finger domain of BRCA1 gene in clinically diagnosed breast cancer patients. 3 Biotech 10 (2), 47.
*
Oliveira, F.F. de, Dias, L.A. and Fernandes, M.A.C. (2022) Proposal of Smith-Waterman algorithm on FPGA to accelerate the forward and backtracking steps. PloS one 17 (6), e0254736.
*
Zaru, R., Orchard, S., and UniProt Consortium (2023) UniProt Tools: BLAST, Align, Peptide Search, and ID Mapping. Current protocols 3 (3), e697.
*