# PROJECT 4: TRANSLATION OF DNA TO PROTEIN

Let's work on the next project, which involves translating DNA sequences to their corresponding protein sequences. This is an essential task in bioinformatics because proteins are the functional molecules in cells, and understanding their sequences can give insights into their functions.

**Steps to Perform DNA to Protein Translation**
1. Define a Function to Translate DNA to Protein
2. Parse the FASTA File and Use the Function
3. Print the Results

In [1]:
from Bio.Seq import Seq

In [2]:
from Bio import SeqIO

### Function to Translate DNA to Protein

**Seq(dna_sequence):** Converts the string of DNA sequence into a Biopython Seq object

**dna_seq.translate():** Translates the DNA sequence into a protein sequence

In [3]:
# Function to translate DNA sequence to protein
def translate_dna_to_protein(dna_sequence):
    dna_seq = Seq(dna_sequence)
    return dna_seq.translate()

### Parse the FASTA File and Use the Function

In [4]:
# Function to parse a FASTA file and translate DNA sequences to protein
def parse_fasta_translate(file_path):
    for record in SeqIO.parse(file_path, "fasta"):
        protein_seq = translate_dna_to_protein(record.seq)
        print(f"ID: {record.id}")
        print(f"Protein Sequence: {protein_seq}\n")

In [5]:
fasta_file = "Example1.fasta"

In [7]:
if fasta_file:
    parse_fasta_translate(fasta_file)
else: 
    print("No data fetched")

ID: sequence1
Protein Sequence: GPPNKIRRS

ID: sequence2
Protein Sequence: HLG*LLLQHF

ID: sequence3
Protein Sequence: VQVKALQLWVQVKSA*



**NOTE**: The asterisk * in the translated protein sequence represents a stop codon. 

### FETCH THE FASTA SEQUENCE FROM THE DATABASE AND THEN TRANSLATE TO PROTEIN SEQEUNCE

In [9]:
from Bio import Entrez

In [19]:
from io import StringIO

In [20]:
# Set email for NCBI Entrez
Entrez.email = "k26sangeetha@gmail.com"  # Replace with your email

In [21]:
# Function to fetch FASTA data from NCBI
def fetch_fasta_from_ncbi(query, database="nucleotide"):
    handle = Entrez.esearch(db=database, term=query, retmax=1)
    record = Entrez.read(handle)
    handle.close()
    if record["IdList"]:
        seq_id = record["IdList"][0]
        handle = Entrez.efetch(db=database, id=seq_id, rettype="fasta", retmode="text")
        fasta_data = handle.read()
        handle.close()
        return fasta_data
    else:
        return None

In [22]:
query = "Homo sapiens COX1"
fasta_data = fetch_fasta_from_ncbi(query)
print(fasta_data)

>PP914118.1 Taenia solium isolate B cytochrome c oxidase subunit I (COX1) gene, partial cds; mitochondrial
TAGATTTTTTAATGTTTTCTTTACATTTAGCTGGTGTATCAAGTATTTTTAGTTCTATTAATTTTATATG
TACATTATATAGAGTTTTTATGACTAATATATTTTCTCGTACATCTATAGTGTTATGATCTTATTTATTT
ACATCTATCTTGTTATTGGTTACTTTACCTGTTTTGGCAGCCGCTGTTACTATGCTTCTATTTGATCGTA
AATTTAGTTCTGCGTTTTTTGATCCGTTAGGAGGTGGTGATCCTGTTTTATTTCAACATATGTTTTGATT
TTTTGGTCATCCTGAGGTTTATGTGTTAATTCTTCCGGGGTTTGGTATAATTAGTCATATATGTTTGAGT
ATAAGTATGTGTTCTGATGCTTTTGGCTTTTATGGGTTATTGTTTGCTATGTTTTCAATAGTATGTTTAG
GAAGAAGTGTATGAGGGCATCATATGTTTACGGTTGGGTTAGATGTTAAGACGGCTGTATTTTTTAGTTC
TGTTACTATGATAATTGGAGTGCCTACGGGGATTAAGGTTTTTACTTGGCTTTATATGCTTTTAAAATCT
CGTGTTAATAAGAGTGATCCGGTTTTATGATGAATAATTTCGTTTATAGTATTGTTTACATTTGGTGGTG
TAACTGGTATTATTCTATCTGCTTGTGTATTAGATAAAGTTCTTCATGATACTTGGTTTGTTGTTGCTCA
TTTTCATT




In [23]:
#Function to parse the fasta data and translate into into protein by passing translate_dna_to_protein function
def parse_fasta_fromdb_translate(fasta_string):
    fasta_io = StringIO(fasta_string)
    for record in SeqIO.parse(fasta_io, "fasta"):
        protein_seq = translate_dna_to_protein(record.seq)
        print(f"ID: {record.id}")
        print(f"Protein Sequence: {protein_seq}\n")

In [24]:
# Example usage
if fasta_data:
    parse_fasta_fromdb_translate(fasta_data)
else:
    print("No data fetched.")

ID: PP914118.1
Protein Sequence: *IF*CFLYI*LVYQVFLVLLILYVHYIEFL*LIYFLVHL*CYDLIYLHLSCYWLLYLFWQPLLLCFYLIVNLVLRFLIR*EVVILFYFNICFDFLVILRFMC*FFRGLV*LVIYV*V*VCVLMLLAFMGYCLLCFQ*YV*EEVYEGIICLRLG*MLRRLYFLVLLL**LECLRGLRFLLGFICF*NLVLIRVIRFYDE*FRL*YCLHLVV*LVLFYLLVY*IKFFMILGLLLLIFI

