**Install Biopython**

**Biopython is an essential Python library for bioinformatics. It provides tools to parse common biological file formats, manipulate sequences, and interact with online databases like NCBI. Its goal is to simplify complex bioinformatics tasks and help you write clean, repeatable analysis scripts.**

In [None]:
!pip install Biopython

Collecting Biopython
  Downloading biopython-1.85-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading biopython-1.85-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Biopython
Successfully installed Biopython-1.85


A simple DNA_Sequence_Analyzer that fetches the accession id from NCBI database to analyze the sequence.

In [None]:
from Bio.Seq import Seq
from Bio.SeqUtils import GC123
from Bio import Entrez
from Bio import SeqIO # Import SeqIO for FASTA parsing

def analyze_sequence_from_entrez(entrez_id):
    Entrez.email = "azizisra2001@gmail.com"
    try:
        handle = Entrez.efetch(db="nucleotide", id=entrez_id, rettype="fasta", retmode="text")

        #SeqIO is used to parse the FASTA format directly
        record = SeqIO.read(handle, "fasta")
        handle.close()

        if not record.seq:
            print(f"Error: Could not retrieve sequence for Entrez ID {entrez_id}")
            return

        dna_seq = record.seq.upper()

        length = len(dna_seq)
        gc_content = GC123(dna_seq)[0] # Accesses the first element of the tuple
        at_content = 100 - gc_content
        counts = {b: dna_seq.count(b) for b in 'ATGC'}
        rev_comp = dna_seq.reverse_complement()
        rna_seq = dna_seq.transcribe()
        protein_seq = rna_seq.translate(to_stop=True)


        print(f"Sequence details for Entrez ID: {entrez_id}")
        print("Length:", length)
        print(f"GC content: {gc_content:.2f}%")
        print(f"AT content: {at_content:.2f}%")
        print("Nucleotide counts:", counts)
        print("Reverse complement:", rev_comp)
        print("RNA sequence:", rna_seq)
        print("Protein sequence:", protein_seq)


    except Exception as e:
        print(f"An error occurred: {e}")


if __name__ == "__main__":
    accession_id = 'NM_001204686.1' # Entrez ID or accession ID

    analyze_sequence_from_entrez(accession_id)

Sequence details for Entrez ID: NM_001204686.1
Length: 968
GC content: 48.45%
AT content: 51.55%
Nucleotide counts: {'A': 255, 'T': 244, 'G': 224, 'C': 245}
Reverse complement: ATCAATAATGCCTTCAGCTTCTTAGGGGACCTTCTATTCCTGCGGGTATCGCTGGTATGAAATAAATCTGGAAACTTTAGTCAAACACGTGGAAAAAACCCGCTAAATCTGACACCACGTGGACTCCTCCTTTATAATAATAATCACCAAATTAAAAATTTTAAAAGCATTTACGCCCTCAACATGTCTAACTAAAGTTGTCCTCCAACTGCGCATGTCCACTGTTGCTCCTTCCGGTTCTGGATATTCTGGAGAAGAAATGGTCTGGCAGACGGCAGTACTGAGCCAGCTCAAATATCCGACACTGGTTGAAGCAACATTCGCATGTGATGGAGCCTGAGGCCTCTCTCTTGGTCAAGTAGGAGAAAGCTTCTTTCTTATTGAGCAGGATCCCTCGTAACTTGTCGTTCACATTTTCCGTGTCTCTTTTGACCATGTACTTGGCCAGGAACCTCCTGTTGCCCCCCAGAGAGCTGCACAGGTTGGAAATGATGACGTGCAGGTCTTCGCCGCACAGACCCCGCGGGTGGGGCCGCATGTAGCCGTTGCACGAGTGCTCGAAGTTGGCCAGGGATATGTCGAGGTTGGAGGCCAGCGTGAGCAGAAGGGTGAGCAGGCAGGCGTTGGCGGAGTGGCTCTGGAGGAGGAACTTGCTCATTCTGGCTACGAGAAGGCACCGTTGGAACAGTCAGGAGAGGCTGGTAGCACCAATGTTGTTTCACCCGAGTTAGTTTTTCACTGGCCAGCAAAATGGTCCTGTTAAGACCAGAAAGGCAGCTAGAACAACAGTTTCTGCTCTCGCCGGCTGGGAATAAAAAACCTT