### Advanced Biopython Learning Guide (Jupyter Notebook Format)

In [1]:
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqUtils import gc_fraction
from Bio.Blast import NCBIWWW, NCBIXML
from Bio.Align import MultipleSeqAlignment
from Bio.Align.Applications import ClustalwCommandline
import matplotlib.pyplot as plt


Due to the on going maintenance burden of keeping command line application
wrappers up to date, we have decided to deprecate and eventually remove these
modules.

We instead now recommend building your command line and invoking it directly
with the subprocess module.


## 1. Working with DNA Sequences

In [2]:
print("--- DNA Sequence Operations ---")
dna_seq = Seq("ATGCTAGCTAGCTAGCTAGCTAGC")
print("Original DNA:", dna_seq)
print("Complement:", dna_seq.complement())
print("Reverse Complement:", dna_seq.reverse_complement())
print("GC Content:", gc_fraction(dna_seq) * 100)

--- DNA Sequence Operations ---
Original DNA: ATGCTAGCTAGCTAGCTAGCTAGC
Complement: TACGATCGATCGATCGATCGATCG
Reverse Complement: GCTAGCTAGCTAGCTAGCTAGCAT
GC Content: 50.0


## 2. Reading and Writing FASTA Files

In [3]:
print("\n--- Reading FASTA File ---")
for record in SeqIO.parse("example.fasta", "fasta"):
    print("ID:", record.id)
    print("Sequence:", record.seq[:50], "...")  # Show first 50 bases
    print("Sequence Length:", len(record.seq))


--- Reading FASTA File ---
ID: seq1
Sequence: ATGCGTACGTAGCTAGCTAGCTAGCTAGCTAGC ...
Sequence Length: 33
ID: seq2
Sequence: GCTAGCTAGCTAGCATCGATCGATCGATCGATC ...
Sequence Length: 33
ID: seq3
Sequence: TCGATCGATCGATCGATCGATCGATCGATCGAT ...
Sequence Length: 33


## 3. Translating DNA to Protein

In [4]:
print("\n--- Translating DNA to Protein ---")
mrna_seq = dna_seq.transcribe()
protein_seq = dna_seq.translate()
print("mRNA Sequence:", mrna_seq)
print("Protein Sequence:", protein_seq)


--- Translating DNA to Protein ---
mRNA Sequence: AUGCUAGCUAGCUAGCUAGCUAGC
Protein Sequence: MLAS*LAS


## 4. Running a BLAST Search

In [5]:
print("\n--- Running BLAST Search ---")
result_handle = NCBIWWW.qblast("blastn", "nt", dna_seq)
blast_record = NCBIXML.read(result_handle)
for alignment in blast_record.alignments[:5]:  # Show top 5 hits
    print("Alignment:", alignment.title)
    print("Length:", alignment.length)


--- Running BLAST Search ---


## 5. Restriction Enzyme Analysis

In [6]:
from Bio.Restriction import RestrictionBatch

print("\n--- Restriction Enzyme Analysis ---")
enzymes = RestrictionBatch(["EcoRI", "BamHI", "HindIII"])
sites = enzymes.search(dna_seq)
for enzyme, positions in sites.items():
    print(f"{enzyme} cuts at positions: {positions}")


--- Restriction Enzyme Analysis ---
EcoRI cuts at positions: []
HindIII cuts at positions: []
BamHI cuts at positions: []


## 6. Multiple Sequence Alignment

In [7]:
print("\n--- Multiple Sequence Alignment ---")
alignment = MultipleSeqAlignment([
    SeqIO.read("seq1.fasta", "fasta"),
    SeqIO.read("seq2.fasta", "fasta"),
    SeqIO.read("seq3.fasta", "fasta")
])
print(alignment)

# Running ClustalW (example, requires ClustalW installed)
# clustalw_cline = ClustalwCommandline("clustalw2", infile="example.fasta")
# stdout, stderr = clustalw_cline()


--- Multiple Sequence Alignment ---



Nowadays, the FASTA file format is usually understood not to have any such comments, and most software packages do not allow them. Therefore, the use of comments at the beginning of a FASTA file is now deprecated in Biopython.


(1) Modify your FASTA file to remove such comments at the beginning of the file.

(2) Use SeqIO.parse with the 'fasta-pearson' format instead of 'fasta'. This format is consistent with the FASTA format defined by William Pearson's FASTA aligner software. Thie format allows for comments before the first sequence; lines starting with the ';' character anywhere in the file are also regarded as comment lines and are ignored.

(3) Use the 'fasta-blast' format. This format regards any lines starting with '!', '#', or ';' as comment lines. The 'fasta-blast' format may be safer than the 'fasta-pearson' format, as it explicitly indicates which lines are comments. 


ValueError: No records found in handle

## 7. Genomic Data Visualization

In [None]:
print("\n--- Genomic Data Visualization ---")
genes = ['Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5']
expression_levels = [23, 45, 12, 30, 18]
plt.bar(genes, expression_levels, color=['blue', 'red', 'green', 'purple', 'orange'])
plt.xlabel('Genes')
plt.ylabel('Expression Level')
plt.title('Gene Expression Levels')
plt.show()