<a href="https://colab.research.google.com/github/RobBurnap/Bioinformatics-MICR4203-MICR5203/blob/main/notebooks/L00_template_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


BIOINFO4/5203 — Colab Exercise Template

Use this template for every weekly exercise. It standardizes setup, data paths, and the final summary so grading in Canvas is quick.

Workflow

    Click the "Open in Colab" link in Canvas (points to this notebook in GitHub).
    Run Setup cells (installs and mounts Google Drive).
    Run the Exercise cells (edit as instructed for each lecture).
    Verify the Results Summary prints the values requested by Canvas.
    File → Print → Save as PDF and upload .ipynb + PDF to Canvas.

    Instructor note (delete in student copy if desired):

        Place datasets for this lecture at: Drive → BIOINFO4-5203-F25 → Data → Lxx_topic
        Update the constants in Config below: COURSE_DIR, LECTURE_CODE (e.g., L05), and TOPIC.
        For heavy jobs (trees, assemblies), provide the PETE output files in the same Data folder so students can analyze them here if the queue is busy.



**Auto‑setup + course folder (uses your Teaching path)**

In [26]:
# ===== 0) Install FIRST, then import =====
%pip install -q biopython

from google.colab import drive; drive.mount('/content/drive')

import os, pandas as pd
from Bio import SeqIO
import matplotlib.pyplot as plt

# ===== 1) Course folders (EDIT ONLY THESE TWO EACH WEEK) =====
COURSE_DIR   = "/content/drive/MyDrive/Teaching/BIOINFO4-5203-F25"  # << your Drive folder
LECTURE_CODE = "L00_template"   # e.g., L01, L02, ...
TOPIC        = "smoke_test2"     # e.g., foundations, msa, blast, ...

# Derived paths (do not change)
DATA_DIR   = f"{COURSE_DIR}/Data/{LECTURE_CODE}_{TOPIC}"
OUTPUT_DIR = f"{COURSE_DIR}/Outputs/{LECTURE_CODE}_{TOPIC}"
for p in [f"{COURSE_DIR}/Data", f"{COURSE_DIR}/Outputs", f"{COURSE_DIR}/Notebooks", DATA_DIR, OUTPUT_DIR]:
    os.makedirs(p, exist_ok=True)

print("COURSE_DIR :", COURSE_DIR)
print("DATA_DIR   :", DATA_DIR)
print("OUTPUT_DIR :", OUTPUT_DIR)

print("✅ Setup complete! Your Google Drive is mounted at /content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
COURSE_DIR : /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25
DATA_DIR   : /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Data/L00_template_smoke_test2
OUTPUT_DIR : /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Outputs/L00_template_smoke_test2
✅ Setup complete! Your Google Drive is mounted at /content/drive


**2) Make a tiny demo dataset (FASTA) in your Data/ folder**

In [27]:
# Create a tiny FASTA so the template is fully executable
fasta_text = """>seqA
MSTNPKPQRKTKRNTNRRPQDVKFPGGGNKK
>seqB
MSSSNTATAPKKKRKVGQAGGPPKKK
"""
with open(f"{DATA_DIR}/demo.fasta", "w") as f:
    f.write(fasta_text)
print("🧪 Wrote dataset:", f"{DATA_DIR}/demo-original.fasta")


# ===== 2) Make a tiny FASTA in Data/ =====
demo_fa = f"{DATA_DIR}/demo.fasta"
with open(demo_fa, "w") as f:
    f.write(">seqA\nMSTNPKPQRK\n>seqB\nMSSSNTATAP\n")
print("Wrote:", demo_fa)

🧪 Wrote dataset: /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Data/L00_template_smoke_test2/demo-original.fasta
Wrote: /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Data/L00_template_smoke_test2/demo.fasta


**3) Parse FASTA → CSV, then plot a quick figure → PNG in Outputs/**

In [28]:
# ===== 3) Parse FASTA -> CSV; Plot -> PNG in Outputs/ =====
records = list(SeqIO.parse(demo_fa, "fasta"))
df = pd.DataFrame({"id":[r.id for r in records],
                   "length":[len(r.seq) for r in records]})
display(df)

csv_path = f"{OUTPUT_DIR}/seq_lengths.csv"
png_path = f"{OUTPUT_DIR}/length_hist.png"

df.to_csv(csv_path, index=False)
plt.figure()
df["length"].plot(kind="bar")
plt.title("Sequence lengths")
plt.xlabel("ID"); plt.ylabel("Length (nt)")
plt.savefig(png_path, bbox_inches="tight"); plt.close()

print("CSV exists? ", os.path.exists(csv_path), "->", csv_path)
print("PNG exists? ", os.path.exists(png_path), "->", png_path)
print("OUTPUT_DIR contents:", os.listdir(OUTPUT_DIR))

Unnamed: 0,id,length
0,seqA,10
1,seqB,10


CSV exists?  True -> /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Outputs/L00_template_smoke_test2/seq_lengths.csv
PNG exists?  True -> /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Outputs/L00_template_smoke_test2/length_hist.png
OUTPUT_DIR contents: ['seq_lengths.csv', 'length_hist.png']


**4) Write a simple results summary (Canvas‑friendly)**

In [29]:
summary_path = f"{OUTPUT_DIR}/summary.txt"
with open(summary_path, "w") as f:
    f.write(f"LECTURE={LECTURE_CODE}\n")
    f.write(f"TOPIC={TOPIC}\n")
    f.write(f"N_records={len(records)}\n")
print("📝 Saved summary:", summary_path)

📝 Saved summary: /content/drive/MyDrive/Teaching/BIOINFO4-5203-F25/Outputs/L00_template_smoke_test2/summary.txt
