This repository documents my freshman experience in two graduate-level courses within a field of which I have limited familiarity but lots of curiosity. I would love to showcase my work and progress through this repository, and I hope to contribute further to the field of computational biology in the future.
Genome 540 is the first quarter of a two-quarter introduction to protein and DNA sequence analysis and molecular evolution, including probabilistic models of sequences and of sequence evolution, computational gene identification, pairwise sequence comparison and alignment (algorithms and statistical issues), multiple sequence alignment and evolutionary tree construction, comparative genomics, and protein sequence/structure relationships. These represent the central computational approaches for interpreting genomes. The statistical and algorithmic ideas discussed (which include maximum likelihood estimation, hidden Markov models, dynamic programming) have wide applicability in other areas of computational & mathematical biology as well.
Homework 1: Suffix Arrays, UltraConserved Elements (UCE)
Homework 2: Order-0/Order-1 Markov Models
Homework 3: Log-Likelihood Ratios (LLR), CoDing Sequences (CDS)
Homework 4: Dynamic Programming, Weighted Directed Acyclic Graphs (WDAG)
Homework 5: Topological Sort, Network Optimization, Local Sequence Alignment, BLOSUM62
Homework 6: D-segment Algorithm, Poisson Distributions
Homework 7: Next-Generation Sequencing (NGS), Karlin-Altschul Theory
Homework 8: Hidden Markov Models (HMM), Baum-Welch Algorithm/Expectation Maximization (EM)
Homework 9: Viterbi/Max Sum Algorithm, Evolutionarily Conserved Segments (ECS), Putative Neutral/Functional Sequences
Genome 541 provides a survey of topics within the field of computational molecular biology. The course is divided into five two-week blocks, each devoted to a single topic and taught by a different instructor. This year, the topics include protein structure, single-cell analysis, epigenomics, cancer genomics, and phylogenetics.
Protein Structure
Homework 1: Protein Energetics/Modeling, Ramachandran Space (phi-psi torsion angles)
Homework 2: Simulated Annealing Monte Carlo, Verlet Algorithm, Mfold Algorithm, Fragment Folding
Single-Cell Analysis
Homework 3: Manifold Learning, (Absorbing) Markov Chains, Diffusion Maps, Trajectory Inference
Homework 4: Variational Autoencoders (VAE), Gene Regulatory Networks, Cross-Modality Prediction
Epigenomics
Homework 5: Gibbs Sampling, MEME, Benjamini-Hochberg Test, Bonferroni Correction, ML For TF Binding
Homework 6: Hi-C, Multidimensional Scaling (MDS), Gradient Descent, 3D Structure Inference
Cancer Genomics
Homework 7: Single-Nucleotide Variant (SNV) Caller, Binomial Mixture Model (BMM), Maximum Likelihood Estimation, SNV genotyping
Homework 8: Copy Number Alteration (CNA) Caller, Hidden Markov Model (HMM)
Phylogenetics
Homework 9: Likelihood-Based Phylogenetics, Substitution Models
Homework 10: Phylogenetic Trees, Bootstrapping, Bayesian Statistics