Skip to content

TanushGoel/Genome540-541

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Overview

This repository documents my freshman experience in two graduate-level courses within a field of which I have limited familiarity but lots of curiosity. I would love to showcase my work and progress through this repository, and I hope to contribute further to the field of computational biology in the future.

Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis

Genome 540 is the first quarter of a two-quarter introduction to protein and DNA sequence analysis and molecular evolution, including probabilistic models of sequences and of sequence evolution, computational gene identification, pairwise sequence comparison and alignment (algorithms and statistical issues), multiple sequence alignment and evolutionary tree construction, comparative genomics, and protein sequence/structure relationships. These represent the central computational approaches for interpreting genomes. The statistical and algorithmic ideas discussed (which include maximum likelihood estimation, hidden Markov models, dynamic programming) have wide applicability in other areas of computational & mathematical biology as well.

Homework 1: Suffix Arrays, UltraConserved Elements (UCE)

Homework 2: Order-0/Order-1 Markov Models

Homework 3: Log-Likelihood Ratios (LLR), CoDing Sequences (CDS)

Homework 4: Dynamic Programming, Weighted Directed Acyclic Graphs (WDAG)

Homework 5: Topological Sort, Network Optimization, Local Sequence Alignment, BLOSUM62

Homework 6: D-segment Algorithm, Poisson Distributions

Homework 7: Next-Generation Sequencing (NGS), Karlin-Altschul Theory

Homework 8: Hidden Markov Models (HMM), Baum-Welch Algorithm/Expectation Maximization (EM)

Homework 9: Viterbi/Max Sum Algorithm, Evolutionarily Conserved Segments (ECS), Putative Neutral/Functional Sequences

Genome541

Introduction to Computational Molecular Biology: Molecular Evolution

Genome 541 provides a survey of topics within the field of computational molecular biology. The course is divided into five two-week blocks, each devoted to a single topic and taught by a different instructor. This year, the topics include protein structure, single-cell analysis, epigenomics, cancer genomics, and phylogenetics.

Protein Structure

Homework 1: Protein Energetics/Modeling, Ramachandran Space (phi-psi torsion angles)

Homework 2: Simulated Annealing Monte Carlo, Verlet Algorithm, Mfold Algorithm, Fragment Folding

Single-Cell Analysis

Homework 3: Manifold Learning, (Absorbing) Markov Chains, Diffusion Maps, Trajectory Inference

Homework 4: Variational Autoencoders (VAE), Gene Regulatory Networks, Cross-Modality Prediction

Epigenomics

Homework 5: Gibbs Sampling, MEME, Benjamini-Hochberg Test, Bonferroni Correction, ML For TF Binding

Homework 6: Hi-C, Multidimensional Scaling (MDS), Gradient Descent, 3D Structure Inference

Cancer Genomics

Homework 7: Single-Nucleotide Variant (SNV) Caller, Binomial Mixture Model (BMM), Maximum Likelihood Estimation, SNV genotyping

Homework 8: Copy Number Alteration (CNA) Caller, Hidden Markov Model (HMM)

Phylogenetics

Homework 9: Likelihood-Based Phylogenetics, Substitution Models

Homework 10: Phylogenetic Trees, Bootstrapping, Bayesian Statistics