Skip to content

Python code written for EdX course CSE181 (Genomic Data Science): analyzing biological sequences, identifying motifs and consensus sequences, and applying statistical and random search methods to genomic data

0916kj/genomic_data_science

Repository files navigation

genomic_data_science

Python code written for EdX course CSE181 (Genomic Data Science): analyzing biological sequences, identifying motifs and consensus sequences, and applying statistical and random search methods to genomic data

  • week1_frequent_words: constructing frequency arrays and identifying frequent sub-sequences in genomes (analyze e_coli.txt and vibrio_cholerae_genome.txt)
  • week2_frequent_words_w_mismatches: identify lagging and leading strands by G-C content, identify mismatches and reverse complements of sub-sequences to more realistically determine most frequently repeated sub-sequences in genome in order to locate ori
  • week3_motif_matrices: construct numpy arrays of regulatory motifs, probability distributions of motif matrices, construct consensus motif, and identify possible regulatory motif sets from brute force and "greedy" searches
  • week4_random_motif_search: use more accurate random search with iterative and repeated searches to identify likely regulatory motifs

About

Python code written for EdX course CSE181 (Genomic Data Science): analyzing biological sequences, identifying motifs and consensus sequences, and applying statistical and random search methods to genomic data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages