Skip to content
Research at DIMACS summer REU 2015 with Dr. Kevin Chen
Python R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Repository Description

This repository contains the work I did for the DIMACS REU. See my webpage here for more information on the project.

Files Description

Data Integration

For Spectacle Input (Spectacle code can be found here)

  • - given multiple files (one per chromosome), adds additional feature (eRNA) to each file
  • - same as but adds additional feature of p300 overlap

For SVM/Logit Input

  • integrate_all.ipynb - integrates chromatin marks, eRNA, p300, and tfbs into one file (extension of add_feature_v1); to add more features, use add_feature_v2
  • add_feature_v1.ipynb - given multiple files (one per chromosome), adds additional feature (p300) with output in one file for all chromosomes
  • add_feature_v2.ipynb - given one file (containing all chromosomes), adds additional feature (tfbs) to the file

Data Analysis

  • exp.R - conducts exploratory analysis of eRNA data with enhancer predictions and outputs eRNA data in condensed form
  • tss.ipynb - finds distances to nearest transcription start site for each state (labeled in Spectacle)
  • tss.R - plots output of tss.ipynb
  • p300.ipynb - finds overlap with p300 for each state (labeled in Spectacle)
  • p300.R - plots output of p300.ipynb

Machine Learning Models

  • svm.ipynb - runs support vector machine
  • logit.ipynb - runs logistic regression


  • - interval search with tree (imported from here)
You can’t perform that action at this time.