Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Latest commit 4df4e1a
Apr 24, 2014
pipeline script generates reference-sorted, indexed BAM from uniqued reads from radtag sequencing lanes. To generate uniqued reads, see preprocess_radtag_lane.py four accessory programs and three python libraries are used, listed below. for parallel execution, GNU parallel is also HIGHLY recommended. Experimental LSF support is also available. REQUIREMENTS: - PATH must contain: blat mcl mcxload muscle samtools [parallel] - PYTHONPATH must contain: numpy gdata editdist see (at the time of this writing, March 09 2011) blat http://hgdownload.cse.ucsc.edu/downloads.html mcl/mcxload http://www.micans.org/mcl/ muscle http://www.drive5.com/muscle/ samtools http://samtools.sourceforge.net/ GNU parallel http://savannah.gnu.org/projects/parallel/ numpy * http://sourceforge.net/projects/numpy/files/ gdata install gdata v2.0.10 included in this repository (recent versions are known to be incompatible with rtd code, but are available at: http://code.google.com/p/gdata-python-client/downloads/list) editdist http://www.mindrot.org/projects/py-editdist/ * N.B. numpy is also available as part of the excellent Enthought Python Distribution, available free for academic/non-profit use at http://www.enthought.com/products/epd.php NOTE ON GOOGLE DOCUMENTS SPREADSHEETS: It appears as of this writing (June 2012) the google spreadsheets API only correctly queries all fields of a user-edited spreadsheet if the first column is blank. column A is therefore left blank in the tables generated by initialize_sample_DB.py (I recommend hiding column A of all programmatically accessed GDoc spreadsheets)