reference-free ddRADseq analysis tools
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore updated .gitignore to include gdata build directory Jul 23, 2012
101013_lane7_sample_data.csv added test sample data csv Dec 12, 2012
DB_index_by_well.csv DB_index_by_well.csv added Jun 27, 2012
LICENSE lgpl3 added Aug 31, 2012
LSF.py prepro May 15, 2013
README README updated Jul 23, 2012
RE_site_dropout.py add iterative Jun 10, 2013
USAGE_NOTES Update USAGE_NOTES Dec 13, 2012
__init__.py 20120208 radtag_denovo code added Feb 8, 2012
bam2fastq_by_index.py 20120208 radtag_denovo code added Feb 8, 2012
calc_offby.py add iterative Jun 10, 2013
config.template.py config.template.py corrected May 19, 2012
convert_fq.py add iterative Jun 10, 2013
estimate_error_by_clustering.py tagged version May 30, 2012
evaluate_rtd_clustering.py added run_safe.py Sep 6, 2012
extract_perfect_RE_reads.py add iterative Jun 10, 2013
find_perfect_match_reads.py 20120208 radtag_denovo code added Feb 8, 2012
gdata-2.0.10.tar.gz added gdata 2.0.10 Jul 23, 2012
get_uniqued_lines_by_cluster.py multiple subject support added, bugfixes May 30, 2012
initialize_sample_DB.py 20120208 radtag_denovo code added Feb 8, 2012
iterative_rtd.py iterative_rtd updates Sep 24, 2013
mcl_id_triples_by_blat.py rtd fixes Jun 10, 2013
musclemap.py 20120208 radtag_denovo code added Feb 8, 2012
overlap_preprocess.py fixed sq lookup Nov 7, 2013
overlap_rtd.py add iterative Jun 10, 2013
plot_error.py add plot_error.py Mar 8, 2012
pool_lane_counts.py add pool Mar 8, 2012
preprocess_radtag_lane.py passthough for db records in legacy lookup Apr 8, 2014
preprocess_radtag_lane_vlbc.py refactored vcf_to_rqtl Sep 6, 2012
read_quality_statistics.py read_quality_statistics added Jul 27, 2012
rtd_run.py commit Sep 9, 2013
run_safe.py exception on 0 length return Apr 4, 2014
s_7_sequence-1M.txt.gz added sample sequence data Dec 12, 2012
sam_from_clust_uniqued.py DB_index_by_well.csv added Jun 27, 2012
simulate_loci.py simulation scripts updated to include efficiency predictions Jul 15, 2012
strip_rqtl_header_add_phenocols.py 20120208 radtag_denovo code added Feb 8, 2012
summarize_sequencing_stats.py switched .uniqued handling to compressed by default May 24, 2012
vcf_to_rqtl.py Merge branch 'master' of github.com:brantp/rtd May 15, 2013
vcf_to_rqtl_DB.py added htseq style vcf_to_rqtl (vcf_to_rqtl_DB.py) Apr 24, 2014
vcf_to_rqtl_from_template_map.py prepro May 15, 2013

README

pipeline script generates reference-sorted, indexed BAM from uniqued reads from radtag sequencing lanes.

To generate uniqued reads, see preprocess_radtag_lane.py

four accessory programs and three python libraries are used, listed below.
for parallel execution, GNU parallel is also HIGHLY recommended. 
Experimental LSF support is also available.

REQUIREMENTS:
-        PATH must contain: blat mcl mcxload muscle samtools [parallel]
-  PYTHONPATH must contain: numpy gdata editdist

see (at the time of this writing, March 09 2011)
  blat         http://hgdownload.cse.ucsc.edu/downloads.html
  mcl/mcxload  http://www.micans.org/mcl/
  muscle       http://www.drive5.com/muscle/
  samtools     http://samtools.sourceforge.net/
  GNU parallel http://savannah.gnu.org/projects/parallel/

  numpy *      http://sourceforge.net/projects/numpy/files/
  gdata        install gdata v2.0.10 included in this repository
			(recent versions are known to be incompatible
			with rtd code, but are available at:
			http://code.google.com/p/gdata-python-client/downloads/list)
  editdist     http://www.mindrot.org/projects/py-editdist/

* N.B. numpy is also available as part of the excellent Enthought Python Distribution,
available free for academic/non-profit use at http://www.enthought.com/products/epd.php

NOTE ON GOOGLE DOCUMENTS SPREADSHEETS:
It appears as of this writing (June 2012) the google spreadsheets API only correctly queries 
all fields of a user-edited spreadsheet if the first column is blank.
column A is therefore left blank in the tables generated by initialize_sample_DB.py
(I recommend hiding column A of all programmatically accessed GDoc spreadsheets)