Permalink
Browse files

Documented phyta-extract

  • Loading branch information...
1 parent 8be0592 commit 559ca5a5893c57e516776a96227dcf9808da72c2 Philipp Comans committed Nov 25, 2011
Showing with 26 additions and 2 deletions.
  1. +26 −2 README.rdoc
View
@@ -77,7 +77,19 @@ Phyta-assign takes the following command line arguments:
Phyta-split is designed to sort out bacterial, archaeal and viral contaminations from eukaryotic Expressed Sequence Tag (EST) and genomic data.
-The tool uses CSV files generated by phyta-assign as input. This input file is split into two new CSV files. The first file contains all sequences that are deemed to belong to eukaryotic organisms according to the rules stated below. The second file contains all sequences that are deemed to be prokaryotic or viral contaminations.
+The tool uses CSV files generated by phyta-assign as input. This input file is split into two new CSV files. The first file contains all sequences that are deemed to belong to eukaryotic organisms according to a set of rules that can be customized by the user. The second file contains all sequences that are deemed to be prokaryotic or viral contaminations.
+
+=== Usage
+
+[\--input-file, -i ] The output of phyta-assign in CSV format
+
+[\--output-clean, -c ] The name of the clean output table in CSV format (default: [name_of_input_file]_clean.csv)
+
+[\--output-contaminated, -d ] The name of the contaminated output table in CSV format (default: [name_of_input_file]_contaminated.csv)
+
+[\--filter, -f ] A file in YAML format containing a list of taxa to be considered contaminants (default: Use builtin filter capturing Bacteria, Archaea, Viruses and NONE. To learn how to write your own filters, visit https://github.com/PalMuc/bio-phyta/wiki/Custom-filters)
+
+[\--help, -h] Show a help message
=== Rules
@@ -125,7 +137,19 @@ You can also download this filter directly from https://github.com/PalMuc/bio-ph
Constructs two sub-libraries: a “clean” sub-library that consists of annotated sequences from the target species and a “contaminant” one that includes putative contaminant sequences.
-#TODO more
+The output files will be written in FASTA format.
+
+=== Usage
+
+[\--fasta, -f] The file containing the sequences in FASTA format
+
+[\--input-clean, -c] The name of the clean sequence table in CSV format
+
+[\--input-contaminated, -d] The name of the contaminated sequence table in CSV format[\--output-clean, -o ] The name of the FASTA file where clean sequences will be written to
+
+[\--output-contaminated, -p] The name of the FASTA file where contaminated sequences will be written to
+
+[\--help, -h] Show a help message
== Installation

0 comments on commit 559ca5a

Please sign in to comment.