addressing Stefan's comments

biocore · Aug 14, 2017 · 9ed5653 · 9ed5653
1 parent 9155619
commit 9ed5653
Showing 1 changed file with 14 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -2,10 +2,10 @@
 [![Build Status](https://travis-ci.org/biocore/microprot.svg?branch=master)](https://travis-ci.org/biocore/microprot)
 
 # microprot
-microProt is coded in Python 3.x
+microprot is coded in Python 3.x
 
 ## Introduction
-microProt clusters and annotates microbial metagenome sequences for the ultimate goal of predicting the 3-dimensional structure and function of these proteins.
+microprot clusters and annotates microbial metagenome sequences for the ultimate goal of predicting the 3-dimensional structure and function of these proteins.
 
 ## Install
 
@@ -29,27 +29,27 @@ Tools requiring manual installation are listed and linked below:
 All filenames are in the form: `GenomeID`\_`GeneID`\_`ResiduesFrom`-`ResiduesTo`.  
 For example, `CP003179.1_3319` means gene `3319` from genome `CP003179.1` (Sulfobacillus acidophilus DSM 10332), or `CP003179.1_3319_1-60` means amino acids 1 to 60 from that gene.
 
-### Extensions
+### File extensions
 
 * a3m  
-    An alignment format produced by HH-suite programs. It's a format similar to FASTA, but in sequence rows it contains additional information useful for the construction of HMMs (represented by [a-z]). A detailed description can be found in [HH-suite user guide](https://github.com/soedinglab/hh-suite/blob/master/hhsuite-userguide.pdf) (section 6.1).
+    An alignment file produced by HH-suite programs. It's a format similar to FASTA, but in sequence rows it contains additional information useful for the construction of HMMs (represented by [a-z]). A detailed description can be found in [HH-suite user guide](https://github.com/soedinglab/hh-suite/blob/master/hhsuite-userguide.pdf) (section 6.1).
 
 * out  
-    HH-suite output files reporting a list of hits for an input sequence, along with Probability, P-value, E-value and other parameters, as well as a set of pair-wise sequence alignments.
+    HH-suite output files reporting a list of hits for an input sequence, along with Probability, P-value, E-value and other parameters (hit list); as well as a set of pair-wise sequence alignments. A detailed description can be found in [HH-suite user guide](https://github.com/soedinglab/hh-suite/blob/master/hhsuite-userguide.pdf) (section 5).
 
 * match  
     Internal `microprot` files showing which sub-sequence of the input sequence matched defined by `config.yml` criteria for any of `E-value`, `P-value`, `Prob` or `minimum sequence length` in the `.out` file. Multiple hits are possible. The file is reported in a FASTA format.
 
 * non_match  
-    All sub-sequences longer than the `minimum sequence length` that do not meet the criteria for `match`.
+    All sub-sequences longer than the `minimum sequence length` that do not meet the criteria for `.match`. Internal `microprot` file.
 
 ### Example
-Gene `example_1` (`example_1.fasta`) with 100 residues is run against HHsearch and it returns 2 outputs: `example_1.out` and `example_1.a3m`. Sequence split parameters are:
+Gene `CP00000.0_1` (`CP00000.0_1.fasta`) with 100 residues is run against HHsearch and it returns 2 outputs: `CP00000.0_1.out` and `CP00000.0_1.a3m`. Sequence split parameters are:
 ```
 min_prob: 90.0
 min_fragment_length: 10
 ```
-and the hit list from `example_1.out` is:
+and the hit list portion of `CP00000.0_1.out` is:
 ```
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 1ABC_A Uncharacterized protein  91.5   0.001   0.001   24.3   0.0   20   10-30    211-231 (260)
@@ -58,19 +58,19 @@ No Hit                             Prob E-value P-value  Score    SS Cols Query
 ```
 
 According to our criteria, hits 1 and 2 are matches (probability >= 90.0 and fragment length (from `Query_HMM`) >= 10).  
-So `example_1.match` file will contain sequences:
+So `CP00000.0_1.match` file will contain sequences:
 ```
->example_1_10-30
+>CP00000.0_1_10-30
 ---------EXAMPLEEXAMPLEEXAMPL-----------------------------------------
 ------------------------------
->example_1_33-88
+>CP00000.0_1_33-88
 ---------------------------------EXAMPLEEXAMPLEEXAMPLEEXAMPLEEXAMPLEEX
 AMPLEEXAMPLEEXAMPL------------
 ```
-and `example_1.non_match` will contain sequence:
+and `CP00000.0_1.non_match` will contain sequence:
 ```
->example_1_89-100
+>CP00000.0_1_89-100
 ----------------------------------------------------------------------
 ------------------EXAMPLEEXAMP
 ```
-Sub-sequences `example_1_1-9` and `example_1_31-33` will be dropped from subsequent analyses, as they did not match `minimum fragment length` criteria.
+Sub-sequences `CP00000.0_1_1-9` and `CP00000.0_1_31-33` will be dropped from subsequent analyses, as they did not match `minimum fragment length` criteria.