Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ifiddes committed Sep 5, 2018
1 parent e2126d8 commit 41e0e7a
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,9 @@ If you are using IsoSeq data, it is recommended that you doing your mapping with

CAT relies on a proper GFF3 file from the reference. One very important part of this GFF3 file is the `biotype` tag, which follows the GENCODE/Ensembl convention. The concept of a `protein_coding` biotype is hard baked into the pipeline. Proper division of biotypes is very important for transMap filtering and consensus finding to work properly.

If your GFF3 has duplicate transcript names, the pipeline will complain. One common cause of this is PAR locus genes. You will want to remove PAR genes -- If your GFF3 came from GENCODE, you should be able to do this: `grep -v PAR_Y $gff > $gff.fixed`
If your GFF3 has duplicate transcript names, the pipeline will complain. One common cause of this is PAR locus genes. You will want to remove PAR genes -- If your GFF3 came from GENCODE, you should be able to do this: `grep -v PAR_Y $gff > $gff.fixed`.

To ensure that your GFF3 is valid and won't cause any problems, there is a script in the `programs` folder that will parse and validate your GFF3. Please run this script before running the pipeline!

# Execution modes

Expand Down

0 comments on commit 41e0e7a

Please sign in to comment.