- Fixed a bug that caused failures in "absolute" reproducibility. Prior
to this release use of the "-srand" would not gaurantee that the ouputs
consensi.fa.classified and families-classified.stk were exactly the same
in sequence and sequence order. It did gaurantee that the same samples
were drawn from the genome, and that equivalent scoring families were
derived at at each step. In this release secondary sorts were added
to gaurantee a fixed sort order among equally scoring results, generating
exactly the same output files each time the random number generator seed
is used. NOTE: This change only applies to results generated with
this version and future releases.
- Added "-long" option to faToTwoBit to support larger genomes.
- Improved the gathering of RepeatScout examplars for building
- Parallelized and improved the masking between rounds for faster
runs and fewer redicoveries.
- The 'pa' (parallel batches) option has been replaced with a new
'threads' option which maps directly to the maximum number of
threads the program will attempt to use.
- Takes advantage of RepeatMasker 4.1.4 and RMBlast 2.13.0 parallel
- Larger default sample sizes are now possible with speed improvements.
The original sampling strategy can be selected with the new 'quick'
- ABBlast is no longer supported.
- Fixed a few visual artifacts in the viewMSA.pl html
- The program now generates a logfile in the working directory named
-rmod.log. This file contains the random seed number used
and some high level stats on the run for use with reporting problems
with the program.
- Fixed a problem with the orientation/coordinates provided in the
Stockholm output format that affected a subset of the sequences.
- Fixed a bug affecting the trim functions of the Linup tool.
- First release of a set of manual curation tools for use with de-novo generated TE libraries.
- Added generateSeedAlignments.pl to generate Dfam compatible seed alignments given a consensus based TE library and RepeatMasker output.
- Fixed bug in N50 calculation.
- Fixed Ruzzo-Tompa maximal scoring subsequences implementation.
- Several minor bugfixes.
Work around a bug introduced in the new NCBI Blast 2.10.0 with version 5 databases.
RepeatScout in some rare cases will generate models for very long-period satellites. This can cause Refiner to go crazy creating tons of off-diagonal alignments. This version filters out these rare cases.
This version now prints out the version of each dependency in the log.
RepeatModeler employs a genome sampling approach that is based
on a random number generator. In this release of RepeatModeler
we print out the random number generator seed at the start of
a run. This number can be used with the "-srand ####" flag in future
runs to exactly reproduce the samples taken from a given database.
The final output files are now placed in the same directory as
the input database.
An additional output file is now generated containing the seed
alignment for each discovered family. This alignment is the source
of the final consnesus and is stored in a Dfam compatible Stockholm
file. The new output files are named <database_name>-families.fa and
Support for Dfam_consensus has been built into this release. Two
utilities dfamConsensusTool.pl and renameIds.pl can be found in the
RepeatModeler util/ directory. The dfamConsensusTool script enables
one to upload curated seed alignments to the open Dfam_consensus
database from the command line. The renameIds script simplifies
the process of coming up with unique identifiers for a set of
RepeatModeler generated families given a naming template.