Transcript headers follow different formats #46

schorlton · 2022-08-08T19:27:58Z

Please report

version of RNA-Bloom with java -jar RNA-Bloom.jar -version
version of java with java -version
exact command used to run RNA-Bloom

Trying to run RNA-Bloom indiscriminately on input files to see if they assemble. I don't check the files before as I want to leave it to RNA-Bloom to decide if it can assemble anything. Interestingly, RNA-Bloom produces different header formats in FASTA for different outputs.

Sometimes I get:
>3 l=228 c=1.1 s=8
other times I get:
>s1

Note that these are with different inputs. Is it possible to output the same header format each time? In the latter format, does coverage=1?

Thanks!!

RNA-Bloom v2.0.0

java --version
openjdk 17.0.3-internal 2022-04-19
OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)

Command:

rnabloom -outdir rnabloom_out -t 8 -long input.fastq -ntcard

Sample input read to reproduce single-element header:

@read1
AATTTGGGTGTTTAACCAGTCATCGCCTACCGTGACTTCGGATTCATCGTGTTTCGTTTTCGTGCGCCGCTTCAACATGGGGCTAATCATTGCTTTCGTGCGCCATTCAACATGGAATAATCATTGCTTTTTCGTGCGCCGCTTCAACATGGGGGGCCACGCGCGCGTCCCCCGAAGGCGCGTAACGCTGTGGCGGCCTGCTT
+
%*'('((,./;:3,''%%&#$%(*$$&(*-30441004/*.1110)*.06{?;?<)57??@76341{9334?C9B@:999JA?;88<@::7610/--+224.,,'&&''-612105'&&,127<<820.-:::34475{;545-?8454;==??8877...F{{{{<//101/.*,/12{{1.'&&$$$$%$'('''$%&&&'

The text was updated successfully, but these errors were encountered:

kmnip · 2022-08-08T20:34:34Z

Hi @schorlton,

Are you seeing different FASTA header formats in the final output (i.e. rnabloom.transcripts.fa) of different assemblies?
Or, you mean different output FASTA files from the same assembly have different FASTA header formats?

If it is the latter, then it is actually intentional.

Ka Ming

schorlton · 2022-08-09T01:21:34Z

Are you seeing different FASTA header formats in the final output (i.e. rnabloom.transcripts.fa) of different assemblies?

Yes this. Different reads used as input leads to differently formatted FASTA headers. Sorry that wasn't clear. I like the

 >3 l=228 c=1.1 s=8

header format as I use the coverage and length information. However, not all transcripts have this information in the header, eg. if you run RNA-Bloom on the example read above, you'll only get a FASTA header with a sequence identifier, no coverage or length information.

kmnip · 2022-08-09T23:30:12Z

Ah, ok. The reason why you see this header style in some but not others is because some assemblies may have ended at an earlier stage.

To resolve this issue, I will try to standardize the final output FASTA regardless of the assembly endpoint.

schorlton mentioned this issue Aug 8, 2022

Feature Request: More verbose logging? #47

Closed

3 tasks

kmnip self-assigned this Aug 8, 2022

kmnip added the question Further information is requested label Aug 8, 2022

kmnip added the enhancement New feature or request label Aug 9, 2022

kmnip added to be released An update will be available in the next release and removed question Further information is requested labels Sep 23, 2022

kmnip closed this as completed Mar 17, 2023

kmnip removed the to be released An update will be available in the next release label Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcript headers follow different formats #46

Transcript headers follow different formats #46

schorlton commented Aug 8, 2022

kmnip commented Aug 8, 2022

schorlton commented Aug 9, 2022

kmnip commented Aug 9, 2022

Transcript headers follow different formats #46

Transcript headers follow different formats #46

Comments

schorlton commented Aug 8, 2022

Please report

kmnip commented Aug 8, 2022

schorlton commented Aug 9, 2022

kmnip commented Aug 9, 2022