Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript headers follow different formats #46

Closed
3 tasks done
schorlton opened this issue Aug 8, 2022 · 3 comments
Closed
3 tasks done

Transcript headers follow different formats #46

schorlton opened this issue Aug 8, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@schorlton
Copy link

Please report

  • version of RNA-Bloom with java -jar RNA-Bloom.jar -version
  • version of java with java -version
  • exact command used to run RNA-Bloom

Trying to run RNA-Bloom indiscriminately on input files to see if they assemble. I don't check the files before as I want to leave it to RNA-Bloom to decide if it can assemble anything. Interestingly, RNA-Bloom produces different header formats in FASTA for different outputs.

Sometimes I get:
>3 l=228 c=1.1 s=8
other times I get:
>s1

Note that these are with different inputs. Is it possible to output the same header format each time? In the latter format, does coverage=1?

Thanks!!

RNA-Bloom v2.0.0

java --version
openjdk 17.0.3-internal 2022-04-19
OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)

Command:

rnabloom -outdir rnabloom_out -t 8 -long input.fastq -ntcard

Sample input read to reproduce single-element header:

@read1
AATTTGGGTGTTTAACCAGTCATCGCCTACCGTGACTTCGGATTCATCGTGTTTCGTTTTCGTGCGCCGCTTCAACATGGGGCTAATCATTGCTTTCGTGCGCCATTCAACATGGAATAATCATTGCTTTTTCGTGCGCCGCTTCAACATGGGGGGCCACGCGCGCGTCCCCCGAAGGCGCGTAACGCTGTGGCGGCCTGCTT
+
%*'('((,./;:3,''%%&#$%(*$$&(*-30441004/*.1110)*.06{?;?<)57??@76341{9334?C9B@:999JA?;88<@::7610/--+224.,,'&&''-612105'&&,127<<820.-:::34475{;545-?8454;==??8877...F{{{{<//101/.*,/12{{1.'&&$$$$%$'('''$%&&&'
@kmnip
Copy link
Collaborator

kmnip commented Aug 8, 2022

Hi @schorlton,

Are you seeing different FASTA header formats in the final output (i.e. rnabloom.transcripts.fa) of different assemblies?
Or, you mean different output FASTA files from the same assembly have different FASTA header formats?

If it is the latter, then it is actually intentional.

Ka Ming

@kmnip kmnip self-assigned this Aug 8, 2022
@kmnip kmnip added the question Further information is requested label Aug 8, 2022
@schorlton
Copy link
Author

Are you seeing different FASTA header formats in the final output (i.e. rnabloom.transcripts.fa) of different assemblies?

Yes this. Different reads used as input leads to differently formatted FASTA headers. Sorry that wasn't clear. I like the

 >3 l=228 c=1.1 s=8

header format as I use the coverage and length information. However, not all transcripts have this information in the header, eg. if you run RNA-Bloom on the example read above, you'll only get a FASTA header with a sequence identifier, no coverage or length information.

@kmnip
Copy link
Collaborator

kmnip commented Aug 9, 2022

Ah, ok. The reason why you see this header style in some but not others is because some assemblies may have ended at an earlier stage.

To resolve this issue, I will try to standardize the final output FASTA regardless of the assembly endpoint.

@kmnip kmnip added the enhancement New feature or request label Aug 9, 2022
@kmnip kmnip added to be released An update will be available in the next release and removed question Further information is requested labels Sep 23, 2022
@kmnip kmnip closed this as completed Mar 17, 2023
@kmnip kmnip removed the to be released An update will be available in the next release label Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants