Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate reference index to use for bulk RNA-sequencing #59

Merged
merged 10 commits into from
Oct 27, 2021

Conversation

allyhawkins
Copy link
Member

This PR, stacked on #58, adds the necessary steps to create the index used for bulk RNA-sequencing mapping. I am generating the reference fasta in the same way that we generated the reference fasta for the splici index, except I am only using transcripts from spliced cDNA, rather than spliced cDNA + introns. I'm then using that reference fasta and creating an index with decoys added, with the whole genome being used as decoys.

To do this, I modified our existing script for creating the splici reference fasta to also create the spliced cdna only fasta (changing the name of the file to reflect that the file now preps both of the reference transcriptomes). Part of my reasoning for doing this in one script rather than two separate scripts is because the parsing of the gtf and whole primary genome sequence to annotate spliced and unspliced cDNA is time consuming and those steps only need to happen once before outputting both fasta files so thought it was more efficient to keep it within the same script.

I then modified the current workflow used to build the indices and added in a step to create the decoy salmon index using the reference fasta. I chose to keep things as just the two processes, generating the fasta and then creating the salmon index, and added to the current salmon index process rather than create a third process. I could really go either way on this, but thought this worked nicely since it took as input all of the output from generating the two fastas.

Note that this is stacked on #58 as I had started on that first, but this should probably be reviewed first since fully testing #58 is dependent on generation of the index.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

My only real concern is that the annotation outputs may not be written out, as the annotation directory was removed as an output from the generate_fasta process (which I would rename generate_references or generate_transcriptome since it should make gtf files too).

I made some suggestions based on learning something new about dsl2 (emit statements) to handle separating outputs we will and won't use.

build-index.nf Outdated Show resolved Hide resolved
build-index.nf Outdated Show resolved Hide resolved
build-index.nf Outdated Show resolved Hide resolved
@allyhawkins allyhawkins changed the base branch from allyhawkins/add-bulk-workflow to main October 26, 2021 16:15
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@allyhawkins allyhawkins merged commit d7462b7 into main Oct 27, 2021
@allyhawkins allyhawkins deleted the allyhawkins/bulk-index branch March 23, 2023 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants