Generate reference index to use for bulk RNA-sequencing #59

allyhawkins · 2021-10-20T21:28:21Z

This PR, stacked on #58, adds the necessary steps to create the index used for bulk RNA-sequencing mapping. I am generating the reference fasta in the same way that we generated the reference fasta for the splici index, except I am only using transcripts from spliced cDNA, rather than spliced cDNA + introns. I'm then using that reference fasta and creating an index with decoys added, with the whole genome being used as decoys.

To do this, I modified our existing script for creating the splici reference fasta to also create the spliced cdna only fasta (changing the name of the file to reflect that the file now preps both of the reference transcriptomes). Part of my reasoning for doing this in one script rather than two separate scripts is because the parsing of the gtf and whole primary genome sequence to annotate spliced and unspliced cDNA is time consuming and those steps only need to happen once before outputting both fasta files so thought it was more efficient to keep it within the same script.

I then modified the current workflow used to build the indices and added in a step to create the decoy salmon index using the reference fasta. I chose to keep things as just the two processes, generating the fasta and then creating the salmon index, and added to the current salmon index process rather than create a third process. I could really go either way on this, but thought this worked nicely since it took as input all of the output from generating the two fastas.

Note that this is stacked on #58 as I had started on that first, but this should probably be reviewed first since fully testing #58 is dependent on generation of the index.

jashapiro

This looks good!

My only real concern is that the annotation outputs may not be written out, as the annotation directory was removed as an output from the generate_fasta process (which I would rename generate_references or generate_transcriptome since it should make gtf files too).

I made some suggestions based on learning something new about dsl2 (emit statements) to handle separating outputs we will and won't use.

build-index.nf

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

build-index.nf

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

jashapiro

LGTM!

allyhawkins added 5 commits October 19, 2021 16:30

generate spliced reference fasta

556f0d0

update process to include decoy index generation

0c01eba

add steps to generate spliced cdna reference and fix typos

921a1db

change process name

299a942

Merge branch 'allyhawkins/add-bulk-workflow' into allyhawkins/bulk-index

c26f117

allyhawkins requested a review from jashapiro October 21, 2021 00:04

jashapiro reviewed Oct 21, 2021

View reviewed changes

build-index.nf Outdated Show resolved Hide resolved

build-index.nf Outdated Show resolved Hide resolved

allyhawkins and others added 3 commits October 21, 2021 10:33

Apply suggestions from code review

992f19a

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

Merge branch 'allyhawkins/add-bulk-workflow' into allyhawkins/bulk-index

959c283

change process name

ff5740b

jashapiro reviewed Oct 23, 2021

View reviewed changes

build-index.nf Outdated Show resolved Hide resolved

allyhawkins and others added 2 commits October 25, 2021 10:19

apply suggestions from code review

1db846b

Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>

Merge remote-tracking branch 'origin/main' into allyhawkins/bulk-index

356cee9

allyhawkins changed the base branch from allyhawkins/add-bulk-workflow to main October 26, 2021 16:15

allyhawkins requested a review from jashapiro October 26, 2021 16:15

jashapiro approved these changes Oct 27, 2021

View reviewed changes

allyhawkins merged commit d7462b7 into main Oct 27, 2021

jashapiro mentioned this pull request Dec 14, 2021

Prepare for new workflow release (0.1.3) #70

Closed

allyhawkins deleted the allyhawkins/bulk-index branch March 23, 2023 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate reference index to use for bulk RNA-sequencing #59

Generate reference index to use for bulk RNA-sequencing #59

allyhawkins commented Oct 20, 2021

jashapiro left a comment

jashapiro left a comment

Generate reference index to use for bulk RNA-sequencing #59

Generate reference index to use for bulk RNA-sequencing #59

Conversation

allyhawkins commented Oct 20, 2021

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro left a comment

Choose a reason for hiding this comment