Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CANNOLI-29] Add minimal GMAP and GSNAP wrappers. #32

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
4 participants
@heuermh
Copy link
Member

heuermh commented May 2, 2017

Fixes #29.

@heuermh heuermh changed the title Add minimal GMAP and GSNAP wrappers. [CANNOLI-29] Add minimal GMAP and GSNAP wrappers. May 2, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented May 2, 2017

Coverage Status

Coverage decreased (-4.3%) to 25.379% when pulling 2e8d35f on heuermh:gmap-gsnap into 82ef700 on bigdatagenomics:master.

1 similar comment
@coveralls

This comment has been minimized.

Copy link

coveralls commented May 2, 2017

Coverage Status

Coverage decreased (-4.3%) to 25.379% when pulling 2e8d35f on heuermh:gmap-gsnap into 82ef700 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented May 2, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/17/
Test PASSed.

@fnothaft
Copy link
Member

fnothaft left a comment

Two small nits! Let's chat tomorrow about getting "feature complete" across all the aligners (e.g., docker support, etc).

implicit val tFormatter = InterleavedFASTQInFormatter
implicit val uFormatter = new AnySAMOutFormatter

val gmapCommand = "gmap --dir= " + args.genomePath + " --db=" + args.genomeName + " --format=sampe"

This comment has been minimized.

Copy link
@fnothaft

fnothaft May 3, 2017

Member

I'd make the path to gmap parametrizable.

implicit val tFormatter = InterleavedFASTQInFormatter
implicit val uFormatter = new AnySAMOutFormatter

val gsnapCommand = "gsnap --dir= " + args.genomePath + " --db=" + args.genomeName + " --format=sam"

This comment has been minimized.

Copy link
@fnothaft

fnothaft May 3, 2017

Member

I'd make the path to gsnap parametrizable.

@heuermh

This comment has been minimized.

Copy link
Member Author

heuermh commented May 3, 2017

Let's chat tomorrow about getting "feature complete" across all the aligners (e.g., docker support, etc).

+1

@coveralls

This comment has been minimized.

Copy link

coveralls commented May 12, 2017

Coverage Status

Coverage decreased (-4.5%) to 25.188% when pulling de68025 on heuermh:gmap-gsnap into 82ef700 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented May 12, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/22/
Test PASSed.

"--format=sampe").mkString(" ")

val output: AlignmentRecordRDD = input.pipe[AlignmentRecord, AlignmentRecordRDD, InterleavedFASTQInFormatter](gmapCommand)
.transform(_.cache())

This comment has been minimized.

Copy link
@fnothaft

fnothaft May 15, 2017

Member

Why are we caching here, if we're saving direct to disk?

"--format=sam").mkString(" ")

val output: AlignmentRecordRDD = input.pipe[AlignmentRecord, AlignmentRecordRDD, InterleavedFASTQInFormatter](gsnapCommand)
.transform(_.cache())

This comment has been minimized.

Copy link
@fnothaft

fnothaft May 15, 2017

Member

Ditto here. Why the cache?

@heuermh

This comment has been minimized.

Copy link
Member Author

heuermh commented Jul 1, 2017

This one is ready to go. Will need docker support and index mapping as described in #34.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 1, 2017

Coverage Status

Coverage decreased (-4.3%) to 25.379% when pulling ec41460 on heuermh:gmap-gsnap into 82ef700 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Jul 1, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/28/
Test PASSed.

@heuermh heuermh force-pushed the heuermh:gmap-gsnap branch from ec41460 to ef7b858 Jul 5, 2017

@heuermh

This comment has been minimized.

Copy link
Member Author

heuermh commented Jul 5, 2017

Rebased.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 5, 2017

Coverage Status

Coverage decreased (-2.4%) to 33.0% when pulling ef7b858 on heuermh:gmap-gsnap into dccc4a1 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Jul 5, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/38/
Test PASSed.

@heuermh heuermh force-pushed the heuermh:gmap-gsnap branch from ef7b858 to dd667b0 Jul 17, 2017

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jul 17, 2017

Coverage Status

Coverage decreased (-2.2%) to 30.241% when pulling dd667b0 on heuermh:gmap-gsnap into b026a94 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Copy link

AmplabJenkins commented Jul 17, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/cannoli-prb/49/
Test PASSed.

@fnothaft
Copy link
Member

fnothaft left a comment

Do these two tools accept interleaved FASTQ on stdin by default?

@heuermh

This comment has been minimized.

Copy link
Member Author

heuermh commented Jul 26, 2017

Do these two tools accept interleaved FASTQ on stdin by default?

I should've read closer, they accept FASTQ on stdin by default but assume it to be single-end reads

Input to GSNAP should be either in FASTQ or FASTA format.  The FASTQ
input may include quality scores, which will then be included in SAM
output, if that output format is selected.  For single-end reads, the
FASTQ file may be piped into GSNAP, or given as its command-line
argument, like this

    cat <fastq_file> | gsnap -d <genome>

or

    gsnap -d <genome> <fastq_file>


For paired-end reads, the two corresponding FASTQ files should be
given as command-line arguments in pairs, like this

    gsnap -d <genome> <fastq_file_1> <fastq_file_2> [<fastq_file_3> <fastq_file_4>...]

A pipe cannot work since GSNAP needs to access both FASTQ files in
parallel.  The reads in FASTQ files may have varying lengths, if
desired.  Note that GSNAP can process multiple sets of paired-end
reads, by adding the files in pairs.  If you want to provide multiple
single-end files, you can either use "cat" to concatenate them into
the stdin of gsnap, like this:

    cat <fastq_file_1> [<fastq_file_2>...] | gsnap -d <genome>

or you can provide them all on the command line with the
--force-single-end flag, like this:

    gsnap -d <genome> --force-single-end <fastq_file_1> [<fastq_file_2>...]

which will process each FASTQ file one at a time as single-end reads,
and not try to pair them up.
@fnothaft

This comment has been minimized.

Copy link
Member

fnothaft commented Jul 26, 2017

@heuermh thoughts on a path forward?

@heuermh

This comment has been minimized.

Copy link
Member Author

heuermh commented Jul 26, 2017

Not having any issue tracker, I've sent an email to the author. Unless you can think of another approach, like splitting the interleaved reads into separate named pipes in a wrapper script.

@heuermh

This comment has been minimized.

Copy link
Member Author

heuermh commented Mar 22, 2018

Closing as WontFix, no reply from author.

@heuermh heuermh closed this Mar 22, 2018

@heuermh heuermh added this to the 0.2.0 milestone Mar 22, 2018

@heuermh heuermh deleted the heuermh:gmap-gsnap branch Mar 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.